CN112861699A - Method for estimating height of human body in any posture based on single depth image and multi-stage neural network - Google Patents
Method for estimating height of human body in any posture based on single depth image and multi-stage neural network Download PDFInfo
- Publication number
- CN112861699A CN112861699A CN202110150551.6A CN202110150551A CN112861699A CN 112861699 A CN112861699 A CN 112861699A CN 202110150551 A CN202110150551 A CN 202110150551A CN 112861699 A CN112861699 A CN 112861699A
- Authority
- CN
- China
- Prior art keywords
- image
- neural network
- module
- height
- depth image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 94
- 238000000034 method Methods 0.000 title claims abstract description 88
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 73
- 238000012549 training Methods 0.000 claims abstract description 40
- 210000000689 upper leg Anatomy 0.000 claims abstract description 28
- 238000011176 pooling Methods 0.000 claims abstract description 24
- 230000011218 segmentation Effects 0.000 claims abstract description 20
- 238000005452 bending Methods 0.000 claims abstract description 4
- 230000014509 gene expression Effects 0.000 claims abstract description 3
- 230000008447 perception Effects 0.000 claims abstract description 3
- 230000036544 posture Effects 0.000 claims description 23
- 230000004913 activation Effects 0.000 claims description 22
- 230000006870 function Effects 0.000 claims description 22
- 244000309466 calf Species 0.000 claims description 14
- 239000013598 vector Substances 0.000 claims description 14
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 12
- 238000005070 sampling Methods 0.000 claims description 12
- 230000031702 trunk segmentation Effects 0.000 claims description 8
- 125000004122 cyclic group Chemical group 0.000 claims description 6
- 238000005516 engineering process Methods 0.000 claims description 6
- 230000008901 benefit Effects 0.000 claims description 4
- 210000000746 body region Anatomy 0.000 claims description 4
- 238000005259 measurement Methods 0.000 claims description 4
- 238000009825 accumulation Methods 0.000 claims description 2
- 238000003708 edge detection Methods 0.000 claims description 2
- 230000002349 favourable effect Effects 0.000 claims description 2
- 238000012544 monitoring process Methods 0.000 claims description 2
- 238000010606 normalization Methods 0.000 claims description 2
- 230000002265 prevention Effects 0.000 claims description 2
- 239000000523 sample Substances 0.000 claims description 2
- 230000003247 decreasing effect Effects 0.000 claims 1
- 230000001537 neural effect Effects 0.000 claims 1
- 238000011161 development Methods 0.000 abstract description 5
- 230000018109 developmental process Effects 0.000 abstract description 5
- 238000013461 design Methods 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 5
- 238000013139 quantization Methods 0.000 description 4
- 230000009467 reduction Effects 0.000 description 4
- 238000007796 conventional method Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- HPTJABJPZMULFH-UHFFFAOYSA-N 12-[(Cyclohexylcarbamoyl)amino]dodecanoic acid Chemical compound OC(=O)CCCCCCCCCCCNC(=O)NC1CCCCC1 HPTJABJPZMULFH-UHFFFAOYSA-N 0.000 description 1
- 208000027418 Wounds and injury Diseases 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 208000014674 injury Diseases 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method for estimating the height of a human body in any posture based on a single depth image and a multi-stage neural network, which realizes the accurate height estimation of the human body in any posture and position through a 4-stage neural network framework and a development neural network framework. The method mainly comprises the following steps: human body segmentation, namely segmenting a human body trunk image from a single depth image by using a neural network, and extracting high-frequency detail information as input in order to enable the segmentation edge to be finer; constructing intermediate expression, further dividing the trunk image into four parts with small bending degrees, namely a head, an upper body, a thigh and a shank, and respectively predicting the lengths of the four parts, thereby fully utilizing the excellent performance of the convolutional neural network in local perception; and model design, namely, designing a network architecture and a training method for developing a neural network and combining a mixed pooling strategy to estimate the height of a human body, thereby further improving the network performance and reducing the training time. The average human height prediction accuracy of the invention is more than 99.1%.
Description
Technical Field
The invention relates to the field of computer vision and machine learning, and provides a method for estimating the height of a human body in any posture based on a single depth image and a multi-stage neural network. The specific technology comprises the following steps: the method comprises the steps of depth image segmentation, image feature extraction, image semantic information extraction, neural network architecture design and neural network training. Through the technology, a 4-stage neural network framework is constructed, a training method for developing a neural network is provided, and the accurate estimation of the height of a human body from a single depth image is realized.
Background
In the fields of human body three-dimensional reconstruction, virtual reality, medical health, clothing design and the like, height data is essential data information in the development process of the fields. In most cases, the conventional method generally requires that the person to be measured stands upright, and then the person is measured and read by means of a meter or a height meter. The measurement mode not only consumes a great deal of time and labor, but also requires active cooperation of the measured person and limited use scenes. Particularly, in practical application scenarios, if a measuring person lacks a measuring tool such as a meter or a height meter, or the measured person cannot stand upright due to injury or disease, the conventional height measurement cannot be performed.
In recent years, some methods attempt to acquire information from images or videos to achieve the purpose of measuring the height of a human body in a non-contact manner, so that the problems in the conventional methods can be solved to a certain extent, but the methods still have certain limitations: most methods can only measure simple postures such as standing and walking, or require a tested person to stand at a specified position, so that the use scenes of the methods are greatly limited. Some methods require manual calibration of the head and feet, cannot be fully automated, and require a large amount of manual labeling. Still other methods require the acquisition of multiple photographs, or the use of multiple devices, adding to the expenditure of time and cost.
Aiming at the problems, the method can output a reliable result within millisecond time by only shooting one depth image, realize full-automatic estimation of the height of the human body from the image and save a large amount of manpower and time. Meanwhile, the tested person can make any postures of standing, walking, bending, sitting, standing and the like without being required to be in a certain fixed position in the image, can be positioned at any position in the acquisition range of the depth camera in various postures, and has good adaptability and robustness.
Disclosure of Invention
Technical problem to be solved
The invention aims to provide a method for estimating the height of a human body in any posture based on a single depth image and a multi-stage neural network, which can accurately estimate the height of the human body in any position and any posture by a full-automatic non-contact measuring means.
(II) technical scheme
1. The invention provides a method for estimating the height of a human body in any posture based on a single depth image and a multi-stage neural network, which comprises the following steps:
and S1, acquiring data, wherein the step acquires a human height data set with 2136 depth images by using the depth camera. The tested person can be positioned at any position in the acquisition range of the depth camera and can be put in any posture, including non-upright postures such as sitting, bending, walking and the like. The height of each volunteer and the lengths of four approximately rigid parts, namely the head, the upper body, the thigh and the shank, are measured and recorded. And marking the real values of the corresponding trunk images and the real values of the body part images for each depth image.
And S2, extracting the edge image, and extracting edge high-frequency information in the original depth image by using an edge detection operator.
S3, segmenting human body trunk, and designing a convolutional neural network f1(x) And the human body image is extracted from the original depth image and the edge image.
S4, recognizing body parts and designing a convolutional neural network f2(x) The torso image is further segmented into four approximately rigid body parts: head, upper body, thigh and calf, and obtaining body position image.
S5, predicting the length of the body part, and designing a convolutional neural network f3(x) The lengths of four approximately rigid body parts, i.e., the probe, the upper body, the thigh, and the calf, are predicted from the body part image and the original depth image.
S6, predicting the height of human body, designing a convolutional neural network f4(x) The human height is predicted through the original depth image, the body part image and the body part length. At the same time, different characteristics are adopted according to different input dataUsing a hopping connection structure to input raw input data into each convolutional layer.
S7, designing a developing neural network, predicting task characteristics based on height and a convolutional neural network f4(x) The framework of (2) and a training method of a network structure changing along with the iteration times are designed, and the fitting state is repeatedly destroyed until the neural network finds a global optimal solution.
2. The method of any pose human height estimation based on a single depth image and a multi-stage neural network of claim 1, wherein: and only using a single depth image to predict the height of the human body in a non-contact mode. The input data adopted in all steps of the invention are original depth images or intermediate expressions obtained by the original depth images. Only a low-price commercial-grade depth camera is used for collecting depth data to serve as an original depth image, so that equipment cost is reduced, and the method is easy to use, practical and popularize. The depth image is obtained by using an infrared technology, and the performance is not influenced even if no external light source exists, so that the method can be applied to the fields of night security monitoring and the like. The body height of a human body is predicted in the image in a non-contact full-automatic mode, only one image is needed to be shot, a measurer does not need to be in direct contact with a detected person, and the method is suitable for measuring the body height data of the human body in epidemic situation prevention and control normalization periods, physical examination, riding, ticket selling and other situations. The depth image does not contain human face features and clothing texture features, and the fact that the neural network learns the identity features of the detected person can be avoided only by using a single depth image, so that interference is generated on height estimation.
3. The method of any pose human height estimation based on a single depth image and a multi-stage neural network of claim 1, wherein: and extracting height prediction related features by using a multi-stage neural network, and converting the height estimation problem into a plurality of local small problems. By convolutional neural networks f1(x) Obtaining the image of the human body, and obtaining the image of the human body through a convolution neural network f2(x) Obtaining body part images through a convolution neural network f3(x) Obtaining the length of body part, decomposing the body height estimation into four approximately rigid parts of head, upper half body, thigh and shank, respectively predicting, and analyzingThe four results are integrated into the height of the measured human body. The benefits of this are: the height prediction is decomposed into four rigid part predictions, which is an easier problem; the lengths of the four body parts and the topological relationship between them can suggest the posture of the human body, thereby providing a favorable clue for height estimation; the height prediction problem is divided into four small local problems, and the excellent performance of the convolutional neural network on local perception can be fully utilized.
4. The arbitrary pose human height estimation method based on a single depth image and a multi-stage neural network of claim 1, further comprising: convolutional neural network f1(x) And the method is used for segmenting the human body image from the single depth image. Since in the field of height measurement we define the distance between the apex of the head and the point of the sole of the foot when the body is upright as height, the positioning of the apex of the head and the point of the sole of the foot is of great importance. In the existing human body segmentation method, the segmentation at the body edge is often inaccurate, which influences the selection of the head vertex and the foot bottom point. Our method uses Canny operator to extract edge information from depth image, and enhances the edge of human body segmentation image.
E=Canny(X)
X represents the original depth image acquired by the camera and E represents the corresponding edge image extracted.
f1(x) Including five downsampling and five upsampling modules. In the up-sampling module, a module I and a module II both comprise 2 convolutional layers and activation functions, and a module III, a module IV and a module V all comprise 3 convolutional layers and activation functions. The down-sampling module is symmetrical to the up-sampling module, the first module, the second module and the third module all comprise 3 convolution layers and activation functions, and the fourth module and the fifth module all comprise 2 convolution layers and activation functions. Our method takes the original depth image X and the edge image E as input, and passes through a convolutional neural network f1(x) To obtain a human body trunk segmentation image prediction image T'.
T′=f1(X,E)
Convolutional neural network f1(x) Loss value of Loss1 using a predictive torso image sumMean of the pixel-by-pixel difference sum of squares of the real torso image, Adam is used by the optimizer.
N is the total number of pixels in the image, i is a certain pixel point in the cyclic variable representation image, and T is a real trunk image. The high-frequency information of the human body edge is input into the neural network, so that the accuracy of the edge in the human body trunk segmentation graph can be obviously improved.
5. The arbitrary pose human height estimation method based on a single depth image and a multi-stage neural network of claim 1, further comprising: convolutional neural network f2(x) The method is used for further dividing the human body image into four approximately rigid parts, namely a head, an upper body, thighs and calves, so as to obtain the human body image. Convolutional neural network f2(x) And f1(x) The network structures of the modules are the same, and each module comprises 5 down-sampling modules and 5 up-sampling modules, and each module comprises a plurality of convolution layers and an activation function. To a convolutional neural network f2(x) The prediction graph L ' of the human body part segmentation image is obtained by inputting the prediction graph T ' of the human body trunk segmentation image in the middle, because the prediction graph L ' of the human body part segmentation image is obtained by the convolutional neural network f1(x) The obtained trunk segmentation image T' has errors, so the images are simultaneously f2(x) The original depth image X is input to avoid accumulation of errors.
L′=f2(X,T′)
Convolutional neural network f2(x) Loss value Loss2 is the average of the sum of squared pixel-by-pixel differences between the predicted body region image and the actual body region image.
N is the total number of pixels in the image, i is a certain pixel point in the cyclic variable representation image, and L is a real human body part image. The optimizer employs Adam to minimize the error of predicted and real images until the network can robustly achieve convergence.
6. The arbitrary pose human height estimation method based on a single depth image and a multi-stage neural network of claim 1, further comprising: convolutional neural network f3(x) The method is used for predicting the lengths of four approximately rigid parts, namely the measuring head, the upper half body, the thigh and the lower leg. Convolutional neural network f3(x) The device comprises 13 convolutional layers, 5 full-connection layers and corresponding activation functions, wherein the 13 convolutional layers form 5 up-sampling modules, a first module and a second module respectively comprise 2 convolutional layers and activation functions, and a third module, a fourth module and a fifth module respectively comprise 3 convolutional layers and activation functions. And outputting a 1 x 4 vector representing the length of 4 body parts through 6 fully-connected layers, wherein the number of nodes of the 6 fully-connected layers is respectively as follows: 4096. 4096, 1000, 256, 64, 4. To a convolutional neural network f3(x) Meanwhile, in order to reduce accumulated errors, an original depth image X is input into the network, and length estimated values of 4 body parts including the head, the upper body, the thigh and the lower leg are obtained.
[HheadHupperbodyHthighHcalf]1*4=f3(X,L)
HheadIs the head length, HupperbodyTo upper body length, HthighIs thigh length, HcalfIs the calf length.
The Loss value Loss3 uses the sum of the squares of the difference of the predicted four-part length and the true value.
Loss3=|Hhead-THhead|2+|Hupperbody-THupperbody|2+|Hthigh-THthigh|2+|Hcalf-THcalf|2
THhead,THuooerbody,THthigh,THcalfRepresenting the actual length of the head, upper body, thigh and calf, respectively. The optimizer uses Adam to predict the lengths of the 4 approximately rigid sites separately by minimizing the loss values.
7. The single-sheet based depth map of claim 1The method for estimating the height of the human body with any posture, such as the multi-stage neural network, further comprises the following steps: convolutional neural network f4(x) The height of the human body is estimated by the body part image, the body part length and the original depth image. Convolutional neural network f4(x) The method comprises 13 convolutional layers, 7 full-connection layers and corresponding activation functions, wherein the 13 convolutional layers form 5 upsampling modules, a first module and a second module both comprise 2 convolutional layers and activation functions, a third module, a fourth module and a fifth module all comprise 3 convolutional layers and activation functions, the last convolutional layer result is unfolded into a one-dimensional vector, and an estimated height value is output through the 7 full-connection layers, wherein the number of nodes of the 7 full-connection layers is 4096, 1000, 256, 64, 16 and 1. In order to reduce accumulated errors and improve the accuracy of prediction, an original depth image X and a human body part segmentation image prediction image L' are input into a convolutional neural network f together4(x) In (1). Meanwhile, the invention also adopts a jump structure, and adopts different pooling strategies according to the characteristics of different input data: and adopting an average pooling strategy for the depth image and a maximum pooling strategy for the human body part segmentation image, so that input data with different scales are directly input into each convolution layer.
[Hhuman]=f4(X,L′,H4-part)
HhumanIs an estimate of body height, H4-partAre predicted values of the 4 approximate rigid body part lengths in step S5.
Convolutional neural network f4(x) Loss value Loss4 uses the square of the difference between the estimated height and the actual height.
Loss4=|Hhuman-THhuman|2
THhumanIs the true value of the height of the human body. The optimizer estimates body height by minimizing the loss value using Adam.
8. The arbitrary pose human height estimation method based on a single depth image and a multi-stage neural network of claim 7, further comprising: a hopping connection structure with a hybrid pooling strategy. Convolutional neural network f4(x) Having 13 convolution layers, at f4(x) The problem that the gradient disappears under the condition that the network layer number is deep can be solved by adding the jump connection, and meanwhile, the reverse propagation of the gradient is facilitated, and the training process is accelerated. In steps S3, S4, and S5, we use three neural networks to obtain length estimates of the head, upper body, thigh, and lower leg of the human body, and input them as input data to the convolutional neural network f4(x) In the method, because the estimated value predicted by each neural network has an error with the true value, and the current error is input into the neural network of the next stage, the error is increased. We use the skip-join structure to input the original depth image X and the body part image L directly into the convolutional neural network f4(x) To minimize the accumulated error.
Since the original depth image X and the body part image L have different characteristics: the original depth image X has a plurality of noise points, which is expressed on the depth image, namely a plurality of extreme points, and if a maximum pooling strategy is simply used, the noise points are reserved, the depth information is lost, the network learning is interfered, and a larger error is caused; the body part image L is smoother and if an average pooling strategy is used, the gradient at the body part boundary is reduced, introducing errors. Therefore, an average pooling strategy is adopted for the original depth image X, noise of the depth image is smoothed, and the accurate original depth image under different scales is input into a network. And adopting a maximum pooling strategy for the body part image L, keeping the accuracy of the segmentation boundary, and still keeping gradient information when the image size is reduced.
Lnext=Maxpool(Lnow)
Xnext=Avgpool(Xnow)
LnowIs a body part image of the current scale, LnextIs the body part image of the next scale. XnowFor depth images of the current scale, XnextIs the depth image of the next scale.
The original information is input into each convolution module by adopting a jump connection structure of a mixed pooling strategy, so that the original image can be kept undistorted under each scale to the maximum extent, and the accuracy of network prediction is improved.
9. The method for arbitrary pose body height estimation based on a single depth image and a multi-stage neural network of claim 7, further comprising: architecture and training methods for neural networks were developed. The invention proposes to develop a neural network and apply it to a convolutional neural network f4(x) So as to improve the network precision, reduce the training time and prevent overfitting.
The main idea of developing neural networks is: in the network training process, when the network tends to converge, the architecture of the convolution layer part of the network is adjusted to jump out of the local minimum value, and a global optimal solution is searched.
The specific method comprises the following steps: in a convolutional neural network f4(x) When the number of iterations is less than 4 x 10 in the training process4In time, the network is pre-trained, the neural network f4(x) 13 convolution layers of the middle 5 modules are all in working states; when the number of iterations equals 4 x 104Then, storing the pre-training model; when the number of iterations is greater than 4 x 104And less than 6 x 104Time, neural network f4(x) Each convolution module only reserves the first convolution layer, and the total number of the convolution layers is 5 for training; when the number of iterations is greater than 6 x 104And less than 8 x 104And then, the first module and the second module recover one convolutional layer from the pre-training model respectively, and at the moment, 7 convolutional layers in the network participate in training. When the number of iterations is more than 8 x 104And less than 1 x 105And then, recovering one convolutional layer from the pre-training model by using a module three, a module four and a module five, wherein 10 convolutional layers in the network participate in training. When the number of iterations is greater than 1 x 105In time, module three, module four, and module five each recover one convolutional layer from the pre-trained model, i.e., recover the first 13 convolutional layers.
Developing a neural network can be accomplished by fitting and destroying the fitting conditions iteratively until the network finds a globally optimal solution. Since when the number of iterations is greater than 4 x 104And less than 1 x 105Only part of the convolutional layer is trained, so that the network training time can be reduced. When the network tends to converge, by increasingThe deconvolution layer prevents the network from being over-fitted, so that the network jumps out of the local optimal solution to search the global optimal solution, thereby effectively improving the accuracy of height estimation.
Drawings
FIG. 1 is a schematic diagram of a framework of an arbitrary posture human height estimation method based on a single depth image and a multi-stage neural network according to an embodiment of the present invention. Firstly extracting edge images from the acquired depth images and then inputting the edge images into a neural network f1(x) Obtaining a trunk image; the torso image and depth image are then input to a neural network f2(x) Obtaining a body part image; inputting the body position image and the depth image into a neural network f3(x) Obtaining a predicted value of the length of the body part; finally, the length of the body part, the body part image and the depth image are input into the neural network f4(x) And outputting the estimated value of the height of the human body.
FIG. 2 shows a convolutional neural network f according to an embodiment of the present invention1(x) The network structure chart of (1) inputting the depth image and the edge image and outputting the trunk image. The system comprises 10 convolution modules and 26 convolution layers. The first 5 convolution modules are used to extract features, and the last 5 convolution modules generate a torso image.
FIG. 3 shows a convolutional neural network f according to an embodiment of the present invention2(x) The network structure chart of (1) inputs the depth image and the body image and outputs the body position image. The system comprises 10 convolution modules and 26 convolution layers. The first 5 convolution modules are used to extract features, and the last 5 convolution modules generate body part images.
FIG. 4 shows a convolutional neural network f according to an embodiment of the present invention3(x) The network structure chart of (1) inputs the depth image and the body part image, and outputs the body part length estimation value. The system comprises 5 convolution modules, 13 convolution layers and 6 full-connection layers. The 5 convolution modules are used for extracting features, and length estimated values of 4 body parts including the head, the upper body, the thigh and the shank are obtained through 6 full-connection layers.
FIG. 5 shows a convolutional neural network f according to an embodiment of the present invention4(x) The network structure chart inputs the depth image, the length value of the body part and the body part image and outputs the height of the human body. Contains 5 convolution modules13 convolutional layers and 7 fully connected layers. The 5 convolution modules are used for extracting features, and then the estimated value of the height of the human body is obtained through 7 full-connection layers.
FIG. 6 shows a convolutional neural network f according to an embodiment of the present invention4(x) The network structure changes with the iteration number.
FIG. 7 is an example of some experimental results according to an embodiment of the present invention.
Detailed Description
The invention provides a method for estimating the height of a human body in any posture based on a single depth image and a multi-stage neural network, which realizes the accurate height estimation of the human body in any position and any posture through a 4-stage neural network framework and a development neural network framework. Firstly, a large number of depth images are collected for a plurality of people, the height of a tested person and the length of each part of a body are recorded, and a trunk image and a body part image are marked as truth values to construct a data set. Then the network and model are designed, the height estimation is converted into the length prediction of four approximately rigid body parts, and a 4-stage convolution neural network is designed to complete the process. And finally, improving a network architecture and a training method, and providing an architecture named as developing a neural network, so that the training time is reduced, overfitting is prevented, and the accuracy of network prediction is further improved.
In order that the objects, technical solutions and advantages of the present invention will become more apparent, the present invention will be further described in detail with reference to the accompanying drawings in conjunction with the following specific embodiments.
The invention provides a method for estimating height of a human body in any posture based on a single depth image and a multi-stage neural network, and the method is a frame schematic diagram of the method for estimating the height based on the single depth image. As shown in FIG. 1, the invention mainly realizes the accurate estimation of the human body height through a 4-stage neural network framework, which comprises: neural network f1(x) Neural network f2(x) Neural network f3(x) And neural network f4(x) In that respect Neural network f1(x) Estimating a detected human body trunk image from the depth image and the edge image; neural network f2(x)From depth images and neural networks f1(x) The output human body trunk image estimates four approximately rigid body part segmentation images of the head, the upper half body, the thigh and the shank of the tested person; neural network f3(x) From depth images and neural networks f2(x) The output body part images estimate the corresponding lengths of 4 body parts of the tested person; neural network f4(x) From depth images, body position images and neural networks f3(x) The output length of body part estimates the height of human body, the invention also designs a mixed pooling strategy and develops a neural network and applies the neural network to f4(x) Further improving the accuracy of height estimation.
With reference to fig. 2 to 6, specific technical steps of the above-mentioned framework are described:
and S1, segmenting the human body image through the depth image and the edge image.
The method comprises the following steps of firstly, acquiring an edge image from a depth image by using a Canny operator, and specifically comprises the following steps: and selecting a Gauss filter for smooth filtering of the image, and processing by adopting a non-extreme value suppression technology to obtain edge information of the depth image. The depth image and edge image are then input to a convolutional neural network f1(x) In the method, features of an input image are extracted by 5 downsampling modules, and the image is represented by a low-dimensional feature vector, wherein the first module and the second module respectively comprise 2 convolutional layers, and the third module, the fourth module and the fifth module respectively comprise 3 convolutional layers. And inputting the feature vectors into 5 upsampling modules, wherein each of the module six, the module seven and the module eight comprises 3 convolutional layers, each of the module 9 and the module 10 comprises 2 convolutional layers, and the 5 upsampling modules output corresponding human body trunk images according to the input feature vectors.
Convolutional neural network f1(x) The network architecture is shown in fig. 2.
The Loss value Loss1 quantization neural network f is adopted in the step1(x) The error between the estimated image and the real image is specifically an average value of the pixel-by-pixel difference sum of squares of the predicted torso image and the real torso image.
N is the total number of pixels in the image, i is a cyclic variable representing a certain pixel point in the image, X is a depth image, E is an edge image, and T is a real trunk image. The initial learning rate was set to 0.0001 with a reduction of 0.8 every 5 rounds. And calculating an updating step size by adopting an Adam optimizer, and obtaining an estimated human body trunk image by minimizing the Loss 1.
And S2, obtaining a body position image through the depth image and the body image.
The step is to further divide the trunk image according to the body part to obtain the body part image, and simultaneously input the depth image to eliminate the accumulated error, thereby converting the human height estimation into the estimation of the lengths of four approximate rigid parts, namely the head, the upper body, the thigh and the shank. The specific method comprises the following steps: inputting depth images and torso into a convolutional neural network f2(x) In the method, features of an input image are extracted by 5 downsampling modules, and the image is represented by a low-dimensional feature vector, wherein the first module and the second module respectively comprise 2 convolutional layers, and the third module, the fourth module and the fifth module respectively comprise 3 convolutional layers. And inputting the feature vectors into 5 upsampling modules, wherein each of the module six, the module seven and the module eight comprises 3 convolutional layers, each of the module 9 and the module 10 comprises 2 convolutional layers, and the 5 upsampling modules output corresponding body part images according to the input feature vectors.
Convolutional neural network f2(x) The network architecture is shown in fig. 3.
The Loss value Loss2 quantization neural network f is adopted in the step2(x) The error between the estimated image and the real image is specifically an average value of pixel-by-pixel difference sum of squares of the predicted body part image and the real body part image.
N is the total number of pixels in the image, i is a certain pixel point in the cyclic variable representation image, X is a depth image, T is a human body trunk image, and L is a real body part image. The initial learning rate was set to 0.0001 with a reduction of 0.8 every 5 rounds. An Adam optimizer is used to calculate the update step size and an estimated body part image is obtained by minimizing Loss 2.
And S3, obtaining the length value of the body part through the depth image and the body part image.
This step estimates the lengths of four approximately rigid body parts, the head, the upper body, the thighs, and the calves, from the depth image and the body part image. The specific method comprises the following steps: inputting depth image and body part image to convolutional neural network f3(x) Firstly, 5 downsampling modules are used for extracting image features, wherein a module I and a module II respectively comprise 2 convolutional layers, and a module III, a module IV and a module V respectively comprise 3 convolutional layers. Then, the output characteristics of the last layer are expanded into 25088-dimensional vectors, 6 full-connected layers are sequentially input, and a four-dimensional vector representing the lengths of the 4 body parts of the head, the upper body, the thigh and the calf is output, and the process can be expressed as follows:
[HheadHupperbodyHthighHcalf]1*4=f3(X,L)
Hheadis the head length, HupperbodyTo upper body length, HthighIs thigh length, HcalfIs the calf length, X is the depth image, and L is the body part image.
Convolutional neural network f3(x) The network structure diagram is shown in fig. 4, and the number of input and output nodes of each fully-connected layer is marked below the layer.
The Loss value Loss3 quantization neural network f is adopted in the step3(x) And outputting an error between the estimated length and the real length, specifically, a sum of squares of differences between the four parts of the estimated length and the real length.
Loss3=|Hhead-THhead|2+|Hupperbody-THupperbody|2+|Hthigh-THthigh|2+|Hcalf-THcalf|2
THhead,THupperbody,THthigh,THcalfRepresenting the actual length of the head, upper body, thigh and calf, respectively. The initial learning rate was set to 0.0001 with a reduction of 0.5 every 50 rounds. The update step size is calculated using an Adam optimizer, and the estimated 4 body part lengths are obtained by minimizing Loss 3.
And S4, obtaining the height of the human body through the depth image, the length of the body part and the body part image.
The step estimates the height of the human body by combining the length of the body part and the image of the body part, and inputs the original depth image to eliminate the accumulated error. The specific method comprises the following steps: inputting the depth image, the body part length and the body part length into a convolutional neural network f4(x) Firstly, 5 downsampling modules are used for extracting height estimation related features, wherein a module I and a module II respectively comprise 2 convolutional layers, and a module III, a module IV and a module V respectively comprise 3 convolutional layers. Jump connection is added before each convolution layer, and the depth images and body part segmentation images under different scales are input before each convolution layer, so that the problem that the gradient disappears under the condition that the network layer number is deep is solved, the reverse propagation of the gradient is facilitated, and the training process is accelerated. In the step, different pooling strategies are selected according to different characteristics of an input image, noise is prevented from being introduced or original image gradient is prevented from disappearing, an average pooling strategy is adopted for a depth image, a maximum pooling strategy is adopted for a body part image, and the depth image and the body part image under different scales are input to each convolution layer. And finally, expanding the output characteristics of the last layer into 25088-dimensional vectors, sequentially inputting 7 fully-connected layers, and outputting a one-dimensional vector to represent the height of the human body, wherein the process can be represented as follows:
[Hhuman]=f4(X,L,H4-part)
Hhumanis an estimate of body height, H4-partAre predicted values of the 4 approximate rigid body part lengths in step S3.
Convolutional neural network f4(x) The network structure diagram is shown in fig. 5, and the number of input and output nodes of each fully-connected layer is marked below the layer.
The Loss value Loss4 quantization neural network f is adopted in the step4(x) And outputting the error between the estimated height and the real height, specifically the square sum of the difference between the estimated height and the real height.
Loss4=|Hhuman-THhuman|2
The initial learning rate was set to 0.0001 with a reduction of 0.5 every 50 rounds. An Adam optimizer is used to calculate the update step size, and an estimate of height is obtained by minimizing Loss 4.
S5 optimization of convolutional neural network f using a evolving neural network framework4(x)
Directly using the convolutional neural network f in the step S44(x) The problem of overfitting is easily generated when the height of a human body is estimated, and the prediction accuracy is influenced. This step applies the developing neural network to the convolutional neural network f4(x) So as to improve the network precision, reduce the training time and prevent overfitting. In a convolutional neural network f4(x) When the number of iterations is less than 4 x 10 in the training process4When the method is used, the network is pre-trained, and 13 convolutional layers in 5 modules in the network work; when the number of iterations equals 4 x 104Then, storing the pre-training model; when the number of iterations is greater than 4 x 104And less than 6 x 104When the training is carried out, each module only reserves the first convolutional layer, and the training is carried out on 5 convolutional layers in total; when the number of iterations is greater than 6 x 104And less than 8 x 104And then, the first module and the second module recover one convolutional layer from the pre-training model respectively, and at the moment, the network has 7 convolutional layers in total. When the number of iterations is more than 8 x 104And less than 1 x 105And then, recovering one convolutional layer from the pre-training model by using a module three, a module four and a module five respectively, wherein the network has 10 convolutional layers. When the number of iterations is greater than 1 x 105In time, module three, module four, and module five each recover one convolutional layer from the pre-trained model, i.e., recover the first 13 convolutional layers. By fitting and breaking the fitting conditions iteratively until the network finds a globally optimal solution.
FIG. 5 is a convolutional neural network f4(x) Schematic diagram of convolution layer changing with iteration number after applying development neural network framework.
FIG. 6 is an example of the experimental results of the present invention.
Experiments prove that the technology can accurately estimate the height of the human body from a single depth image.
The techniques of the present invention may be implemented in computer software, for example written using Python, and the development environment may be, for example, the Windows 10 system and Pycharm Version 2018.3.
The hardware support required is:
GPU: inviada (NVIDIA) GeForce RTX 2080 Ti Foundation Edition
The required deep learning environments are:
Pytorch 1.1.0
NVIDIA CUDA 10.1.120 driver
cuDNN-10.0-windows10-x64 v7.3.1.20
experiments prove that the invention can accurately predict the height of a human body from only a single depth image, and the tested person can be positioned at any position in the image to make any posture.
The above-mentioned embodiments are intended to illustrate the objects, aspects and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only examples of the present invention, and are not intended to limit the present invention, and any modifications, equivalent substitutions, improvements and the like within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (9)
1. The method for estimating the height of the human body in any posture based on the single depth image and the multi-stage neural network can accurately measure the height of the human body in any position and any posture, and the average prediction accuracy rate reaches 99.1%. Our technique can be summarized as the following steps.
And S1, acquiring data, wherein the step acquires a human height data set with 2136 depth images by using the depth camera. The tested person can be positioned at any position in the acquisition range of the depth camera and can be put in any posture, including non-upright postures such as sitting, bending, walking and the like. The height of each volunteer and the lengths of four approximately rigid parts, namely the head, the upper body, the thigh and the shank, are measured and recorded. And marking the real values of the corresponding trunk images and the real values of the body part images for each depth image.
And S2, extracting the edge image, and extracting edge high-frequency information in the original depth image by using an edge detection operator.
S3, segmenting human body trunk, and designing a convolutional neural network f1(x) And the human body image is extracted from the original depth image and the edge image.
S4, recognizing body parts and designing a convolutional neural network f2(x) The torso image is further segmented into four approximately rigid body parts: head, upper body, thigh and calf, and obtaining body position image.
S5, predicting the length of the body part, and designing a convolutional neural network f3(x) The lengths of four approximately rigid body parts, i.e., the probe, the upper body, the thigh, and the calf, are predicted from the body part image and the original depth image.
S6, predicting the height of human body, designing a convolutional neural network f4(x) The human height is predicted through the original depth image, the body part image and the body part length. Meanwhile, different pooling strategies are adopted according to different input data characteristics, and the original input data is input into each convolution layer by using a jump connection structure.
S7, designing a developing neural network, predicting task characteristics based on height and a convolutional neural network f4(x) The framework of (2) and a training method of a network structure changing along with the iteration times are designed, and the fitting state is repeatedly destroyed until the neural network finds a global optimal solution.
2. The method of any pose human height estimation based on a single depth image and a multi-stage neural network of claim 1, wherein: and only using a single depth image to predict the height of the human body in a non-contact mode. The input data adopted in all steps of the invention are original depth images or intermediate expressions obtained by the original depth images. Only a low-price commercial-grade depth camera is used for collecting depth data to serve as an original depth image, so that equipment cost is reduced, and the method is easy to use, practical and popularize. The depth image is obtained by using an infrared technology, and the performance is not influenced even if no external light source exists, so that the method can be applied to the fields of night security monitoring and the like. The body height of a human body is predicted in the image in a non-contact full-automatic mode, only one image is needed to be shot, a measurer does not need to be in direct contact with a detected person, and the method is suitable for measuring the body height data of the human body in epidemic situation prevention and control normalization periods, physical examination, riding, ticket selling and other situations. The depth image does not contain human face features and clothing texture features, and the fact that the neural network learns the identity features of the detected person can be avoided only by using a single depth image, so that interference is generated on height estimation.
3. The method of any pose human height estimation based on a single depth image and a multi-stage neural network of claim 1, wherein: and extracting height prediction related features by using a multi-stage neural network, and converting the height estimation problem into a plurality of local small problems. By convolutional neural networks f1(x) Obtaining the image of the human body, and obtaining the image of the human body through a convolution neural network f2(x) Obtaining body part images through a convolution neural network f3(x) And obtaining the length of the body part, so that the human height estimation is decomposed into four approximately rigid parts of the head, the upper half body, the thigh and the shank which are respectively predicted, and the four parts of results are integrated into the measured human height. The benefits of this are: the height prediction is decomposed into four rigid part predictions, which is an easier problem; the lengths of the four body parts and the topological relationship between them can suggest the posture of the human body, thereby providing a favorable clue for height estimation; the height prediction problem is divided into four small local problems, and the excellent performance of the convolutional neural network on local perception can be fully utilized.
4. The base of claim 1The method for estimating the height of the human body in any posture based on a single depth image and a multi-stage neural network further comprises the following steps: convolutional neural network f1(x) And the method is used for segmenting the human body image from the single depth image. Since in the field of height measurement we define the distance between the apex of the head and the point of the sole of the foot when the body is upright as height, the positioning of the apex of the head and the point of the sole of the foot is of great importance. In the existing human body segmentation method, the segmentation at the body edge is often inaccurate, which influences the selection of the head vertex and the foot bottom point. Our method uses Canny operator to extract edge information from depth image, and enhances the edge of human body segmentation image.
E=Canny(X)
X represents the original depth image acquired by the camera and E represents the corresponding edge image extracted.
f1(x) Including five downsampling and five upsampling modules. In the up-sampling module, a module I and a module II both comprise 2 convolutional layers and activation functions, and a module III, a module IV and a module V all comprise 3 convolutional layers and activation functions. The down-sampling module is symmetrical to the up-sampling module, the first module, the second module and the third module all comprise 3 convolution layers and activation functions, and the fourth module and the fifth module all comprise 2 convolution layers and activation functions. Our method takes the original depth image X and the edge image E as input, and passes through a convolutional neural network f1(x) To obtain a human body trunk segmentation image prediction image T'.
T′=f1(X,E)
Convolutional neural network f1(x) Loss value Loss1 uses the average of the sum of squares of pixel-by-pixel differences of the predicted torso image and the true torso image, and the optimizer uses Adam.
N is the total number of pixels in the image, i is a certain pixel point in the cyclic variable representation image, and T is a real trunk image. The high-frequency information of the human body edge is input into the neural network, so that the accuracy of the edge in the human body trunk segmentation graph can be obviously improved.
5. The arbitrary pose human height estimation method based on a single depth image and a multi-stage neural network of claim 1, further comprising: convolutional neural network f2(x) The method is used for further dividing the human body image into four approximately rigid parts, namely a head, an upper body, thighs and calves, so as to obtain the human body image. Convolutional neural network f2(x) And f1(x) The network structures of the modules are the same, and each module comprises 5 down-sampling modules and 5 up-sampling modules, and each module comprises a plurality of convolution layers and an activation function. To a convolutional neural network f2(x) The prediction graph L ' of the human body part segmentation image is obtained by inputting the prediction graph T ' of the human body trunk segmentation image in the middle, because the prediction graph L ' of the human body part segmentation image is obtained by the convolutional neural network f1(x) The obtained trunk segmentation image T' has errors, so the images are simultaneously f2(x) The original depth image X is input to avoid accumulation of errors.
L′=f2(X,T′)
Convolutional neural network f2(x) Loss value Loss2 is the average of the sum of squared pixel-by-pixel differences between the predicted body region image and the actual body region image.
N is the total number of pixels in the image, i is a certain pixel point in the cyclic variable representation image, and L is a real human body part image. The optimizer employs Adam to minimize the error of predicted and real images until the network can robustly achieve convergence.
6. The arbitrary pose human height estimation method based on a single depth image and a multi-stage neural network of claim 1, further comprising: convolutional neural network f3(x) The method is used for predicting the lengths of four approximately rigid parts, namely the measuring head, the upper half body, the thigh and the lower leg. Convolutional neural network f3(x) Comprising 13 convolutional layers and 5 fully-connected layers and correspondingAnd the activation functions comprise 13 convolutional layers to form 5 upsampling modules, the first module and the second module respectively comprise 2 convolutional layers and activation functions, and the third module, the fourth module and the fifth module respectively comprise 3 convolutional layers and activation functions. And outputting a 1 x 4 vector representing the length of 4 body parts through 6 fully-connected layers, wherein the number of nodes of the 6 fully-connected layers is respectively as follows: 4096. 4096, 1000, 256, 64, 4. To a convolutional neural network f3(x) Meanwhile, in order to reduce accumulated errors, an original depth image X is input into the network, and length estimated values of 4 body parts including the head, the upper body, the thigh and the lower leg are obtained.
[Hhead Hupperbody Hthigh Hcalf]1*4=f3(X,L)
HheadIs the head length, HupperbodyTo upper body length, HthighIs thigh length, HcalfIs the calf length.
The Loss value Loss3 uses the sum of the squares of the difference of the predicted four-part length and the true value.
Loss3=|Hhead-THhead|2+|Hupperbody-THupperbody|2+|Hthigh-THthigh|2+|Hcalf-THcalf|2
THhead,THupperbody,THthigh,THcalfRepresenting the actual length of the head, upper body, thigh and calf, respectively. The optimizer uses Adam to predict the lengths of the 4 approximately rigid sites separately by minimizing the loss values.
7. The arbitrary pose human height estimation method based on a single depth image and a multi-stage neural network of claim 1, further comprising: convolutional neural network f4(x) The height of the human body is estimated by the body part image, the body part length and the original depth image. Convolutional neural network f4(x) Comprises 13 convolutional layers and 7 full-connection layers and corresponding activation functions, wherein the 13 convolutional layers form 5 or moreAnd the sampling module, the first module and the second module respectively comprise 2 convolutional layers and activation functions, the third module, the fourth module and the fifth module respectively comprise 3 convolutional layers and activation functions, the last convolutional layer result is expanded into a one-dimensional vector, and the estimated height value is output through 7 full-connection layers, wherein the number of nodes of the 7 full-connection layers is 4096, 1000, 256, 64, 16 and 1. In order to reduce accumulated errors and improve the accuracy of prediction, an original depth image X and a human body part segmentation image prediction image L' are input into a convolutional neural network f together4(x) In (1). Meanwhile, the invention also adopts a jump structure, and adopts different pooling strategies according to the characteristics of different input data: and adopting an average pooling strategy for the depth image and a maximum pooling strategy for the human body part segmentation image, so that input data with different scales are directly input into each convolution layer.
[Hhuman]=f4(X,L′,H4-part)
HhumanIs an estimate of body height, H4-partAre predicted values of the 4 approximate rigid body part lengths in step S5.
Convolutional neural network f4(x) Loss value Loss4 uses the square of the difference between the estimated height and the actual height.
Loss4=|Hhuman-THhuman|2
THhumanIs the true value of the height of the human body. The optimizer estimates body height by minimizing the loss value using Adam.
8. The arbitrary pose human height estimation method based on a single depth image and a multi-stage neural network of claim 7, further comprising: a hopping connection structure with a hybrid pooling strategy. Convolutional neural network f4(x) Having 13 convolution layers, at f4(x) The problem that the gradient disappears under the condition that the network layer number is deep can be solved by adding the jump connection, and meanwhile, the reverse propagation of the gradient is facilitated, and the training process is accelerated. In steps S3, S4 and S5, we use three neural networks to acquire the lengths of the head, upper body, thigh and lower leg of the human bodyEstimate and input as input data to the convolutional neural network f4(x) In the method, because the estimated value predicted by each neural network has an error with the true value, and the current error is input into the neural network of the next stage, the error is increased. We use the skip-join structure to input the original depth image X and the body part image L directly into the convolutional neural network f4(x) To minimize the accumulated error.
Since the original depth image X and the body part image L have different characteristics: the original depth image X has a plurality of noise points, which is expressed on the depth image, namely a plurality of extreme points, and if a maximum pooling strategy is simply used, the noise points are reserved, the depth information is lost, the network learning is interfered, and a larger error is caused; the body part image L is smoother and if an average pooling strategy is used, the gradient at the body part boundary is reduced, introducing errors. Therefore, an average pooling strategy is adopted for the original depth image X, noise of the depth image is smoothed, and the accurate original depth image under different scales is input into a network. And adopting a maximum pooling strategy for the body part image L, keeping the accuracy of the segmentation boundary, and still keeping gradient information when the image size is reduced.
Lnext=Maxpool(Lnow)
Xnext=Avgpool(Xnow)
LnowIs a body part image of the current scale, LnextIs the body part image of the next scale. XnowFor depth images of the current scale, XnextIs the depth image of the next scale.
The original information is input into each convolution module by adopting a jump connection structure of a mixed pooling strategy, so that the original image can be kept undistorted under each scale to the maximum extent, and the accuracy of network prediction is improved.
9. The method for arbitrary pose body height estimation based on a single depth image and a multi-stage neural network of claim 7, further comprising: developing neural networksArchitecture and a training method. The invention proposes to develop a neural network and apply it to a convolutional neural network f4(x) So as to improve the network precision, reduce the training time and prevent overfitting.
The main idea of developing neural networks is: in the network training process, when the network tends to converge, the architecture of the convolution layer part of the network is adjusted to jump out of the local minimum value, and a global optimal solution is searched.
The specific method comprises the following steps: in a convolutional neural network f4(x) When the number of iterations is less than 4 x 10 in the training process4In time, the network is pre-trained, the neural network f4(x) 13 convolution layers of the middle 5 modules are all in working states; when the number of iterations equals 4 x 104Then, storing the pre-training model; when the number of iterations is greater than 4 x 104And less than 6 x 104Time, neural network f4(x) Each convolution module only reserves the first convolution layer, and the total number of the convolution layers is 5 for training; when the number of iterations is greater than 6 x 104And less than 8 x 104And then, the first module and the second module recover one convolutional layer from the pre-training model respectively, and at the moment, 7 convolutional layers in the network participate in training. When the number of iterations is more than 8 x 104And less than 1 x 105And then, recovering one convolutional layer from the pre-training model by using a module three, a module four and a module five, wherein 10 convolutional layers in the network participate in training. When the number of iterations is greater than 1 x 105In time, module three, module four, and module five each recover one convolutional layer from the pre-trained model, i.e., recover the first 13 convolutional layers.
Developing a neural network can be accomplished by fitting and destroying the fitting conditions iteratively until the network finds a globally optimal solution. Since when the number of iterations is greater than 4 x 104And less than 1 x 105Only part of the convolutional layer is trained, so that the network training time can be reduced. When the network tends to be convergent, overfitting of the network is prevented by increasing or decreasing the convolution layer, so that the network jumps out of a local optimal solution to search a global optimal solution, and the accuracy of height estimation is effectively improved.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110150551.6A CN112861699A (en) | 2021-02-03 | 2021-02-03 | Method for estimating height of human body in any posture based on single depth image and multi-stage neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110150551.6A CN112861699A (en) | 2021-02-03 | 2021-02-03 | Method for estimating height of human body in any posture based on single depth image and multi-stage neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112861699A true CN112861699A (en) | 2021-05-28 |
Family
ID=75987778
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110150551.6A Pending CN112861699A (en) | 2021-02-03 | 2021-02-03 | Method for estimating height of human body in any posture based on single depth image and multi-stage neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112861699A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023047157A1 (en) * | 2021-09-23 | 2023-03-30 | Sensetime International Pte. Ltd. | Image generating methods and apparatuses, and detecting methods and apparatuses |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109285139A (en) * | 2018-07-23 | 2019-01-29 | 同济大学 | A kind of x-ray imaging weld inspection method based on deep learning |
CN109522924A (en) * | 2018-09-28 | 2019-03-26 | 浙江农林大学 | A kind of broad-leaf forest wood recognition method based on single photo |
CN110020606A (en) * | 2019-03-13 | 2019-07-16 | 北京工业大学 | A kind of crowd density estimation method based on multiple dimensioned convolutional neural networks |
-
2021
- 2021-02-03 CN CN202110150551.6A patent/CN112861699A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109285139A (en) * | 2018-07-23 | 2019-01-29 | 同济大学 | A kind of x-ray imaging weld inspection method based on deep learning |
CN109522924A (en) * | 2018-09-28 | 2019-03-26 | 浙江农林大学 | A kind of broad-leaf forest wood recognition method based on single photo |
CN110020606A (en) * | 2019-03-13 | 2019-07-16 | 北京工业大学 | A kind of crowd density estimation method based on multiple dimensioned convolutional neural networks |
Non-Patent Citations (1)
Title |
---|
YIN FUKUN等: "Accurate Estimation of Body Height from a Single Depth Image via a Four-Stage Developing Network", 020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, pages 8264 - 8273 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023047157A1 (en) * | 2021-09-23 | 2023-03-30 | Sensetime International Pte. Ltd. | Image generating methods and apparatuses, and detecting methods and apparatuses |
AU2021240272A1 (en) * | 2021-09-23 | 2023-04-06 | Sensetime International Pte. Ltd. | Image generating methods and apparatuses, and detecting methods and apparatuses |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111598998B (en) | Three-dimensional virtual model reconstruction method, three-dimensional virtual model reconstruction device, computer equipment and storage medium | |
CN110827342B (en) | Three-dimensional human body model reconstruction method, storage device and control device | |
Paragios et al. | Non-rigid registration using distance functions | |
WO2017133009A1 (en) | Method for positioning human joint using depth image of convolutional neural network | |
US20070196007A1 (en) | Device Systems and Methods for Imaging | |
Chaudhari et al. | Yog-guru: Real-time yoga pose correction system using deep learning methods | |
CN109978037A (en) | Image processing method, model training method, device and storage medium | |
CN110853111B (en) | Medical image processing system, model training method and training device | |
JP2022517769A (en) | 3D target detection and model training methods, equipment, equipment, storage media and computer programs | |
CN111160111B (en) | Human body key point detection method based on deep learning | |
CN113111767A (en) | Fall detection method based on deep learning 3D posture assessment | |
Loureiro et al. | Using a skeleton gait energy image for pathological gait classification | |
CN111507184B (en) | Human body posture detection method based on parallel cavity convolution and body structure constraint | |
Yan et al. | Cine MRI analysis by deep learning of optical flow: Adding the temporal dimension | |
CN111462274A (en) | Human body image synthesis method and system based on SMP L model | |
CN106846372A (en) | Human motion quality visual A+E system and method | |
CN113229807A (en) | Human body rehabilitation evaluation device, method, electronic device and storage medium | |
CN114387317B (en) | CT image and MRI three-dimensional image registration method and device | |
CN113822792A (en) | Image registration method, device, equipment and storage medium | |
CN115346272A (en) | Real-time tumble detection method based on depth image sequence | |
CN117274599A (en) | Brain magnetic resonance segmentation method and system based on combined double-task self-encoder | |
CN112861699A (en) | Method for estimating height of human body in any posture based on single depth image and multi-stage neural network | |
CN116843725B (en) | River surface flow velocity measurement method and system based on deep learning optical flow method | |
CN113822323A (en) | Brain scanning image identification processing method, device, equipment and storage medium | |
CN117392746A (en) | Rehabilitation training evaluation assisting method, device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |