CN111709269A - Human hand segmentation method and device based on two-dimensional joint information in depth image - Google Patents

Human hand segmentation method and device based on two-dimensional joint information in depth image Download PDF

Info

Publication number
CN111709269A
CN111709269A CN202010332317.0A CN202010332317A CN111709269A CN 111709269 A CN111709269 A CN 111709269A CN 202010332317 A CN202010332317 A CN 202010332317A CN 111709269 A CN111709269 A CN 111709269A
Authority
CN
China
Prior art keywords
dimensional
dimensional joint
human hand
depth
points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010332317.0A
Other languages
Chinese (zh)
Other versions
CN111709269B (en
Inventor
左德鑫
邓小明
马翠霞
王宏安
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Software of CAS
Original Assignee
Institute of Software of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Software of CAS filed Critical Institute of Software of CAS
Priority to CN202010332317.0A priority Critical patent/CN111709269B/en
Publication of CN111709269A publication Critical patent/CN111709269A/en
Application granted granted Critical
Publication of CN111709269B publication Critical patent/CN111709269B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a human hand segmentation method and a human hand segmentation device based on two-dimensional joint information in a depth image. The method comprises the following steps: acquiring the position of a two-dimensional joint point of a human hand in the depth image by using a two-dimensional joint point detection network; acquiring three-dimensional key points of the human hand by using the two-dimensional joint points and combining the depth image; calculating a three-dimensional directional bounding box of the human hand by using the three-dimensional key points; and filtering the depth image by using the three-dimensional directional bounding box to obtain the well-segmented human hand area. The invention provides a human hand two-dimensional joint point detection method based on a deep neural network, then provides a conversion method from two-dimensional joint points to three-dimensional key points, and finally provides a three-dimensional bounding box and a depth value filtering mode. Through practical use verification, the method has the advantages of high automation degree, high precision and high speed, and can meet professional or popular application requirements.

Description

Human hand segmentation method and device based on two-dimensional joint information in depth image
Technical Field
The invention belongs to the field of computer vision and computer image processing, and particularly relates to a human hand depth image segmentation method and device based on two-dimensional joint points.
Background
The hand posture estimation and the gesture understanding are hot problems in the field of computer vision and human-computer interaction, and are widely applied to scenes such as virtual reality, augmented reality, auxiliary design and the like, and the accurate hand posture estimation and the gesture understanding have great application value and research value. The human hand segmentation algorithm aims at semantically segmenting human hand parts and non-human hand parts in the image, is an important preprocessing link for a computer to understand gestures, and solves the problem of human hand segmentation.
Currently, the mainstream human hand depth image data set (e.g. NYU, handles 2017, ICVL, MSRA) generally provides a depth image of a human hand and joint points, wherein the data set giving Mask for segmenting the human hand only accounts for a few (NYU), so that the position information of the joint points becomes the main basis for obtaining the Mask of the human hand. The human hand joint points comprise various key positions (joints, wrists, palms and the like) of the human hand. The three-dimensional joint point is the coordinate of the joint point in a three-dimensional space and is represented by three scalars. The two-dimensional joint point is the coordinate of the joint point on the plane of the image and is represented by two scalars. The three-dimensional joint points can calculate the three-dimensional bounding box of the object more easily than the two-dimensional joint points so as to obtain the human hand area, but for data without labels, the acquisition of accurate three-dimensional joint points is more difficult than the acquisition of two-dimensional joint points, so how to use the two-dimensional joint points and the depth map information are the key for solving the human hand segmentation problem on the depth map.
Disclosure of Invention
The invention provides a human hand segmentation method and a human hand segmentation device based on two-dimensional joint information in a depth image, and mainly solves the problem of how to segment a human hand region from a single depth image.
The invention discloses a human hand segmentation method based on two-dimensional joint information in a depth image, which comprises the following steps of:
a human hand segmentation method based on two-dimensional joint information in a depth image comprises the following steps:
acquiring the position of a two-dimensional joint point of a human hand in the depth image by using a two-dimensional joint point detection network;
acquiring three-dimensional key points of the human hand by using the two-dimensional joint points and combining the depth image;
calculating a three-dimensional directional bounding box of the human hand by using the three-dimensional key points;
and filtering the depth image by using the three-dimensional directional bounding box to obtain the well-segmented human hand area.
Furthermore, the two-dimensional joint point detection network is mainly an hourglass network, global information and deep features are extracted by utilizing convolution and down-sampling of the hourglass network, required output is decoded by convolution and up-sampling, and the decoded features are guaranteed to contain deep semantic information and shallow morphological features by adding jump connection.
Further, when the two-dimensional joint detection network is trained, firstly preprocessing training data, including scaling to a standard size, normalizing, and acquiring a heat map label; the two-dimensional joint point detection network acquires the specific position of the two-dimensional joint point by using the preprocessed image as input; the output of the two-dimensional joint detection network is a heat image.
Furthermore, the output of the two-dimensional joint point detection network is a heat map of J channels, each channel corresponds to a type of joint point, each pixel contains a scalar value, the probability that a pixel point is used as a J-th type joint point is reflected, and the position of a point with the maximum probability is used as the coordinate of the joint point.
Further, the acquiring three-dimensional key points of the human hand by using the two-dimensional joint points and combining the depth image comprises the following steps: and estimating effective depth values of the adjacent areas of the two-dimensional joint points, and combining the two-dimensional joint points with the effective depth values to finish the conversion from the two-dimensional joint points to the three-dimensional key points.
Further, when calculating the effective depth value, the Gaussian mixture model is used to estimate the distribution of the foreground depth value, the background depth value and the segmented entity depth value, so as to eliminate the interference of the noise depth value.
Further, the main axis direction of the three-dimensional directional bounding box is obtained through principal component analysis of the three-dimensional key points, and the length of the three-dimensional directional bounding box is the corresponding proportion of the projection of the three-dimensional key points on the main axis.
Further, when the depth map is filtered by using the three-dimensional directed bounding box, whether each pixel point of the original depth map is in the box or not is judged, and the acceleration is realized through GPU parallel calculation.
A human hand segmentation apparatus based on two-dimensional joint information in a depth image, comprising:
the two-dimensional joint point detection module is responsible for constructing a two-dimensional joint point detection network and obtaining the position of the two-dimensional joint point of the human hand in the depth image by utilizing the two-dimensional joint point detection network;
the key point acquisition module is responsible for acquiring three-dimensional key points of the human hand by utilizing the two-dimensional joint points and combining the depth image;
the bounding box calculation module is responsible for calculating the three-dimensional directed bounding box of the human hand by using the three-dimensional key points;
and the hand segmentation module is responsible for filtering the depth image by utilizing the three-dimensional directed bounding box to obtain a well segmented hand region.
Further, the apparatus further comprises:
the data preprocessing module is responsible for preprocessing the training data of the two-dimensional joint detection network, zooming the original depth map to a standard size, normalizing and acquiring a heat map label;
and the network construction and training module is responsible for constructing and training the two-dimensional joint detection network and is used for detecting the coordinates of the two-dimensional joint on the image plane.
The invention has the advantages and beneficial effects that:
the invention mainly solves the problem of human hand segmentation by using a human hand joint point prediction data set without Mask marks. The invention provides a segmentation algorithm based on two-dimensional joint point prediction and joint point region depth value clustering, which can eliminate the interference of foreground and background under the condition of large predicted two-dimensional joint point error and obtain accurate hand segmentation. Through practical use verification, the method has the advantages of high automation degree, high precision and real-time performance, and can meet professional or popular application requirements.
Compared with the directed bounding box directly calculated based on the three-dimensional joint points, the method is more accurate because the depth information of the joint labels of the partial data set is not particularly accurate, thereby resulting in incomplete segmentation. The method has advantages in some specific occasions, for example, when manual interactive annotation is carried out, the operation difficulty of annotating the three-dimensional joint points accurately is very high for annotators of the PC platform, the method can automatically combine the depth information of the image, and the depth information of the hand is deduced under the condition that the depth information annotation is missing.
The method obtains the area to be segmented by predicting the two-dimensional joint points, and estimates the distribution conditions of the foreground depth value, the background depth value and the segmented entity depth value by using a Gaussian mixture model to eliminate the interference of the noise depth value. Through practical use verification, the algorithm has high tolerance on errors of two-dimensional joint points and accurate segmentation.
Drawings
FIG. 1 is an overall flow diagram of the method of the present invention.
Fig. 2 is a general block diagram of a human hand two-dimensional joint detection network based on deep learning.
Figure 3 is a block diagram of an hourglass network.
Fig. 4 is a structural diagram of a residual module.
Fig. 5 is a graph showing the results of the present invention in actual testing.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, the present invention shall be described in further detail with reference to the following detailed description and accompanying drawings.
FIG. 1 is a general flowchart of the human hand segmentation method based on two-dimensional joint information in depth images of the present invention. According to the method, for an input depth map, a two-dimensional joint point detection network is used for obtaining the position of a two-dimensional joint point of a human hand, the two-dimensional joint point is used for obtaining a three-dimensional key point of the human hand by combining the depth map, an oriented bounding box is calculated by using the three-dimensional key point, the depth map is filtered by using the three-dimensional oriented bounding box, and a well-segmented human hand area is obtained.
The data preprocessing method adopted by the invention, the specific structure of the two-dimensional joint detection network, the adopted loss function, the clipping based on the two-dimensional joint, the removal of the noise depth value, the calculation of the bounding box and the calculation of the Mask are sequentially described below.
Step 1: pre-processing of training data
The original depth map is scaled to a standard size, which in this method is set to height × width, normalized (minus the mean of the depth map, divided by the difference between the maximum depth and the minimum depth of the depth camera).
Obtaining a label: the label required by the training joint point detection network is a heat map (heatmap), namely a three-dimensional tensor (tensor) with the size of (height, width, joint), and the label given by a common data set is the joint point position with the size of (J, 3) (J represents the number of joints and is equal to the number of channels of an output picture), and the corresponding heat map needs to be calculated according to the three-dimensional joint points. Let the label of the three-dimensional joint point be (u)gt,vgt,dgt) Wherein u isgtRepresenting the abscissa, v, of the joint on the picturegtRepresenting the ordinate of the joint on the picture, dgtThe depth value of the joint on the picture is represented, any position in the tensor is marked as (u, v, j), wherein u represents an upper horizontal coordinate of a certain pixel on the heat map, v represents a vertical coordinate of the certain pixel on the heat map, and j represents the number of the channel. Value H of the positionGTThe formula for (u, v, j) is:
Figure BDA0002465400600000041
where σ is a fixed number, sxAnd syThe calculation formula of (2) is as follows:
Figure BDA0002465400600000042
wherein f isx,fyIs the focal length parameter of the camera.
If there are only two-dimensional articulation points (u)gt,vgt),HGTThe calculation formula of (u, v, j) becomes:
Figure BDA0002465400600000043
the preprocessed samples that can be used for training are represented as
Figure BDA0002465400600000044
Wherein N represents the number of samples, Di
Figure BDA0002465400600000045
Respectively, a normalized depth map and a corresponding heat map of the ith sample.
Step 2: construction and training of two-dimensional joint point detection network
The present invention proposes a convolutional neural network of the hourglass (hour glass) type for predicting two-dimensional joint points. The design principle of the network is that global information and deep features are extracted by utilizing convolution and down sampling of an hourglass network, required output is decoded by utilizing convolution and up sampling, and jumping connection is added to ensure that the decoded features comprise deep semantic information and shallow morphological features, namely bottom-layer features and high-layer features of an image are utilized.
The basic modules of the two-dimensional joint point detection network are a residual block (residual block) and a convolution block (convolution block), and each residual block comprises a convolution module and a jump connection. The convolution module is a repeated superposition of convolutional layers, batch regularization (BN) layers, and Linear rectification function (ReLU) layers. In the residual block, the input of the residual block enters the convolution block on the one hand, and is combined with the output of the convolution block in an additive manner on the other hand by means of jump connection.
Fig. 2 shows the overall structure of a two-dimensional joint detection network, and the depth map needs to be first symmetrically padded (symmetry padding) and pooled (pooling) to meet the size requirement. Before inputting into an hourglass module (hourglass network), a plurality of convolution layers and a residual module are required to carry out primary feature extraction. The output of the hourglass module needs to be convolved with several subsequent layers to achieve a desired number of channels. Residual represents a Residual module, K represents the size of the convolution kernel of the layer, c represents the number of channels output by the layer, S represents the step size of the layer, and pad represents how many pixels are filled in the height and width dimensions of the layer respectively. (480,640,1) and (J,2) represent shapes (shape) of the input and output tensors, respectively.
Figure 3 shows a block diagram of the hourglass network. Each cuboid is formed by connecting three residual modules, the residual modules have the same output channel number (all 256), and corresponding pooling and up-sampling operations are performed among cuboids with different sizes. The plus-signed circle connects the two inputs, representing adding the inputs bit-by-bit.
The residual block shown in fig. 4 includes several convolutional layers, among which are a batch regularization (BN) layer and a Rectified Linear Unit (ReLU) layer. c denotes the number of channels of the convolution block output, and the parameters of the convolution kernel are given in fig. 4, "P ═ same'" denotes that the filled pixel needs to have the output and input of the layer equal in height and width. "+" indicates a bitwise addition operation.
The input of the two-dimensional joint detection network is a depth map with the size of (H, W), the output is a heat map with the size of (H/s, W/s, J), and s is the scale of image reduction after the network. The loss function of the network is:
Figure BDA0002465400600000051
wherein HpredIs a heat map of the network output, HGTIs a real heat map, H, W, J represents the height and width of the output picture respectivelyAnd the number of channels. The output of the network is a heat map of J channels, each channel corresponds to a class of joint points, each pixel contains a scalar value reflecting the probability of the pixel point as the J-th class of joint points, the method uses the position of the point with the highest probability as the coordinate of the joint point, thus the two-dimensional joint point coordinates (u, v)jThe calculation method is as follows:
Figure BDA0002465400600000052
where s is the scale of the image reduction after passing through the network.
The optimizer used in the network of the present invention is Adam, the learning rate is initially set to 0.001, and decays exponentially as the number of training steps increases.
And step 3: depth value estimation
The two-dimensional coordinates J of the joint point can be obtained by the step 22DHowever, only two-dimensional information cannot accurately obtain a three-dimensional bounding box, so that it is necessary to use J2DAnd obtaining the depth information of the corresponding position on the original depth map. It is worth noting that due to the presence of occlusion, J is utilized2DDirectly acquiring three-dimensional joint point J by depth map3DIs difficult. However, J may be used2DObtain points important for segmentation, using J2DThe advantage of this is that it contains all the joint information and is approximately similar to the contour of the hand, so that no important parts are missed. Can be used as a handle J2DThe points extracted from the depth map are called key points, P3DAnd (4) showing. P3DThe calculation method of (2) is as follows.
P3DThe acquisition of (a) is subject to the following several principles,
1. experiments show that the depth mean value in a certain area is easily influenced by the depth of surrounding points, is not directly used as the depth of adjacent points, and the depth value is J in order to keep the accuracy of the depth value2DOf the nearest neighbor point.
2. The depth value of the adjacent point is greatly influenced by the noise depth value under the condition that the prediction error of the two-dimensional joint point is large, J2DWhen the estimation is inaccurate, the foreground or the background is easy to get and becomes a foreground depth value or a background depth value, so that the effectiveness of the depth of the neighboring point needs to be judged. If it is determined as an invalid point, a valid point P is requiredaltInstead of this.
The principle of removing the noise depth value is as follows: and counting the probability distribution of the depth values in the field, wherein the foreground depth value, the segmented entity and the background depth value can be described by a Gaussian mixture model containing three components, and the depth value corresponding to the center of the Gaussian component closest to the prefetch target is taken. When the method is realized, an Expectation Maximization (EM) algorithm can be approximately simulated by using k-means clustering.
And cutting a picture area corresponding to the two-dimensional joint point, carrying out k-means clustering on the depth value, obtaining candidate depth by using a defined rule, and combining the candidate depth with the two-dimensional joint point to form a three-dimensional key point. The specific implementation method is as follows:
using J2DCalculating the minimum bounding box, cutting out an area, and calculating the depth median and recording as davg
By using
Figure BDA0002465400600000061
Finding out the nearest pixel point of the two-dimensional coordinate on the original image to obtain the depth value thereof
Figure BDA0002465400600000062
Where i represents the number of the joint point.
Substitution points
Figure BDA0002465400600000063
The calculation of (2): for each point
Figure BDA0002465400600000064
Centered thereon, a picture area of size 50 × 50 (the size being adjustable depending on the depth map resolution) is cropped, if the cropped area has no depth value, the cropping size is enlarged by a certain proportion until there is a depth value.
K-means clustering with k being 3 is carried out on the depth value of the area, and the class center is processedThe depth values of (a) are sorted, and the class center arranged in the middle is taken as
Figure BDA0002465400600000065
Depth value closest to
Figure BDA0002465400600000066
Is marked as
Figure BDA0002465400600000067
If it is not
Figure BDA0002465400600000068
To represent
Figure BDA0002465400600000069
Is a noise depth value and should be used
Figure BDA00024654006000000610
Instead of the former
Figure BDA00024654006000000611
By using
Figure BDA00024654006000000612
Instead of the former
Figure BDA00024654006000000613
As diOtherwise
Figure BDA00024654006000000614
Directly as di. In this example dthresholdSet to 100.
Therefore, it is not only easy to use
Figure BDA00024654006000000615
That is to say diSplice to the 2D points.
And step 3: depth value estimation (acceleration type)
For each point D with depth of the cut out regionj=(uj,vj,dj)(j represents the number of the pixel) calculates two distances, let
Figure BDA0002465400600000071
dthreshIs a fixed value, the meaning of the formula is when djAnd when the average depth is not much different, the
Figure BDA0002465400600000072
Are ignored. According to the imaging principle of the camera,
Figure BDA0002465400600000073
a ray l in space can be determinediLet us order
Figure BDA0002465400600000074
Is DjTo liThe distance of (a) to (b),
Figure BDA0002465400600000075
the calculation method is as follows:
Figure BDA0002465400600000076
and 4, step 4: computation of directed bounding boxes
Computing a directional bounding box requires computing the center point of the box and the direction and size of the three axes. Center P of the box3DIs calculated to obtain the average value of (1). By P3DThe three eigenvectors after Principal Component Analysis (PCA) are taken as three principal axes of the box, and the length of the principal axis is represented by P3DThe length of the projection interval of (a) is determined, and if necessary, each axis can be extended in an appropriate proportion.
And 5: filtering of depth maps
The depth map area inside the bounding box needs to be acquired, and the method is to judge whether each pixel point of the original depth map is inside the box. The pixels of the depth map are converted into point clouds formed by three-dimensional points by using camera parameters, whether the pixels are in the bounding box or not is judged point by point, and only the pixels corresponding to the points in the bounding box are reserved. The retained pixels constitute the segmented hand portion and the remaining portions of the depth map constitute the non-hand portion.
The bounding box outward direction is set to the positive direction, and each face of the bounding box can determine the parameters (a, b, c, d) of a set of equations of the triplet:
ax+by+cz+d=0
firstly, converting the pixel point combination depth value to be judged into point cloud of a real space coordinate system by using camera parameters, bringing each point to the left side of six equations determined by six surfaces, and judging whether the results have the same positive and negative. If all positive or all negative then the pixel can be determined to be inside the bounding box, that is to say part of the hand.
The speed is accelerated by GPU parallel computation during the implementation of the step.
Fig. 5 shows the segmentation effect of the method. The input depth map, the predicted two-dimensional joint point, the predicted three-dimensional oriented bounding box and the final segmentation result are sequentially arranged from left to right, namely the 1 st column to the 4 th column.
The scheme of the invention can be realized by software or hardware, such as:
in one embodiment, a human hand segmentation device based on two-dimensional joint information in a depth image is provided, which includes:
the two-dimensional joint point detection module is responsible for constructing a two-dimensional joint point detection network and obtaining the position of the two-dimensional joint point of the human hand in the depth image by utilizing the two-dimensional joint point detection network;
the key point acquisition module is responsible for acquiring three-dimensional key points of the human hand by utilizing the two-dimensional joint points and combining the depth image;
the bounding box calculation module is responsible for calculating the three-dimensional directed bounding box of the human hand by using the three-dimensional key points;
and the hand segmentation module is responsible for filtering the depth image by utilizing the three-dimensional directed bounding box to obtain a well segmented hand region.
In addition, the apparatus may further include:
and the data preprocessing module is responsible for preprocessing data before being input into the neural network (preprocessing training data of the two-dimensional joint detection network), zooming the original depth map to a standard size, normalizing and acquiring a heat map label.
And the network construction and training module is responsible for constructing a two-dimensional joint detection network and is used for detecting the coordinates of the two-dimensional joint on the image plane.
The two-dimensional joint point detection module, the key point acquisition module, the bounding box calculation module and the hand segmentation module, which can also be collectively referred to as a two-dimensional joint point-based hand segmentation module, are responsible for segmenting a hand region, and comprise joint point detection, joint point-to-key point mapping, three-dimensional oriented bounding box calculation, depth map filtering, and finally acquire the hand region.
In another embodiment, an electronic device (computer, server, etc.) is provided comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for performing the steps of the method of the invention.
In another embodiment, a computer readable storage medium (e.g., ROM/RAM, magnetic disk, optical disk) is provided, which stores a computer program that, when executed by a computer, implements the steps of the method of the present invention.
The above embodiments are only intended to illustrate the technical solution of the present invention and not to limit the same, and a person skilled in the art can modify the technical solution of the present invention or substitute the same without departing from the principle and scope of the present invention, and the scope of the present invention should be determined by the claims.

Claims (10)

1. A human hand segmentation method based on two-dimensional joint information in a depth image is characterized by comprising the following steps:
acquiring the position of a two-dimensional joint point of a human hand in the depth image by using a two-dimensional joint point detection network;
acquiring three-dimensional key points of the human hand by using the two-dimensional joint points and combining the depth image;
calculating a three-dimensional directional bounding box of the human hand by using the three-dimensional key points;
and filtering the depth image by using the three-dimensional directional bounding box to obtain the well-segmented human hand area.
2. The method of claim 1, wherein the two-dimensional joint point detection network is mainly an hourglass network, global information and deep features are extracted by convolution and downsampling of the hourglass network, and then required output is decoded by convolution and upsampling, and the decoded features are guaranteed to contain both deep semantic information and shallow morphological features by adding jump connection.
3. The method of claim 1, wherein the two-dimensional joint detection network, when trained, first pre-processes the training data, including scaling to a standard size, normalizing, and obtaining a heat map label; the two-dimensional joint point detection network acquires the specific position of the two-dimensional joint point by using the preprocessed image as input; the output of the two-dimensional joint point detection network is a heat image; the loss function of the two-dimensional joint detection network is as follows:
Figure FDA0002465400590000011
wherein HpredIs a heat map of the network output, HGTIs a true heat map, H, W, J represents the height, width, and number of channels, respectively, of the output picture.
4. The method of claim 3, wherein the output of the two-dimensional joint detection network is a heat map of J channels, each channel corresponding to a class of joint points, each pixel comprising a scalar value reflecting the probability of a pixel being a class J joint point, the position of the point with the highest probability being the coordinate of the joint point, and two-dimensional joint point coordinates (u, v)jThe calculation method is as follows:
Figure FDA0002465400590000012
where u denotes an abscissa of a certain pixel on the heat map, v denotes an ordinate of a certain pixel on the heat map, j denotes a number of a channel, and s is a scale of image reduction after passing through the network.
5. The method of claim 1, wherein acquiring three-dimensional key points of a human hand using two-dimensional joint points in combination with depth images comprises: estimating effective depth values of areas adjacent to the two-dimensional joint points, and completing the conversion from the two-dimensional joint points to the three-dimensional key points by combining the two-dimensional joint points with the effective depth values; when calculating effective depth value, using Gaussian mixture model to estimate foreground depth value, background depth value, and distribution of segmented entity depth value to eliminate interference of noise depth value.
6. The method according to claim 1, wherein the three-dimensional directional bounding box has a principal axis direction obtained by principal component analysis of the three-dimensional key points, and a length corresponding to a proportion of a projection of the three-dimensional key points on the principal axis; when the depth map is filtered by using the three-dimensional directed bounding box, whether the depth map is in the box or not is judged on each pixel point of the original depth map, and the acceleration is realized through GPU parallel calculation.
7. A human hand segmentation device based on two-dimensional joint information in a depth image is characterized by comprising:
the two-dimensional joint point detection module is responsible for constructing a two-dimensional joint point detection network and obtaining the position of the two-dimensional joint point of the human hand in the depth image by utilizing the two-dimensional joint point detection network;
the key point acquisition module is responsible for acquiring three-dimensional key points of the human hand by utilizing the two-dimensional joint points and combining the depth image;
the bounding box calculation module is responsible for calculating the three-dimensional directed bounding box of the human hand by using the three-dimensional key points;
and the hand segmentation module is responsible for filtering the depth image by utilizing the three-dimensional directed bounding box to obtain a well segmented hand region.
8. The apparatus of claim 7, further comprising:
the data preprocessing module is responsible for preprocessing the training data of the two-dimensional joint detection network, zooming the original depth map to a standard size, normalizing and acquiring a heat map label;
and the network construction and training module is responsible for constructing and training the two-dimensional joint detection network and is used for detecting the coordinates of the two-dimensional joint on the image plane.
9. An electronic apparatus, comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for performing the method of any of claims 1 to 6.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a computer, implements the method of any one of claims 1 to 6.
CN202010332317.0A 2020-04-24 2020-04-24 Human hand segmentation method and device based on two-dimensional joint information in depth image Active CN111709269B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010332317.0A CN111709269B (en) 2020-04-24 2020-04-24 Human hand segmentation method and device based on two-dimensional joint information in depth image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010332317.0A CN111709269B (en) 2020-04-24 2020-04-24 Human hand segmentation method and device based on two-dimensional joint information in depth image

Publications (2)

Publication Number Publication Date
CN111709269A true CN111709269A (en) 2020-09-25
CN111709269B CN111709269B (en) 2022-11-15

Family

ID=72536830

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010332317.0A Active CN111709269B (en) 2020-04-24 2020-04-24 Human hand segmentation method and device based on two-dimensional joint information in depth image

Country Status (1)

Country Link
CN (1) CN111709269B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112529911A (en) * 2020-12-07 2021-03-19 重庆大学 Training method of pancreas image segmentation model, image segmentation method and device
CN113379755A (en) * 2021-04-09 2021-09-10 南京航空航天大学 3D point cloud object example segmentation method in disordered scene based on graph

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460338A (en) * 2018-02-02 2018-08-28 北京市商汤科技开发有限公司 Estimation method of human posture and device, electronic equipment, storage medium, program
CN109214282A (en) * 2018-08-01 2019-01-15 中南民族大学 A kind of three-dimension gesture critical point detection method and system neural network based
CN109523552A (en) * 2018-10-24 2019-03-26 青岛智能产业技术研究院 Three-dimension object detection method based on cone point cloud
CN110222580A (en) * 2019-05-09 2019-09-10 中国科学院软件研究所 A kind of manpower 3 d pose estimation method and device based on three-dimensional point cloud
CN110443205A (en) * 2019-08-07 2019-11-12 北京华捷艾米科技有限公司 A kind of hand images dividing method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460338A (en) * 2018-02-02 2018-08-28 北京市商汤科技开发有限公司 Estimation method of human posture and device, electronic equipment, storage medium, program
CN109214282A (en) * 2018-08-01 2019-01-15 中南民族大学 A kind of three-dimension gesture critical point detection method and system neural network based
CN109523552A (en) * 2018-10-24 2019-03-26 青岛智能产业技术研究院 Three-dimension object detection method based on cone point cloud
CN110222580A (en) * 2019-05-09 2019-09-10 中国科学院软件研究所 A kind of manpower 3 d pose estimation method and device based on three-dimensional point cloud
CN110443205A (en) * 2019-08-07 2019-11-12 北京华捷艾米科技有限公司 A kind of hand images dividing method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XIAOMING DENG 等: "Joint Hand Detection and Rotation Estimation Using CNN", 《IEEE》 *
周小芹等: "Virtools 环境下基于 Kinect 的手势识别与手部跟踪", 《计算机应用与软件》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112529911A (en) * 2020-12-07 2021-03-19 重庆大学 Training method of pancreas image segmentation model, image segmentation method and device
CN112529911B (en) * 2020-12-07 2024-02-09 重庆大学 Pancreatic image segmentation model training method, image segmentation method and device
CN113379755A (en) * 2021-04-09 2021-09-10 南京航空航天大学 3D point cloud object example segmentation method in disordered scene based on graph
CN113379755B (en) * 2021-04-09 2024-03-12 南京航空航天大学 3D point cloud object instance segmentation method in out-of-order scene based on graph

Also Published As

Publication number Publication date
CN111709269B (en) 2022-11-15

Similar Documents

Publication Publication Date Title
CN109544677B (en) Indoor scene main structure reconstruction method and system based on depth image key frame
US20210232924A1 (en) Method for training smpl parameter prediction model, computer device, and storage medium
CN111126272B (en) Posture acquisition method, and training method and device of key point coordinate positioning model
CN108549873B (en) Three-dimensional face recognition method and three-dimensional face recognition system
CN108764048B (en) Face key point detection method and device
CN113240691B (en) Medical image segmentation method based on U-shaped network
CN111080693A (en) Robot autonomous classification grabbing method based on YOLOv3
CN109934847B (en) Method and device for estimating posture of weak texture three-dimensional object
CN110363817B (en) Target pose estimation method, electronic device, and medium
CN108171133B (en) Dynamic gesture recognition method based on characteristic covariance matrix
JP2007164720A (en) Head detecting device, head detecting method, and head detecting program
CN110910437B (en) Depth prediction method for complex indoor scene
WO2023151237A1 (en) Face pose estimation method and apparatus, electronic device, and storage medium
CN111709269B (en) Human hand segmentation method and device based on two-dimensional joint information in depth image
CN112200056B (en) Face living body detection method and device, electronic equipment and storage medium
CN114549557A (en) Portrait segmentation network training method, device, equipment and medium
CN113643329B (en) Twin attention network-based online update target tracking method and system
Yin et al. Virtual reconstruction method of regional 3D image based on visual transmission effect
CN114627438A (en) Target detection model generation method, target detection method, device and medium
Geng et al. SANet: A novel segmented attention mechanism and multi-level information fusion network for 6D object pose estimation
CN112037282B (en) Aircraft attitude estimation method and system based on key points and skeleton
CN113808202A (en) Multi-target detection and space positioning method and system thereof
CN114608522A (en) Vision-based obstacle identification and distance measurement method
JP2023512359A (en) Associated object detection method and apparatus
CN112634331A (en) Optical flow prediction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant