CN109977834B - Method and device for segmenting human hand and interactive object from depth image - Google Patents

Method and device for segmenting human hand and interactive object from depth image Download PDF

Info

Publication number
CN109977834B
CN109977834B CN201910207311.8A CN201910207311A CN109977834B CN 109977834 B CN109977834 B CN 109977834B CN 201910207311 A CN201910207311 A CN 201910207311A CN 109977834 B CN109977834 B CN 109977834B
Authority
CN
China
Prior art keywords
depth image
segmentation
human hand
data set
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910207311.8A
Other languages
Chinese (zh)
Other versions
CN109977834A (en
Inventor
徐枫
薄子豪
雍俊海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201910207311.8A priority Critical patent/CN109977834B/en
Publication of CN109977834A publication Critical patent/CN109977834A/en
Application granted granted Critical
Publication of CN109977834B publication Critical patent/CN109977834B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a method and a device for segmenting a human hand and an interactive object from a depth image, wherein the method comprises the following steps: constructing a human hand segmentation data set based on a depth image by using a segmentation method based on a color image; training to obtain a segmentation model by utilizing a human hand segmentation data set based on the depth image, wherein the segmentation model is composed of an encoder, an attention transfer model and a decoder; and segmenting the depth image to be processed by utilizing the segmentation model to obtain a classification label map corresponding to the depth image to be processed, wherein the value of each pixel point in the classification label map is the type value of each pixel point. The method utilizes the segmentation model obtained by training the human hand segmentation data set based on the depth image and the segmentation model to segment the depth image to be processed, thereby realizing the segmentation of the human hand and the object at the pixel level, improving the environmental robustness, having higher segmentation precision and being capable of processing the segmentation condition of the human hand and the object under the complex interaction condition.

Description

Method and device for segmenting human hand and interactive object from depth image
Technical Field
The application relates to the technical field of computer vision, in particular to a method and a device for segmenting a human hand and an interactive object from a depth image.
Background
Human hand segmentation is a fundamental problem in many research fields such as gesture recognition, human hand tracking, human hand reconstruction, etc. Compared with single hand movement, the method is more important in the fields of human-computer interaction and virtual reality for research under the state of interaction with an object.
In recent years, a general semantic segmentation model based on a neural network is more and more perfect, but the existing method model has low environmental robustness and poor segmentation precision and cannot process manual segmentation under the condition of complex interaction.
Disclosure of Invention
The application provides a method and a device for segmenting a human hand and an interactive object from a depth image, which are used for solving the problems that the existing human hand segmentation model in the prior art is low in environmental robustness, poor in segmentation precision and incapable of processing human hand segmentation under the condition of complex interaction.
An embodiment of an aspect of the present application provides a method for segmenting a human hand and an interactive object from a depth image, including:
constructing a human hand segmentation data set based on a depth image by using a segmentation method based on a color image;
training by using the human hand segmentation data set based on the depth image to obtain a segmentation model, wherein the segmentation model consists of an encoder, an attention transfer model and a decoder;
and segmenting the depth image to be processed by utilizing the segmentation model to obtain a classification label map corresponding to the depth image to be processed, wherein the value of each pixel point in the classification label map is the type value of each pixel point, and the type value is used for representing the type of the pixel point in the depth image to be processed.
The method for segmenting the human hand and the interactive object from the depth image comprises the steps of constructing a human hand segmentation data set based on the depth image by utilizing a segmentation method based on a color image, segmenting the data set by utilizing the human hand based on the depth image, training a segmentation model, segmenting the depth image to be processed by utilizing the segmentation model, obtaining a classification label map corresponding to the depth image to be processed, determining the value of each pixel point in the classification label map as the type value of each pixel point, determining the type of each pixel point according to the type value of each pixel point, segmenting the depth image to be processed by utilizing the segmentation model obtained by training the human hand segmentation data set based on the depth image, and achieving segmentation of the human hand and the object at the pixel level by utilizing the segmentation model, the environment robustness is improved, the segmentation precision is high, and the situation of human hand and object segmentation under the complex interaction condition can be processed.
Another embodiment of the present application provides an apparatus for segmenting a human hand and an interactive object from a depth image, including:
the construction module is used for constructing a human hand segmentation data set based on the depth image by utilizing a segmentation method based on the color image;
the training module is used for training to obtain a segmentation model by utilizing the human hand segmentation data set based on the depth image, and the segmentation model is composed of an encoder, an attention transfer model and a decoder;
the identification module is used for segmenting the depth image to be processed by utilizing the segmentation model to obtain a classification label map corresponding to the depth image to be processed, wherein the value of each pixel point in the classification label map is the type value of each pixel point, and the type value is used for representing the type of the pixel point in the depth image to be processed.
The device for segmenting the human hand and the interactive object from the depth image, which is disclosed by the embodiment of the application, is characterized in that a human hand segmentation data set based on the depth image is constructed by utilizing a segmentation method based on a color image, the data set is segmented by utilizing the human hand based on the depth image, a segmentation model is trained, the segmentation model is composed of an encoder, an attention transfer model and a decoder, the depth image to be processed is segmented by utilizing the segmentation model, a classification label map corresponding to the depth image to be processed is obtained, the value of each pixel point in the classification label map is the type value of each pixel point, the type of each pixel point can be determined according to the type value of each pixel point, therefore, the segmentation model obtained by training the human hand segmentation data set based on the depth image is utilized to segment the depth image to be processed by utilizing the segmentation model, and the segmentation of the human hand and the object at the pixel level is realized, the environment robustness is improved, the segmentation precision is high, and the situation of human hand and object segmentation under the complex interaction condition can be processed.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flowchart illustrating a method for segmenting a human hand and an interactive object from a depth image according to an embodiment of the present disclosure;
fig. 2 is a schematic structural diagram of a segmentation model provided in an embodiment of the present application;
FIG. 3 is a schematic structural diagram of an attention mechanism model according to an embodiment of the present disclosure;
FIG. 4 is a schematic flowchart of another method for segmenting a human hand and an interactive object from a depth image according to an embodiment of the present disclosure;
fig. 5 is a schematic diagram of a training process of a segmentation model according to an embodiment of the present disclosure;
FIG. 6 is a schematic diagram illustrating an effect of using contour error according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an apparatus for segmenting a human hand and an interactive object from a depth image according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.
The following describes a method and an apparatus for segmenting a human hand and an interactive object from a depth image according to an embodiment of the present application with reference to the drawings.
Fig. 1 is a flowchart illustrating a method for segmenting a human hand and an interactive object from a depth image according to an embodiment of the present application.
As shown in fig. 1, the method for segmenting a human hand and an interactive object from a depth image includes:
and 101, constructing a human hand segmentation data set based on a depth image by using a segmentation method based on a color image.
Since the depth camera can simultaneously acquire color images and depth images, color images and depth images of human hands interacting with objects can be acquired by the depth camera, so that a plurality of pairs of color images and depth images are acquired. The depth image is then processed based on the color image, thereby obtaining a human hand segmentation dataset from the depth image.
In order to improve the segmentation accuracy, in this embodiment, an object having a color that is greatly different from that of the skin of a human hand may be collected in a fixed light source having the same brightness and color temperature. For example, an image of a hand-held blue pen is captured with the same brightness and light source.
And 102, training to obtain a segmentation model by utilizing a human hand segmentation data set based on the depth image.
After a human hand segmentation data set based on a depth image is obtained, an initial neural network model is trained by using the data set, and a segmentation model meeting requirements is obtained.
In the training process, the prediction performance of the segmentation model can be measured by using a loss function.
In this embodiment, the segmentation model is composed of an encoder, an attention transfer model, and a decoder. Wherein the encoder uses a large convolutional network and the decoder uses a deconvolution layer to recover high level information to the image pixel scale.
Fig. 2 is a schematic structural diagram of a segmentation model according to an embodiment of the present application. As shown in fig. 2, the segmentation model is composed of an encoder, an attention transfer model, and a decoder. In this embodiment, an attention mechanism is added between the encoder and the decoder, and an attention feature map is constructed by fusing multi-scale image features for enhancing the same-layer connection between the encoder and the decoder, so that the accuracy and the effectiveness of information transmission between the encoder and the decoder can be improved.
Fig. 3 is a schematic structural diagram of an attention mechanism model according to an embodiment of the present disclosure. In FIG. 3, the layer 1, layer 2, …, i-1 feature maps are multiplied to obtain the underlying attention (FineAtt); each of layer 1, layer 2, …, and layer i-1 includes a scale Scaling Network (SN) and a Bilinear down-sampling layer (DS), where the SN may normalize the feature map dimensions. Multiplying the feature maps of the (i + 1) th layer, the (i + 2) th layer, … and the n-th layer to obtain a high-level attention (CoarseAtt); wherein, each of the (i + 1) th layer, the (i + 2) th layer, and … the nth layer includes SN and an up-sampling layer (US for short). DS and US are used to downscale and upscale the feature map, respectively. And (4) cascading the acquired FineAtt and CoarseAtt attention diagrams with the characteristic diagram of the i-th layer and inputting the characteristic diagram into a decoder. This attention mechanism is used to enhance the feature map scale for each of the layers 1 through n in fig. 3.
And 103, segmenting the depth image to be processed by utilizing the segmentation model, and acquiring a classification label map corresponding to the depth image to be processed.
In this embodiment, before the processed depth image is identified, the depth image to be processed may be acquired by the depth camera.
After the segmentation model is obtained, the depth image to be processed is input into the segmentation model obtained through training, and the classification label graph corresponding to the depth image to be processed is output by the segmentation model. The size of the classification label graph is the same as that of the depth image to be processed, and the value of each pixel point in the classification label graph is the type value of each pixel point. The type value is used for representing the type of the pixel point in the depth image to be processed. In addition, the pixel coordinate values are hidden in the arrangement of the image pixels, and the value of each pixel in the input depth image is the depth value.
The types of the pixel points in the depth image to be processed can include human hands, objects and backgrounds. In specific implementation, three types of human hands, objects and backgrounds can be represented by different type values. For example, 0 represents a background, 1 represents a human hand, and 2 represents an object.
In this embodiment, according to the type value of each pixel point and the type corresponding to the type value, the segmentation result of the human hand and the object in the depth image to be processed can be obtained, and the segmentation of the human hand and the interactive object is realized.
As shown in fig. 2, the depth image to be processed is input into the depth network model, and the depth image to be processed first passes through the encoder, then passes through the attention transfer model, and finally passes through the decoder, and the classification label map of the depth image to be processed is output, and the positions of the human hand and the object are obtained according to the type value of each pixel point, so that the human hand and the object are segmented.
In the embodiment of the application, the pixel points belonging to the human hand and the pixel points belonging to the object can be determined according to the type value of each pixel point in the to-be-processed depth image output by the segmentation model and the type corresponding to the type value, so that the human hand and the object which interact in the to-be-processed image are segmented, the segmentation of the human hand and the object at the pixel level is realized, the segmentation precision is high, and the human hand and the object which interact under the complex situation can be segmented.
In one embodiment of the present application, a depth image-based human hand segmentation training dataset may be constructed from color images. Fig. 4 is a flowchart illustrating another method for segmenting a human hand and an interactive object from a depth image according to an embodiment of the present application.
As shown in fig. 4, the method for constructing a depth image-based human hand segmentation data set includes:
step 301, acquiring multiple pairs of color images and depth images under the scene of interaction between a human hand and an object.
In this embodiment, some objects with a color different from that of the skin of the human hand can be collected manually. Then, an image of a human hand and each object in an interaction scene is shot by using a depth camera, so that a plurality of pairs of color images and depth images are obtained. In addition, in order to increase the data amount, images of different interaction postures of the human hand and the object can be acquired for the same object.
When capturing images with a depth camera, the lighting environment is fixed, e.g. using fixed light sources of the same brightness and color temperature, to ensure that the captured color images are clear and shadow-free.
Step 302, performing object segmentation based on HSV color space on all color images, and obtaining a type value of each pixel point in each color image.
In this embodiment, all the backgrounds in the color image and the depth image may be removed by the depth threshold, and the images of the human hand and the object may be retained. And then, converting all the acquired color images into an HSV color space according to a conversion formula from the existing RGB color space to the HSV color space. Wherein, the parameters of the HSV color space are respectively as follows: hue (H), saturation (S), lightness (V).
And then, segmenting the HSV color space corresponding to each color image to obtain the type value of each pixel point in each color image. Specifically, the distribution of pixel points of a plurality of pure hand samples and interactive samples in an HSV space is analyzed, the overlapping area among the samples is the area corresponding to the pixel points of the human hand, and a plurality of linear constraint conditions are fitted. And analyzing all the color images, wherein pixel points positioned in the constraint are marked as hands, and pixel points positioned outside the constraint are marked as objects.
And 303, aiming at each pair of color images and depth images, mapping each pixel point in the color images to the corresponding pixel point in the depth images, and constructing a human hand segmentation training data set based on the depth images.
And performing pixel alignment on the color image and the depth image for each pair of the color image and the depth image, namely estimating the camera internal and external parameters of the depth sensor and the color sensor respectively, performing affine transformation on the depth point cloud to a color camera space, and generating a real classification label image based on the color image by using an automatic labeling method, wherein the real classification label image is also a real classification label image of the depth image corresponding to the color image. The type value of each pixel point in the real classification label image can be represented by 0 for background, 1 for hand and 2 for object.
In this embodiment, all the depth images and the real classification label maps thereof constitute a human hand segmentation training data set based on the depth images.
Further, to improve the segmentation accuracy, in an embodiment of the present application, before mapping, the depth image may be preprocessed, denoised by using morphology and contour filtering methods, and analyzed for background in the depth image, and only the human hand and objects interacting with the human hand are retained.
After the data set used for training the segmentation model is obtained, when the model is trained, the human hand segmentation training data set based on the depth images can be divided into a training data set and a test data set, wherein the number of the depth images in the training data set is far larger than that of the depth images in the test data set, the training data set is used for training, and the test data set is used for testing the trained model.
The initial segmentation model is then trained using the training data set, and a first loss function is calculated. Wherein, the first loss function adopts a softmax cross entropy loss function, which is shown in the following formula (1):
Figure BDA0001999393750000051
wherein, yiRepresenting true results, xiThe predicted values of the segmentation model outputs are indicated, the index i indicates a different type, and the index j also indicates a different type. For example, the pixel points have three types, and a loss with a type value i equal to 0 is calculated first, and the loss is
Figure BDA0001999393750000061
Calculating the loss of type value i ═ 1:
Figure BDA0001999393750000062
calculate the loss for type value i-2:
Figure BDA0001999393750000063
then the loss of the model is
Figure BDA0001999393750000064
The first loss function may be another loss function that can realize the division task.
Specifically, the depth image in the training dataset is input into an initial neural network model, and the network model outputs a prediction classification label map of the depth image. Then, according to the difference between the prediction classification label graph and the real label graph of the depth image, a gradient descent algorithm is used for feeding back all parameters in the network, and the network parameters are updated correspondingly. When the depth image is input next time, the predicted classification label map output by the network is closer to the real classification label map.
Training continues using the contour error as the loss function until the value of the first loss function no longer drops, i.e., the model's performance is optimized using the first loss function. Wherein, the contour error is shown in the following formula (2):
Figure BDA0001999393750000065
where B is a blurring operation, such as gaussian blurring using a gaussian kernel of 5 × 5 σ -2.121; s is contour extraction, for example, contour extraction is carried out by using a Sorber operator; mlabelsFor true classification label maps, MlogitsAnd outputting the predicted value of the type of the pixel point for network output.
When the value of the contour error is stable and does not decrease any more, the training can be stopped to obtain the segmentation model. Then, the segmentation model is tested by using the test set, specifically, the depth images in the test set can be input into the segmentation model for recognition, the Intersection-over-unity (IOU) scores of all the depth images in the test set are counted, and whether the segmentation model meets the requirements or not is judged by using the IOU scores.
The IOU is a ratio of an intersection to a union, and in this embodiment, is a ratio of an intersection to a union of a model prediction result and a real result, that is, a ratio of an intersection of a model prediction result and a real result to a union of a model prediction result and a real result.
Fig. 5 is a schematic diagram of a training process of a segmentation model according to an embodiment of the present application. In fig. 5, the left side is a schematic diagram of a data construction process, and the right side is a schematic diagram of a model training process. When data is constructed, a color image acquired by a depth camera is aligned with a depth image, and an automatic labeling method is used for generating a real classification label map based on the color image, which is also a real classification label of the aligned corresponding depth image. All depth images and their true classification label images constitute a training data set of human hand segmentation based on the depth images.
During model training, the depth image in the data set is input into the attention segmentation network to obtain a classification label graph predicted by the network model, the classification label graph is compared with a real classification label graph, loss is calculated, and network parameters are updated in a step-by-step iterative mode
Fig. 6 is a schematic diagram illustrating an effect of using a contour error according to an embodiment of the present application. In fig. 6, the left column of objects and hands are real labels, the middle column is a net output without contour error, and the right column shows a net output after contour error is used.
In the embodiment of the application, when the model is segmented in training, the general loss function is used firstly, when the general loss function value is stable, namely the model is optimal under the loss function, the contour error is trained as the loss function, and the attention mechanism model is added into the segmented model, so that the segmentation precision of the model is greatly improved.
Further, in order to enhance the generalization ability of the segmentation model, before the segmentation model is trained by using the training data set, a data augmentation operation may be performed on the training data set, and the depth image obtained by the data augmentation operation may be added to the training data set.
Wherein the data augmentation operation comprises at least one of freely rotating the depth image, adding random noise, and randomly flipping the depth image.
In order to implement the above embodiments, the present application further provides an apparatus for segmenting a human hand and an interactive object from a depth image. Fig. 7 is a schematic structural diagram of an apparatus for segmenting a human hand and an interactive object from a depth image according to an embodiment of the present application.
As shown in fig. 7, the apparatus for segmenting a human hand and an interactive object from a depth image comprises: a building module 610, a training module 620, and a recognition module 630.
A construction module 610 for constructing a depth image-based human hand segmentation dataset using a color image-based segmentation method;
a training module 620, configured to train, by using the human hand segmentation data set based on the depth image, to obtain a segmentation model, where the segmentation model is composed of an encoder, an attention transfer model, and a decoder;
the identifying module 630 is configured to utilize the segmentation model to segment the depth image to be processed, and obtain a classification label map corresponding to the depth image to be processed, where a value of each pixel in the classification label map is a type value of each pixel, and the type value is used to represent a type to which the pixel belongs in the depth image to be processed.
In a possible implementation manner of the embodiment of the present application, the building module 610 is specifically configured to:
collecting a plurality of pairs of color images and depth images under the condition of interaction between hands and objects;
carrying out object segmentation based on HSV color space on all color images to obtain the type value of each pixel point in each color image;
and aiming at each pair of color images and depth images, mapping each pixel point in the color images to the corresponding pixel point in the depth images, and constructing a human hand segmentation training data set based on the depth images.
In a possible implementation manner of the embodiment of the present application, the depth image is preprocessed, including noise and background removal.
In a possible implementation manner of the embodiment of the present application, the human-hand segmentation data set based on the depth image includes a training data set and a testing data set, and the training module 620 is specifically configured to:
training the initial neural network model by utilizing a training data set, and calculating a first loss function, wherein the first loss function adopts a softmax cross entropy loss function;
when the value of the first loss function no longer drops, training continues using the contour error as a loss function.
In a possible implementation manner of the embodiment of the present application, the apparatus further includes:
and the processing module is used for carrying out data augmentation operation on the training data set, wherein the data augmentation operation comprises at least one of freely rotating the depth image, adding random noise and randomly turning over the depth image.
It should be noted that the above explanation of the embodiment of the method for segmenting the human hand and the interactive object from the depth image is also applicable to the apparatus for segmenting the human hand and the interactive object from the depth image in this embodiment, and therefore, the explanation thereof is omitted here.
The device for segmenting the human hand and the interactive object from the depth image, which is disclosed by the embodiment of the application, is characterized in that a human hand segmentation data set based on the depth image is constructed by utilizing a segmentation method based on a color image, the data set is segmented by utilizing the human hand based on the depth image, a segmentation model is trained, the segmentation model is composed of an encoder, an attention transfer model and a decoder, the depth image to be processed is segmented by utilizing the segmentation model, a classification label map corresponding to the depth image to be processed is obtained, the value of each pixel point in the classification label map is the type value of each pixel point, the type of each pixel point can be determined according to the type value of each pixel point, therefore, the segmentation model obtained by training the human hand segmentation data set based on the depth image is utilized to segment the depth image to be processed by utilizing the segmentation model, and the segmentation of the human hand and the object at the pixel level is realized, the environment robustness is improved, the segmentation precision is high, and the situation of human hand and object segmentation under the complex interaction condition can be processed.

Claims (10)

1. A method of segmenting a human hand and an interactive object from a depth image, comprising:
constructing a human hand segmentation data set based on a depth image by using a segmentation method based on a color image;
training an initial neural network model by using the human hand segmentation data set based on the depth image to obtain a segmentation model, wherein the segmentation model consists of an encoder, an attention transfer model and a decoder;
and segmenting the depth image to be processed by utilizing the segmentation model to obtain a classification label map corresponding to the depth image to be processed, wherein the value of each pixel point in the classification label map is the type value of each pixel point, and the type value is used for representing the type of the pixel point in the depth image to be processed.
2. The method of claim 1, wherein constructing a depth image-based human hand segmentation dataset using a color image-based segmentation method comprises:
acquiring a plurality of pairs of color images and depth images under the condition of interaction between hands and objects;
carrying out object segmentation based on HSV color space on all color images to obtain the type value of each pixel point in each color image;
and for each pair of color images and depth images, mapping each pixel point in the color images to corresponding pixel points in the depth images, and constructing a human hand segmentation training data set based on the depth images.
3. The method of claim 2, wherein mapping each pixel point in the color image to a corresponding pixel point in the depth image further comprises:
and preprocessing the depth image, including noise and background removal.
4. The method of claim 2, wherein the depth image-based human hand segmentation data set comprises a training data set and a test data set, and wherein training a segmentation model using the depth image-based human hand segmentation data set comprises:
training an initial neural network model by using the training data set, and calculating a first loss function, wherein the first loss function adopts a softmax cross entropy loss function;
when the value of the first loss function no longer drops, training continues using the contour error as a loss function.
5. The method of claim 4, wherein prior to training the segmentation model using the training data set, further comprising:
and carrying out data augmentation operation on the training data set, wherein the data augmentation operation comprises at least one of free rotation of the depth image, addition of random noise and random inversion of the depth image.
6. An apparatus for segmenting a human hand and an interactive object from a depth image, comprising:
the construction module is used for constructing a human hand segmentation data set based on the depth image by utilizing a segmentation method based on the color image;
the training module is used for training an initial neural network model by utilizing the human hand segmentation data set based on the depth image to obtain a segmentation model, and the segmentation model is composed of an encoder, an attention transfer model and a decoder;
the identification module is used for segmenting the depth image to be processed by utilizing the segmentation model to obtain a classification label map corresponding to the depth image to be processed, wherein the value of each pixel point in the classification label map is the type value of each pixel point, and the type value is used for representing the type of the pixel point in the depth image to be processed.
7. The apparatus of claim 6, wherein the build module is specifically configured to:
acquiring a plurality of pairs of color images and depth images under the condition of interaction between hands and objects;
carrying out object segmentation based on HSV color space on all color images to obtain the type value of each pixel point in each color image;
and for each pair of color images and depth images, mapping each pixel point in the color images to corresponding pixel points in the depth images, and constructing a human hand segmentation training data set based on the depth images.
8. The apparatus of claim 7, further comprising:
and the preprocessing module is used for preprocessing the depth image, including noise and background removal.
9. The apparatus of claim 7, wherein the depth image-based human hand segmentation dataset comprises a training dataset and a testing dataset, and wherein the training module is specifically configured to:
training an initial neural network model by using the training data set, and calculating a first loss function, wherein the first loss function adopts a softmax cross entropy loss function;
when the value of the first loss function no longer drops, training continues using the contour error as a loss function.
10. The apparatus of claim 9, further comprising:
and the processing module is used for carrying out data augmentation operation on the training data set, wherein the data augmentation operation comprises at least one of freely rotating the depth image, adding random noise and randomly turning over the depth image.
CN201910207311.8A 2019-03-19 2019-03-19 Method and device for segmenting human hand and interactive object from depth image Active CN109977834B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910207311.8A CN109977834B (en) 2019-03-19 2019-03-19 Method and device for segmenting human hand and interactive object from depth image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910207311.8A CN109977834B (en) 2019-03-19 2019-03-19 Method and device for segmenting human hand and interactive object from depth image

Publications (2)

Publication Number Publication Date
CN109977834A CN109977834A (en) 2019-07-05
CN109977834B true CN109977834B (en) 2021-04-06

Family

ID=67079395

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910207311.8A Active CN109977834B (en) 2019-03-19 2019-03-19 Method and device for segmenting human hand and interactive object from depth image

Country Status (1)

Country Link
CN (1) CN109977834B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111127535B (en) * 2019-11-22 2023-06-20 北京华捷艾米科技有限公司 Method and device for processing hand depth image
CN111568197A (en) * 2020-02-28 2020-08-25 佛山市云米电器科技有限公司 Intelligent detection method, system and storage medium
CN112396137B (en) * 2020-12-14 2023-12-15 南京信息工程大学 Point cloud semantic segmentation method integrating context semantics
CN113158774B (en) * 2021-03-05 2023-12-29 北京华捷艾米科技有限公司 Hand segmentation method, device, storage medium and equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106469446A (en) * 2015-08-21 2017-03-01 小米科技有限责任公司 The dividing method of depth image and segmenting device
WO2017116879A1 (en) * 2015-12-31 2017-07-06 Microsoft Technology Licensing, Llc Recognition of hand poses by classification using discrete values
CN108898142A (en) * 2018-06-15 2018-11-27 宁波云江互联网科技有限公司 A kind of recognition methods and calculating equipment of handwritten formula
CN109272513A (en) * 2018-09-30 2019-01-25 清华大学 Hand and object interactive segmentation method and device based on depth camera
CN109448006A (en) * 2018-11-01 2019-03-08 江西理工大学 A kind of U-shaped intensive connection Segmentation Method of Retinal Blood Vessels of attention mechanism

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107729326B (en) * 2017-09-25 2020-12-25 沈阳航空航天大学 Multi-BiRNN coding-based neural machine translation method
CN108647214B (en) * 2018-03-29 2020-06-30 中国科学院自动化研究所 Decoding method based on deep neural network translation model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106469446A (en) * 2015-08-21 2017-03-01 小米科技有限责任公司 The dividing method of depth image and segmenting device
WO2017116879A1 (en) * 2015-12-31 2017-07-06 Microsoft Technology Licensing, Llc Recognition of hand poses by classification using discrete values
CN108898142A (en) * 2018-06-15 2018-11-27 宁波云江互联网科技有限公司 A kind of recognition methods and calculating equipment of handwritten formula
CN109272513A (en) * 2018-09-30 2019-01-25 清华大学 Hand and object interactive segmentation method and device based on depth camera
CN109448006A (en) * 2018-11-01 2019-03-08 江西理工大学 A kind of U-shaped intensive connection Segmentation Method of Retinal Blood Vessels of attention mechanism

Also Published As

Publication number Publication date
CN109977834A (en) 2019-07-05

Similar Documents

Publication Publication Date Title
CN109344701B (en) Kinect-based dynamic gesture recognition method
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
CN109558832B (en) Human body posture detection method, device, equipment and storage medium
Engin et al. Cycle-dehaze: Enhanced cyclegan for single image dehazing
Chen et al. Tom-net: Learning transparent object matting from a single image
CN109299274B (en) Natural scene text detection method based on full convolution neural network
CN109977834B (en) Method and device for segmenting human hand and interactive object from depth image
Zhou et al. Semantic-supervised infrared and visible image fusion via a dual-discriminator generative adversarial network
Zhu et al. A fast single image haze removal algorithm using color attenuation prior
CN113052210B (en) Rapid low-light target detection method based on convolutional neural network
CN108062525B (en) Deep learning hand detection method based on hand region prediction
WO2017148265A1 (en) Word segmentation method and apparatus
CN111462120B (en) Defect detection method, device, medium and equipment based on semantic segmentation model
CN110246181B (en) Anchor point-based attitude estimation model training method, attitude estimation method and system
CN110032925B (en) Gesture image segmentation and recognition method based on improved capsule network and algorithm
CN111079764B (en) Low-illumination license plate image recognition method and device based on deep learning
CN110569782A (en) Target detection method based on deep learning
CN113591968A (en) Infrared weak and small target detection method based on asymmetric attention feature fusion
Pei et al. Effects of image degradations to cnn-based image classification
CN113850324B (en) Multispectral target detection method based on Yolov4
CN110827312A (en) Learning method based on cooperative visual attention neural network
Cho et al. Semantic segmentation with low light images by modified CycleGAN-based image enhancement
CN114724155A (en) Scene text detection method, system and equipment based on deep convolutional neural network
CN111652273A (en) Deep learning-based RGB-D image classification method
US20240161304A1 (en) Systems and methods for processing images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant