CN113158774B - Hand segmentation method, device, storage medium and equipment - Google Patents

Hand segmentation method, device, storage medium and equipment Download PDF

Info

Publication number
CN113158774B
CN113158774B CN202110245345.3A CN202110245345A CN113158774B CN 113158774 B CN113158774 B CN 113158774B CN 202110245345 A CN202110245345 A CN 202110245345A CN 113158774 B CN113158774 B CN 113158774B
Authority
CN
China
Prior art keywords
hand
value
output result
image
mask
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110245345.3A
Other languages
Chinese (zh)
Other versions
CN113158774A (en
Inventor
古迎冬
李骊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing HJIMI Technology Co Ltd
Original Assignee
Beijing HJIMI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing HJIMI Technology Co Ltd filed Critical Beijing HJIMI Technology Co Ltd
Priority to CN202110245345.3A priority Critical patent/CN113158774B/en
Publication of CN113158774A publication Critical patent/CN113158774A/en
Application granted granted Critical
Publication of CN113158774B publication Critical patent/CN113158774B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • G06V40/113Recognition of static hand signs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • G06V40/117Biometrics derived from hands

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a hand segmentation method, a device, a storage medium and equipment, wherein an image input by a user is acquired, and is input into a segmentation network to obtain an output result of the segmentation network. And judging whether the first value and the second value are both larger than a preset threshold value. And if the first numerical value and the second numerical value are both larger than the preset threshold, sending the left-hand mask and the right-hand mask to the user, otherwise, repeatedly executing the preset steps, performing iterative processing on the output result until the first numerical value and the second numerical value indicated by the output result after the iterative processing are both larger than the preset threshold, and sending the left-hand mask and the right-hand mask contained in the output result after the iterative processing to the user. Compared with the prior art, the method has the advantages that the calculation time spent by the method is obviously and effectively reduced, and the efficiency of hand segmentation is improved. In addition, the network structure based on the split network is known that the split network has low requirements on hardware resources, and can be widely applied to most individuals and teams.

Description

Hand segmentation method, device, storage medium and equipment
Technical Field
The present disclosure relates to the field of image processing, and in particular, to a hand segmentation method, device, storage medium, and apparatus.
Background
How to accurately divide hands (including left hand and right hand) in an image is a major concern for teams and enterprises currently researching gesture recognition on the market. At present, the hand segmentation is generally realized by using a deep learning network, however, under the condition of ensuring accurate segmentation results, the existing deep learning network generally needs to take a long time to calculate, so that the efficiency of the hand segmentation is lower, and the deep learning network has higher requirements on hardware resources, is difficult to be applied to most individuals and teams, has too narrow application range, and is unfavorable for research and development of gesture recognition work.
Disclosure of Invention
The application provides a hand segmentation method, device, storage medium and equipment, which are used for improving the efficiency of hand segmentation under the condition of ensuring accurate hand segmentation results.
In order to achieve the above object, the present application provides the following technical solutions:
a hand segmentation method, comprising:
acquiring an image input by a user;
inputting the image into a pre-constructed segmentation network to obtain an output result of the segmentation network; the output result comprises a left-hand mask, a right-hand mask, a first numerical value and a second numerical value; the first value indicates a probability of success of the left-hand identification, and the second value indicates a probability of success of the right-hand identification;
judging whether the first value and the second value are both larger than a preset threshold value or not;
transmitting the left-hand mask and the right-hand mask to the user when the first value and the second value are both greater than the preset threshold;
repeatedly executing preset steps under the condition that the first numerical value and the second numerical value are not larger than the preset threshold, performing iterative processing on the output result until the first numerical value and the second numerical value indicated by the output result after iterative processing are both larger than the preset threshold, and sending a left-hand mask and a right-hand mask contained in the output result after iterative processing to the user; the presetting step comprises the following steps: and generating a new image based on the output result, and inputting the new image into the segmentation network to obtain the new output result.
Optionally, the splitting network includes:
the downsampling structure is used for downsampling the image to obtain a downsampled image;
the feature recognition structure is used for recognizing and obtaining a feature image from the downsampled image; the characteristic images comprise a left-hand characteristic image and a right-hand characteristic image;
the up-sampling structure is used for up-sampling the left hand characteristic image to obtain the left hand mask and the probability of success of the left hand identification; and upsampling the right hand characteristic image to obtain the right hand mask and the right hand recognition success probability.
Optionally, the downsampling structure includes:
standard convolution layer, normalization layer, activation layer, and downsampling layer.
Optionally, the feature recognition structure includes:
a depth convolution layer, a normalization layer, an activation layer, and a three-dimensional point cloud operation layer.
Optionally, the upsampling structure includes:
standard convolutional layer, normalized layer, active layer, and transposed convolutional layer.
Optionally, the splitting network further includes:
and the jump link structure is used for assisting the up-sampling structure to up-sample the characteristic image.
Optionally, the generating a new image based on the output result includes:
multiplying the left-hand mask with the first value to obtain a first product;
multiplying the right-hand mask with the second value to obtain a second product;
and carrying out channel combination on the first product and the second product to obtain a new image.
A hand segmentation apparatus comprising:
an acquisition unit configured to acquire an image input by a user;
the segmentation unit is used for inputting the image into a pre-constructed segmentation network to obtain an output result of the segmentation network; the output result comprises a left-hand mask, a right-hand mask, a first numerical value and a second numerical value; the first value indicates a probability of success of the left-hand identification, and the second value indicates a probability of success of the right-hand identification;
the judging unit is used for judging whether the first numerical value and the second numerical value are both larger than a preset threshold value or not;
a sending unit, configured to send the left-hand mask and the right-hand mask to the user when the first value and the second value are both greater than the preset threshold;
the iteration unit is used for repeatedly executing preset steps under the condition that the first numerical value and the second numerical value are not larger than the preset threshold value, carrying out iteration processing on the output result until the first numerical value and the second numerical value indicated by the output result after the iteration processing are larger than the preset threshold value, and sending a left-hand mask and a right-hand mask contained in the output result after the iteration processing to the user; the presetting step comprises the following steps: and generating a new image based on the output result, and inputting the new image into the segmentation network to obtain a new output result.
A computer-readable storage medium comprising a stored program, wherein the program performs the hand segmentation method.
A hand segmentation apparatus comprising: a processor, a memory, and a bus; the processor is connected with the memory through the bus;
the memory is used for storing a program, and the processor is used for running the program, wherein the hand segmentation method is executed when the program runs.
According to the technical scheme, the image input by the user is acquired, and the image is input into the pre-constructed segmentation network to obtain the output result of the segmentation network. The output results include a left-hand mask, a right-hand mask, a first value, and a second value. The first value indicates the probability of success of the left hand identification and the second value indicates the probability of success of the right hand identification. And judging whether the first value and the second value are both larger than a preset threshold value. And under the condition that the first value and the second value are both larger than a preset threshold value, sending the left-hand mask and the right-hand mask to a user. And repeatedly executing the preset step under the condition that the first numerical value and the second numerical value are not larger than the preset threshold value, performing iterative processing on the output result until the first numerical value and the second numerical value indicated by the output result after the iterative processing are larger than the preset threshold value, and sending a left-hand mask and a right-hand mask contained in the output result after the iterative processing to a user. The method comprises the following steps of: based on the output result, a new image is generated, and the new image is input into the segmentation network to obtain a new output result. By comparing the first value, the second value and the preset threshold value, the iteration processing times of the output result of the segmentation network can be planned, namely, the index quantization of the effect of hand segmentation is realized (the quantization index is the preset threshold value, and the iteration processing times of the output result is planned by the preset threshold value), so that redundant calculation processes are avoided. Compared with the prior art, the method has the advantages that the calculation time is obviously and effectively reduced, and therefore the efficiency of hand segmentation is improved. In addition, the network structure based on the split network is known that the split network has low requirement on hardware resources, can be widely applied to most individuals and teams, and has higher applicability.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1a is a schematic diagram of a hand segmentation method according to an embodiment of the present application;
fig. 1b is a schematic diagram of a network structure of a split network according to an embodiment of the present application;
fig. 1c is a schematic diagram of a network structure of another split network according to an embodiment of the present application;
FIG. 2 is a schematic diagram of another hand segmentation method according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of a hand segmentation device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
As shown in fig. 1a, a schematic diagram of a hand segmentation method according to an embodiment of the present application includes the following steps:
s101: an image input by a user is acquired.
Among them, images include, but are not limited to, color images, infrared images, depth images, and the like.
S102: inputting the image into a pre-constructed segmentation network to obtain an output result of the segmentation network.
The output result of the segmentation network comprises a first segmentation result, a second segmentation result, a first numerical value and a second numerical value.
The first segmentation result indicates a left-hand mask (mask), the second segmentation result indicates a right-hand mask, the first value indicates a probability of success of the left-hand recognition, and the second value indicates a probability of success of the right-hand recognition.
In an embodiment of the present application, the partitioning network includes a downsampling structure, a feature recognition structure, an upsampling structure, and a hopped linking structure.
Specifically, according to the network structure shown in fig. 1b, the process of dividing the network processing image includes:
1. the image is input into a downsampling structure to obtain a first result.
It should be noted that the downsampling structure functions as: the image is downsampled to obtain a downsampled image (i.e., a first result). The downsampling structure includes a standard convolutional layer (commonly known as standard Conv), a normalized layer (commonly known as BN layer), an active layer (commonly known as swish), and a downsampling layer (commonly known as pooling). In the embodiment of the present application, the number of standard convolution layers, and the size of the convolution kernel may be set by a technician according to the actual situation.
2. And inputting the first result into the feature recognition structure to obtain a feature image.
It should be noted that the feature recognition structure functions as: and identifying and obtaining a characteristic image from the downsampled image. The feature images include left-hand feature images and right-hand feature images, and the feature recognition structure includes a depth convolution layer (commonly referred to as DepthConv), a normalization layer, an activation layer, and a three-dimensional point cloud operation layer (commonly referred to as poiintconv).
3. The left hand characteristic image is input into the up-sampling structure through the jump link structure, and a left hand mask and the probability of success of left hand identification are obtained.
4. The right hand feature image is input into the up-sampling structure through the jump link structure, and a right hand mask and the right hand recognition success probability are obtained.
It should be noted that the jump link structure functions as: the auxiliary up-sampling structure samples the feature images, i.e. in order to increase the training speed of the segmentation network. The skip link structure includes a channel merge layer (commonly known as concat), a standard convolution layer, and a 1×1 convolution layer (commonly known as 1×1 Conv). In the embodiment of the present application, the number of the channel merging layer, the standard convolution layer, and the 1×1 convolution layer may be set by a technician according to the actual situation.
The up-sampling structure has the functions of: and upsampling the characteristic image (specifically, upsampling the left-hand characteristic image to obtain a left-hand mask and the probability of successful left-hand recognition; and upsampling the right-hand characteristic image to obtain a right-hand mask and the probability of successful right-hand recognition). The upsampling structure includes a standard convolutional layer, a normalization layer, an activation layer, and a transposed convolutional layer (commonly referred to as TransConv).
It should be emphasized that the above-mentioned split network consisting of the downsampling structure, the feature recognition structure, the upsampling structure, and the jump linking structure can also be seen in fig. 1 c.
S103: and judging whether the first value and the second value are both larger than a preset threshold value.
If the first value and the second value are both greater than the preset threshold, S104 is executed, otherwise S105 is executed.
S104: the left-hand mask, and the right-hand mask are sent to the user.
It should be noted that, if the first value and the second value are both greater than the preset threshold, it is determined that the effect of hand segmentation meets the preset requirement, that is, the accuracy of the hand segmentation result can be ensured.
S105: the left-hand mask is multiplied by the first value to obtain a first product.
S106: and multiplying the right-hand mask with the second value to obtain a second product.
Wherein S105 and S106 are performed concurrently.
The specific implementation principle of multiplying the left-hand mask and the right-hand mask by the numerical value is known as a person skilled in the art, and will not be described here again.
S107: and carrying out channel combination on the first product and the second product to obtain a new image, and returning to S102.
The specific implementation principle of channel combination is common knowledge familiar to those skilled in the art, and will not be described herein.
The new image is processed by invoking S102, and the new output result obtained by the processing is more excellent in hand segmentation effect than the original output result.
In summary, by comparing the first value, the second value, and the preset threshold, the number of iterative processes of the output result of the segmentation network can be planned, that is, the index quantization of the effect of hand segmentation (the quantization index is the preset threshold, and the number of iterative processes of the output result is planned by the preset threshold), so as to avoid performing redundant calculation processes. Compared with the prior art, the method of the embodiment obviously and effectively reduces the calculation time, thereby improving the efficiency of hand segmentation. In addition, the network structure based on the split network is known that the split network has low requirement on hardware resources, can be widely applied to most individuals and teams, and has higher applicability.
It should be noted that S105 and S106 mentioned in the foregoing embodiments are an optional specific implementation manner of the hand segmentation method described in the present application. In addition, S107 mentioned in the foregoing embodiment is also an optional specific implementation of the hand segmentation method described in the present application. For this reason, the flow shown in the above embodiment can be summarized as the method shown in fig. 2.
As shown in fig. 2, a schematic diagram of another hand segmentation method according to an embodiment of the present application includes the following steps:
s201: an image input by a user is acquired.
S202: inputting the image into a pre-constructed segmentation network to obtain an output result of the segmentation network.
The output result comprises a left-hand mask, a right-hand mask, a first value and a second value. The first value indicates the probability of success of the left hand identification and the second value indicates the probability of success of the right hand identification.
S203: and judging whether the first value and the second value are both larger than a preset threshold value.
If the first value and the second value are both greater than the preset threshold, S204 is executed, otherwise S205 is executed.
S204: the left-hand mask, and the right-hand mask are sent to the user.
S205: and repeatedly executing the preset step, carrying out iterative processing on the output result until the first numerical value and the second numerical value indicated by the output result after the iterative processing are both larger than a preset threshold value, and sending a left-hand mask and a right-hand mask contained in the output result after the iterative processing to a user.
The method comprises the following steps of: based on the output result, a new image is generated, and the new image is input into the segmentation network to obtain a new output result.
In summary, by comparing the first value, the second value, and the preset threshold, the number of iterative processes of the output result of the segmentation network can be planned, that is, the index quantization of the effect of hand segmentation (the quantization index is the preset threshold, and the number of iterative processes of the output result is planned by the preset threshold), so as to avoid performing redundant calculation processes. Compared with the prior art, the method of the embodiment obviously and effectively reduces the calculation time, thereby improving the efficiency of hand segmentation. In addition, the network structure based on the split network is known that the split network has low requirement on hardware resources, can be widely applied to most individuals and teams, and has higher applicability.
Corresponding to the hand segmentation method described in the embodiment of the present application, the embodiment of the present application further provides a hand segmentation device.
Fig. 3 is a schematic structural diagram of a hand segmentation device according to an embodiment of the present application, including:
an acquisition unit 100 for acquiring an image input by a user.
The segmentation unit 200 is configured to input the image into a pre-constructed segmentation network, and obtain an output result of the segmentation network. The output results include a left-hand mask, a right-hand mask, a first value, and a second value. The first value indicates the probability of success of the left hand identification and the second value indicates the probability of success of the right hand identification.
Wherein the splitting network comprises: the downsampling structure is used for downsampling the image to obtain a downsampled image; the feature recognition structure is used for recognizing and obtaining a feature image from the downsampled image, wherein the feature image comprises a left-hand feature image and a right-hand feature image; the up-sampling structure is used for up-sampling the left-hand characteristic image to obtain a left-hand mask and the probability of success of left-hand identification, and up-sampling the right-hand characteristic image to obtain a right-hand mask and the probability of success of right-hand identification; and the jump link structure is used for assisting the up-sampling structure to up-sample the characteristic image.
The downsampling structure includes a standard convolution layer, a normalization layer, an activation layer, and a downsampling layer.
The feature recognition structure comprises a depth convolution layer, a normalization layer, an activation layer and a three-dimensional point cloud operation layer.
The upsampling structure includes a standard convolution layer, a normalization layer, an activation layer, and a transposed convolution layer.
The judging unit 300 is configured to judge whether the first value and the second value are both greater than a preset threshold.
And a transmitting unit 400, configured to transmit the left-hand mask and the right-hand mask to the user when the first value and the second value are both greater than the preset threshold.
And the iteration unit 500 is configured to repeatedly perform the preset step when the first value and the second value are not greater than the preset threshold, perform the iterative processing on the output result until the first value and the second value indicated by the output result after the iterative processing are both greater than the preset threshold, and send a left-hand mask and a right-hand mask included in the output result after the iterative processing to the user. The method comprises the following steps of: based on the output result, a new image is generated, and the new image is input into the segmentation network to obtain a new output result.
Wherein, the process of the iteration unit 500 for generating a new image based on the output result includes: multiplying the left-hand mask with a first value to obtain a first product; multiplying the right-hand mask with a second value to obtain a second product; and carrying out channel combination on the first product and the second product to obtain a new image.
In summary, by comparing the first value, the second value, and the preset threshold, the number of iterative processes of the output result of the segmentation network can be planned, that is, the index quantization of the effect of hand segmentation (the quantization index is the preset threshold, and the number of iterative processes of the output result is planned by the preset threshold), so as to avoid performing redundant calculation processes. Compared with the prior art, the method of the embodiment obviously and effectively reduces the calculation time, thereby improving the efficiency of hand segmentation. In addition, the network structure based on the split network is known that the split network has low requirement on hardware resources, can be widely applied to most individuals and teams, and has higher applicability.
The present application also provides a computer-readable storage medium including a stored program, wherein the program executes the hand segmentation method provided by the present application.
The application also provides a hand segmentation apparatus comprising: a processor, a memory, and a bus. The processor is connected with the memory through a bus, the memory is used for storing a program, and the processor is used for running the program, wherein the hand segmentation method provided by the application is executed when the program runs, and the method comprises the following steps of:
acquiring an image input by a user;
inputting the image into a pre-constructed segmentation network to obtain an output result of the segmentation network; the output result comprises a left-hand mask, a right-hand mask, a first numerical value and a second numerical value; the first value indicates a probability of success of the left-hand identification, and the second value indicates a probability of success of the right-hand identification;
judging whether the first value and the second value are both larger than a preset threshold value or not;
transmitting the left-hand mask and the right-hand mask to the user when the first value and the second value are both greater than the preset threshold;
repeatedly executing preset steps under the condition that the first numerical value and the second numerical value are not larger than the preset threshold, performing iterative processing on the output result until the first numerical value and the second numerical value indicated by the output result after iterative processing are both larger than the preset threshold, and sending a left-hand mask and a right-hand mask contained in the output result after iterative processing to the user; the presetting step comprises the following steps: and generating a new image based on the output result, and inputting the new image into the segmentation network to obtain a new output result.
Optionally, the splitting network includes:
the downsampling structure is used for downsampling the image to obtain a downsampled image;
the feature recognition structure is used for recognizing and obtaining a feature image from the downsampled image; the characteristic images comprise a left-hand characteristic image and a right-hand characteristic image;
the up-sampling structure is used for up-sampling the left hand characteristic image to obtain the left hand mask and the probability of success of the left hand identification; and upsampling the right hand characteristic image to obtain the right hand mask and the right hand recognition success probability.
Optionally, the downsampling structure includes:
standard convolution layer, normalization layer, activation layer, and downsampling layer.
Optionally, the feature recognition structure includes:
a depth convolution layer, a normalization layer, an activation layer, and a three-dimensional point cloud operation layer.
Optionally, the upsampling structure includes:
standard convolutional layer, normalized layer, active layer, and transposed convolutional layer.
Optionally, the splitting network further includes:
and the jump link structure is used for assisting the up-sampling structure to up-sample the characteristic image.
Optionally, the generating a new image based on the output result includes:
multiplying the left-hand mask with the first value to obtain a first product;
multiplying the right-hand mask with the second value to obtain a second product;
and carrying out channel combination on the first product and the second product to obtain a new image.
The functions described in the methods of the present application, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computing device readable storage medium. Based on such understanding, a portion of the embodiments of the present application that contributes to the prior art or a portion of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computing device (which may be a personal computer, a server, a mobile computing device or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (9)

1. A hand segmentation method, comprising:
acquiring an image input by a user;
inputting the image into a pre-constructed segmentation network to obtain an output result of the segmentation network; the output result comprises a left-hand mask, a right-hand mask, a first numerical value and a second numerical value; the first value indicates a probability of success of the left-hand identification, and the second value indicates a probability of success of the right-hand identification;
judging whether the first value and the second value are both larger than a preset threshold value or not;
transmitting the left-hand mask and the right-hand mask to the user when the first value and the second value are both greater than the preset threshold;
repeatedly executing preset steps under the condition that the first numerical value and the second numerical value are not larger than the preset threshold, performing iterative processing on the output result until the first numerical value and the second numerical value indicated by the output result after iterative processing are both larger than the preset threshold, and sending a left-hand mask and a right-hand mask contained in the output result after iterative processing to the user; the presetting step comprises the following steps: multiplying the left-hand mask with the first value to obtain a first product; multiplying the right-hand mask with the second value to obtain a second product; channel combination is carried out on the first product and the second product, so that a new image is obtained; and inputting the new image into the segmentation network to obtain the new output result.
2. The method of claim 1, wherein the splitting network comprises:
the downsampling structure is used for downsampling the image to obtain a downsampled image;
the feature recognition structure is used for recognizing and obtaining a feature image from the downsampled image; the characteristic images comprise a left-hand characteristic image and a right-hand characteristic image;
the up-sampling structure is used for up-sampling the left hand characteristic image to obtain the left hand mask and the probability of success of the left hand identification; and upsampling the right hand characteristic image to obtain the right hand mask and the right hand recognition success probability.
3. The method of claim 2, wherein the downsampling structure comprises:
standard convolution layer, normalization layer, activation layer, and downsampling layer.
4. The method of claim 2, wherein the feature recognition structure comprises:
a depth convolution layer, a normalization layer, an activation layer, and a three-dimensional point cloud operation layer.
5. The method of claim 2, wherein the upsampling structure comprises:
standard convolutional layer, normalized layer, active layer, and transposed convolutional layer.
6. The method of claim 2, wherein the splitting network further comprises:
and the jump link structure is used for assisting the up-sampling structure to up-sample the characteristic image.
7. A hand segmentation apparatus, comprising:
an acquisition unit configured to acquire an image input by a user;
the segmentation unit is used for inputting the image into a pre-constructed segmentation network to obtain an output result of the segmentation network; the output result comprises a left-hand mask, a right-hand mask, a first numerical value and a second numerical value; the first value indicates a probability of success of the left-hand identification, and the second value indicates a probability of success of the right-hand identification;
the judging unit is used for judging whether the first numerical value and the second numerical value are both larger than a preset threshold value or not;
a sending unit, configured to send the left-hand mask and the right-hand mask to the user when the first value and the second value are both greater than the preset threshold;
the iteration unit is used for repeatedly executing preset steps under the condition that the first numerical value and the second numerical value are not larger than the preset threshold value, carrying out iteration processing on the output result until the first numerical value and the second numerical value indicated by the output result after the iteration processing are larger than the preset threshold value, and sending a left-hand mask and a right-hand mask contained in the output result after the iteration processing to the user; the presetting step comprises the following steps: multiplying the left-hand mask with the first value to obtain a first product; multiplying the right-hand mask with the second value to obtain a second product; channel combination is carried out on the first product and the second product, so that a new image is obtained; and inputting the new image into the segmentation network to obtain a new output result.
8. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored program, wherein the program performs the hand segmentation method according to any one of claims 1-6.
9. A hand segmentation apparatus, comprising: a processor, a memory, and a bus; the processor is connected with the memory through the bus;
the memory is used for storing a program, and the processor is used for running the program, wherein the program executes the hand segmentation method as set forth in any one of claims 1-6.
CN202110245345.3A 2021-03-05 2021-03-05 Hand segmentation method, device, storage medium and equipment Active CN113158774B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110245345.3A CN113158774B (en) 2021-03-05 2021-03-05 Hand segmentation method, device, storage medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110245345.3A CN113158774B (en) 2021-03-05 2021-03-05 Hand segmentation method, device, storage medium and equipment

Publications (2)

Publication Number Publication Date
CN113158774A CN113158774A (en) 2021-07-23
CN113158774B true CN113158774B (en) 2023-12-29

Family

ID=76884338

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110245345.3A Active CN113158774B (en) 2021-03-05 2021-03-05 Hand segmentation method, device, storage medium and equipment

Country Status (1)

Country Link
CN (1) CN113158774B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491752A (en) * 2018-01-16 2018-09-04 北京航空航天大学 A kind of hand gestures method of estimation based on hand Segmentation convolutional network
CN109190559A (en) * 2018-08-31 2019-01-11 深圳先进技术研究院 A kind of gesture identification method, gesture identifying device and electronic equipment
CN109977834A (en) * 2019-03-19 2019-07-05 清华大学 The method and apparatus divided manpower from depth image and interact object
CN111448581A (en) * 2017-10-24 2020-07-24 巴黎欧莱雅公司 System and method for image processing using deep neural networks
CN111539288A (en) * 2020-04-16 2020-08-14 中山大学 Real-time detection method for gestures of both hands
WO2020199593A1 (en) * 2019-04-04 2020-10-08 平安科技(深圳)有限公司 Image segmentation model training method and apparatus, image segmentation method and apparatus, and device and medium
WO2020215565A1 (en) * 2019-04-26 2020-10-29 平安科技(深圳)有限公司 Hand image segmentation method and apparatus, and computer device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009131539A1 (en) * 2008-04-22 2009-10-29 Agency For Science, Technology And Research A method and system for detecting and tracking hands in an image
US8837780B2 (en) * 2012-06-22 2014-09-16 Hewlett-Packard Development Company, L.P. Gesture based human interfaces
CN113874883A (en) * 2019-05-21 2021-12-31 奇跃公司 Hand pose estimation

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111448581A (en) * 2017-10-24 2020-07-24 巴黎欧莱雅公司 System and method for image processing using deep neural networks
CN108491752A (en) * 2018-01-16 2018-09-04 北京航空航天大学 A kind of hand gestures method of estimation based on hand Segmentation convolutional network
CN109190559A (en) * 2018-08-31 2019-01-11 深圳先进技术研究院 A kind of gesture identification method, gesture identifying device and electronic equipment
CN109977834A (en) * 2019-03-19 2019-07-05 清华大学 The method and apparatus divided manpower from depth image and interact object
WO2020199593A1 (en) * 2019-04-04 2020-10-08 平安科技(深圳)有限公司 Image segmentation model training method and apparatus, image segmentation method and apparatus, and device and medium
WO2020215565A1 (en) * 2019-04-26 2020-10-29 平安科技(深圳)有限公司 Hand image segmentation method and apparatus, and computer device
CN111539288A (en) * 2020-04-16 2020-08-14 中山大学 Real-time detection method for gestures of both hands

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Left/right hand segmentation in egocentric videos;Betancourt, A 等;COMPUTER VISION AND IMAGE UNDERSTANDING(第154期);第73-81页 *
基于RGB-D图像的手势识别方法;谭台哲;韩亚伟;邵阳;;计算机工程与设计(第02期);第511-515页 *

Also Published As

Publication number Publication date
CN113158774A (en) 2021-07-23

Similar Documents

Publication Publication Date Title
KR102392094B1 (en) Sequence processing using convolutional neural networks
CN113887701B (en) Method, system and storage medium for generating output for neural network output layer
JP2020149719A (en) Batch normalization layers
CN110263162B (en) Convolutional neural network, text classification method thereof and text classification device
US10929610B2 (en) Sentence-meaning recognition method, sentence-meaning recognition device, sentence-meaning recognition apparatus and storage medium
CN109344242B (en) Dialogue question-answering method, device, equipment and storage medium
CN112183098B (en) Session processing method and device, storage medium and electronic device
CN110245621B (en) Face recognition device, image processing method, feature extraction model, and storage medium
CN113435196B (en) Intention recognition method, device, equipment and storage medium
CN109583586B (en) Convolution kernel processing method and device in voice recognition or image recognition
CN113032528A (en) Case analysis method, case analysis device, case analysis equipment and storage medium
JP2023543964A (en) Image processing method, image processing device, electronic device, storage medium and computer program
CN113361567B (en) Image processing method, device, electronic equipment and storage medium
CN113158774B (en) Hand segmentation method, device, storage medium and equipment
CN111353514A (en) Model training method, image recognition method, device and terminal equipment
CN110413750B (en) Method and device for recalling standard questions according to user questions
EP4116860A2 (en) Method for acquiring information, electronic device and storage medium
CN115457329B (en) Training method of image classification model, image classification method and device
CN114490969B (en) Question and answer method and device based on table and electronic equipment
CN110765245A (en) Emotion positive and negative judgment method, device and equipment based on big data and storage medium
CN114970666B (en) Spoken language processing method and device, electronic equipment and storage medium
CN112257470A (en) Model training method and device, computer equipment and readable storage medium
CN109165097B (en) Data processing method and data processing device
CN113344200A (en) Method for training separable convolutional network, road side equipment and cloud control platform
CN109325234B (en) Sentence processing method, sentence processing device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant