CN113158774B

CN113158774B - Hand segmentation method, device, storage medium and equipment

Info

Publication number: CN113158774B
Application number: CN202110245345.3A
Authority: CN
Inventors: 古迎冬; 李骊
Original assignee: Beijing HJIMI Technology Co Ltd
Current assignee: Beijing HJIMI Technology Co Ltd
Priority date: 2021-03-05
Filing date: 2021-03-05
Publication date: 2023-12-29
Anticipated expiration: 2041-03-05
Also published as: CN113158774A

Abstract

The application discloses a hand segmentation method, a device, a storage medium and equipment, wherein an image input by a user is acquired, and is input into a segmentation network to obtain an output result of the segmentation network. And judging whether the first value and the second value are both larger than a preset threshold value. And if the first numerical value and the second numerical value are both larger than the preset threshold, sending the left-hand mask and the right-hand mask to the user, otherwise, repeatedly executing the preset steps, performing iterative processing on the output result until the first numerical value and the second numerical value indicated by the output result after the iterative processing are both larger than the preset threshold, and sending the left-hand mask and the right-hand mask contained in the output result after the iterative processing to the user. Compared with the prior art, the method has the advantages that the calculation time spent by the method is obviously and effectively reduced, and the efficiency of hand segmentation is improved. In addition, the network structure based on the split network is known that the split network has low requirements on hardware resources, and can be widely applied to most individuals and teams.

Description

Hand segmentation method, device, storage medium and equipment

Technical Field

The present disclosure relates to the field of image processing, and in particular, to a hand segmentation method, device, storage medium, and apparatus.

Background

How to accurately divide hands (including left hand and right hand) in an image is a major concern for teams and enterprises currently researching gesture recognition on the market. At present, the hand segmentation is generally realized by using a deep learning network, however, under the condition of ensuring accurate segmentation results, the existing deep learning network generally needs to take a long time to calculate, so that the efficiency of the hand segmentation is lower, and the deep learning network has higher requirements on hardware resources, is difficult to be applied to most individuals and teams, has too narrow application range, and is unfavorable for research and development of gesture recognition work.

Disclosure of Invention

The application provides a hand segmentation method, device, storage medium and equipment, which are used for improving the efficiency of hand segmentation under the condition of ensuring accurate hand segmentation results.

In order to achieve the above object, the present application provides the following technical solutions:

a hand segmentation method, comprising:

acquiring an image input by a user;

inputting the image into a pre-constructed segmentation network to obtain an output result of the segmentation network; the output result comprises a left-hand mask, a right-hand mask, a first numerical value and a second numerical value; the first value indicates a probability of success of the left-hand identification, and the second value indicates a probability of success of the right-hand identification;

judging whether the first value and the second value are both larger than a preset threshold value or not;

transmitting the left-hand mask and the right-hand mask to the user when the first value and the second value are both greater than the preset threshold;

repeatedly executing preset steps under the condition that the first numerical value and the second numerical value are not larger than the preset threshold, performing iterative processing on the output result until the first numerical value and the second numerical value indicated by the output result after iterative processing are both larger than the preset threshold, and sending a left-hand mask and a right-hand mask contained in the output result after iterative processing to the user; the presetting step comprises the following steps: and generating a new image based on the output result, and inputting the new image into the segmentation network to obtain the new output result.

Optionally, the splitting network includes:

the downsampling structure is used for downsampling the image to obtain a downsampled image;

the feature recognition structure is used for recognizing and obtaining a feature image from the downsampled image; the characteristic images comprise a left-hand characteristic image and a right-hand characteristic image;

the up-sampling structure is used for up-sampling the left hand characteristic image to obtain the left hand mask and the probability of success of the left hand identification; and upsampling the right hand characteristic image to obtain the right hand mask and the right hand recognition success probability.

Optionally, the downsampling structure includes:

standard convolution layer, normalization layer, activation layer, and downsampling layer.

Optionally, the feature recognition structure includes:

a depth convolution layer, a normalization layer, an activation layer, and a three-dimensional point cloud operation layer.

Optionally, the upsampling structure includes:

standard convolutional layer, normalized layer, active layer, and transposed convolutional layer.

Optionally, the splitting network further includes:

and the jump link structure is used for assisting the up-sampling structure to up-sample the characteristic image.

Optionally, the generating a new image based on the output result includes:

multiplying the left-hand mask with the first value to obtain a first product;

multiplying the right-hand mask with the second value to obtain a second product;

and carrying out channel combination on the first product and the second product to obtain a new image.

A hand segmentation apparatus comprising:

an acquisition unit configured to acquire an image input by a user;

the segmentation unit is used for inputting the image into a pre-constructed segmentation network to obtain an output result of the segmentation network; the output result comprises a left-hand mask, a right-hand mask, a first numerical value and a second numerical value; the first value indicates a probability of success of the left-hand identification, and the second value indicates a probability of success of the right-hand identification;

the judging unit is used for judging whether the first numerical value and the second numerical value are both larger than a preset threshold value or not;

a sending unit, configured to send the left-hand mask and the right-hand mask to the user when the first value and the second value are both greater than the preset threshold;

the iteration unit is used for repeatedly executing preset steps under the condition that the first numerical value and the second numerical value are not larger than the preset threshold value, carrying out iteration processing on the output result until the first numerical value and the second numerical value indicated by the output result after the iteration processing are larger than the preset threshold value, and sending a left-hand mask and a right-hand mask contained in the output result after the iteration processing to the user; the presetting step comprises the following steps: and generating a new image based on the output result, and inputting the new image into the segmentation network to obtain a new output result.

A computer-readable storage medium comprising a stored program, wherein the program performs the hand segmentation method.

A hand segmentation apparatus comprising: a processor, a memory, and a bus; the processor is connected with the memory through the bus;

the memory is used for storing a program, and the processor is used for running the program, wherein the hand segmentation method is executed when the program runs.

According to the technical scheme, the image input by the user is acquired, and the image is input into the pre-constructed segmentation network to obtain the output result of the segmentation network. The output results include a left-hand mask, a right-hand mask, a first value, and a second value. The first value indicates the probability of success of the left hand identification and the second value indicates the probability of success of the right hand identification. And judging whether the first value and the second value are both larger than a preset threshold value. And under the condition that the first value and the second value are both larger than a preset threshold value, sending the left-hand mask and the right-hand mask to a user. And repeatedly executing the preset step under the condition that the first numerical value and the second numerical value are not larger than the preset threshold value, performing iterative processing on the output result until the first numerical value and the second numerical value indicated by the output result after the iterative processing are larger than the preset threshold value, and sending a left-hand mask and a right-hand mask contained in the output result after the iterative processing to a user. The method comprises the following steps of: based on the output result, a new image is generated, and the new image is input into the segmentation network to obtain a new output result. By comparing the first value, the second value and the preset threshold value, the iteration processing times of the output result of the segmentation network can be planned, namely, the index quantization of the effect of hand segmentation is realized (the quantization index is the preset threshold value, and the iteration processing times of the output result is planned by the preset threshold value), so that redundant calculation processes are avoided. Compared with the prior art, the method has the advantages that the calculation time is obviously and effectively reduced, and therefore the efficiency of hand segmentation is improved. In addition, the network structure based on the split network is known that the split network has low requirement on hardware resources, can be widely applied to most individuals and teams, and has higher applicability.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1a is a schematic diagram of a hand segmentation method according to an embodiment of the present application;

fig. 1b is a schematic diagram of a network structure of a split network according to an embodiment of the present application;

fig. 1c is a schematic diagram of a network structure of another split network according to an embodiment of the present application;

FIG. 2 is a schematic diagram of another hand segmentation method according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a hand segmentation device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

As shown in fig. 1a, a schematic diagram of a hand segmentation method according to an embodiment of the present application includes the following steps:

s101: an image input by a user is acquired.

Among them, images include, but are not limited to, color images, infrared images, depth images, and the like.

S102: inputting the image into a pre-constructed segmentation network to obtain an output result of the segmentation network.

The output result of the segmentation network comprises a first segmentation result, a second segmentation result, a first numerical value and a second numerical value.

The first segmentation result indicates a left-hand mask (mask), the second segmentation result indicates a right-hand mask, the first value indicates a probability of success of the left-hand recognition, and the second value indicates a probability of success of the right-hand recognition.

In an embodiment of the present application, the partitioning network includes a downsampling structure, a feature recognition structure, an upsampling structure, and a hopped linking structure.

Specifically, according to the network structure shown in fig. 1b, the process of dividing the network processing image includes:

1. the image is input into a downsampling structure to obtain a first result.

It should be noted that the downsampling structure functions as: the image is downsampled to obtain a downsampled image (i.e., a first result). The downsampling structure includes a standard convolutional layer (commonly known as standard Conv), a normalized layer (commonly known as BN layer), an active layer (commonly known as swish), and a downsampling layer (commonly known as pooling). In the embodiment of the present application, the number of standard convolution layers, and the size of the convolution kernel may be set by a technician according to the actual situation.

2. And inputting the first result into the feature recognition structure to obtain a feature image.

It should be noted that the feature recognition structure functions as: and identifying and obtaining a characteristic image from the downsampled image. The feature images include left-hand feature images and right-hand feature images, and the feature recognition structure includes a depth convolution layer (commonly referred to as DepthConv), a normalization layer, an activation layer, and a three-dimensional point cloud operation layer (commonly referred to as poiintconv).

3. The left hand characteristic image is input into the up-sampling structure through the jump link structure, and a left hand mask and the probability of success of left hand identification are obtained.

4. The right hand feature image is input into the up-sampling structure through the jump link structure, and a right hand mask and the right hand recognition success probability are obtained.

It should be noted that the jump link structure functions as: the auxiliary up-sampling structure samples the feature images, i.e. in order to increase the training speed of the segmentation network. The skip link structure includes a channel merge layer (commonly known as concat), a standard convolution layer, and a 1×1 convolution layer (commonly known as 1×1 Conv). In the embodiment of the present application, the number of the channel merging layer, the standard convolution layer, and the 1×1 convolution layer may be set by a technician according to the actual situation.

The up-sampling structure has the functions of: and upsampling the characteristic image (specifically, upsampling the left-hand characteristic image to obtain a left-hand mask and the probability of successful left-hand recognition; and upsampling the right-hand characteristic image to obtain a right-hand mask and the probability of successful right-hand recognition). The upsampling structure includes a standard convolutional layer, a normalization layer, an activation layer, and a transposed convolutional layer (commonly referred to as TransConv).

It should be emphasized that the above-mentioned split network consisting of the downsampling structure, the feature recognition structure, the upsampling structure, and the jump linking structure can also be seen in fig. 1 c.

S103: and judging whether the first value and the second value are both larger than a preset threshold value.

If the first value and the second value are both greater than the preset threshold, S104 is executed, otherwise S105 is executed.

S104: the left-hand mask, and the right-hand mask are sent to the user.

It should be noted that, if the first value and the second value are both greater than the preset threshold, it is determined that the effect of hand segmentation meets the preset requirement, that is, the accuracy of the hand segmentation result can be ensured.

S105: the left-hand mask is multiplied by the first value to obtain a first product.

S106: and multiplying the right-hand mask with the second value to obtain a second product.

Wherein S105 and S106 are performed concurrently.

The specific implementation principle of multiplying the left-hand mask and the right-hand mask by the numerical value is known as a person skilled in the art, and will not be described here again.

S107: and carrying out channel combination on the first product and the second product to obtain a new image, and returning to S102.

The specific implementation principle of channel combination is common knowledge familiar to those skilled in the art, and will not be described herein.

The new image is processed by invoking S102, and the new output result obtained by the processing is more excellent in hand segmentation effect than the original output result.

In summary, by comparing the first value, the second value, and the preset threshold, the number of iterative processes of the output result of the segmentation network can be planned, that is, the index quantization of the effect of hand segmentation (the quantization index is the preset threshold, and the number of iterative processes of the output result is planned by the preset threshold), so as to avoid performing redundant calculation processes. Compared with the prior art, the method of the embodiment obviously and effectively reduces the calculation time, thereby improving the efficiency of hand segmentation. In addition, the network structure based on the split network is known that the split network has low requirement on hardware resources, can be widely applied to most individuals and teams, and has higher applicability.

It should be noted that S105 and S106 mentioned in the foregoing embodiments are an optional specific implementation manner of the hand segmentation method described in the present application. In addition, S107 mentioned in the foregoing embodiment is also an optional specific implementation of the hand segmentation method described in the present application. For this reason, the flow shown in the above embodiment can be summarized as the method shown in fig. 2.

As shown in fig. 2, a schematic diagram of another hand segmentation method according to an embodiment of the present application includes the following steps:

s201: an image input by a user is acquired.

S202: inputting the image into a pre-constructed segmentation network to obtain an output result of the segmentation network.

The output result comprises a left-hand mask, a right-hand mask, a first value and a second value. The first value indicates the probability of success of the left hand identification and the second value indicates the probability of success of the right hand identification.

S203: and judging whether the first value and the second value are both larger than a preset threshold value.

If the first value and the second value are both greater than the preset threshold, S204 is executed, otherwise S205 is executed.

S204: the left-hand mask, and the right-hand mask are sent to the user.

S205: and repeatedly executing the preset step, carrying out iterative processing on the output result until the first numerical value and the second numerical value indicated by the output result after the iterative processing are both larger than a preset threshold value, and sending a left-hand mask and a right-hand mask contained in the output result after the iterative processing to a user.

The method comprises the following steps of: based on the output result, a new image is generated, and the new image is input into the segmentation network to obtain a new output result.

Corresponding to the hand segmentation method described in the embodiment of the present application, the embodiment of the present application further provides a hand segmentation device.

Fig. 3 is a schematic structural diagram of a hand segmentation device according to an embodiment of the present application, including:

an acquisition unit 100 for acquiring an image input by a user.

The segmentation unit 200 is configured to input the image into a pre-constructed segmentation network, and obtain an output result of the segmentation network. The output results include a left-hand mask, a right-hand mask, a first value, and a second value. The first value indicates the probability of success of the left hand identification and the second value indicates the probability of success of the right hand identification.

Wherein the splitting network comprises: the downsampling structure is used for downsampling the image to obtain a downsampled image; the feature recognition structure is used for recognizing and obtaining a feature image from the downsampled image, wherein the feature image comprises a left-hand feature image and a right-hand feature image; the up-sampling structure is used for up-sampling the left-hand characteristic image to obtain a left-hand mask and the probability of success of left-hand identification, and up-sampling the right-hand characteristic image to obtain a right-hand mask and the probability of success of right-hand identification; and the jump link structure is used for assisting the up-sampling structure to up-sample the characteristic image.

The downsampling structure includes a standard convolution layer, a normalization layer, an activation layer, and a downsampling layer.

The feature recognition structure comprises a depth convolution layer, a normalization layer, an activation layer and a three-dimensional point cloud operation layer.

The upsampling structure includes a standard convolution layer, a normalization layer, an activation layer, and a transposed convolution layer.

The judging unit 300 is configured to judge whether the first value and the second value are both greater than a preset threshold.

And a transmitting unit 400, configured to transmit the left-hand mask and the right-hand mask to the user when the first value and the second value are both greater than the preset threshold.

And the iteration unit 500 is configured to repeatedly perform the preset step when the first value and the second value are not greater than the preset threshold, perform the iterative processing on the output result until the first value and the second value indicated by the output result after the iterative processing are both greater than the preset threshold, and send a left-hand mask and a right-hand mask included in the output result after the iterative processing to the user. The method comprises the following steps of: based on the output result, a new image is generated, and the new image is input into the segmentation network to obtain a new output result.

Wherein, the process of the iteration unit 500 for generating a new image based on the output result includes: multiplying the left-hand mask with a first value to obtain a first product; multiplying the right-hand mask with a second value to obtain a second product; and carrying out channel combination on the first product and the second product to obtain a new image.

The present application also provides a computer-readable storage medium including a stored program, wherein the program executes the hand segmentation method provided by the present application.

The application also provides a hand segmentation apparatus comprising: a processor, a memory, and a bus. The processor is connected with the memory through a bus, the memory is used for storing a program, and the processor is used for running the program, wherein the hand segmentation method provided by the application is executed when the program runs, and the method comprises the following steps of:

acquiring an image input by a user;

repeatedly executing preset steps under the condition that the first numerical value and the second numerical value are not larger than the preset threshold, performing iterative processing on the output result until the first numerical value and the second numerical value indicated by the output result after iterative processing are both larger than the preset threshold, and sending a left-hand mask and a right-hand mask contained in the output result after iterative processing to the user; the presetting step comprises the following steps: and generating a new image based on the output result, and inputting the new image into the segmentation network to obtain a new output result.

Optionally, the splitting network includes:

Optionally, the downsampling structure includes:

Optionally, the feature recognition structure includes:

Optionally, the upsampling structure includes:

Optionally, the splitting network further includes:

Optionally, the generating a new image based on the output result includes:

multiplying the left-hand mask with the first value to obtain a first product;

The functions described in the methods of the present application, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computing device readable storage medium. Based on such understanding, a portion of the embodiments of the present application that contributes to the prior art or a portion of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computing device (which may be a personal computer, a server, a mobile computing device or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A hand segmentation method, comprising:

acquiring an image input by a user;

repeatedly executing preset steps under the condition that the first numerical value and the second numerical value are not larger than the preset threshold, performing iterative processing on the output result until the first numerical value and the second numerical value indicated by the output result after iterative processing are both larger than the preset threshold, and sending a left-hand mask and a right-hand mask contained in the output result after iterative processing to the user; the presetting step comprises the following steps: multiplying the left-hand mask with the first value to obtain a first product; multiplying the right-hand mask with the second value to obtain a second product; channel combination is carried out on the first product and the second product, so that a new image is obtained; and inputting the new image into the segmentation network to obtain the new output result.

2. The method of claim 1, wherein the splitting network comprises:

3. The method of claim 2, wherein the downsampling structure comprises:

4. The method of claim 2, wherein the feature recognition structure comprises:

5. The method of claim 2, wherein the upsampling structure comprises:

6. The method of claim 2, wherein the splitting network further comprises:

7. A hand segmentation apparatus, comprising:

an acquisition unit configured to acquire an image input by a user;

the iteration unit is used for repeatedly executing preset steps under the condition that the first numerical value and the second numerical value are not larger than the preset threshold value, carrying out iteration processing on the output result until the first numerical value and the second numerical value indicated by the output result after the iteration processing are larger than the preset threshold value, and sending a left-hand mask and a right-hand mask contained in the output result after the iteration processing to the user; the presetting step comprises the following steps: multiplying the left-hand mask with the first value to obtain a first product; multiplying the right-hand mask with the second value to obtain a second product; channel combination is carried out on the first product and the second product, so that a new image is obtained; and inputting the new image into the segmentation network to obtain a new output result.

8. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored program, wherein the program performs the hand segmentation method according to any one of claims 1-6.

9. A hand segmentation apparatus, comprising: a processor, a memory, and a bus; the processor is connected with the memory through the bus;

the memory is used for storing a program, and the processor is used for running the program, wherein the program executes the hand segmentation method as set forth in any one of claims 1-6.