CN112950652A

CN112950652A - Robot and hand image segmentation method and device thereof

Info

Publication number: CN112950652A
Application number: CN202110182033.2A
Authority: CN
Inventors: 顾在旺; 程骏; 庞建新
Original assignee: Ubtech Robotics Corp
Current assignee: Ubtech Robotics Corp
Priority date: 2021-02-08
Filing date: 2021-02-08
Publication date: 2021-06-11
Anticipated expiration: 2041-02-08
Also published as: CN112950652B

Abstract

The application belongs to the field of robots and provides a robot and a hand image segmentation method and device thereof, wherein the method comprises the following steps: acquiring an image to be segmented; performing feature extraction on the image to be segmented according to the trained first neural network model to obtain a first feature image corresponding to the image to be segmented; acquiring a second characteristic image corresponding to the image to be segmented according to a preset cavity convolution kernel; and decoding the first characteristic image and the second characteristic image according to the trained decoding neural network model, and determining a hand image of the image to be segmented. The method and the device can learn the characteristics with larger receptive field in the image to be segmented through the hole convolution kernel, can provide richer characteristic data for decoding a neural network model, can obtain more accurate segmentation results under the condition that the calculated amount of the operation mode of multi-size convolution segmentation is basically the same, and are favorable for improving the segmentation precision of the hand image.

Description

Robot and hand image segmentation method and device thereof

Technical Field

The application belongs to the field of robots, and particularly relates to a robot and a hand image segmentation method and device thereof.

Background

In recent years, with the rapid development of artificial intelligence, a robot end deploys applications of many artificial intelligence. The robot can interact with a person through an artificial intelligence algorithm. In the robot-human interaction process, gestures are a very simple and convenient interaction mode. In order to effectively perform gesture interaction, the robot needs to accurately segment the hand region of the interactive object, so that the gesture of the interactive object can be accurately recognized.

Current hand region recognition algorithms typically employ a full convolution neural network model. In the encoding part, continuous convolution is used to extract the features in the image, and meanwhile, the size of the extracted features is reduced by using pooling operation, so that the purpose of reducing the network calculation amount is achieved. Such successive convolution and pooling operations can ensure the stability and non-deformation of the extracted features, but can reduce the size of the extracted features, which is not conducive to determining for each pixel. Although the size of the features can be restored through deconvolution, when a feature map with a smaller size is restored, partial information is lost, the gesture types are various, the environment is easy to change, and the improvement of the segmentation precision of the hand image is not facilitated.

Disclosure of Invention

In view of this, embodiments of the present application provide a robot and a hand image segmentation method and device thereof, so as to solve the problem that in the prior art, when hand image segmentation is performed, it is not beneficial to improve the segmentation accuracy of a hand image.

A first aspect of an embodiment of the present application provides a hand image segmentation method, including:

acquiring an image to be segmented;

performing feature extraction on the image to be segmented according to the trained first neural network model to obtain a first feature image corresponding to the image to be segmented;

acquiring a second characteristic image corresponding to the image to be segmented according to a preset cavity convolution kernel;

and decoding the first characteristic image and the second characteristic image according to the trained decoding neural network model, and determining a hand image of the image to be segmented.

With reference to the first aspect, in a first possible implementation manner of the first aspect, before performing feature extraction on the image to be segmented according to a trained first neural network model to obtain a first feature image corresponding to the image to be segmented, the method further includes:

and carrying out normalization processing on the image to be segmented.

With reference to the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, the normalizing the image to be segmented includes:

determining pixel values of pixels in the image to be segmented;

according to the formula: (Ii-I)_min)/I_maxDetermining a normalized value corresponding to each pixel, wherein Ii is the pixel value of any pixel, I_minIs the minimum pixel value, I, of a pixel in the image to be segmented_maxIs the maximum pixel value of a pixel in the image to be segmented.

With reference to the first aspect, in a third possible implementation manner of the first aspect, the decoding the first feature image and the second feature image according to a trained decoding neural network model, and determining a hand image of the image to be segmented includes:

decoding the first characteristic image and the second characteristic image according to the trained decoding neural network model to obtain a portrait segmentation result and an edge detection result in the image to be segmented;

and determining a hand image of the image to be segmented according to the human image segmentation result and the edge detection result.

With reference to the third possible implementation manner of the first aspect, in a fourth possible implementation manner of the first aspect, the decoding the first feature image and the second feature image according to the trained decoding neural network model to obtain a portrait segmentation result and an edge detection result in the image to be segmented includes:

decoding the first characteristic image and the second characteristic image according to a trained second neural network model to obtain a human image segmentation result in the image to be segmented;

and decoding the first characteristic image and the second characteristic image according to the trained third neural network model to obtain an edge detection result in the image to be segmented.

With reference to the first aspect, in a fifth possible implementation manner of the first aspect, the obtaining a second feature image corresponding to the image to be segmented according to a preset hole convolution kernel includes one or two of the following manners:

performing hole convolution on the image to be segmented according to preset hole convolution cores with different hole rates to generate a second characteristic image corresponding to the image to be segmented;

and performing hole convolution on the first characteristic image according to preset hole convolution kernels with different hole rates to generate a second characteristic image corresponding to the image to be segmented.

With reference to the first aspect, in a sixth possible implementation manner of the first aspect, before performing feature extraction on the image to be segmented according to a trained first neural network model to obtain a first feature image corresponding to the image to be segmented, the method further includes:

and cutting the image to be segmented according to the size of a square with a preset size.

A second aspect of an embodiment of the present application provides a hand image segmentation apparatus, including:

the image acquisition unit is used for acquiring an image to be segmented;

the first characteristic image acquisition unit is used for extracting the characteristics of the image to be segmented according to the trained first neural network model to obtain a first characteristic image corresponding to the image to be segmented;

the second characteristic image acquisition unit is used for acquiring a second characteristic image corresponding to the image to be segmented according to a preset cavity convolution kernel;

and the decoding unit is used for decoding the first characteristic image and the second characteristic image according to the trained decoding neural network model and determining the hand image of the image to be segmented.

A third aspect of embodiments of the present application provides a robot comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method according to any one of the first aspect when executing the computer program.

A fourth aspect of embodiments of the present application provides a computer-readable storage medium, in which a computer program is stored, which, when executed by a processor, performs the steps of the method according to any one of the first aspect.

Compared with the prior art, the embodiment of the application has the advantages that: determining a first characteristic image corresponding to an image to be segmented through a trained first neural network model, acquiring a second characteristic image corresponding to the image to be segmented through a hole convolution, and decoding the first characteristic image and the second characteristic image through the trained decoding neural network model to obtain a segmentation result of the hand image. According to the method and the device, the characteristics with larger receptive field in the image to be segmented can be learned through the hole convolution kernel, richer characteristic data can be provided for decoding a neural network model, and under the condition that the calculated amount of the operation mode is basically the same as that of multi-size convolution segmentation, a more accurate segmentation result can be obtained, so that the improvement of the segmentation precision of the hand image is facilitated.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic flow chart illustrating an implementation process of a hand image segmentation method according to an embodiment of the present application;

fig. 2 is a schematic diagram of a 3 × 3 void convolution kernel according to an embodiment of the present application;

FIG. 3 is a block diagram of a hand image segmentation process provided in an embodiment of the present application;

fig. 4 is a schematic diagram of a hand image segmentation apparatus according to an embodiment of the present application;

fig. 5 is a schematic view of a robot provided in an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

In order to explain the technical solution described in the present application, the following description will be given by way of specific examples.

The current hand image segmentation algorithms include a segmentation algorithm based on a full convolution neural network (FCN). In the segmentation process, firstly, the features in the image to be segmented are obtained through the continuous convolution layers, and the pooling layer is inserted in the middle of the convolution layers, so that the size of the feature image is reduced through the pooling operation, and the calculation amount of the network is reduced. Through continuous convolution and pooling operations, high-dimensional feature images (which can be understood as feature images obtained by convolution operations of different sizes and different modes) in the images to be segmented are extracted. And according to the extracted high-dimensional feature image, carrying out reduction operation on the high-dimensional feature image through deconvolution operation to obtain a segmentation result with the same size as the original input image. And measures the difference between the output of the network and the true tag by a loss function, including, for example, a cross-entropy loss function. Parameters in the network are optimized by optimization algorithms, including optimization algorithms such as stochastic gradient descent, to reduce the loss between the output of the algorithm and the tags. And obtaining a better image segmentation model through multiple iterations.

In the above-described image segmentation process, when the features in the image are extracted by successive convolution and the size of the extracted features is reduced by the pooling operation, the amount of network computation can be reduced. And, regardless of the position of the target object, the successive convolution and pooling operations can extract features in the image to be segmented. However, successive convolution and pooling operations reduce the size of the extracted feature image. Although the feature image may be reduced in size by a deconvolution operation. However, when the feature image with a smaller size is restored, part of information of the restored image is lost, which is not favorable for obtaining an accurate hand contour.

In the process of gesture interaction between the robot and the target object, the environment changes are different, the types of gestures are also various, and when the difference between the hand and the environment is small, the current segmentation algorithm cannot better segment the outline of the hand, so that the gesture interaction is not facilitated to be carried out more accurately.

In addition, although the magnitude of size reduction of the feature image may be made small by reducing the number of times of convolution and pooling, it may make the network learning insufficient, failing to sufficiently extract the feature image in the image to be segmented.

Based on the defects, the embodiment of the application provides a hand image segmentation method based on the void convolution, the void convolution is added to extract the feature image to obtain the second feature image, the second feature image and the first feature image obtained by coding are input into the decoding neural network model, and the segmentation precision of the hand image is improved through more comprehensive feature data input.

Fig. 1 is a schematic view of an implementation flow of a hand image segmentation method provided in an embodiment of the present application, which is detailed as follows:

in S101, an image to be segmented is acquired.

The image to be segmented in the embodiment of the application can be a scene image acquired by the robot in real time in an intelligent interaction process. The acquisition of the image to be segmented may be determined according to the state of the robot. For example, when the robot is in the intelligent interaction mode, the robot acquires images in an interaction scene in real time through the camera to obtain images to be segmented.

When the robot is in an intelligent interaction state, the scene images collected by the robot comprise a plurality of images. When the image to be segmented is collected, the collection frequency of the image to be segmented can be determined according to the variation amplitude of the gesture of the interactive object. In addition, in order to effectively perform segmentation recognition on the fine gesture, the corresponding relationship between the change speed of different parts and the acquisition frequency of the segmented image can be set. For example, for the same moving amplitude, if the moving part is a finger part, the higher acquisition frequency can be adopted for acquisition; if the moving part is the elbow or wrist, the acquisition can be carried out with lower acquisition frequency.

In S102, feature extraction is performed on the image to be segmented according to the trained first neural network model, so as to obtain a first feature image corresponding to the image to be segmented.

Before the extraction of the first characteristic image is obtained, a training process of the first neural network model is further included.

A plurality of pre-calibrated training sample images may be acquired while training the first neural network model. And inputting the training sample image into a first neural network model to be trained, and outputting a prediction result (predicted characteristic image) of the hand image through the first neural network model. And comparing the predicted result of the hand image output by the first neural network model with the calibration result of the sample image. For example, a cross entropy loss calculation method or the like may be used to calculate the loss of the prediction result relative to the calibration result. Based on the differences or losses determined by the comparison, the first neural network model may be optimized with respect to parameters by an optimization algorithm, such as by stochastic gradient descent, etc.

And after optimization, inputting the training sample image again for calculation to obtain a prediction result. And comparing the prediction result with the calibration result of the training sample image, and further optimizing the parameters of the first neural network model according to the comparison result. After multiple iterative optimizations, when the difference between the prediction result output by the first neural network model and the corresponding calibration result meets a predetermined requirement, the first neural network model can be considered to be trained.

When the trained first neural network model is used for extracting the features of the image to be segmented, the image to be segmented can be input into the first neural network model, optimization calculation is carried out through the first neural network model subjected to parameter optimization, and a first feature image corresponding to the image to be segmented is output.

In order to further improve the convenience of image processing, before the feature extraction of the image to be segmented, a size unification operation may be performed on the image to be segmented. For example, the image to be segmented may be pre-segmented by a preset standard size. The preset standard size can be a square area with a preset size, and the image to be segmented is intercepted, so that the pre-segmentation operation of the image to be segmented is realized.

For example, the size of the image to be segmented may be 640 × 480, and the predetermined size is 480 × 480, the image to be segmented may be subjected to a pre-segmentation process to obtain an image to be segmented of the same size.

In order to improve the convenience of image processing, before feature extraction, a step of performing normalization processing on the image to be segmented can be further included.

In a possible implementation manner, the normalization process may first obtain a pixel value including a pixel in the image to be segmented, and determine a maximum pixel value and a minimum pixel value of the pixel in the image to be segmented. The formula (Ii-I) can be used_min)/I_maxDetermining a normalized value corresponding to each pixel, wherein Ii is the pixel value of any pixel, I_minIs the minimum pixel value, I, of a pixel in the image to be segmented_maxIs the maximum pixel value of a pixel in the image to be segmented.

Of course, the mode of the normalization processing is not limited to this, and a mode in which the maximum pixel value and the minimum pixel value in the formula are set to fixed values may be employed. For example, the maximum pixel value is set to 255 and the minimum pixel value is set to 0.

When the minimum pixel value and the maximum pixel value in the image to be segmented are determined, the size and the position of the pixel value of the pixel in the image to be segmented can be determined more effectively, so that the feature extraction operation can be performed on the image to be segmented more accurately.

In S103, a second feature image corresponding to the image to be segmented is obtained according to a preset hole convolution kernel.

When the second characteristic image is obtained, the hole convolution calculation can be directly carried out on the image to be segmented according to a preset hole convolution kernel, and the second characteristic image corresponding to the image to be segmented is determined.

The first feature image or the second feature image in the embodiment of the present application may include a plurality of feature images. For example, a plurality of characteristic images corresponding to different voidage rates can be obtained by selecting the void convolution kernels of different voidage rates, or a plurality of different characteristic images can be obtained by calculating through the void convolution kernels of different numerical values. The void rate is the step interval between significant values in the batch void convolution kernel. For example, the hollow convolution kernel shown in fig. 2 has a size of 3 × 3, and the corresponding convolution rates are 1, 3, and 5, respectively.

In a possible implementation manner, the void convolution kernel may directly perform void convolution calculation on the first feature image, and obtain the second feature image according to the void convolution calculation.

Or, in a possible implementation manner, the second feature image includes a feature image obtained by performing hole convolution calculation on the first feature image according to hole convolution, and a feature image obtained by performing hole convolution calculation on the image to be segmented by using a hole convolution kernel.

The second feature image after the void convolution operation may be the same as or different from the image to be segmented.

In S104, decoding the first feature image and the second feature image according to the trained decoding neural network model, and determining a hand image of the image to be segmented.

In a possible implementation manner, the first feature image and the second feature image may both be used as inputs of the trained decoding neural network model, and the region of the hand image corresponding to the image to be segmented is output according to the pre-trained decoding neural network model.

Or, in a possible implementation manner, as shown in the schematic diagram of the hand image segmentation flow module shown in fig. 3, the first feature image may be input to a second neural network model which has been trained in advance, and a human image segmentation result in the image to be segmented is output, so as to obtain a human image region in the image to be segmented. And inputting the second characteristic image into the trained third neural network model to obtain an edge detection result in the image to be segmented. And determining the hand area of the image to be classified according to the obtained portrait area and the edge detection result and according to the edge detection result in the portrait area.

Or, in a possible implementation manner, the first feature image may be input to a second neural network model which is trained in advance, and a portrait segmentation result in the image to be segmented is output, so as to obtain a portrait region in the image to be segmented. And inputting the characteristic information corresponding to the portrait area, including the characteristic information of the first characteristic image in the portrait area and the characteristic information of the second characteristic error transmission in the portrait area, into a third neural network model to obtain a segmentation result of the hand area.

The second neural network model is used for detecting a portrait region in the image, and the third neural network model is used for detecting edges in the image.

In the training process of the third neural network model, the input of the third neural network model can be obtained according to a pre-calibrated training sample image, the trained first neural network model, cavity convolution kernel parameters and the like, and the third neural network model is subjected to parameter optimization according to the difference between the output prediction result and the calibration result until the optimized third neural network model meets the preset requirement.

The embodiment of the application carries out feature extraction on the image to be segmented through a mode based on the neural network model to obtain the first feature image, further extracts the global feature with wider visual field through the cavity convolution operation to obtain the second feature image, so that the decoding neural network model can identify the hand region according to richer feature information, the segmentation precision of the image to be segmented is favorably improved, and the intelligence of robot interaction is improved.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Fig. 4 is a schematic diagram of a hand image segmentation apparatus according to an embodiment of the present application, the apparatus corresponding to the method shown in fig. 1, and as shown in fig. 4, the apparatus includes:

an image obtaining unit 401, configured to obtain an image to be segmented;

a first feature image obtaining unit 402, configured to perform feature extraction on the image to be segmented according to a trained first neural network model, so as to obtain a first feature image corresponding to the image to be segmented;

a second feature image obtaining unit 403, configured to obtain, according to a preset cavity convolution kernel, a second feature image corresponding to the image to be segmented;

a decoding unit 404, configured to perform decoding processing on the first feature image and the second feature image according to the trained decoding neural network model, and determine a hand image of the image to be segmented.

In a possible implementation manner, the apparatus further includes a normalization unit, configured to perform normalization processing on the image to be segmented.

In a possible implementation manner, the normalization unit may include:

the pixel value determining subunit is used for determining the pixel values of the pixels in the image to be segmented;

a calculation subunit to, according to the formula: (Ii-I)_min)/I_maxDetermining a normalized value corresponding to each pixel, wherein Ii is the pixel value of any pixel, I_minIs the minimum pixel value, I, of a pixel in the image to be segmented_maxIs the maximum pixel value of a pixel in the image to be segmented.

In a possible implementation, the decoding unit may include:

the decoding subunit is configured to perform decoding processing on the first feature image and the second feature image according to the trained decoding neural network model to obtain a portrait segmentation result and an edge detection result in the image to be segmented;

and the segmentation subunit is used for determining the hand image of the image to be segmented according to the portrait segmentation result and the edge detection result.

The decoding subunit may include:

the first detection model is used for decoding the first characteristic image and the second characteristic image according to a trained second neural network model to obtain a human image segmentation result in the image to be segmented;

and the second detection module is used for decoding the first characteristic image and the second characteristic image according to the trained third neural network model to obtain an edge detection result in the image to be segmented.

In a possible implementation, the second feature image obtaining unit may include one or both of the following sub-units:

the first convolution subunit is used for performing hole convolution on the image to be segmented according to preset hole convolution cores with different hole rates and generating a second characteristic image corresponding to the image to be segmented;

and the second convolution subunit is used for performing hole convolution on the first characteristic image according to preset hole convolution cores with different hole rates and generating a second characteristic image corresponding to the image to be segmented.

In a possible implementation manner, the apparatus further includes a cropping unit, configured to crop the image to be segmented according to a square size of a predetermined size.

Fig. 5 is a schematic view of a robot provided in an embodiment of the present application. As shown in fig. 5, the robot 5 of this embodiment includes: a processor 50, a memory 51 and a computer program 52, such as a hand image segmentation program, stored in said memory 51 and executable on said processor 50. The processor 50, when executing the computer program 52, implements the steps in the various hand image segmentation method embodiments described above. Alternatively, the processor 50 implements the functions of the modules/units in the above-described device embodiments when executing the computer program 52.

Illustratively, the computer program 52 may be partitioned into one or more modules/units, which are stored in the memory 51 and executed by the processor 50 to accomplish the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 52 in the robot 5.

The robot may include, but is not limited to, a processor 50, a memory 51. Those skilled in the art will appreciate that fig. 5 is merely an example of a robot 5 and does not constitute a limitation of robot 5 and may include more or fewer components than shown, or some components in combination, or different components, e.g., the robot may also include input output devices, network access devices, buses, etc.

The Processor 50 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 51 may be an internal storage unit of the robot 5, such as a hard disk or a memory of the robot 5. The memory 51 may also be an external storage device of the robot 5, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the robot 5. Further, the memory 51 may also include both an internal storage unit and an external storage device of the robot 5. The memory 51 is used for storing the computer program and other programs and data required by the robot. The memory 51 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a computer readable storage medium, and when the computer program is executed by a processor, the steps of the methods described above can be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media which may not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A hand image segmentation method, characterized in that the method comprises:

acquiring an image to be segmented;

2. The method according to claim 1, wherein before performing feature extraction on the image to be segmented according to the trained first neural network model to obtain a first feature image corresponding to the image to be segmented, the method further comprises:

and carrying out normalization processing on the image to be segmented.

3. The method according to claim 2, wherein the normalizing the image to be segmented comprises:

determining pixel values of pixels in the image to be segmented;

4. The method of claim 1, wherein performing decoding processing on the first feature image and the second feature image according to a trained decoding neural network model to determine a hand image of the image to be segmented comprises:

5. The method according to claim 4, wherein decoding the first feature image and the second feature image according to a trained decoding neural network model to obtain a portrait segmentation result and an edge detection result in the image to be segmented, comprises:

6. The method according to claim 1, wherein the obtaining of the second feature image corresponding to the image to be segmented according to a preset hole convolution kernel comprises one or two of the following modes:

7. The method according to claim 1, wherein before performing feature extraction on the image to be segmented according to the trained first neural network model to obtain a first feature image corresponding to the image to be segmented, the method further comprises:

8. A hand image segmentation apparatus, the apparatus comprising:

the image acquisition unit is used for acquiring an image to be segmented;

9. A robot comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 7 are implemented when the computer program is executed by the processor.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.