CN112950652B

CN112950652B - Robot and hand image segmentation method and device thereof

Info

Publication number: CN112950652B
Application number: CN202110182033.2A
Authority: CN
Inventors: 顾在旺; 程骏; 庞建新
Original assignee: Ubtech Robotics Corp
Current assignee: Ubtech Robotics Corp
Priority date: 2021-02-08
Filing date: 2021-02-08
Publication date: 2024-01-19
Anticipated expiration: 2041-02-08
Also published as: CN112950652A

Abstract

The application belongs to the field of robots, and provides a robot and a hand image segmentation method and device thereof, wherein the method comprises the following steps: acquiring an image to be segmented; extracting features of the image to be segmented according to the trained first neural network model to obtain a first feature image corresponding to the image to be segmented; acquiring a second characteristic image corresponding to the image to be segmented according to a preset cavity convolution kernel; and decoding the first characteristic image and the second characteristic image according to the trained decoding neural network model, and determining the hand image of the image to be segmented. The method and the device can learn the characteristic of larger receptive field in the image to be segmented through the cavity convolution kernel, can provide richer characteristic data for the decoding neural network model, can obtain more accurate segmentation results under the condition that the calculated amount of the method and the device is basically the same as that of a multi-size convolution segmentation operation mode, and are beneficial to improving the segmentation precision of hand images.

Description

Robot and hand image segmentation method and device thereof

Technical Field

The application belongs to the field of robots, and particularly relates to a robot and a hand image segmentation method and device thereof.

Background

In recent years, with the rapid development of artificial intelligence, many applications of artificial intelligence have been deployed at the robot end. The robot may interact with the person through artificial intelligence algorithms. In the process of robot interaction with human, gestures are a very simple and convenient interaction mode. In order for the robot to effectively perform gesture interaction, the hand area of the interaction object needs to be accurately segmented, so that accurate recognition of the gesture of the interaction object is facilitated.

Current hand region recognition algorithms typically employ a full convolutional neural network model. In the encoding part, continuous convolution is used to extract the features in the image, and a pooling operation is used to reduce the size of the extracted features so as to achieve the purpose of reducing the network calculation amount. This continuous convolution and pooling operation can ensure that the extracted features are smooth and non-distorted, but can reduce the size of the extracted features, which is detrimental to the determination of each pixel. Although the feature can be restored by deconvolution, when the feature is restored from a feature map with a smaller size, part of information is lost, the gesture is various in variety, the environment is easy to change, and the segmentation accuracy of the hand image is not improved.

Disclosure of Invention

In view of this, the embodiments of the present application provide a robot and a method and an apparatus for segmenting hand images thereof, so as to solve the problem that the segmentation accuracy of hand images is not improved when hand image segmentation is performed in the prior art.

A first aspect of an embodiment of the present application provides a hand image segmentation method, including:

acquiring an image to be segmented;

extracting features of the image to be segmented according to the trained first neural network model to obtain a first feature image corresponding to the image to be segmented;

acquiring a second characteristic image corresponding to the image to be segmented according to a preset cavity convolution kernel;

and decoding the first characteristic image and the second characteristic image according to the trained decoding neural network model, and determining the hand image of the image to be segmented.

With reference to the first aspect, in a first possible implementation manner of the first aspect, before extracting features of the image to be segmented according to the trained first neural network model, and obtaining a first feature image corresponding to the image to be segmented, the method further includes:

and carrying out normalization processing on the image to be segmented.

With reference to the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, the normalizing the image to be segmented includes:

determining pixel values of pixels in the image to be segmented;

according to the formula: (Ii-I) _min )/I _max Determining a normalization value corresponding to each pixel, wherein Ii is the pixel value of any pixel, I _min For the minimum pixel value of the pixels in the image to be segmented, I _max Is the maximum pixel value of the pixels in the image to be segmented.

With reference to the first aspect, in a third possible implementation manner of the first aspect, the determining a hand image of the image to be segmented according to a trained decoding neural network model to perform decoding processing on the first feature image and the second feature image includes:

decoding the first characteristic image and the second characteristic image according to the trained decoding neural network model to obtain a portrait segmentation result and an edge detection result in the image to be segmented;

and determining the hand image of the image to be segmented according to the portrait segmentation result and the edge detection result.

With reference to the third possible implementation manner of the first aspect, in a fourth possible implementation manner of the first aspect, decoding the first feature image and the second feature image according to a trained decoding neural network model to obtain a portrait segmentation result and an edge detection result in the image to be segmented, where the decoding includes:

decoding the first characteristic image and the second characteristic image according to the trained second neural network model to obtain a portrait segmentation result in the image to be segmented;

and decoding the first characteristic image and the second characteristic image according to the trained third neural network model to obtain an edge detection result in the image to be segmented.

With reference to the first aspect, in a fifth possible implementation manner of the first aspect, the obtaining, according to a preset hole convolution kernel, a second feature image corresponding to the image to be segmented includes one or two of the following manners:

according to the preset cavity convolution cores with different cavity rates, carrying out cavity convolution on the image to be segmented, and generating a second characteristic image corresponding to the image to be segmented;

and carrying out hole convolution on the first characteristic image according to a preset hole convolution check with different hole rates, and generating a second characteristic image corresponding to the image to be segmented.

With reference to the first aspect, in a sixth possible implementation manner of the first aspect, before performing feature extraction on the image to be segmented according to the trained first neural network model, and obtaining a first feature image corresponding to the image to be segmented, the method further includes:

and cutting the image to be segmented according to the square size of the preset size.

A second aspect of embodiments of the present application provides a hand image segmentation apparatus, the apparatus comprising:

the image acquisition unit is used for acquiring an image to be segmented;

the first characteristic image acquisition unit is used for extracting characteristics of the image to be segmented according to the trained first neural network model to obtain a first characteristic image corresponding to the image to be segmented;

the second characteristic image acquisition unit is used for acquiring a second characteristic image corresponding to the image to be segmented according to a preset cavity convolution kernel;

and the decoding unit is used for decoding the first characteristic image and the second characteristic image according to the trained decoding neural network model and determining the hand image of the image to be segmented.

A third aspect of the embodiments of the present application provides a robot comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method according to any one of the first aspects when executing the computer program.

A fourth aspect of the embodiments of the present application provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method according to any one of the first aspects.

Compared with the prior art, the embodiment of the application has the beneficial effects that: determining a first characteristic image corresponding to an image to be segmented through a trained first neural network model, acquiring a second characteristic image corresponding to the image to be segmented through hole convolution, and decoding the first characteristic image and the second characteristic image through a trained decoding neural network model to obtain a segmentation result of the hand image. The feature with larger receptive field in the image to be segmented can be learned through the cavity convolution kernel, so that richer feature data can be provided for the decoding neural network model, and more accurate segmentation results can be obtained under the condition that the calculated amount of the multi-size convolution segmentation operation mode is basically the same, and the segmentation accuracy of the hand image is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required for the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic implementation flow chart of a hand image segmentation method provided in an embodiment of the present application;

FIG. 2 is a schematic diagram of a 3*3 size cavity convolution kernel provided in an embodiment of the present application;

fig. 3 is a schematic diagram of a hand image segmentation flow module provided in an embodiment of the present application;

fig. 4 is a schematic diagram of a hand image segmentation apparatus according to an embodiment of the present application;

fig. 5 is a schematic view of a robot provided in an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

In order to illustrate the technical solutions described in the present application, the following description is made by specific examples.

The current hand image segmentation algorithm comprises a segmentation algorithm based on a full convolution neural network (fully connected network is called as English in full, and FCN is called as English in short). In the segmentation process, firstly, the features in the images to be segmented are acquired through continuous convolution layers, and pooling layers are inserted in the middle of the convolution layers, so that the size of the feature images is reduced through pooling operation, and the calculated amount of a network is reduced. High-dimensional characteristic images (which can be understood as characteristic images obtained by convolution operations with different sizes and different manners) in the images to be segmented are extracted through continuous convolution and pooling operations. And according to the extracted high-dimensional characteristic image, performing reduction operation on the high-dimensional characteristic image through deconvolution operation to obtain a segmentation result with the same size as the original input image. And the difference between the output of the network and the actual label is measured by a loss function, including, for example, a cross entropy loss function. Parameters in the network are optimized by an optimization algorithm, including an optimization algorithm such as random gradient descent, to reduce losses between the output of the algorithm and the labels. And obtaining a better image segmentation model through multiple iterations.

In the image segmentation process, the network calculation amount can be reduced when the features in the image are extracted through continuous convolution and the size of the extracted features is reduced through pooling operation. And, regardless of where the target object is, successive convolution and pooling operations can extract features in the image to be segmented. However, the successive convolution and pooling operations reduce the size of the extracted feature images. Although the feature image may be size reduced by a deconvolution operation. However, when the feature image with a smaller size is restored, part of information of the restored image is lost, which is not beneficial to obtaining accurate hand contours.

In the process of gesture interaction between the robot and the target object, the environment changes differently, meanwhile, the types of gestures are also various, and when the difference between the hand and the environment is small, the current segmentation algorithm cannot segment the outline of the hand better, so that more accurate gesture interaction is not facilitated.

In addition, although the magnitude of the size reduction of the feature image can be made smaller by reducing the number of convolution and pooling, network learning may be insufficient, and the feature image in the image to be segmented may not be sufficiently extracted.

Based on the defects, the embodiment of the application provides a hand image segmentation method based on hole convolution, wherein feature image extraction is performed by adding hole convolution to obtain a second feature image, the second feature image and a first feature image obtained by encoding are input into a decoding neural network model, and segmentation accuracy of the hand image is improved through more comprehensive input of feature data.

Fig. 1 is a schematic implementation flow chart of a hand image segmentation method provided in an embodiment of the present application, and is described in detail below:

in S101, an image to be segmented is acquired.

The image to be segmented in the embodiment of the application can be a scene image acquired by the robot in real time in the intelligent interaction process. The obtaining of the image to be segmented can be determined according to the state of the robot. For example, when the robot is in an intelligent interaction mode, the robot acquires images in an interaction scene in real time through a camera to obtain images to be segmented.

Since the robot is in the intelligent interaction state, the scene image collected by the robot includes a plurality of scene images. When the image to be segmented is acquired, the acquisition frequency of the image to be segmented can be determined according to the change amplitude of the gesture of the interactive object. In addition, in order to effectively perform segmentation recognition on the fine gesture, a correspondence relationship between the change speeds of different portions and the acquisition frequency of the segmented image may be set. For example, for the same movement amplitude, if the movement part is a finger part, a higher acquisition frequency can be adopted for acquisition; if the moving part is a part such as an elbow or a wrist, the acquisition can be performed with a lower acquisition frequency.

In S102, feature extraction is performed on the image to be segmented according to the trained first neural network model, so as to obtain a first feature image corresponding to the image to be segmented.

The method further comprises a training process of the first neural network model before the extraction of the first characteristic image is acquired.

A plurality of pre-calibrated training sample images may be acquired while training the first neural network model. The training sample image is input to a first neural network model to be trained, and a prediction result (predicted feature image) of the hand image is output through the first neural network model. And comparing the predicted result of the hand image output by the first neural network model with the calibration result of the sample image. For example, a cross entropy loss calculation method or the like may be used to calculate the loss of the prediction result relative to the calibration result. Based on the differences or losses determined by the comparison, the first neural network model may be optimized for parameters by an optimization algorithm, such as by random gradient descent, etc.

And after optimization, re-inputting the training sample image for calculation to obtain a prediction result. And comparing the prediction result with the calibration result of the training sample image, and further optimizing the parameters of the first neural network model according to the comparison result. After multiple iterative optimization, when the difference between the prediction result output by the first neural network model and the corresponding calibration result meets the preset requirement, the first neural network model can be considered to be trained.

When the feature extraction is performed on the image to be segmented through the trained first neural network model, the image to be segmented can be input into the first neural network model, the optimization calculation is performed through the first neural network model subjected to parameter optimization, and the first feature image corresponding to the image to be segmented is output.

In order to further improve the convenience of image processing, before the feature extraction is performed on the image to be segmented, the method may further include performing a size unification operation on the image to be segmented. For example, the image to be segmented may be pre-segmented by a preset standard size. The preset standard size can be a square area with a preset size, and the image to be segmented is intercepted, so that the image to be segmented is subjected to pre-segmentation operation.

For example, the size of the image to be segmented may be 640×480, and the predetermined size may be 480×480, and then the image to be segmented may be subjected to pre-segmentation processing to obtain the image to be segmented with the same size.

In order to improve the convenience of image processing, the method can further comprise the step of carrying out normalization processing on the image to be segmented before carrying out feature extraction.

In a possible implementation, the normalization process may first obtain pixel values of pixels included in the image to be segmented, and determine a maximum pixel value and a minimum pixel value of the pixels in the image to be segmented. The formula (Ii-I can be used _min )/I _max Determining a normalization value corresponding to each pixel, wherein Ii is the pixel value of any pixel, I _min For the minimum pixel value of the pixels in the image to be segmented, I _max Is the maximum pixel value of the pixels in the image to be segmented.

Of course, the manner of normalization processing is not limited to this, and the maximum pixel value and the minimum pixel value in the formula may be set to fixed values. For example, the maximum pixel value is set to 255, and the minimum pixel value is set to 0.

When the minimum pixel value and the maximum pixel value in the image to be segmented are determined, the size and the position of the pixel value of the pixel in the image to be segmented can be determined more effectively, so that the feature extraction operation of the image to be segmented can be performed more accurately.

In S103, a second feature image corresponding to the image to be segmented is obtained according to a preset cavity convolution kernel.

When the second characteristic image is acquired, the second characteristic image corresponding to the image to be segmented can be determined by directly carrying out cavity convolution calculation on the image to be segmented according to a preset cavity convolution kernel.

The first feature image or the second feature image in the embodiment of the present application may include a plurality of feature images. For example, by selecting the hole convolution kernels with different hole ratios, a plurality of feature images corresponding to different hole ratios can be obtained, or a plurality of different feature images can be obtained by calculation through the hole convolution kernels with different values. The void fraction is the step spacing between the effective values in the batch of void convolution kernels. For example, the hole convolution kernel of 3*3 shown in fig. 2 corresponds to convolution rates of 1, 3 and 5, respectively.

In a possible implementation manner, the hole convolution kernel may directly perform hole convolution calculation on the first feature image, and obtain the second feature image according to the hole convolution calculation.

Or in a possible implementation manner, the second characteristic image comprises a characteristic image obtained by carrying out hole convolution calculation on the first characteristic image according to hole convolution, and comprises a characteristic image obtained by carrying out hole convolution calculation on the image to be segmented by using a hole convolution kernel.

The second feature image after the hole convolution operation may be the same size as the image to be segmented, or may be different from the image to be segmented.

In S104, decoding the first feature image and the second feature image according to the trained decoding neural network model, and determining a hand image of the image to be segmented.

In a possible implementation manner, the first feature image and the second feature image may be input as trained decoding neural network models, and the region of the hand image corresponding to the image to be segmented may be output according to the decoding neural network models trained in advance.

Alternatively, in a possible implementation manner, as shown in a schematic diagram of a hand image segmentation flow module shown in fig. 3, the first feature image may be input into a pre-trained second neural network model, and a portrait segmentation result in the image to be segmented is output, so as to obtain a portrait region in the image to be segmented. And inputting the second characteristic image into a trained third neural network model to obtain an edge detection result in the image to be segmented. And determining the hand region of the image to be classified according to the obtained portrait region and the edge detection result of the portrait region.

Or in a possible implementation manner, the first characteristic image can be input into a pre-trained second neural network model, and a portrait segmentation result in the image to be segmented is output to obtain a portrait region in the image to be segmented. And inputting the characteristic information corresponding to the portrait region, including the characteristic information of the first characteristic image in the portrait region and the characteristic information of the second characteristic error input in the portrait region, into a third neural network model to obtain a segmentation result of the hand region.

The second neural network model is a neural network model for detecting a portrait region in the image, and the third neural network model is a neural network model for detecting an edge in the image.

In the training process of the third neural network model, the input of the third neural network model can be obtained according to a pre-calibrated training sample image, the first neural network model which is trained, the cavity convolution kernel parameters and the like, and the parameter optimization is performed on the third neural network model according to the difference between the output prediction result and the calibration result until the optimized third neural network model meets the preset requirement.

According to the method and the device for processing the segmented images, feature extraction is carried out on the segmented images in a mode based on the neural network model, a first feature image is obtained, global features with wider visual field are further extracted through hole convolution operation, a second feature image is obtained, the decoded neural network model can identify hand areas according to richer feature information, segmentation accuracy of the segmented images is improved, and intelligence of robot interaction is improved.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not limit the implementation process of the embodiment of the present application in any way.

Fig. 4 is a schematic diagram of a hand image segmentation apparatus according to an embodiment of the present application, where the apparatus corresponds to the method shown in fig. 1, and as shown in fig. 4, the apparatus includes:

an image acquisition unit 401 for acquiring an image to be segmented;

a first feature image obtaining unit 402, configured to perform feature extraction on the image to be segmented according to a trained first neural network model, so as to obtain a first feature image corresponding to the image to be segmented;

a second feature image obtaining unit 403, configured to obtain a second feature image corresponding to the image to be segmented according to a preset hole convolution kernel;

and the decoding unit 404 is configured to perform decoding processing on the first feature image and the second feature image according to the trained decoding neural network model, and determine a hand image of the image to be segmented.

In a possible implementation manner, the device further comprises a normalization unit, which is used for performing normalization processing on the image to be segmented.

In a possible implementation manner, the normalization unit may include:

a pixel value determining subunit, configured to determine a pixel value of a pixel in the image to be segmented;

a calculation subunit configured to, according to the formula: (Ii-I) _min )/I _max Determining a normalization value corresponding to each pixel, wherein Ii is the pixel value of any pixel, I _min For the minimum pixel value of the pixels in the image to be segmented, I _max Is the maximum pixel value of the pixels in the image to be segmented.

In a possible implementation, the decoding unit may include:

the decoding subunit is used for decoding the first characteristic image and the second characteristic image according to the trained decoding neural network model to obtain a portrait segmentation result and an edge detection result in the image to be segmented;

and the segmentation subunit is used for determining the hand image of the image to be segmented according to the portrait segmentation result and the edge detection result.

The decoding subunit may include:

the first detection model is used for decoding the first characteristic image and the second characteristic image according to the trained second neural network model to obtain a portrait segmentation result in the image to be segmented;

and the second detection module is used for decoding the first characteristic image and the second characteristic image according to the trained third neural network model to obtain an edge detection result in the image to be segmented.

In a possible implementation manner, the second characteristic image acquiring unit may include one or two of the following subunits:

the first convolution subunit is used for checking the image to be segmented according to the preset cavity convolution cores with different cavity rates to carry out cavity convolution, and generating a second characteristic image corresponding to the image to be segmented;

and the second convolution subunit is used for checking the first characteristic image according to the preset cavity convolution with different cavity rates to carry out cavity convolution so as to generate a second characteristic image corresponding to the image to be segmented.

In a possible implementation manner, the device further comprises a clipping unit, configured to clip the image to be segmented according to a square size with a predetermined size.

Fig. 5 is a schematic view of a robot according to an embodiment of the present application. As shown in fig. 5, the robot 5 of this embodiment includes: a processor 50, a memory 51 and a computer program 52, such as a hand image segmentation program, stored in the memory 51 and executable on the processor 50. The processor 50, when executing the computer program 52, implements the steps of the various hand image segmentation method embodiments described above. Alternatively, the processor 50, when executing the computer program 52, performs the functions of the modules/units of the apparatus embodiments described above.

By way of example, the computer program 52 may be partitioned into one or more modules/units that are stored in the memory 51 and executed by the processor 50 to complete the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing a specific function for describing the execution of the computer program 52 in the robot 5.

The robot may include, but is not limited to, a processor 50, a memory 51. It will be appreciated by those skilled in the art that fig. 5 is merely an example of a robot 5 and is not meant to be limiting of the robot 5, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., the robot may also include input and output devices, network access devices, buses, etc.

The processor 50 may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field-programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 51 may be an internal storage unit of the robot 5, such as a hard disk or a memory of the robot 5. The memory 51 may be an external storage device of the robot 5, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the robot 5. Further, the memory 51 may also include both an internal memory unit and an external memory device of the robot 5. The memory 51 is used for storing the computer program and other programs and data required by the robot. The memory 51 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other manners. For example, the apparatus/terminal device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical function division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. With such understanding, the present application implements all or part of the flow of the method of the above embodiments, and may also be implemented by hardware associated with computer program instructions, where the computer program may be stored on a computer readable storage medium, where the computer program, when executed by a processor, implements the steps of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium may include content that is subject to appropriate increases and decreases as required by jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is not included as electrical carrier signals and telecommunication signals.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A method of hand image segmentation, the method comprising:

acquiring an image to be segmented;

decoding the first characteristic image and the second characteristic image according to the trained decoding neural network model to determine the hand image of the image to be segmented, wherein the method comprises the following steps:

determining a hand image of the image to be segmented according to the portrait segmentation result and the edge detection result;

decoding the first characteristic image and the second characteristic image according to the trained decoding neural network model to obtain a portrait segmentation result and an edge detection result in the image to be segmented, wherein the method comprises the following steps:

2. The method of claim 1, wherein prior to feature extraction of the image to be segmented according to the trained first neural network model, obtaining a first feature image corresponding to the image to be segmented, the method further comprises:

and carrying out normalization processing on the image to be segmented.

3. The method according to claim 2, wherein normalizing the image to be segmented comprises:

determining pixel values of pixels in the image to be segmented;

4. The method according to claim 1, wherein obtaining the second feature image corresponding to the image to be segmented according to a preset hole convolution kernel comprises one or two of the following modes:

5. The method of claim 1, wherein prior to feature extraction of the image to be segmented according to the trained first neural network model, obtaining a first feature image corresponding to the image to be segmented, the method further comprises:

6. A hand image segmentation apparatus, the apparatus comprising:

the image acquisition unit is used for acquiring an image to be segmented;

the decoding unit is used for decoding the first characteristic image and the second characteristic image according to the trained decoding neural network model to determine the hand image of the image to be segmented, and comprises the steps of decoding the first characteristic image and the second characteristic image according to the trained decoding neural network model to obtain a portrait segmentation result and an edge detection result in the image to be segmented; determining a hand image of the image to be segmented according to the image segmentation result and the edge detection result, and decoding the first characteristic image and the second characteristic image according to a trained decoding neural network model to obtain the image segmentation result and the edge detection result in the image to be segmented, wherein the method comprises the following steps: decoding the first characteristic image and the second characteristic image according to the trained second neural network model to obtain a portrait segmentation result in the image to be segmented; and decoding the first characteristic image and the second characteristic image according to the trained third neural network model to obtain an edge detection result in the image to be segmented.

7. A robot comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 5 when the computer program is executed.

8. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 5.