WO2022142297A1 - A robot grasping system and method based on few-shot learning - Google Patents
A robot grasping system and method based on few-shot learning Download PDFInfo
- Publication number
- WO2022142297A1 WO2022142297A1 PCT/CN2021/108568 CN2021108568W WO2022142297A1 WO 2022142297 A1 WO2022142297 A1 WO 2022142297A1 CN 2021108568 W CN2021108568 W CN 2021108568W WO 2022142297 A1 WO2022142297 A1 WO 2022142297A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- network
- processing module
- few
- net3
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 19
- 230000006870 function Effects 0.000 claims abstract description 39
- 238000012545 processing Methods 0.000 claims abstract description 36
- 230000011218 segmentation Effects 0.000 claims abstract description 20
- 230000004913 activation Effects 0.000 claims abstract description 18
- 230000009471 action Effects 0.000 claims abstract description 15
- 238000012549 training Methods 0.000 claims abstract description 13
- 230000002708 enhancing effect Effects 0.000 claims description 7
- 230000004807 localization Effects 0.000 claims description 6
- 238000011176 pooling Methods 0.000 claims description 5
- 238000005070 sampling Methods 0.000 claims description 5
- 230000004927 fusion Effects 0.000 claims description 4
- 238000010200 validation analysis Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 238000003707 image sharpening Methods 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 2
- 238000013461 design Methods 0.000 abstract description 8
- 238000013135 deep learning Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 238000004519 manufacturing process Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000003709 image segmentation Methods 0.000 description 4
- 230000006872 improvement Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000008092 positive effect Effects 0.000 description 2
- 244000141359 Malus pumila Species 0.000 description 1
- 235000021016 apples Nutrition 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/215—Motion-based segmentation
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J15/00—Gripping heads and other end effectors
- B25J15/08—Gripping heads and other end effectors having finger members
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/08—Programme-controlled manipulators characterised by modular constructions
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1602—Programme controls characterised by the control system, structure, architecture
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1694—Programme controls characterised by use of sensors other than normal servo-feedback from position, speed or acceleration sensors, perception control, multi-sensor controlled systems, sensor fusion
- B25J9/1697—Vision controlled systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30204—Marker
Definitions
- the disclosure belongs to the field of robot learning technology and particularly relates to a robot grasping system and method based on Few-Shot Learning.
- robots have become an indispensable part of daily production life.
- robot grasping as the most basic action in robot function, has been well realized in the problems of unobstructed, single object positioning and grasping.
- objects to be grasped often obscure each other in actual production, for example, the grasping of apples for crating during fruit production and transportation.
- the current intelligent robots do not handle this situation well. Therefore, the grasping of intelligent robots in the case of mutual obstruction of target objects becomes an urgent problem to be solved.
- image segmentation The key aspect of the vision-based robot grasping task is image segmentation, which can achieve fast and accurate grasping only if the target position is accurately and effectively located.
- Image segmentation methods are mainly divided into traditional methods and deep learning methods.
- Traditional methods are influenced by the quality of the captured image, and require high requirements for the image, requiring a large distinction between background and object, proper image contrast, and segmentation by color features or texture features.
- the present invention provides a robot grasping system based on small sample learning, comprising an image acquisition module, an image processing module, and an action processing module.
- the image acquisition module includes a depth-of-field camera for capturing images.
- the image processing module includes U-net3+ network structure; for processing images, completing tasks about recognition, localization, segmentation.
- the action processing module includes a ROS (Robot Operating System) system and a corresponding package for converting image information into control information for controlling the motor.
- ROS Robot Operating System
- the U-net3+ network structure includes an encoding part and a decoding part, the decoding part is used to achieve the extraction of contextual information, and the encoding part is used to achieve precise localization of the target according to the extracted results;
- the encoding part consists of a stack of convolutional and pooling layers, and the decoding part consists of a stack of up-sampling, convolutional and BN layers;
- the input of the convolutional layer of the decoding part is the entire lead-in of the encoding part
- the fusion of the results, before fusing the input of each layer, the lead-in results of the coding section need to be adjusted to the same size as this layer by up-sampling or pooling, and then fused and fed to the convolution layer as the input of this layer for convolution.
- the U-net3+ network structure further comprises a dropblock module for enhancing the network's recognition capability against occluded objects.
- the activation function of the U-net3+ network structure is a Mish activation function for optimizing the network to obtain better accuracy and generalization capability; the Mish function is expressed by the formula
- Mis h is the activation function
- tan h is the hyperbolic tangent function
- ln (x) is the natural logarithm
- e x is the exponential function with e as the base.
- a robot grasping method of the present invention based on small sample learning comprising the following steps.
- step S1 an image acquisition module performing image acquisition of objects with different degrees of occlusion as well as different objects
- Step S2 the image processing module performs pre-processing of the acquired images, including image sharpening and image Gaussian filtering processing.
- Step S3 the image processing module uses labelme to annotate the image, generates a json file, and then uses the official api to generate a mask image in png format; the obtained mask image data set is divided into two parts: the training set and the validation set, and the loss function Dice Loss is used to measure the similarity of the two sets.
- Step S4 using the image training set, the U-net3+ network of the image processing module is trained using a learning rate with an initial value of 0.000001; the network is tested using the image test set to complete tasks such as recognition, localization and segmentation.
- Step S5 the action processing module converts the image information into control information for controlling the robot and controls the robot to complete the grasping action.
- the annotation of the image comprises classification annotation according to the type of object in the image, and classification annotation according to the occlusion of the object in the image; the classification annotation according to the occlusion of the object in the image comprises two different types of occluded and unoccluded.
- the present invention proposes a grasping algorithm for segmentation networks optimized for occlusion situations, using U-net3+ networks, and proposes to enhance the network using the characteristics of dropblock and Mish activation functions for enhancing the segmentation of occlusion situations, designs a segmentation neural network for small sample situations, designs a grasping system for multiple targets in occlusion situations, uses the Mish activation function is used to optimise the network, resulting in improved accuracy and reduced training time across different networks and situations; allowing the network to perform better in computing, and allowing better input information to be passed into the network, thus allowing the network to achieve better accuracy and generalization.
- Fig. 1 shows a schematic diagram of the robotic gripping system of the present invention.
- Fig. 2 shows a schematic diagram of the third layer structure of the decoding part of a U-net3+ network of an embodiment of the present invention.
- Figure 3 shows a schematic diagram of the Mish function curve of the U-net3+network of one embodiment of the present invention.
- FIG. 4 is a schematic diagram of the network improvement comparison of one embodiment of the present invention.
- the present invention proposes a grasping algorithm for segmentation networks optimised for occlusion situations.
- the main body of the algorithm is a U-net3+network that has achieved good results in the field of medical segmentation and proposes to enhance the network using the properties of dropblock and Mish activation functions for enhancing the segmentation of occlusion situations.
- the main elements of the invention are: the design of a segmentation neural network for the small sample case; the design of a grasping system for multiple targets in the occlusion case; and the optimisation of the network using the Mish activation function.
- Figure 1 provides a schematic diagram of the robot grasping system of the present invention, as shown in Figure 1.
- the present invention provides a robot grasping system based on small sample learning, comprising an image acquisition module, an image processing module, and an action processing module.
- the image acquisition module comprising a depth-of-field camera for capturing images
- the image processing module comprising a U-net3+ network structure; for processing images
- the action processing module comprises a ROS (Robot Operating System) system and a corresponding package for converting the image information into control information for controlling the motor.
- ROS Robot Operating System
- the invention adds a dropblock layer to the U-net3+ structure to optimize the performance of the network in the case of object occlusion.
- the main structure of the U-net3+ coding part is a stack of convolutional and pooling layers, with 3 ⁇ 3 convolutional kernels in each layer, which are activated using the ReLU function, with 64, 128, 256, 512, and 1024 convolutional kernels respectively.
- the decoding section consists of a stack of up-sampling, convolutional, and BN layers, each with 3 x 3 convolutional kernels, activated using the ReLU function, with the number of convolutional kernels corresponding to the encoding section, followed by the BN layer.
- the input to the decoding part of the convolutional layer is a fusion of all the results of the coding part.
- the results of the coding section are up-sampled or pooled to the same size as the layer, then fused and fed to the convolution layer as the input of this layer for convolution.
- Figure 2 gives a schematic diagram of the structure of the third layer of the decoding part of the U-net3+ network of one embodiment of the present invention, as shown in Figure 2, we may take the example of the third layer of the decoding part, where the input of this layer is the fusion of the output of the previous layer with the result of the coding part, and the result of the coding part needs to be adjusted to the size of the design input of this layer. The fused result is fed into the network as an input to this layer for computation.
- the original network structure does not have a dropblock module, so in order to address the case of target occlusion, the dropblock layer is proposed to be introduced to enhance the network's ability to recognise objects against occlusion.
- the dropblock is added after each convolutional layer in the encoder model, and the main function of the dropblock module is to randomly discard the information in the region to improve the feature extraction ability of the whole network, while the data after the dropblock processing is basically the same as the data of the object being obscured.
- the network structure is thus optimised for the segmentation of obscured objects.
- the Mish function curve of the U-net3+ network of one embodiment of the invention is given in Figure 3.
- the invention also modifies the activation function in the network by introducing the Mish activation function in the network, which results in improved accuracy and reduced training time in different networks and under different circumstances.
- the formula for the Mish function is expressed as
- the present invention also provides a robot grasping method based on small sample learning, comprising the following steps.
- Step S1 The image acquisition module performs image acquisition of objects with different degrees of occlusion as well as different objects; the device platform uses Daheng GigE Vision TL HD camera to perform image acquisition of the stacked objects, and to restore the reality as much as possible, different objects are used as subjects in the images respectively.
- Step S2 To make the images clearer and easier to use, the image processing module pre-processes the captured images, including image sharpening and image Gaussian filtering processing.
- Step S3 The image processing module uses labelme to annotate the images, generates json files, and then uses the official api to generate mask images in png format; the resulting mask image data set is divided into two parts: the training set and the validation set, and the loss function Dice Loss is used to measure the similarity of the two sets.
- the resulting mask image dataset was divided into two parts, one of which was a training set of 60 images and the other a validation set of 20 images.
- Dice Loss which is commonly used in medical image segmentation.
- the value domain of Dice Loss is [0, 1] , which is a function used to measure the similarity of two sets, and the smaller the value represents the more similar the two sets are, the specific formula is shown below:
- X is the predicted classification value of the image pixel and Y is the true classification value of the image pixel
- Step S4 Using the image training set, the U-net3+ network of the image processing module is trained using a learning rate with an initial value of 0.000001; the network is tested using the image test set.
- a very small learning rate is used and the network is trained with an initial value of 0.000001 and the network is trained for 100 epochs.
- Step S5 The action processing module converts the image information into control information for controlling the robot, and controls the robot to complete the grasping action.
- the present invention proposes a grasping algorithm for segmentation networks optimised for occlusion situations, utilises the U-net3+ network, and proposes to enhance the network using the characteristics of dropblock and Mish activation functions for enhancing the segmentation effect in occlusion situations, designs a segmentation neural network for small sample situations, designs a grasping system for multiple targets in occlusion situations, uses the Mish activation function is used to optimise the network, resulting in improved accuracy and reduced training time for different networks and situations; allowing the network to perform better in computing; and allowing better input information to be passed into the network, thus allowing the network to achieve better accuracy and generalisation.
Landscapes
- Engineering & Computer Science (AREA)
- Robotics (AREA)
- Mechanical Engineering (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Automation & Control Theory (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
- Manipulator (AREA)
Abstract
The disclosure relates to a robot grasping system and method based on few-shot learning. The disclosure includes: an image acquisition module, an image processing module, and an action processing module; the image acquisition module includes a depth-of-field camera for capturing images; the image processing module includes U-net3+ network structure for processing images; the action processing module includes a ROS system and a corresponding package for converting image information into control information for controlling the motor. The disclosure proposes a grasping algorithm for segmentation networks optimised for occlusion situations, using U-net3+ networks enhanced characteristics performed by dropblock and Mish activation functions, designs a grasping system for few-shot learning and the occlusion situations, resulting in improved accuracy and reduced training time across different networks and situations; allowing the network to perform better in computing.
Description
The disclosure belongs to the field of robot learning technology and particularly relates to a robot grasping system and method based on Few-Shot Learning.
Background Technique
With the progress of technology, robots have become an indispensable part of daily production life. In the intelligent industry, robot grasping, as the most basic action in robot function, has been well realized in the problems of unobstructed, single object positioning and grasping. However, objects to be grasped often obscure each other in actual production, for example, the grasping of apples for crating during fruit production and transportation. The current intelligent robots do not handle this situation well. Therefore, the grasping of intelligent robots in the case of mutual obstruction of target objects becomes an urgent problem to be solved.
The key aspect of the vision-based robot grasping task is image segmentation, which can achieve fast and accurate grasping only if the target position is accurately and effectively located. Image segmentation methods are mainly divided into traditional methods and deep learning methods. Traditional methods are influenced by the quality of the captured image, and require high requirements for the image, requiring a large distinction between background and object, proper image contrast, and segmentation by color features or texture features.
With the research and development of deep learning, deep learning gradually shows excellent ability and adaptability in the field of machine vision. Problems that are difficult to solve by traditional methods can often be obtained satisfactorily by deep learning. Commonly used deep learning segmentation methods are U-net, R-CNN, and other neural network methods, but deep learning often requires a large amount of data to train the network, and the labeling of the training set consumes a lot of time, and the production tasks do not allow too long debugging time, so achieving better results with as small a data set as possible has become the focus of research. In summary, there are few studies on robot grasping for multi-target, occluded scenes. However, multiple objects obscuring each other is a common situation in production environments, and solving this problem using deep learning methods requires networks that implement highly accurate image segmentation.
Summary of The Invention
Because of the above, it is necessary to provide a robot grasping system and method based on small sample learning, using a U-net3+ network and enhancing the network with the characteristics of dropblock and Mish activation function for enhancing the segmentation effect of the occlusion situation, thus enabling the robot to better handle the grasping task of the occlusion scene.
To achieve the above purpose, the present invention is realized according to the following technical solutions.
On the one hand, the present invention provides a robot grasping system based on small sample learning, comprising an image acquisition module, an image processing module, and an action processing module.
the image acquisition module includes a depth-of-field camera for capturing images.
the image processing module includes U-net3+ network structure; for processing images, completing tasks about recognition, localization, segmentation.
The action processing module includes a ROS (Robot Operating System) system and a corresponding package for converting image information into control information for controlling the motor.
Further, the U-net3+ network structure includes an encoding part and a decoding part, the decoding part is used to achieve the extraction of contextual information, and the encoding part is used to achieve precise localization of the target according to the extracted results; the encoding part consists of a stack of convolutional and pooling layers, and the decoding part consists of a stack of up-sampling, convolutional and BN layers; the input of the convolutional layer of the decoding part is the entire lead-in of the encoding part The fusion of the results, before fusing the input of each layer, the lead-in results of the coding section need to be adjusted to the same size as this layer by up-sampling or pooling, and then fused and fed to the convolution layer as the input of this layer for convolution.
Further, the U-net3+ network structure further comprises a dropblock module for enhancing the network's recognition capability against occluded objects.
Further, the activation function of the U-net3+ network structure is a Mish activation function for optimizing the network to obtain better accuracy and generalization capability; the Mish function is expressed by the formula
Mis h=x×tan h (ln (1+e
x) ) (1)
Mis h is the activation function, tan h is the hyperbolic tangent function, ln (x) is the natural logarithm, and e
x is the exponential function with e as the base.
On the other hand, a robot grasping method of the present invention based on small sample learning, comprising the following steps.
step S1: an image acquisition module performing image acquisition of objects with different degrees of occlusion as well as different objects
Step S2: the image processing module performs pre-processing of the acquired images, including image sharpening and image Gaussian filtering processing.
Step S3: the image processing module uses labelme to annotate the image, generates a json file, and then uses the official api to generate a mask image in png format; the obtained mask image data set is divided into two parts: the training set and the validation set, and the loss function Dice Loss is used to measure the similarity of the two sets.
Step S4: using the image training set, the U-net3+ network of the image processing module is trained using a learning rate with an initial value of 0.000001; the network is tested using the image test set to complete tasks such as recognition, localization and segmentation.
Step S5: the action processing module converts the image information into control information for controlling the robot and controls the robot to complete the grasping action.
Further, in step S3, the annotation of the image comprises classification annotation according to the type of object in the image, and classification annotation according to the occlusion of the object in the image; the classification annotation according to the occlusion of the object in the image comprises two different types of occluded and unoccluded.
The advantages and positive effects of the present invention compared to the prior art include at least the following.
The present invention proposes a grasping algorithm for segmentation networks optimized for occlusion situations, using U-net3+ networks, and proposes to enhance the network using the characteristics of dropblock and Mish activation functions for enhancing the segmentation of occlusion situations, designs a segmentation neural network for small sample situations, designs a grasping system for multiple targets in occlusion situations, uses the Mish activation function is used to optimise the network, resulting in improved accuracy and reduced training time across different networks and situations; allowing the network to perform better in computing, and allowing better input information to be passed into the network, thus allowing the network to achieve better accuracy and generalization.
Description of the accompanying figures
To illustrate more clearly the technical solutions in the embodiments of the present invention, the following is a brief description of the accompanying drawings which are required for the description of the embodiments. It is obvious that the accompanying drawings in the following description are only some embodiments of the present invention, and those other accompanying drawings may be obtained based on these drawings without any creative effort on the part of a person of ordinary skill in the art.
Fig. 1 shows a schematic diagram of the robotic gripping system of the present invention.
Fig. 2 shows a schematic diagram of the third layer structure of the decoding part of a U-net3+ network of an embodiment of the present invention.
Figure 3 shows a schematic diagram of the Mish function curve of the U-net3+network of one embodiment of the present invention.
FIG. 4 is a schematic diagram of the network improvement comparison of one embodiment of the present invention.
Specific embodiments
To make the above-mentioned objects, features, and advantages of the present invention more obvious and understandable, the technical solutions of the present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments. It is to be noted that the embodiments described are only a part of the embodiments of the present invention and not all of them, and based on the embodiments in the present invention, all other embodiments obtained without creative labour by a person of ordinary skill in the art fall within the scope of protection of the present invention.
It should be noted that the specific parameters or quantities used in the embodiments of the invention are only a few possible or preferable sets of combinations used in the embodiments of the invention, but this should not be understood as a limitation of the scope of the patent of the invention; it should be noted that for a person of ordinary skill in the art, there are several variations and improvements that can be made without departing from the conception of the invention, and these fall within the scope of protection. Therefore, the scope of protection of the present invention shall be governed by the appended claims.
Example 1
The present invention proposes a grasping algorithm for segmentation networks optimised for occlusion situations. The main body of the algorithm is a U-net3+network that has achieved good results in the field of medical segmentation and proposes to enhance the network using the properties of dropblock and Mish activation functions for enhancing the segmentation of occlusion situations.
The main elements of the invention are: the design of a segmentation neural network for the small sample case; the design of a grasping system for multiple targets in the occlusion case; and the optimisation of the network using the Mish activation function.
Figure 1 provides a schematic diagram of the robot grasping system of the present invention, as shown in Figure 1. On the one hand, the present invention provides a robot grasping system based on small sample learning, comprising an image acquisition module, an image processing module, and an action processing module.
the image acquisition module comprising a depth-of-field camera for capturing images
the image processing module comprising a U-net3+ network structure; for processing images
the action processing module comprises a ROS (Robot Operating System) system and a corresponding package for converting the image information into control information for controlling the motor.
In order to better solve the robot grasping problem in the case of object occlusion, the invention adds a dropblock layer to the U-net3+ structure to optimize the performance of the network in the case of object occlusion. The main structure of the U-net3+ coding part is a stack of convolutional and pooling layers, with 3×3 convolutional kernels in each layer, which are activated using the ReLU function, with 64, 128, 256, 512, and 1024 convolutional kernels respectively. The decoding section consists of a stack of up-sampling, convolutional, and BN layers, each with 3 x 3 convolutional kernels, activated using the ReLU function, with the number of convolutional kernels corresponding to the encoding section, followed by the BN layer. The input to the decoding part of the convolutional layer is a fusion of all the results of the coding part. Before fusing the input of each layer, the results of the coding section are up-sampled or pooled to the same size as the layer, then fused and fed to the convolution layer as the input of this layer for convolution.
Figure 2 gives a schematic diagram of the structure of the third layer of the decoding part of the U-net3+ network of one embodiment of the present invention, as shown in Figure 2, we may take the example of the third layer of the decoding part, where the input of this layer is the fusion of the output of the previous layer with the result of the coding part, and the result of the coding part needs to be adjusted to the size of the design input of this layer. The fused result is fed into the network as an input to this layer for computation.
In addition, the original network structure does not have a dropblock module, so in order to address the case of target occlusion, the dropblock layer is proposed to be introduced to enhance the network's ability to recognise objects against occlusion. The dropblock is added after each convolutional layer in the encoder model, and the main function of the dropblock module is to randomly discard the information in the region to improve the feature extraction ability of the whole network, while the data after the dropblock processing is basically the same as the data of the object being obscured. The network structure is thus optimised for the segmentation of obscured objects.
The Mish function curve of the U-net3+ network of one embodiment of the invention is given in Figure 3. As shown in Figure 3, in order to optimise the network, the invention also modifies the activation function in the network by introducing the Mish activation function in the network, which results in improved accuracy and reduced training time in different networks and under different circumstances. The formula for the Mish function is expressed as
Mis h=x×tan h (ln (1+e
x) ) (1)
Because there is no boundary on the Mish function (i.e. positive values can reach any height) avoiding saturation due to capping, and also due to the presence of small negative values in the operation is has better performance, theoretically a slight allowance for negative values allows for a better gradient flow rather than a hard zero boundary as in ReLU, and most critically the Mish function is smooth, allowing for better input information to be passed into the network, resulting in better accuracy and generalisation capability.
By adding the dropblock module to the U-net3 network and replacing the activation function of the network with the Mish activation function, a comparison of the partial structure of the improved network is shown in Figure 4, where the original U-net3 network structure is shown on the left and the improved network structure is shown on the right.
Example 2
The present invention also provides a robot grasping method based on small sample learning, comprising the following steps.
Step S1: The image acquisition module performs image acquisition of objects with different degrees of occlusion as well as different objects; the device platform uses Daheng GigE Vision TL HD camera to perform image acquisition of the stacked objects, and to restore the reality as much as possible, different objects are used as subjects in the images respectively.
Step S2: To make the images clearer and easier to use, the image processing module pre-processes the captured images, including image sharpening and image Gaussian filtering processing.
Step S3: The image processing module uses labelme to annotate the images, generates json files, and then uses the official api to generate mask images in png format; the resulting mask image data set is divided into two parts: the training set and the validation set, and the loss function Dice Loss is used to measure the similarity of the two sets.
In order to enable the robot to better handle the grasping task under occlusion, a new approach was adopted for the annotation of the dataset, not only according to the type of object, but also according to the classification of the occlusion situation, with each different type being classified into occluded and unobscured.
The resulting mask image dataset was divided into two parts, one of which was a training set of 60 images and the other a validation set of 20 images. In the experiments, it was found that the commonly used loss function for classification did not effectively describe the real situation of network training, and there were often cases where the accuracy and loss value data were good, but the segmentation results were not satisfactory. After several experiments, it was decided to use the loss function Dice Loss, which is commonly used in medical image segmentation. The value domain of Dice Loss is [0, 1] , which is a function used to measure the similarity of two sets, and the smaller the value represents the more similar the two sets are, the specific formula is shown below:
X is the predicted classification value of the image pixel and Y is the true classification value of the image pixel
Step S4: Using the image training set, the U-net3+ network of the image processing module is trained using a learning rate with an initial value of 0.000001; the network is tested using the image test set. In order to obtain accurate classification results and to avoid the situation where the network calculates a high accuracy rate while the actual results have a large error, a very small learning rate is used and the network is trained with an initial value of 0.000001 and the network is trained for 100 epochs.
Step S5: The action processing module converts the image information into control information for controlling the robot, and controls the robot to complete the grasping action.
The advantages and positive effects of the present invention over the prior art include at least the following.
The present invention proposes a grasping algorithm for segmentation networks optimised for occlusion situations, utilises the U-net3+ network, and proposes to enhance the network using the characteristics of dropblock and Mish activation functions for enhancing the segmentation effect in occlusion situations, designs a segmentation neural network for small sample situations, designs a grasping system for multiple targets in occlusion situations, uses the Mish activation function is used to optimise the network, resulting in improved accuracy and reduced training time for different networks and situations; allowing the network to perform better in computing; and allowing better input information to be passed into the network, thus allowing the network to achieve better accuracy and generalisation.
The above-described embodiments express only several embodiments of the present invention, which are described in more specific and detailed terms, but should not be construed as limiting the scope of the patent of the present invention. It should be noted that, for a person of ordinary skill in the art, several variations and improvements can be made without departing from the conception of the present invention, and these fall within the scope of protection of the present invention. Therefore, the scope of protection of the present invention shall be governed by the appended claims.
Claims (6)
- A robot grasping system based on few-shot learning, comprising:an image acquisition module, an image processing module, and an action processing module;the image acquisition module includes a depth-of-field camera for capturing images;the image processing module includes U-net3+ network structure; for processing images, completing tasks about recognition, localization, segmentation;the action processing module includes a ROS system and a corresponding package for converting image information into control information for controlling the motor.
- A robot grasping system based on few-shot learning as claimed in claim 1, wherein, the U-net3+ network structure includes an encoding part and a decoding part, the decoding part is used to achieve extraction of contextual information, and the encoding part is used to achieve precise localization of the target according to the extracted results; the encoding part consists of a stack of convolutional and pooling layers, and the decoding part consists of a stack of up-sampling, convolutional and BN layers; the input of the convolutional layer of the decoding part is the entire lead-in of the encoding part The fusion of the results, before fusing the input of each layer, the lead-in results of the coding section need to be adjusted to the same size as this layer by up-sampling or pooling, and then fused and fed to the convolution layer as the input of this layer for convolution.
- A robot grasping system based on few-shot learning as claimed in claim 2, wherein, the U-net3+ network structure further comprises a dropblock module for enhancing the network's recognition capability against occluded objects.
- A robot grasping system based on few-shot learning as claimed in claim 3, wherein, the activation function of the U-net3+ network structure is a Mish activation function for optimizing the network to obtain better accuracy and generalization capability; the Mish function is expressed by the formulaMish=x×tanh (ln (1+e x) ) (1)mish is the activation function, tanh is the hyperbolic tangent function, ln (x) is the natural logarithm, and e x is the exponential function with e as the base.
- A robot grasping method based on few-shot learning, comprising:step S1: an image acquisition module performing image acquisition of objects with different degrees of occlusion as well as different objects;Step S2: the image processing module performs pre-processing of the acquired images, including image sharpening and image Gaussian filtering processing;Step S3: the image processing module uses labelme to annotate the image, generates a json file, and then uses the official api to generate a mask image in png format; the obtained mask image data set is divided into two parts: the training set and the validation set, and the loss function Dice Loss is used to measure the similarity of the two sets;Step S4: using the image training set, the U-net3+ network of the image processing module is trained using a learning rate with an initial value of 0.000001; the network is tested using the image test set to complete tasks such as recognition, localization and segmentation;Step S5: the action processing module converts the image information into control information for controlling the robot and controls the robot to complete the grasping action.
- A robot grasping method based on few-shot learning as claimed in claim 5, wherein, in step S3, the annotation of the image comprises classification annotation according to the type of object in the image, and classification annotation according to the occlusion of the object in the image; the classification annotation according to the occlusion of the object in the image comprises two different types of occluded and unoccluded.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110004574.6 | 2021-01-04 | ||
CN202110004574.6A CN114723775A (en) | 2021-01-04 | 2021-01-04 | Robot grabbing system and method based on small sample learning |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022142297A1 true WO2022142297A1 (en) | 2022-07-07 |
Family
ID=82234481
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/108568 WO2022142297A1 (en) | 2021-01-04 | 2021-07-27 | A robot grasping system and method based on few-shot learning |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114723775A (en) |
WO (1) | WO2022142297A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116452936A (en) * | 2023-04-22 | 2023-07-18 | 安徽大学 | Rotation target detection method integrating optics and SAR image multi-mode information |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115631401A (en) * | 2022-12-22 | 2023-01-20 | 广东省科学院智能制造研究所 | Robot autonomous grabbing skill learning system and method based on visual perception |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109584298A (en) * | 2018-11-07 | 2019-04-05 | 上海交通大学 | Object manipulator picks up the automatic measure on line method of task from master object |
US20200086483A1 (en) * | 2018-09-15 | 2020-03-19 | X Development Llc | Action prediction networks for robotic grasping |
CN111898699A (en) * | 2020-08-11 | 2020-11-06 | 海之韵(苏州)科技有限公司 | Automatic detection and identification method for hull target |
CN112136505A (en) * | 2020-09-07 | 2020-12-29 | 华南农业大学 | Fruit picking sequence planning method based on visual attention selection mechanism |
-
2021
- 2021-01-04 CN CN202110004574.6A patent/CN114723775A/en active Pending
- 2021-07-27 WO PCT/CN2021/108568 patent/WO2022142297A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200086483A1 (en) * | 2018-09-15 | 2020-03-19 | X Development Llc | Action prediction networks for robotic grasping |
CN109584298A (en) * | 2018-11-07 | 2019-04-05 | 上海交通大学 | Object manipulator picks up the automatic measure on line method of task from master object |
CN111898699A (en) * | 2020-08-11 | 2020-11-06 | 海之韵(苏州)科技有限公司 | Automatic detection and identification method for hull target |
CN112136505A (en) * | 2020-09-07 | 2020-12-29 | 华南农业大学 | Fruit picking sequence planning method based on visual attention selection mechanism |
Non-Patent Citations (1)
Title |
---|
HUIMIN HUANG; LANFEN LIN; RUOFENG TONG; HONGJIE HU; QIAOWEI ZHANG; YUTARO IWAMOTO; XIANHUA HAN; YEN-WEI CHEN; JIAN WU: "UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 19 April 2020 (2020-04-19), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081647951 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116452936A (en) * | 2023-04-22 | 2023-07-18 | 安徽大学 | Rotation target detection method integrating optics and SAR image multi-mode information |
CN116452936B (en) * | 2023-04-22 | 2023-09-29 | 安徽大学 | Rotation target detection method integrating optics and SAR image multi-mode information |
Also Published As
Publication number | Publication date |
---|---|
CN114723775A (en) | 2022-07-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020108362A1 (en) | Body posture detection method, apparatus and device, and storage medium | |
CN109344701B (en) | Kinect-based dynamic gesture recognition method | |
Makhmudkhujaev et al. | Facial expression recognition with local prominent directional pattern | |
CN110032925B (en) | Gesture image segmentation and recognition method based on improved capsule network and algorithm | |
WO2022142297A1 (en) | A robot grasping system and method based on few-shot learning | |
JP2009211178A (en) | Image processing apparatus, image processing method, program and storage medium | |
CN108648216B (en) | Visual odometer implementation method and system based on optical flow and deep learning | |
US20210279453A1 (en) | Methods and systems for computerized recognition of hand gestures | |
CN109977834B (en) | Method and device for segmenting human hand and interactive object from depth image | |
Rao et al. | Neural network classifier for continuous sign language recognition with selfie video | |
CN112001317A (en) | Lead defect identification method and system based on semantic information and terminal equipment | |
Han et al. | Pupil center detection based on the UNet for the user interaction in VR and AR environments | |
CN112766028A (en) | Face fuzzy processing method and device, electronic equipment and storage medium | |
CN112329662B (en) | Multi-view saliency estimation method based on unsupervised learning | |
WO2024060909A1 (en) | Expression recognition method and apparatus, and device and medium | |
Wang | Automatic and robust hand gesture recognition by SDD features based model matching | |
CN112329510A (en) | Cross-domain metric learning system and method | |
Tsai et al. | Deep Learning Based AOI System with Equivalent Convolutional Layers Transformed from Fully Connected Layers | |
CN117036658A (en) | Image processing method and related equipment | |
Ravinder et al. | An approach for gesture recognition based on a lightweight convolutional neural network | |
CN117274761B (en) | Image generation method, device, electronic equipment and storage medium | |
Bong et al. | Application of Fixed-Radius Hough Transform In Eye Detection. | |
CN117372437B (en) | Intelligent detection and quantification method and system for facial paralysis | |
Jiang et al. | Unsupervised Deep Homography Estimation based on Transformer | |
Li et al. | Scalenet-improve cnns through recursively rescaling objects |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21913048 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21913048 Country of ref document: EP Kind code of ref document: A1 |