CN114750164A

CN114750164A - Transparent object grabbing method and system and computer readable storage medium

Info

Publication number: CN114750164A
Application number: CN202210576645.4A
Authority: CN
Inventors: 梁斌; 刘厚德; 于海鑫; 李寿杰; 王学谦
Original assignee: Shenzhen International Graduate School of Tsinghua University
Current assignee: Shenzhen International Graduate School of Tsinghua University
Priority date: 2022-05-25
Filing date: 2022-05-25
Publication date: 2022-07-15
Anticipated expiration: 2042-05-25
Also published as: CN114750164B

Abstract

The invention provides a method and a system for grabbing transparent objects and a computer readable storage medium, wherein the method comprises the following steps: collecting an RGB image of the whole working area containing the transparent object by an RGB camera; processing the input image by using the RGB image as input through a generative residual convolution neural network, and generating a thermodynamic diagram representing the grabbing position of the transparent object and a grabbing radius thermodynamic diagram positioning grabbing point after processing; extracting a touch image of a grabbing point by adopting a mechanical arm clamping jaw with artificial touch mounted at the tail end of a mechanical arm, extracting the outline of a transparent object in the visual image, and judging the position of the transparent object relative to the mechanical arm clamping jaw with the artificial touch according to the outline of the transparent object until the transparent object is moved to the center of the mechanical arm clamping jaw with the artificial touch; and grabbing the transparent object and placing the transparent object at a specified position. And after the guiding is carried out through the vision, the touch sense is used for carrying out fine positioning, and the vision and the touch sense are combined for carrying out transparent object grabbing.

Description

Transparent object grabbing method and system and computer readable storage medium

Technical Field

The invention relates to the technical field of robot grabbing, in particular to a method and a system for grabbing a transparent object and a computer readable storage medium.

Background

Robot gripping is one of the most widely used robot technologies, but there are still many problems that are difficult to solve, and the gripping problem of transparent objects is one of them. Transparent objects are widely found in domestic and industrial environments, which pose significant challenges to robotic grasping techniques. On one hand, the surface of the transparent object cannot effectively form diffuse reflection, which results in that a depth camera cannot be used for acquiring a depth image of the transparent object, and therefore, the grabbing scheme based on the depth image fails to work on the transparent object. On the other hand, the appearance of the transparent object is greatly influenced by the background and the light, and the transparent object is difficult to be effectively detected through vision in extreme environments such as high-frequency background conversion, light confusion, underwater environment and the like.

The existing transparent object grabbing scheme generally detects, classifies and grabs transparent objects by directly using visual information, and plans grabbing actions by using a mechanical arm after the visual information is used for detecting the obvious visual characteristics of the transparent objects. The current major grasping scheme is as follows:

The method disclosed in KeyPose, Multi-View 3D Labeling and Keypoint Estimation for transitional Objects, can accurately detect the 6-degree-of-freedom pose of an object in space, can realize the grabbing of any pose by using the 6-degree-of-freedom pose, but requires that the object to be detected must have a CAD model, cannot be applied to unknown Objects, requires stable light and background, and cannot cope with the situation that the object is blocked and mixed.

A method provided in ClearGrasp, 3D Shape Estimation of transfer Objects for management, which can perform depth completion on an object to thereby perform grasping; however, the completion accuracy is not high, the grabbing is easy to fail, and the object to be detected must have a CAD model, so that the object to be detected cannot be applied to an unknown object. The lamp can not be used under the complex conditions of backgrounds and lights.

The method disclosed in Fuzzy-Depth Objects scattering base on FSG Algorithm and a software robot Hand can realize the grabbing of unknown Objects, and the flexible fin claw has high fault-tolerant capability, but has no tactile feedback, cannot be applied to environments with complex backgrounds and light, and cannot effectively identify and classify the Objects during grabbing.

In the prior art, the transparent object grabbing scheme generally positions the transparent object only through vision, but the visual characteristics of the transparent object are easily affected by conditions such as background and illumination, so that the precision of the visual positioning is very limited in an extreme environment.

The above background disclosure is only for the purpose of assisting understanding of the concept and technical solution of the present invention and does not necessarily belong to the prior art of the present patent application, and should not be used for evaluating the novelty and inventive step of the present application in the case that there is no clear evidence that the above content is disclosed at the filing date of the present patent application.

Disclosure of Invention

The invention provides a method and a system for grabbing a transparent object and a computer readable storage medium for solving the existing problems.

In order to solve the above problems, the technical solution adopted by the present invention is as follows:

a transparent object grabbing method comprises the following steps: s1: collecting an RGB image of the whole working area containing the transparent object by an RGB camera; s2: processing the input image by using the RGB image as input through a generative residual convolution neural network, and generating a thermodynamic diagram representing the grabbing position of the transparent object and grabbing radius thermodynamic diagram positioning grabbing points after processing; s3: extracting a touch image of the grabbing point by using a mechanical arm clamping jaw with artificial touch mounted at the tail end of a mechanical arm, extracting the outline of a transparent object in the visual image, and judging the position of the transparent object relative to the mechanical arm clamping jaw with the artificial touch according to the outline of the transparent object until the transparent object is moved to the center of the mechanical arm clamping jaw with the artificial touch; s4: and grabbing the transparent object and placing the transparent object at a specified position.

Preferably, the transparent objects are classified before the perceived transparent objects are grabbed; the method specifically comprises the following steps: matching the RGB image with the touch image, and then cutting the RGB image to enable the cut RGB image and the touch image to have the same view field range; and inputting the cut RGB image and the touch image into a classification network for object classification.

Preferably, the processing of the input image by the generated residual convolutional neural network with the RGB image as input specifically includes: performing feature extraction on the RGB image through a first convolution layer, a second convolution layer and a third convolution layer to respectively generate 32-dimensional, 64-dimensional and 128-dimensional feature vectors; and then, the output of the convolution layer is sent into 5 residual blocks, the input resolution and the output resolution of each residual block are equal, and then, after passing through a fourth convolution layer, a fifth convolution layer and a sixth convolution layer, two eigenvectors with the output resolution being the same as the resolution of the input RGB image are obtained, wherein the two eigenvectors are respectively a grabbing position and a grabbing radius.

Preferably, the generated residual convolutional neural network is connected in a cross-layer manner, the first convolutional layer and the sixth convolutional layer are connected in a cross-layer manner, the second convolutional layer and the fifth convolutional layer are connected in a cross-layer manner, and the third convolutional layer and the fourth convolutional layer are connected in a cross-layer manner.

Preferably, the method for extracting the outline of the transparent object in the visual image by using the full convolution neural network comprises the following steps: the full convolution neural network detects a mask of the transparent object; and calculating to obtain the central position of the mask, and judging the position of the transparent object relative to the mechanical arm clamping jaw with the artificial touch according to the central position of the mask.

Preferably, the conversion relation between the coordinate system of the base of the mechanical arm and the coordinate system of the RGB camera is obtained by calibrating the mechanical arm and the RGB camera with hands and eyes, the relation between the coordinate system of the base of the mechanical arm and the coordinate system of the tail end of the mechanical arm is uniquely determined according to joint angle information of the mechanical arm through positive kinematics of the mechanical arm, the relation between the coordinate system of the clamping jaw of the mechanical arm with artificial touch and the coordinate system of the RGB camera is obtained according to the relation, and the position of the clamping jaw of the mechanical arm with artificial touch in an RGB image is obtained.

Preferably, the structure of the classification network is as follows, and comprises 16 convolution layers and 3 full-connection layers; and a series of classes of transparent objects are predefined, and the types of the transparent objects are judged according to the input matched RGB images and the touch images.

The present invention further provides a transparent object grabbing system, comprising: the image acquisition unit is used for acquiring an RGB image of the whole working area containing the transparent object through an RGB camera; the visual positioning unit is used for processing the input image by using the RGB image as input through a generative residual convolution neural network, and generating a thermodynamic diagram representing the grabbing position of the transparent object and a grabbing radius thermodynamic diagram positioning grabbing point after processing; the touch calibration unit is used for extracting a touch image of the grabbing point by using a mechanical arm clamping jaw with artificial touch mounted at the tail end of a mechanical arm, extracting the outline of a transparent object in the visual image, and judging the position of the transparent object relative to the mechanical arm clamping jaw with the artificial touch according to the outline of the transparent object until the transparent object is moved to the center of the mechanical arm clamping jaw with the artificial touch; and the grabbing unit is used for grabbing the perceived transparent object and placing the transparent object at a specified position.

Preferably, the method further comprises the following steps: the visual image matching unit is used for matching the RGB image with the tactile image and then cutting the RGB image so that the cut RGB image and the tactile image have the same view field range; and the visual-touch fusion classification unit is used for inputting the cut RGB image and the touch image into a classification network to classify the object.

The invention further provides a computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method as set forth in any one of the above.

The beneficial effects of the invention are as follows: the method and the system for grabbing the transparent object and the computer readable storage medium are provided, the potential position of the transparent object is detected under the complex condition (namely changing the background, changing the light and the underwater environment), the touch sense is used for fine positioning after the guidance is carried out through the vision, and the transparent object is grabbed by combining the vision and the touch sense; the visual-touch fusion can effectively handle the grabbing of transparent objects under a disordered condition; the method has better processing capability for complex backgrounds, changed lamplight and underwater scenes.

In one embodiment of the invention, the perceived transparent objects are classified before they are grabbed.

Drawings

Fig. 1 is a schematic diagram of a method for grabbing a transparent object according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a method for classifying transparent objects according to an embodiment of the present invention.

Fig. 3 is a schematic view of a transparent object grasping system according to an embodiment of the present invention.

FIG. 4 is a schematic diagram of a generated residual convolutional neural network according to an embodiment of the present invention.

Fig. 5 is a schematic structural diagram of a fully convolutional neural network in an embodiment of the present invention.

Fig. 6 is a schematic diagram of a method for extracting a contour of a transparent object in the visual image by using a full convolution neural network according to an embodiment of the present invention.

Fig. 7 is a schematic diagram of determining a moving direction according to the contour of the transparent object in the embodiment of the present invention.

FIG. 8 is a diagram illustrating the movement effect of a haptic gripper according to an embodiment of the invention.

FIG. 9 is a diagram illustrating an overall procedure for calibrating a haptic sensation according to an embodiment of the present invention.

Fig. 10(a) -fig. 10(b) are schematic diagrams of coordinate transformation relations in the embodiment of the present invention.

FIG. 11 is a diagram illustrating a visual-tactile image matching method according to an embodiment of the present invention.

Fig. 12 is a schematic diagram of a classification network in an embodiment of the invention.

Detailed Description

In order to make the technical problems, technical solutions and advantageous effects to be solved by the embodiments of the present invention more clearly understood, the present invention is further described in detail below with reference to the accompanying drawings and the embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

It will be understood that when an element is referred to as being "secured to" or "disposed on" another element, it can be directly on the other element or be indirectly on the other element. When an element is referred to as being "connected to" another element, it can be directly connected to the other element or be indirectly connected to the other element. In addition, the connection may be for either a fixing or a circuit communication.

It is to be understood that the terms "length," "width," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like are used in an orientation or positional relationship indicated in the drawings for convenience in describing the embodiments of the present invention and to simplify the description, and are not intended to indicate or imply that the referenced device or element must have a particular orientation, be constructed in a particular orientation, and be in any way limiting of the present invention.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or to implicitly indicate the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the embodiments of the present invention, "a plurality" means two or more unless specifically limited otherwise.

The traditional transparent object grabbing scheme generally positions the transparent object only through vision, however, the visual characteristics of the transparent object are easily affected by the conditions such as background and illumination, and therefore the precision of visual positioning is very limited in an extreme environment. Therefore, the patent aims to design a scheme for grabbing the transparent object by fusing the vision and the touch, and the grabbing of the transparent object in the complex environment can be completed by fusing the vision and the touch.

As shown in fig. 1, the present invention provides a method for grabbing a transparent object, comprising the following steps:

s1: collecting an RGB image of the whole working area containing the transparent object through an RGB camera;

s2: processing the input image by using the RGB image as input through a generative residual convolution neural network, and generating a thermodynamic diagram representing the grabbing position of the transparent object and a grabbing radius thermodynamic diagram positioning grabbing point after processing;

s3: extracting a touch image of the grabbing point by using a mechanical arm clamping jaw with artificial touch installed at the tail end of a mechanical arm, extracting the outline of a transparent object in the visual image, and judging the position of the transparent object relative to the mechanical arm clamping jaw with the artificial touch according to the outline of the transparent object until the transparent object is moved to the center of the mechanical arm clamping jaw with the artificial touch;

S4: and grabbing the transparent object and placing the transparent object at a specified position.

The potential position of the transparent object is detected under the complex condition (namely changing the background, changing the light and underwater environment), the touch sense is used for fine positioning after the guidance is carried out through the vision, the transparent object is grabbed by combining the vision and the touch sense, and the grabbing of the transparent object under the disordered condition can be effectively processed through the scheme of vision-touch fusion; the method has better processing capability for complex backgrounds, changed lights and underwater scenes.

As shown in fig. 2, the method specifically comprises the following steps:

matching the RGB image with the touch image, and then cutting the RGB image to enable the cut RGB image and the touch image to have the same view field range;

and inputting the cut RGB image and the touch image into a classification network for object classification.

As shown in fig. 3, the present invention also provides a transparent object grabbing system, comprising:

the image acquisition unit is used for acquiring an RGB image of the whole working area containing the transparent object through an RGB camera;

The visual positioning unit is used for processing the input image by using the RGB image as input through a generative residual convolution neural network, and generating a thermodynamic diagram representing the grabbing position of the transparent object and a grabbing radius thermodynamic diagram positioning grabbing point after processing;

the touch calibration unit is used for extracting a touch image of the grabbing point by using a mechanical arm clamping jaw with artificial touch mounted at the tail end of a mechanical arm, extracting the outline of a transparent object in the visual image, and judging the position of the transparent object relative to the mechanical arm clamping jaw with the artificial touch according to the outline of the transparent object until the transparent object is moved to the center of the mechanical arm clamping jaw with the artificial touch;

and the grabbing unit is used for grabbing the perceived transparent object and placing the transparent object at a specified position.

It will be appreciated that grabbing of a transparent object of unknown kind can be accomplished by the above-described units.

In an embodiment of the invention, for the transparent objects of known types, the transparent objects can be classified and then grabbed, so that the transparent objects can be classified and placed.

As further shown in fig. 3, the system for grasping a transparent object further includes:

The visual image matching unit is used for matching the RGB image with the tactile image and then cutting the RGB image so that the cut RGB image and the tactile image have the same view field range;

and the visual-touch fusion classification unit is used for inputting the cut RGB image and the touch image into a classification network for object classification.

The method and system of the present invention are described below in conjunction with the following.

(1) Image acquisition unit

In the image capture unit, an RGB camera may be used to capture RGB images of the scene, the RGB camera being placed outside the working area so that the viewing angle covers the entire working area.

(2) Visual positioning unit

The visual positioning unit is mainly used for visually detecting the feasible transparent object grabbing positions.

In the visual positioning unit, firstly, an RGB image of an image acquisition unit is obtained as input, the input image is processed through a generating type residual convolution neural network, a thermodynamic diagram representing the grabbing position of the transparent object and a thermodynamic diagram representing the grabbing radius of the transparent object can be generated after processing, and the value of the thermodynamic diagram representing the grabbing position represents the feasibility of the position in grabbing the transparent object. In the grab position thermodynamic diagram, a larger value of a pixel indicates that it is better to grab at the position, and each pixel number value of the grab radius thermodynamic diagram indicates a grab radius at each pixel.

In a specific embodiment, the grasping position refers to a specific grasping point, which indicates that the center of the flexible clamping jaw should be aligned with the selected grasping point for grasping, and since the head of the tactile mechanical jaw can be a spherical elastic bag (see a variable-rigidity flexible clamping jaw [ CN202111109271.7] with artificial touch sense in detail), the contact area between the elastic bag and the grasping surface is not as large as the contact area of the flexible clamping jaw at different pressing heights, but is circular, like a large contact surface is pressed by an air ball, for a small object, the small object can be completely wrapped without pressing too much, for a large object, the small object can be completely wrapped by pressing a little more, and therefore, the grasping radius indicates the circle of the contact surface caused by the pressing height.

Specifically, taking the RGB image as an input, processing the input image through the generated residual convolutional neural network, specifically including:

performing feature extraction on the RGB image through a first convolution layer, a second convolution layer and a third convolution layer to respectively generate 32-dimensional, 64-dimensional and 128-dimensional feature vectors; and the output of the convolution layer is sent into 5 residual blocks, the input resolution and the output resolution of the residual blocks are equal, and then two eigenvectors with the output resolution being the same as the resolution of the input RGB image are obtained after the residual blocks pass through a fourth convolution layer, a fifth convolution layer and a sixth convolution layer, wherein the two eigenvectors are respectively a grabbing position and a grabbing radius.

Fig. 4 is a schematic diagram of a generated residual convolutional neural network in an embodiment of the present invention, in which the first convolutional layer is convolutional layer 1, the second convolutional layer is convolutional layer 2, the third convolutional layer is convolutional layer 3, the fourth convolutional layer is convolutional layer 4, the fifth convolutional layer is convolutional layer 5, and the sixth convolutional layer is convolutional layer 6. The generated residual convolution neural network obtains two feature vectors with the same output resolution as that of the input RGB image, wherein the two feature vectors are respectively a grabbing position and a grabbing radius, the two feature vectors jointly determine a final grabbing plan, namely, a peak value in a grabbing position feature map is selected as a grabbing position, and a numerical value at the same position in a grabbing radius feature map is used as a grabbing radius.

In an embodiment of the present invention, the generated residual convolutional neural network adopts cross-layer connection, the first convolutional layer and the sixth convolutional layer are connected in a cross-layer manner, the second convolutional layer and the fifth convolutional layer are connected in a cross-layer manner, and the third convolutional layer and the fourth convolutional layer are connected in a cross-layer manner.

Continuing as shown in fig. 4, convolutional layer 1 is connected across layers to convolutional layer 6, convolutional layer 2 is connected across layers to convolutional layer 5, and convolutional layer 3 is connected across layers to convolutional layer 4. The cross-layer connection directly inputs the information of the first half part of the network into the second half part without processing, which is beneficial to the network integration of multi-scale information and obtains more stable estimation effect of the grasping position.

The invention combines cross-layer connection with a generated residual convolution neural network, and is effectively applied to the detection of transparent objects.

The invention provides a transparent object grabbing point positioning algorithm, which comprises a main body of a generating type residual convolution neural network, wherein the network input is RGB pictures, the network output is grabbing position thermodynamic diagrams and grabbing radius thermodynamic diagrams, and cross-layer connection in the network can realize multi-scale information fusion and obtain a more stable positioning effect.

(3) Haptic calibration unit

In the tactile calibration unit, a tactile gripper installed at the tip of the robot arm is used to extract a tactile image of the place. The invention provides a tactile mechanical claw which is provided with a variable-rigidity flexible clamping jaw with an artificial tactile sense and is used in the invention [ CN202111109271.7 ]. The filling medium is a transparent liquid, and the solid particles are transparent particles that are not visible when placed in the transparent liquid. The touch mechanical claw uses a camera to realize the principle of artificial touch, and can deform when the soft capsule contacts an external object, so that the deformation displays the shape of the object, and the deformation can be detected through the camera to generate a touch image. In the grabbing process, the soft bag and the solid particles in the soft bag can gradually wrap the object to be grabbed by extracting the filling medium in the soft bag, so that the soft bag can deform in a self-adaptive manner along with the shape of the object to be grabbed, and the object can be grabbed.

Because the position that the sense of touch gripper was placed for the first time is guided through visual inspection, and transparent object self receives the influence of external environment and leads to inaccurately easily, therefore visual positioning can't accurately guide sense of touch gripper to fall to the positive center of waiting to snatch the object. In order to move the transparent object to the right center of the mechanical claw for grabbing, the relative position of the transparent object and the tactile mechanical claw needs to be judged. Therefore, the invention uses FCN (full Convolutional neural Network) to perform edge detection on a part of transparent objects under the tactile gripper, extracts the outline of the transparent objects in the tactile image, and judges the position of the transparent objects relative to the tactile gripper through the outline of the transparent objects. If the transparent object is not at the center of the tactile gripper, the robot arm continues to move until the transparent object is moved to the center of the tactile gripper.

As shown in fig. 5, the FCN is composed entirely of convolutional layers, can accept an input image of any size, and uses the convolutional layers to up-sample the feature map of the last convolutional layer to restore it to the same size as the input image, so that a prediction can be generated for each pixel, and the spatial information in the original input image is preserved.

As shown in fig. 6, the method for extracting the outline of the transparent object in the visual image by using the full convolution neural network includes the following steps:

the full convolution neural network detects a mask of the transparent object;

and calculating to obtain the central position of the mask, and judging the position of the transparent object relative to the mechanical arm clamping jaw with the artificial touch according to the central position of the mask.

As shown in fig. 7, which is a schematic diagram of determining a moving direction according to a transparent object contour according to the present invention, specifically, a tactile image collected by a tactile gripper is input into an FCN network, and after convolution, pooling and deconvolution, pixel-level edge detection may be performed on a transparent object in the tactile image to obtain a boundary of the transparent object. The resulting boundary may determine the position of the object relative to the haptic gripper and thus may drive the gripper toward the center of the object.

Fig. 8 is a schematic diagram showing the moving effect of the tactile mechanical claw according to the present invention. It can be seen that when the movement is complete, the center of the transparency will appear at the center of the haptic gripper.

Fig. 9 is a schematic diagram illustrating an overall procedure of the haptic calibration according to the present invention.

The touch calibration unit uses the touch as the guide information, obtains the boundary of the transparent object through the FCN network, gradually moves the touch mechanical claw to the center of the object, and can effectively solve the influence of external environment disturbance on the visual positioning of the transparent object.

(4) Visual contact image matching unit

In the visual touch image matching unit, the visual angle of the RGB camera is large, so that the visual touch image matching unit can cover the whole working space, and the visual angle of the tactile image is small, so that only the space where the tactile mechanical gripper is located is covered. Therefore, the RGB image needs to be cut to match the tactile image, the position of the mechanical claw in the RGB image can be determined according to the spatial position of the mechanical claw, the RGB image is cut, and the cut RGB image and the tactile image have the same range.

Specifically, in one embodiment of the present invention, since the gripper is mounted at the end of the robot arm, the positions of six joints of the robot arm can be read in real time, so that the spatial position of the end of the robot arm can be uniquely determined, and the mature positive kinematics of the robot arm is used, so that the spatial position of the gripper at the end of the robot arm can be accurately obtained every time the robot arm moves.

Because the tactile gripper is arranged at the tail end of the mechanical arm, the coordinate system conversion relation between the mechanical arm base coordinate system and the RGB camera can be obtained by calibrating the mechanical arm and the RGB camera by hands and eyes, and the scheme is as follows:

As shown in fig. 10(a) and 10(B), the checkerboard calibration plate is firmly placed on the wrist portion under the haptic gripper, and the robot arm base coordinate system { B }, the end haptic gripper coordinate system { E }, the calibration plate coordinate system { K }, and the camera coordinate system { C }:

thus, the transformation relationship can be obtained as:

wherein the transformation matrix

(the transformation of the robot arm base coordinate system to the coordinate system of the end-touch gripper) can be solved by positive kinematics of the robot arm. Meanwhile, because the camera has already finished the internal reference calibration, so can obtain the transformation matrix according to the picture that the camera gathers

(the conversion relation from the calibration board coordinate system to the camera coordinate system can be directly solved through matlab). Thus, only need to solve

(the coordinate system of the end-touch gripper to the coordinate system of the calibration plate) can be completely determinedMatrix array

(robot base coordinate system to RGB camera coordinate system). To solve a transformation matrix

The mechanical arm needs to be moved to another different pose while the camera is used to take the calibration plate image using the repeated first step. By contrast, it can be found that the transformation matrix has the following relationship:

Wherein, subscripts 1 and 2 refer to the twice movements, the subscript in the first posture is 1, and the subscript in the second posture is 2;

wherein, because the relative relationship between the robot base coordinate system { B } and the camera coordinate system { C } is not changed (RGB camera is not moved, robot base is also moved), the relative relationship between the robot base coordinate system { B } and the camera coordinate system { C } is changed, so as to make the robot base be moved

After the formula is finished, the compound is obtained:

order to

The formula can be expressed as:

AX＝XB

then, the transformation matrix can be obtained only by solving the transformation matrix X, so that the hand-eye calibration is completed, and the solution of X can use the scheme in the prior art.

Fig. 11 is a schematic diagram of a visual-tactile image matching method according to the present invention. In a specific embodiment, after the hand-eye calibration is completed, the relationship between the mechanical arm base coordinate system and the RGB camera coordinate system is completely determined, and the relationship between the mechanical arm base coordinate system and the mechanical arm end coordinate system (i.e. the coordinate system where the tactile gripper is located) can be uniquely determined as long as the joint angle value is obtained through the positive kinematics of the mechanical arm at the end of the mechanical arm, so that the relationship between the tactile gripper coordinate system and the RGB camera coordinate system can be directly obtained according to the above two relationships, and thus the position of the tactile gripper in the RGB image can be directly obtained.

According to the visual-tactile image matching unit, the relative positions of the RGB camera and the tactile mechanical claw at the tail end of the mechanical arm are obtained through calibration of hands and eyes, so that the position of the tactile mechanical claw in an RGB image is obtained, a cut RGB image corresponding to the tactile image is obtained, and the problem that the visual field ranges of the RGB image and the tactile image are inconsistent can be effectively solved.

(5) Visual-touch fusion classification unit

And in the visual-touch fusion classification unit, sending the cut RGB image and the touch image into a classification network together for object classification, grabbing by using a touch mechanical claw after the classification is finished, and placing at a specified position.

The classification function of the invention is sorting, namely, objects of different types are put into different containing devices, and whether the objects are classified does not influence the grabbing.

The classification performed above is limited to known classifiable objects, and objects of unknown class, such as broken irregular glass sheets, are directly captured without being identified, and therefore the haptic calibration unit is directly transited to the capture unit.

As shown in fig. 12, after the matched haptic image and the clipped RGB image are generated by the haptic image matching unit, they are stacked and sent to the classification network, which includes 16 convolutional layers and 3 full connection layers. Since a series of classes of transparent objects are defined in advance, the determined object type is output from the input image.

The visual-touch fusion classification unit uses the cut RGB image and the touch image as input, obtains classification categories after network processing, and can effectively solve the problems that the transparent object is greatly influenced by the external environment and is difficult to detect and identify through fusion of the visual information and the touch information.

(6) Grabbing unit

In the grasping unit, the sensed transparent object is grasped and placed at a designated position.

The classifiable transparent object can be grabbed after being classified, the invention adopts the variable-rigidity flexible clamping jaw with artificial touch sense, and when the filling medium in the mechanical jaw soft bag is drawn out, the flexible clamping jaw can wrap the transparent object, so that grabbing is completed; and for the non-classifiable transparent object, after finishing the touch calibration, the visual-touch image matching and classification are not needed, and the transparent object can be directly grabbed.

In a home environment, the transparent object is visible everywhere, and the environment in which the transparent object is located is relatively complex and variable, such as a water cup on a dining table, a water cup in a bowl washing groove, and the like. If the classification mode is used, the water cups are sorted and placed at the specified position. If the method in the prior art is adopted, besides the type information of the object, a CAD model of the object is needed, and the scanning of the CAD model is expensive, and the scanned model must be completely consistent (shape, color) with the object to be used, so the method in the prior art cannot be operated for the following reasons:

(1) The CAD model can not be obtained, or the obtained CAD model and the object have different shapes and colors

(2) The background is too chaotic and changeable, and the lamplight is complex and changeable

(3) Able to acquire CAD models, but too expensive

So 3 of the three methods in the prior art cannot classify the object at all and therefore cannot accomplish this task.

According to the scheme provided by the invention, the object is positioned through vision and the touch exploration is carried out, the fusion of the vision and the touch can realize the accurate object classification and grabbing, and the classification accuracy can reach more than 95%.

For the gripping of unknown objects, such as cups that fall to the ground and are broken into pieces, the prior art methods are either not usable, because the broken cups must be of a very random size and shape for the known objects. Or only the environment of a single background can be dealt with, disordered backgrounds and complex lamplight cannot be processed, meanwhile, the scheme in the prior art has no feedback link, and whether the center of the object is grabbed or not cannot be detected.

The method provided by the invention can be used for grabbing complex backgrounds, complex lamplight and unknown objects, so that the positioning can be carried out through vision firstly, and then the touch exploration is carried out.

In conclusion, the existing schemes can not meet the above-mentioned scenario, but the scheme of the present invention can effectively and successfully complete the task of the scenario.

An embodiment of the present application further provides a control apparatus, including a processor and a storage medium for storing a computer program; wherein a processor is adapted to perform at least the method as described above when executing said computer program.

Embodiments of the present application also provide a storage medium for storing a computer program, which when executed performs at least the method described above.

Embodiments of the present application further provide a processor, where the processor executes a computer program to perform at least the method described above.

The storage medium may be implemented by any type of volatile or non-volatile storage device, or combination thereof. The nonvolatile Memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a magnetic random access Memory (FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical Disc, or a Compact Disc Read-Only Memory (CD-ROM); the magnetic surface storage may be disk storage or tape storage. The volatile Memory may be a Random Access Memory (RAM) which serves as an external cache. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Synchronous Static Random Access Memory (SSRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM), Double Data Rate Synchronous Dynamic random access Memory (ESDRAM), enhanced Synchronous Dynamic random access Memory (ESDRAMEnhanded Synchronous Dynamic random access Memory), Synchronous Link Dynamic random access Memory (DRAM, Synchronous Link Dynamic random access Memory), Direct Memory bus (DRMBUS random access Memory). The storage media described in connection with the embodiments of the invention are intended to comprise, without being limited to, these and any other suitable types of memory.

In the several embodiments provided in the present application, it should be understood that the disclosed system and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only one logical function division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or in other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit may be implemented in the form of hardware, or in the form of hardware plus a software functional unit.

Those of ordinary skill in the art will understand that: all or part of the steps of implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer-readable storage medium, and when executed, executes the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as a removable Memory device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention or portions thereof contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media capable of storing program code.

The methods disclosed in the several method embodiments provided in the present application may be combined arbitrarily without conflict to arrive at new method embodiments.

Features disclosed in several of the product embodiments provided in the present application may be combined in any combination to yield new product embodiments without conflict.

The features disclosed in the several method or apparatus embodiments provided in the present application may be combined arbitrarily, without conflict, to arrive at new method embodiments or apparatus embodiments.

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several equivalent substitutions or obvious modifications can be made without departing from the spirit of the invention, and all the properties or uses are considered to be within the scope of the invention.

Claims

1. A transparent object grabbing method is characterized by comprising the following steps:

s1: collecting an RGB image of the whole working area containing the transparent object by an RGB camera;

s2: processing the input image by using the RGB image as input through a generative residual convolution neural network, and generating a thermodynamic diagram representing the grabbing position of the transparent object and grabbing radius thermodynamic diagram positioning grabbing points after processing;

S3: extracting a touch image of the grabbing point by using a mechanical arm clamping jaw with artificial touch mounted at the tail end of a mechanical arm, extracting the outline of a transparent object in the visual image, and judging the position of the transparent object relative to the mechanical arm clamping jaw with the artificial touch according to the outline of the transparent object until the transparent object is moved to the center of the mechanical arm clamping jaw with the artificial touch;

2. The method for grabbing a transparent object according to claim 1, wherein the transparent object is classified before the transparent object is grabbed; the method specifically comprises the following steps:

matching the RGB image with the touch image, and then cutting the RGB image to enable the cut RGB image and the touch image to have the same field range;

3. The method for grabbing a transparent object according to claim 1, wherein the processing of the input image by the generated residual convolutional neural network using the RGB image as an input specifically comprises:

4. The method for grabbing a transparent object according to claim 3, wherein said generated residual convolutional neural network adopts cross-layer connection, the first convolutional layer and the sixth convolutional layer are connected in a cross-layer manner, the second convolutional layer and the fifth convolutional layer are connected in a cross-layer manner, and the third convolutional layer and the fourth convolutional layer are connected in a cross-layer manner.

5. The method for grabbing a transparent object according to claim 1, wherein a full convolution neural network is used to extract the outline of the transparent object in the visual image, and the method comprises the following steps:

the full convolution neural network detects a mask of the transparent object;

6. The method for grabbing a transparent object according to claim 1, wherein the hand-eye calibration is performed on the robot arm and the RGB camera to obtain a conversion relationship between a robot arm base coordinate system and a coordinate system of the RGB camera, the relationship between the robot arm base coordinate system and a robot arm end coordinate system is uniquely determined according to joint angle information of the robot arm through positive kinematics of the robot arm, a relationship between a robot arm jaw coordinate system having artificial touch feeling and an RGB camera coordinate system is obtained according to the relationship, and a position of the robot arm jaw having artificial touch feeling in an RGB image is obtained.

7. The method for grabbing transparent objects according to claim 2, wherein the structure of said classification network is shown as follows, comprising 16 convolution layers and 3 full-link layers;

and a series of classes of transparent objects are predefined, and the types of the transparent objects are judged according to the input matched RGB images and the touch images.

8. A transparent object grasping system, comprising:

9. The transparent object grasping system according to claim 8, further comprising:

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.