CN112802107A

CN112802107A - Robot-based control method and device for clamp group

Info

Publication number: CN112802107A
Application number: CN202110167302.8A
Authority: CN
Inventors: 段文杰; 夏冬青; 丁有爽; 邵天兰
Original assignee: Mech Mind Robotics Technologies Co Ltd
Current assignee: Mech Mind Robotics Technologies Co Ltd
Priority date: 2021-02-05
Filing date: 2021-02-05
Publication date: 2021-05-14

Abstract

The invention discloses a robot-based control method and a robot-based control device for a clamp group, wherein the method comprises the following steps: acquiring a two-dimensional color image corresponding to the three-dimensional object area and a depth image corresponding to the two-dimensional color image along a preset depth direction; inputting the two-dimensional color image and the depth image into a depth learning model, and predicting a graspable area contained in the two-dimensional color image according to an output result; determining the geometric characteristics of the grippable region according to the contour line of the grippable region, and determining the number of the clamps corresponding to the grippable region and the clamp identification of each clamp according to the geometric characteristics; and controlling the robot to call each clamp corresponding to the clamp identification in the clamp group to execute the grabbing operation. The method determines the geometric characteristics of the graspable region according to the contour line of the graspable region, thereby determining the number of jigs corresponding to the graspable region and the jig identification of each jig. The reliability of clamp grabbing can be remarkably improved by determining the number of the clamps and the identification of the clamps through the geometric characteristics.

Description

Robot-based control method and device for clamp group

Technical Field

The invention relates to the technical field of manipulator control, in particular to a control method and a control device of a clamp group based on an intelligent programmed robot.

Background

At present, with the wide popularization of intelligent program-controlled robots, more and more articles can be grabbed and transported by means of the intelligent program-controlled robots. For example, commodity circulation packing can snatch through intelligent programming robot to promote by a wide margin and snatch efficiency. In order to improve the grabbing efficiency and flexibly adapt to various object objects, the intelligent programmed robot is usually provided with a clamp group consisting of a plurality of clamps, so that different clamps in the clamp group can be flexibly adjusted according to different object objects.

However, conventional smart robots can only be used to grab known item objects. For example, when the characteristics of the placement position, size, type, and the like of the object are known in advance, the targeted grasping is performed by a jig matching the size and type. However, for any unknown article, since information such as the shape and position of the article cannot be determined in advance, accurate grasping cannot be performed, and it cannot be determined in advance which jigs are called to perform grasping.

Disclosure of Invention

In view of the above, the present invention has been made to provide a method and apparatus for controlling a robot-based gripper set that overcomes or at least partially solves the above-mentioned problems.

According to an aspect of the present invention, there is provided a robot-based jig set control method including:

acquiring a two-dimensional color image corresponding to a three-dimensional object area and a depth image corresponding to the two-dimensional color image along a preset depth direction;

inputting the two-dimensional color image and the depth image into a deep learning model, and predicting a graspable area contained in the two-dimensional color image according to an output result;

determining the geometric characteristics of the grippable region according to the contour line of the grippable region, and determining the number of the clamps corresponding to the grippable region and the clamp identification of each clamp according to the geometric characteristics of the grippable region;

and controlling the robot to call each clamp corresponding to the clamp identification in the clamp group to execute the grabbing operation.

Optionally, the determining the geometric features of the graspable region according to the contour line of the graspable region includes:

and drawing a maximum inscribed circle, a minimum inscribed circle, a maximum inscribed rectangle and/or a minimum inscribed rectangle of the graspable region according to the contour line of the graspable region.

Optionally, the determining, according to the geometric features of the graspable region, the number of clips corresponding to the graspable region and the clip identifier of each clip includes:

acquiring the radius of the maximum inscribed circle of the grippable region and/or acquiring the length of a main shaft of the grippable region;

and comparing the radius with a first radius threshold value, and/or comparing the length of the spindle with a spindle threshold value, and determining the number of the clamps corresponding to the grippable region and the clamp identification of each clamp according to the comparison result.

acquiring the radius of a maximum inscribed circle of the graspable area and/or acquiring the length of a maximum inscribed rectangle of the graspable area;

and comparing the radius with a second radius threshold value, and/or comparing the length of the maximum inscribed rectangle with a first length threshold value, and determining the number of clamps corresponding to the graspable region and the clamp identifier of each clamp according to the comparison result.

acquiring the width of the maximum inscribed rectangle of the graspable region and/or acquiring the length of the maximum inscribed rectangle of the graspable region;

and comparing the width with a width threshold value, and/or comparing the length of the maximum inscribed rectangle with a second length threshold value, and determining the number of the clamps corresponding to the graspable region and the clamp identification of each clamp according to the comparison result.

Optionally, the controlling the robot to call each fixture included in the fixture group and corresponding to the fixture identifier to perform the grabbing operation includes:

and acquiring a conversion relation between a camera coordinate system and a robot coordinate system, converting the position information of the graspable area corresponding to the camera coordinate system into the robot coordinate system according to the conversion relation, and outputting the converted position information of each graspable object and the jig identifier to the robot.

Optionally, the clamp group includes: a plurality of suction cup clamps.

According to yet another aspect of the present invention, there is also provided a control apparatus of a robot-based gripper set, comprising:

the acquisition module is suitable for acquiring a two-dimensional color image corresponding to a three-dimensional object area and a depth image corresponding to the two-dimensional color image along a preset depth direction;

the prediction module is suitable for inputting the two-dimensional color image and the depth image into a deep learning model and predicting a graspable area contained in the two-dimensional color image according to an output result;

the determining module is suitable for determining the geometric characteristics of the grippable region according to the contour line of the grippable region, and determining the number of the clamps corresponding to the grippable region and the clamp identification of each clamp according to the geometric characteristics of the grippable region;

and the control module is suitable for controlling the robot to call each clamp which is contained in the clamp group and corresponds to the clamp identification to execute the grabbing operation.

Optionally, the determining module is specifically adapted to:

Optionally, the control module is specifically adapted to:

Optionally, the clamp group includes: a plurality of suction cup clamps.

According to still another aspect of the present invention, there is provided an electronic apparatus including: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the control method of the robot-based clamp group.

According to yet another aspect of the present invention, there is provided a computer storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to perform operations corresponding to the method for controlling a robot-based gripper set as described above.

In the control method and the device of the robot-based clamp group, the two-dimensional color image corresponding to the three-dimensional object area and the depth image corresponding to the two-dimensional color image can be obtained along the preset depth direction, and the graspable area contained in the two-dimensional color image is predicted through the deep learning model; and determining the geometric characteristics of the graspable area according to the contour line of the graspable area, and determining the number of the clamps corresponding to the graspable area and the clamp identifier of each clamp according to the geometric characteristics of the graspable area, so as to control the robot to call each clamp corresponding to the clamp identifier included in the clamp group to execute the grasping operation. It can be seen that, in the present invention, on one hand, the shape and position of the graspable region can be predicted by means of the deep learning model; on the other hand, the geometric features of the graspable region can be determined from the contour line of the graspable region, thereby determining the number of jigs corresponding to the graspable region and the jig identification of each jig. The reliability of clamp grabbing can be remarkably improved by determining the number of the clamps and the identification of the clamps through the geometric characteristics.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 shows a flow diagram of a method of controlling a robot-based gripper set according to one embodiment of the present invention;

FIG. 2 shows a flow diagram of a method of controlling a robot-based gripper set according to another embodiment of the present invention;

FIG. 3 illustrates a schematic structural diagram of a control device for a robot-based gripper set according to yet another embodiment of the present invention;

fig. 4 shows a schematic structural diagram of an electronic device according to the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Fig. 1 shows a flow diagram of a method for controlling a robot-based gripper set according to an embodiment of the present invention, wherein the robot may be a smart programmed robot, as shown in fig. 1, the method comprising:

step S110: and acquiring a two-dimensional color image corresponding to the three-dimensional object area and a depth image corresponding to the two-dimensional color image along a preset depth direction.

The preset depth direction can be flexibly set according to an actual service scene, and specifically, the preset depth direction includes at least one of the following: the camera shoots the direction, the gravity direction and the direction of the vertical line of the object bearing surface.

In one implementation, the preset depth direction is: the depth direction along which the camera takes a picture is also called the camera shooting direction. Specifically, the light generated by the camera of the camera starts from the first direction and extends to the second direction, and the preset depth direction is a direction from the first direction to the second direction. For example, when a camera of the camera takes a picture from top to bottom, the preset depth direction is the direction from top to bottom; when the camera of the camera takes a picture from left to right, the preset depth direction is the direction from left to right. For example, if a camera is used to take a picture, the preset depth direction is: pointed by the camera in the direction of the article area. If the two cameras are used for photographing, the preset depth direction is as follows: the midpoint position of the two cameras points to the direction of the article area. Of course, for scenes with multiple cameras, the preset depth direction may be set according to a direction in which the center positions of the multiple cameras point to the article area, and the present invention is not limited to the specific details.

In another implementation, the preset depth direction is: the direction of the vertical line of the article carrying surface is as follows: perpendicular to the direction of the object carrying surface. In particular, the shooting angle of the camera can be flexibly set, for example, the shooting angle of the camera may be at an angle with the object placement direction, that is: the camera is in a tilted state. Therefore, for the sake of accurate description, the preset depth direction may also be a direction perpendicular to the article carrying surface, and in practical implementation, the preset depth direction may be any direction, for example, the preset depth direction may be a vertical direction, or may be a certain inclined direction, and the present invention does not limit the preset depth direction.

Wherein, article bearing surface means: the plane of the carrier for placing the three-dimensional object. For example, when a three-dimensional object is placed on the ground, the ground is a carrier for placing the three-dimensional object, and correspondingly, the object bearing surface is a plane where the ground is located; for another example, when a three-dimensional object is placed on a tray, a conveyor belt, or a material basket, the tray, the conveyor belt, or the material basket is a carrier for placing the three-dimensional object, and correspondingly, the object bearing surface is a plane on which the tray, the conveyor belt, or the material basket is located. In a specific scenario, the carrier such as the tray, the conveyor belt, or the material basket may be disposed obliquely, for example, for convenience of loading and unloading, a plane on which the conveyor belt is disposed may form a preset angle with a horizontal plane, and correspondingly, a preset depth direction is perpendicular to the plane on which the conveyor belt is disposed, and thus, a preset angle is also formed with a vertical direction.

In addition, the preset depth direction may be a gravity direction. For example, when the object carrying surface is consistent with the horizontal plane, the predetermined depth direction is the gravity direction.

For example, in one specific example, the preset depth direction refers to: the depth direction along which the camera takes a picture is also called the shooting direction. Specifically, the light generated by the camera of the camera starts from the first direction and extends to the second direction, and the preset depth direction is a direction from the first direction to the second direction. For example, when a camera of the camera takes a picture from top to bottom, the preset depth direction is the direction from top to bottom; when the camera of the camera takes a picture from left to right, the preset depth direction is the direction from left to right. In addition, the three-dimensional article region means: a three-dimensional region in which a plurality of articles are stacked. Since a stacking phenomenon often exists among a plurality of articles in this embodiment, the orientation relationship among the articles cannot be accurately described only by the planar image, and thus the description is performed by the three-dimensional article region.

In specific implementation, a two-dimensional color image corresponding to the three-dimensional object area and a depth image corresponding to the two-dimensional color image are obtained through a 3D camera. The two-dimensional color image corresponds to an image of a plane area vertical to a preset depth direction; each pixel point in the depth map corresponding to the two-dimensional color image corresponds to each pixel point in the two-dimensional color image one by one, and the value of each pixel point is the depth value of the pixel point. Wherein the depth value is determined according to the distance of the article from the camera. For example, when the camera is taken from top to bottom, the two-dimensional color map corresponds to a top plan view, while the depth map is used to represent the distance of each object from the camera.

Therefore, the azimuth relationship between the objects can be accurately described from a three-dimensional perspective through the two-dimensional color image and the depth image corresponding to the two-dimensional color image.

Step S120: and inputting the two-dimensional color image and the depth image into a deep learning model, and predicting a graspable area contained in the two-dimensional color image according to an output result.

The deep learning model is obtained by training a plurality of training samples generated in advance. Specifically, the deep learning model can predict one or more graspable regions included in the two-dimensional color image based on the two-dimensional color image and the depth image by learning the training sample. In specific implementation, the graspable regions included in the plurality of training samples can be labeled in advance, and accordingly, the graspable regions included in the two-dimensional color image are predicted based on the labeled deep learning model. The deep learning model may be various types of machine learning models, and the specific details are not limited in the present invention.

Step S130: and determining the geometric characteristics of the grippable region according to the contour line of the grippable region, and determining the number of the clamps corresponding to the grippable region and the clamp identification of each clamp according to the geometric characteristics of the grippable region.

Specifically, the geometric characteristics of the graspable region, which are used to represent the geometric shape of the graspable region, such as various shapes of a circle, an ellipse, a rectangle, a trapezoid, and the like, can be determined according to the contour line of the graspable region. Accordingly, the number of jigs corresponding to the graspable region and the jig identification of each jig are determined according to the geometric features of the graspable region. For example, for a circular graspable region, a gripper corresponding to the graspable region is used to grasp with respect to the position of the center of a circle; for a rectangular graspable region, a gripper corresponding to the graspable region is used to grasp for the long side position of the rectangle. In summary, the manner of determining the clamps according to the geometric features helps to ensure that the number of clamps matches the geometric features, thereby improving the gripping effect.

Step S140: and controlling the robot to call each clamp corresponding to the clamp identification in the clamp group to execute the grabbing operation.

Specifically, a clamp control command is output to the robot, and the clamp control command includes at least one clamp identifier, so that the robot calls at least one clamp corresponding to the clamp identifier included in the clamp group to perform a grabbing operation.

The fixture in this embodiment includes various types, for example, various types of universal fixtures, and a universal fixture refers to a fixture having a standardized structure and having a wide application range, for example, a three-jaw chuck and a four-jaw chuck for a lathe, a flat tongs and an index head for a milling machine, and the like. For another example, the clamping apparatus may be divided into a manual clamping apparatus, a pneumatic clamping apparatus, a hydraulic clamping apparatus, a gas-liquid linkage clamping apparatus, an electromagnetic clamping apparatus, a vacuum clamping apparatus, and the like, according to a clamping power source used by the clamping apparatus. The present invention does not limit the specific type of the clamp as long as the article grasping operation can be achieved.

It can be seen that, in the present invention, on one hand, the shape and position of the graspable region can be predicted by means of the deep learning model; on the other hand, the geometric features of the graspable region can be determined from the contour line of the graspable region, thereby determining the number of jigs corresponding to the graspable region and the jig identification of each jig. The reliability of clamp grabbing can be remarkably improved by determining the number of the clamps and the identification of the clamps through the geometric characteristics.

Fig. 2 shows a flow diagram of a method for controlling a robot-based gripper set according to another embodiment of the invention. As shown in fig. 2, the method includes:

step S200: and training the deep learning model through the pre-acquired sample images corresponding to the three-dimensional sample regions.

Specifically, the deep learning model is obtained by training in the following way:

first, a sample image corresponding to a three-dimensional sample region is acquired, and a plurality of object objects included in the sample image are determined. Wherein, a plurality of articles to be grabbed as samples are contained in the three-dimensional sample area. The sample image corresponding to the three-dimensional sample region includes: the depth map comprises a two-dimensional color map corresponding to a three-dimensional sample region and a depth map corresponding to the two-dimensional color map, wherein the two-dimensional color map is acquired along a preset depth direction. The specific obtaining manner may refer to the corresponding description in step S110, and is not described herein again. When a plurality of article objects contained in the sample image are determined, information such as outlines and boundary lines among the articles can be identified in an example segmentation mode, and the article objects contained in the sample image are segmented according to identification results.

Then, according to the position relationship among the plurality of article objects, the graspable region and the graspable region included in the sample image are labeled respectively. Since a stacking phenomenon exists among the plurality of article objects in this embodiment, the article objects stacked below may not be easily grasped, and thus, an area corresponding to a graspable article object and an area corresponding to a non-graspable article object need to be marked. Specifically, when the graspable region and the non-graspable region included in the sample image are respectively labeled according to the positional relationship among the plurality of object objects, the labeling can be implemented by at least one of the following implementation manners:

in an optional implementation manner, a stacking order of each article object along a preset depth direction is determined, an area corresponding to the article object located at the top layer is marked as a graspable area, and an area corresponding to the article object located at the bottom layer is marked as a non-graspable area. Conventional example segmentation algorithms do not distinguish whether an item in the scene is graspable, i.e.: a complete and accurate instance mask needs to be given for all the items in the scene. Therefore, if the conventional example segmentation algorithm is directly applied to the recognition of the graspable region, the article to be pressed located at the bottom layer is recognized as the graspable article or the article irrelevant to the background is recognized as the graspable article, thereby causing a recognition error. In order to prevent the above problem, in this implementation, a stacking order of each article object along a preset depth direction is determined, so that an area corresponding to the article object located on the top layer is marked as a graspable area, and an area corresponding to the article object located on the bottom layer is marked as a non-graspable area, thereby avoiding an abnormality caused by grasping an article on the bottom layer by the robot. For example, in a carton unstacking scenario, it is necessary to unstack from the uppermost layer to the lowermost layer, and it is not possible to grasp a lower layer of cartons without fully grasping the lower layer of cartons. Therefore, in a similar scene, only the uppermost carton is labeled as a graspable object, and the rest cartons are labeled as non-graspable objects. The articles on the uppermost layer and the articles on the non-uppermost layer can be accurately distinguished through the marking mode, and then accurate pixel-level article positioning is given.

In yet another optional implementation manner, according to the exposure proportion of each article object, an area corresponding to the article object whose exposure proportion is greater than a preset threshold is marked as a graspable area, and an area corresponding to the article object whose exposure proportion is not greater than the preset threshold is marked as a non-graspable area. In some scenarios, the mutual stacking relationship between the articles is not easy to determine, and there may be a mutual overlapping situation between the articles on the same layer, at this time, it is difficult to accurately mark the top layer article. For example, in a commercial superstore goods picking scene, the upper-lower layer relation between goods is not clear, and meanwhile, the goods on the same layer are also overlapped, so that the requirements on the grabbing sequence are not strict, and the difference between the grabbed goods and the non-grabbed goods is strict. At this point, items with less surface exposure, or items that may cause other items in the scene to fly out after grabbing, should not be labeled as grippable items. Accordingly, in the above scenario, a labeling threshold, such as 85%, may be set. If the exposed surface area of the article is greater than 85%, then the article is marked as a graspable article; if the article has an exposed surface area of no greater than 85%, the article is marked as a non-graspable article. Of course, the proportion of the exposed article can be quantified by the exposed volume in addition to the exposed surface area, and the invention is not limited to specific details.

In yet another alternative implementation manner, a contact area included in each item object is determined according to the shape and/or type of each item object, an area corresponding to an item object whose contact area is not blocked is marked as a graspable area, and an area corresponding to an item object whose contact area is blocked is marked as a non-graspable area. Wherein, the contact area refers to a force-bearing area which is convenient for grabbing in the object. For example, in the case of an article such as a metal part, in order to prevent the part from being damaged, a specific region in the metal part, that is, a contact region, is required to be grasped, and the specific region is generally a region which is firmer and is not easy to fall off in the metal part. Therefore, when the articles are marked, it is necessary to determine whether the contact area is completely exposed and is not blocked. If the exposed surface area of the article is large, but the contact area is blocked, the article is marked as a non-graspable article.

The above-mentioned labeling methods can be used alone or in combination, and the present invention is not limited thereto.

And finally, training a deep learning model according to the labeled sample image. In order to improve the effect of the model, the larger the number of samples is, the better the training effect is, in order to prevent the problem of poor training effect caused by the small number of samples, in this embodiment, the number of training samples is increased by applying a multiplication mode to the sample data, so as to achieve the goal of training the deep learning model. In order to achieve the effect of data multiplication, various methods can be adopted. Specifically, the marked sample image may be used as an original training set, the original training set may be expanded in a brightness contrast adjustment mode, a picture affine transformation mode and/or a picture white balance random transformation mode, and the deep learning model may be trained through the expanded training set obtained after expansion. By means of the expansion processing mode, the number of samples can be increased, and training effects are improved.

Step S210: and acquiring a two-dimensional color image corresponding to the three-dimensional object area and a depth image corresponding to the two-dimensional color image along a preset depth direction.

In specific implementation, a two-dimensional color image corresponding to the three-dimensional object area and a depth image corresponding to the two-dimensional color image are obtained through a 3D camera. The two-dimensional color image corresponds to an image of a plane area vertical to a preset depth direction; each pixel point in the depth map corresponding to the two-dimensional color image corresponds to each pixel point in the two-dimensional color image one by one, and the value of each pixel point is the depth value of the pixel point. Wherein the depth value is determined according to the distance of the article from the camera. For example, when the camera is taken from top to bottom, the two-dimensional color map corresponds to a top plan view, while the depth map is used to represent the distance of each object from the camera. Therefore, the azimuth relationship between the objects can be accurately described from a three-dimensional perspective through the two-dimensional color image and the depth image corresponding to the two-dimensional color image.

Step S220: and inputting the two-dimensional color image and the depth image into the deep learning model, and predicting a graspable area contained in the two-dimensional color image according to an output result.

Since the deep learning model is generated according to the samples marked with the graspable region and the non-graspable region, the graspable region included in the two-dimensional color image can be predicted by the model. Specifically, after the two-dimensional color image and the depth image are input into the deep learning model, the model outputs a graspable region and a non-graspable region included in the two-dimensional color image, wherein the graspable region corresponds to the graspable object.

In particular, when the model outputs the prediction results corresponding to the respective pixel regions, the prediction results may be expressed in various ways. For example, in one representation, the predicted results include: two states, grippable and not grippable. For another example, in another expression, the prediction result may predict a probability for the capture point of each pixel region: after a two-dimensional color image and a corresponding depth image are input, the deep learning model predicts a 2D probability image that an object can be successfully grabbed by using a sucker or other grabbing tools on each pixel point in an image 2D space, and the value of each pixel in the image represents the 'probability for controlling the sucker to move to the point and successfully picking up the object from the frame', which is predicted by the model. Therefore, in the latter mode, the prediction result can be accurate to the corresponding snatchable probability of each pixel point. The larger the probability of grabbing is, the larger the success rate of grabbing operation performed from the corresponding pixel point is; the smaller the probability of grasping is, the smaller the success rate of the grasping operation performed from the corresponding pixel point is.

Step S230: and determining the geometric characteristics of the grippable region according to the contour line of the grippable region, and determining the number of the clamps corresponding to the grippable region and the clamp identification of each clamp according to the geometric characteristics of the grippable region.

During specific implementation, according to the contour line of the grippable region, drawing a maximum inscribed circle, a minimum inscribed circle, a maximum inscribed rectangle and/or a minimum inscribed rectangle of the grippable region, further determining the shape characteristics of the grippable region according to the maximum inscribed circle, the minimum inscribed circle, the maximum inscribed rectangle and/or the minimum inscribed rectangle, and further allocating a clamp matched with the shape characteristics for gripping.

In an alternative implementation, the radius of the maximum inscribed circle of the graspable region is obtained, and/or the length of the principal axis of the graspable region is obtained; and comparing the radius with a first radius threshold value and/or comparing the spindle length with a spindle threshold value, and determining the number of the clamps corresponding to the graspable area and the clamp identification of each clamp according to the comparison result. The radius of the maximum inscribed circle of the grippable region is used for reflecting the length of the grippable region in the narrower direction, and the length of the main axis of the grippable region is used for reflecting the length of the grippable region in the longer direction. In specific implementation, the jigs included in the jig group are divided into a plurality of combinations, such as four combinations. When the radius of the maximum inscribed circle of the grippable region is larger than a first radius threshold value and the length of the main shaft is larger than a main shaft threshold value, the grippable region is wide and long, so that the grippable region is gripped by using the clamps contained in a first combination mode, and the number of the clamps contained in the first combination mode is large; when the radius of the maximum inscribed circle of the grippable area is larger than the first radius threshold value and the length of the main shaft is smaller than the main shaft threshold value, gripping by adopting a fixture contained in a second combination mode; when the radius of the maximum inscribed circle of the grippable region is smaller than the first radius threshold value and the length of the main shaft is larger than the main shaft threshold value, gripping by using a fixture contained in a third combination mode, which indicates that the grippable region is thin and long, and the fixture contained in the third combination mode is also long and narrow; when the radius of the maximum inscribed circle of the grippable region is smaller than the first radius threshold value and the length of the main shaft is smaller than the main shaft threshold value, the shape of the grippable region is narrow and short, so that the grippable region is gripped by using the grippers included in the fourth combination mode, and the number of the grippers included in the fourth combination mode is small.

In yet another alternative implementation, the radius of the maximum inscribed circle of the graspable region is obtained, and/or the length of the maximum inscribed rectangle of the graspable region is obtained; and comparing the radius with a second radius threshold value, and/or comparing the length of the maximum inscribed rectangle with a first length threshold value, and determining the number of the clamps corresponding to the graspable region and the clamp identification of each clamp according to the comparison result. Wherein, the length of the maximum inscribed rectangle of the grippable region can also reflect the overall length of the grippable region. Similar to the former implementation, the respective jigs included in the jig group may be divided into a plurality of combinations, for example, four combinations. When the radius of the maximum inscribed circle of the grippable region is larger than a second radius threshold value and the length of the maximum inscribed rectangle is larger than a first length threshold value, the grippable region is wide and long, so that the grippable region is gripped by using the clamps included in a first combination mode, and the number of the clamps included in the first combination mode is large; when the radius of the maximum inscribed circle of the graspable area is larger than a second radius threshold and the length of the maximum inscribed rectangle is smaller than a first length threshold, the grasper included in the second combination mode is adopted to grasp; when the radius of the maximum inscribed circle of the graspable region is smaller than the second radius threshold and the length of the maximum inscribed rectangle is larger than the first length threshold, using a fixture included in a third combination mode to grasp, which indicates that the shape of the graspable region is thin and long, and the shape of the fixture included in the third combination mode is also long and narrow; when the radius of the maximum inscribed circle of the grippable region is smaller than the second radius threshold value and the length of the maximum inscribed rectangle is smaller than the first length threshold value, the shape of the grippable region is narrow and short, so that the grippable region is gripped by using the grippers included in the fourth combination mode, and the number of the grippers included in the fourth combination mode is small.

In yet another alternative implementation, the width of the maximum inscribed rectangle of the graspable region is obtained, and/or the length of the maximum inscribed rectangle of the graspable region is obtained; and comparing the width of the maximum inscribed rectangle with a width threshold value, and/or comparing the length of the maximum inscribed rectangle with a second length threshold value, and determining the number of the clamps corresponding to the graspable area and the clamp identification of each clamp according to the comparison result. Similar to the first two implementation manners, the respective jigs included in the jig group may be divided into a plurality of combinations, for example, four combinations. The present invention will not be described in detail.

In specific implementation, various parameters and thresholds can be flexibly combined for judgment, and the invention does not limit the specific details.

Step S240: and controlling the robot to call each clamp corresponding to the clamp identification in the clamp group to execute the grabbing operation.

Specifically, a clamp control command is output to the robot, and the clamp control command includes at least one clamp identifier, so that the robot calls at least one clamp corresponding to the clamp identifier included in the clamp group to perform a grabbing operation. Wherein, anchor clamps group includes: a plurality of suction cup clamps.

In specific implementation, a conversion relationship between the camera coordinate system and the robot coordinate system needs to be acquired, the position information of the graspable area corresponding to the camera coordinate system is converted into the robot coordinate system according to the conversion relationship, and the converted position information of each graspable object and the gripper identification are output to the robot. In which it is considered that the camera is often not located at the same position as the robot. Therefore, the graspable object can be positioned by means of coordinate system conversion. Since the three-dimensional pose information of the graspable object described in each step is determined according to the camera coordinate system, the graspable object needs to be converted into the robot coordinate system in order to facilitate the robot to perform the grasping operation. The conversion process between the camera coordinate system and the robot coordinate system can be determined according to the relative relationship between the position of the camera and the position of the robot.

In summary, through the mode in this embodiment, can realize the operation of snatching of arbitrary article through the degree of deep learning model to, according to the geometric features that can snatch the region, can set up the quantity and the anchor clamps sign of anchor clamps in a flexible way, thereby call the anchor clamps that match with regional characteristics and snatch the operation, thereby promote and snatch the effect.

In addition, various modifications and alterations can be made by those skilled in the art with respect to the above-described embodiments:

when the number of the graspable areas is plural, the grasping order of the graspable objects corresponding to each graspable area may be further set, and the grasping order may be specifically set in the following manner:

and calculating the three-dimensional pose information of each graspable object according to the point cloud information corresponding to the three-dimensional object area, sequencing each graspable object along a preset depth direction according to the three-dimensional pose information, and determining the grasping sequence of each graspable object according to the sequencing result. The point cloud is a data set of points in a preset coordinate system. The points contain rich information including three-dimensional coordinates X, Y, Z, color, classification values, intensity values, time, etc. The point cloud can atomize the real world, and the real world can be restored through high-precision point cloud data. Therefore, the point cloud information can reflect the three-dimensional characteristics of the three-dimensional object area. In the present embodiment, point cloud information can be constructed from the two-dimensional color map and the depth map. Or, the point cloud information can be generated by additionally combining elements such as a laser detector, a visible light detector such as an LED, an infrared detector, a radar detector and the like, so that the point cloud information is more accurate. And calculating the three-dimensional pose information of each object capable of being grabbed through the point cloud information. And the three-dimensional pose information is used for describing the three-dimensional posture of the graspable object in the three-dimensional world. The three-dimensional pose information of the object which can be grabbed is also called object position information and object position information, and can be determined in various modes. The three-dimensional pose information can be described by a grabbing point or a grabbing area contained in the object to be grabbed. For example, the three-dimensional pose information is represented by a grab point. Correspondingly, when the grabbing point corresponding to the object which can be grabbed is determined, the method can be realized in multiple modes: the point with the maximum probability of being grabbed in the area corresponding to the object to be grabbed can be used as the grabbing point; it is also possible to calculate the 2D center of gravity of the graspable object, thereby determining the grasping point from the 2D center of gravity. The grasp points are used to describe the approximate orientation of the graspable object in three-dimensional space. In addition, because the grabbing point is a point in a three-dimensional coordinate system, the depth value information corresponding to the grabbing point can be determined according to the three-dimensional pose information, namely: the distance between the grabbed object and the camera can be controlled. In one specific example, three-dimensional pose information of each graspable object is calculated by: firstly, establishing a three-dimensional coordinate system corresponding to a three-dimensional article area; the directions of a first coordinate axis and a second coordinate axis contained in the three-dimensional coordinate system are matched with the two-dimensional color image, and the direction of a third coordinate axis (also called a depth coordinate axis) in the three-dimensional coordinate system is matched with the preset depth direction. Wherein the preset depth direction may be set by any one of the three ways mentioned above. Then, a depth coordinate value of each grippable object corresponding to the third coordinate axis is calculated, and three-dimensional pose information of each grippable object is calculated according to the depth coordinate value. Therefore, when the direction of the depth coordinate axis is set according to the photographing direction of the camera, the depth coordinate value is used for reflecting the distance between the object to be grabbed and the camera head of the camera. Specifically, the depth coordinate value of each grippable object can be determined according to the three-dimensional pose information, and the numerical value of the depth coordinate value can reflect the sorting condition of each grippable object along the preset depth direction. In specific implementation, sequencing the grabbed objects according to the distances between the grabbed objects and the camera, and determining the grabbing sequence of the grabbed objects according to the sequencing result; the closer the distance from the camera to the grabbed objects, the closer the grabbing sequence of the grabbed objects is; the farther from the camera, the later the grasping order of the graspable objects. In general, since the camera takes a picture from top to bottom, the graspable object located close to the camera is located at the top layer, and the graspable object located far from the camera is located at the bottom layer. Therefore, the objects which can be grabbed can be sequentially arranged according to the sequence from the top layer to the bottom layer through the sequencing result, and then the objects can be grabbed sequentially according to the sequence from the top layer to the bottom layer when the objects are grabbed, so that the problem that the objects on the upper layer fly out when the objects on the lower layer are grabbed first is solved. Or, when the preset depth direction is the direction of the vertical line of the article bearing surface, the farther away from the article bearing surface, the closer the graspable object is, the farther forward the graspable sequence of the graspable object is, and the closer to the article bearing surface, the closer the graspable object is, the closer the graspable sequence of the graspable object is. Wherein, the distance between the object that can snatch and the article loading face is: the object can be grasped and spaced from the object carrying surface along the vertical line of the object carrying surface. Namely: the distance between the graspable object and the object carrying surface is a vertical distance between the graspable object and the object carrying surface.

Wherein, the article that can snatch in this embodiment includes: sheet-like object packaging such as cartons, envelopes, file bags, postcards, flexible plastic bags (including but not limited to snack food packaging, milk tetra pillow packaging, milk plastic packaging, and the like), cosmeceuticals bottles, cosmeceuticals, and/or irregular toys, and the like.

Fig. 3 is a schematic structural view illustrating a control apparatus of a robot-based gripper set according to still another embodiment of the present invention, as shown in fig. 3, the apparatus including:

the acquisition module 31 is adapted to acquire a two-dimensional color image corresponding to a three-dimensional object region and a depth image corresponding to the two-dimensional color image along a preset depth direction;

the prediction module 32 is adapted to input the two-dimensional color image and the depth image into a deep learning model, and predict a graspable region included in the two-dimensional color image according to an output result;

a determining module 33, adapted to determine geometric characteristics of the graspable region according to the contour line of the graspable region, and determine the number of clamps corresponding to the graspable region and the clamp identifier of each clamp according to the geometric characteristics of the graspable region;

and the control module 34 is suitable for controlling the robot to call each clamp corresponding to the clamp identification in the clamp group to execute the grabbing operation.

Optionally, the determining module is specifically adapted to:

Optionally, the control module is specifically adapted to:

Optionally, the clamp group includes: a plurality of suction cup clamps.

The specific structure and the working principle of each module may refer to the description of the corresponding step in the method embodiment, and are not described herein again.

The embodiment of the application provides a non-volatile computer storage medium, wherein at least one executable instruction is stored in the computer storage medium, and the computer executable instruction can execute the control method of the robot-based clamp group in any method embodiment.

Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and the specific embodiment of the present invention does not limit the specific implementation of the electronic device.

As shown in fig. 4, the electronic device may include: a processor (processor)402, a Communications Interface 404, a memory 406, and a Communications bus 408.

Wherein:

the processor 402, communication interface 404, and memory 406 communicate with each other via a communication bus 408.

A communication interface 404 for communicating with network elements of other devices, such as clients or other servers.

The processor 402 is configured to execute the program 410, and may specifically perform relevant steps in the above embodiments of the domain name resolution method.

In particular, program 410 may include program code comprising computer operating instructions.

The processor 402 may be a central processing unit CPU or an application Specific Integrated circuit asic or one or more Integrated circuits configured to implement embodiments of the present invention. The electronic device comprises one or more processors, which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

And a memory 406 for storing a program 410. Memory 406 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The program 410 may be specifically configured to cause the processor 402 to perform the operations in the above-described method embodiments.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components in an electronic device according to embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims

1. A method of controlling a robot-based gripper set, comprising:

2. The method of claim 1, wherein said determining geometric features of the graspable region from the outline of the graspable region comprises:

3. The method of claim 2, wherein the determining the number of grippers corresponding to the grippable region and the gripper identity of each gripper according to the geometric features of the grippable region comprises:

4. The method of claim 2, wherein the determining the number of grippers corresponding to the grippable region and the gripper identity of each gripper according to the geometric features of the grippable region comprises:

5. The method of claim 2, wherein the determining the number of grippers corresponding to the grippable region and the gripper identity of each gripper according to the geometric features of the grippable region comprises:

6. The method of any one of claims 1-5, wherein the controlling the robot to invoke each gripper included in the set of grippers corresponding to the gripper identification to perform a gripping operation comprises:

7. The method of any of claims 1-6, wherein the set of clamps comprises: a plurality of suction cup clamps.

8. A control device for a robot-based gripper set, comprising:

9. The apparatus of claim 8, wherein the determination module is specifically adapted to:

10. The apparatus of claim 9, wherein the determination module is specifically adapted to:

11. The apparatus of claim 9, wherein the determination module is specifically adapted to:

12. The apparatus of claim 9, wherein the determination module is specifically adapted to:

13. The apparatus according to any one of claims 8-12, wherein the control module is specifically adapted to:

14. The apparatus of any of claims 8-13, wherein the set of clamps comprises: a plurality of suction cup clamps.

15. An electronic device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the control method of the robot-based clamp group according to any one of claims 1-7.

16. A computer storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to the method of controlling a robot-based gripper set according to any one of claims 1-7.