CN114092428A

CN114092428A - Image data processing method, image data processing device, electronic equipment and storage medium

Info

Publication number: CN114092428A
Application number: CN202111338191.9A
Authority: CN
Inventors: 李辉; 司林林; 丁有爽; 邵天兰
Original assignee: Mech Mind Robotics Technologies Co Ltd
Current assignee: Mech Mind Robotics Technologies Co Ltd
Priority date: 2021-11-10
Filing date: 2021-11-10
Publication date: 2022-02-25

Abstract

The application discloses an image data processing method and device, electronic equipment and a storage medium. The image data processing method includes: receiving image data containing an article to be processed; identifying an article to be processed from image data and generating a mask for the article to be processed; performing morphological treatment on the generated mask of the object to be treated; processing the mask after morphological processing by using a template matching algorithm based on a template of a pre-stored object to be processed; and acquiring the associated information of the correction mask and/or the grabbing point of the object to be processed based on the processing result of the template matching algorithm. According to the invention, under the condition that an accurate object mask cannot be obtained, an inaccurate mask can be corrected so as to obtain an accurate mask as much as possible and further obtain accurate information of the grabbing point, and the problems of inaccurate grabbing point caused by inaccurate mask and further inaccurate grabbing or dropping during grabbing are effectively avoided.

Description

Image data processing method, image data processing device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of automatic control of a robot arm or a gripper, program control B25J, and more particularly, to an image data processing method, apparatus, electronic device, and storage medium.

Background

In recent years, industries such as caller ID, express and the like rise rapidly, and good development opportunities are created for logistics industries. With the maturity of technologies such as artificial intelligence, machine vision, etc., in the most basic part of storage, transport, sorting these logistics, automated equipment is more and more, and it is great tendency that industrial robot replaces the manual work. The fixture is used as an important component of an industrial robot, and is increasingly applied to the logistics industry, the sorting industry, the 3C industry and the food loading industry.

At present, automatic control of a mechanical arm or a clamp is required based on point cloud data of an object to be grabbed. The general method is that a 3D camera is used to collect point cloud data, information such as track points or capture points of the robot arm motion is determined according to the point cloud data, and then the robot arm is controlled to execute a capture operation at a proper position and in a proper track based on the information such as the track points or the capture points. The mechanical arm operates in a three-dimensional space, the motion track of the mechanical arm is a three-dimensional track in three directions of length, width and height, so that track points or grabbing points also have three-dimensional information of an X axis, a Y axis and a Z axis, and point cloud data acquired by the 3D camera includes the three-dimensional information. However, in some industrial scenarios, it is difficult to obtain clear point cloud information of the grabbed objects, for example in the cosmetics industry, using glass, especially black glass bottles, to hold liquid. The glass bottle has weak light signal, the reflective material is easily interfered by multiple reflections of surrounding objects, and the glass is transparent, so that a lot of data of diffuse reflection and multiple reflections can be generated, and proper point cloud data can not be acquired easily. Specifically, in such an industrial scenario, the point cloud of the object to be grabbed may not be obtained, the point cloud of the object to be grabbed that may also be collected may be missing, or other point clouds are not good, which results in that the robot may not recognize the object to be grabbed, or calculate an erroneous grabbing point based on the erroneous point cloud, and use the erroneous grabbing point to perform grabbing, resulting in that the grabbing may not be achieved, or even the bottle falls. At present, no scheme for automatically controlling a robot to execute grabbing based on robot vision under the industrial scene of point cloud loss exists in the prior art.

Disclosure of Invention

In view of the above, the present invention has been made to overcome the above problems or at least partially solve the above problems. Specifically, firstly, the invention can acquire the grabbing point information of the object to be grabbed by means of other image data such as color pictures under the condition that the point cloud of the object to be grabbed cannot be acquired, so that the robot or the clamp can grab the object to be grabbed directly by depending on the grabbing point information without the aid of the point cloud of the object, and the object grabbing problem under the environment of point cloud loss is effectively solved; secondly, the invention provides a method for correcting the mask and solving the two-dimensional grabbing point information, so that under the condition that the accurate object mask cannot be obtained, the inaccurate mask can be corrected to obtain the accurate mask as much as possible and further obtain the accurate grabbing point information, and the problem that the grabbing point is inaccurate and the grabbing is inaccurate or the grabbing point falls off due to inaccurate mask is effectively avoided; thirdly, the invention provides a method for automatically converting input two-dimensional grabbing point information into three-dimensional grabbing point information by a robot, which can automatically acquire reference information capable of converting two-dimensional grabbing points into three-dimensional grabbing points according to the environmental characteristics of an article to be grabbed, and acquire grabbing point information based on the reference information for the robot to grab.

All the solutions disclosed in the claims and in the description of the present application have one or more of the above-mentioned innovations and, accordingly, are capable of solving one or more of the above-mentioned technical problems. Specifically, the application provides an image data processing method, an image data processing device, an electronic device and a storage medium.

An image data processing method according to an embodiment of the present application includes:

receiving image data containing an article to be processed;

identifying an article to be processed from image data and generating a mask for the article to be processed;

performing morphological treatment on the generated mask of the object to be treated;

processing the mask after morphological processing by using a template matching algorithm based on a template of a pre-stored object to be processed;

and acquiring the associated information of the correction mask and/or the grabbing point of the object to be processed based on the processing result of the template matching algorithm.

In certain embodiments, the morphological treatment comprises a morphological dilation treatment.

In some embodiments, the matching algorithm comprises a shape-based matching algorithm.

In some embodiments, the grasping point includes a center point of the graspable region of the article.

In some embodiments, the identifying an item to be processed from image data and generating a mask for the item to be processed comprises: the image data is processed based on deep learning to identify an item to be processed and to generate a mask for the item to be processed.

In some embodiments, the grasp point association information includes two-dimensional information of the grasp point.

An image data processing apparatus according to an embodiment of the present application includes:

the image data receiving module is used for receiving image data containing an article to be processed;

the mask generation module is used for identifying an article to be processed from the image data and generating a mask of the article to be processed;

the mask processing module is used for performing morphological processing on the generated mask of the object to be processed;

and the processing module is used for processing the mask after morphological processing by using a template matching algorithm based on a template of the pre-stored object to be processed, and acquiring the correction mask and/or the grabbing point correlation information of the object to be processed based on the processing result of the template matching algorithm.

The electronic device of the embodiments of the present application includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor implements the image data processing method of any of the above embodiments when executing the computer program.

The computer-readable storage medium of the embodiments of the present application has stored thereon a computer program that, when executed by a processor, implements the image data processing method of any of the embodiments described above.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

The above and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a schematic flowchart of a capture point information obtaining method in a point cloud poor scene according to some embodiments of the present disclosure;

FIG. 2 is a schematic flow chart diagram illustrating a method for obtaining information associated with a correction mask and a capture point according to some embodiments of the present disclosure;

FIG. 3 is a schematic flow chart diagram illustrating a method for converting two-dimensional grab point information to three-dimensional grab point information according to some embodiments of the present application;

FIG. 4 is a schematic illustration of an article to be grabbed and an article mask acquired using a non-dedicated deep learning network in accordance with certain embodiments of the present application;

fig. 5 is a schematic structural diagram of a capture point information obtaining apparatus in a point cloud poor scene according to some embodiments of the present application;

FIG. 6 is a schematic structural diagram of an apparatus for obtaining information associated with a correction mask and a capture point according to some embodiments of the present disclosure;

FIG. 7 is a schematic diagram of an apparatus for converting two-dimensional grab point information to three-dimensional grab point information according to some embodiments of the present application;

FIG. 8 is a schematic diagram of an electronic device according to some embodiments of the present application.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Fig. 1 is a schematic flow chart of an article grabbing point information acquisition method according to an embodiment of the present invention, and as shown in fig. 1, the method includes:

step S100, acquiring a non-point cloud image containing an article to be grabbed;

step S110, processing the non-point cloud image to obtain a mask of the object to be grabbed;

step S120, processing a mask of an article to be grabbed to acquire grabbing point associated information, wherein the grabbing point associated information comprises information for calculating grabbing point information;

and step S130, acquiring grabbing point information for controlling the robot to grab the object to be grabbed based on the grabbing point associated information.

For step S100, in an industrial scenario, it is usually necessary to use a robot to grasp a large number of articles, and thus there may be a plurality of articles to be grasped, rather than more than one. As a preferred embodiment, the object to be grasped may be a group of black cosmetic bottles placed in a material frame, and the robot needs to grasp the cosmetic bottles from the material frame and carry them to other positions. Because the black glass has weak light signals and is easily interfered by multiple reflections of surrounding objects, and a lot of data of diffuse reflection and multiple reflection exist, when an industrial camera is used for shooting, point cloud can not be obtained possibly, and 3D point cloud data required by robot control in general is difficult to acquire. The method of the invention is therefore particularly suitable for use in such a scenario, in order to identify the object to be grabbed by means of image data other than point cloud data, to calculate grabbing point information, and to control the robot to perform grabbing. The non-point cloud image data in this embodiment may preferably be a 2D color picture of the item. Different from the point cloud, the 2D color image can clearly identify the objects contained in the point cloud, and even if the objects are similar to black transparent glass bottles which cannot generate the point cloud, the objects can be clearly shot and identified. Can shoot through industry camera, will wait to snatch the article group and place the below of vision sensor and shoot, acquire the image data of waiting to snatch the article.

For step S110, the mask of the article may be calculated using any existing means. Preferably, the mask of the article may be obtained based on a deep learning network. Image recognition is a conventional application of deep learning networks, and there are general deep learning networks capable of performing image recognition in the prior art. As a specific embodiment, the object may be identified and the mask may be extracted using an existing deep learning network for identifying the object or extracting the mask of the object, and the network may be trained in advance in order to improve the accuracy of the identification. Inputting the acquired 2D color image into the deep learning network, processing the color image through the deep learning network, identifying the interested area of the image to be generated with a mask, generating the image mask in the area, and using the image mask to cover the original image area. Therefore, the mask of the object to be grabbed can be obtained on the basis of the color image.

For step S120, after acquiring the mask of the article, the grasp point association information of the article may be calculated in the mask region. The so-called grasp point related information lacks partially necessary information required for grasp point information, is not complete grasp point information, and thus cannot be directly used by the robot to perform grasp, but the grasp point related information can be used to calculate grasp point information. As a specific embodiment, the information related to the grabbing point may be two-dimensional information of the grabbing point, such as coordinate information of an X axis and a Y axis. Assuming that a control system used at present needs to determine information of three dimensions of an X axis, a Y axis, and a Z axis of a grabbing point before controlling a grabbing position of a robot (according to a difference of a used fixture, more information may be needed in an actual industrial scene, such as a rotation angle of the fixture, etc.), after acquiring two-dimensional grabbing point information, grabbing cannot be performed according to the information, and further, the two-dimensional grabbing point information needs to be converted into three-dimensional grabbing point information in a subsequent step. The specific position of the gripping point is related to the used clamp and the object to be gripped, for example, when the object to be gripped is a black cosmetic bottle, the center point of the mouth of the cosmetic bottle can be selected as the gripping point.

An off-the-shelf deep learning network that can be used in a variety of situations to identify items, when processing a shot, identifying an item in the shot, and extracting an item mask, the resulting mask will typically not fit exactly over the shot item, sometimes may be slightly over the grippable region of the item, sometimes may be slightly less than the grippable region of the item, and the shape of the mask will typically not coincide with the grippable region of the item. Fig. 4 shows an object to be grasped and a typical mask of the object to be grasped generated through a deep learning network, and particularly, fig. 4 shows a scene in which the object to be grasped is the cosmetic bottle described above, and a circular portion thereof is a bottle mouth region of a glass bottle to be grasped, i.e., a grasping region of a robot arm. The shadow part is a mask extracted through a general deep learning network, and it can be seen that under the condition, the mask of the object is inaccurate, the calculated grabbing point is natural and inaccurate, and the inaccurate grabbing point can cause the problems that the bottle cannot be grabbed or the bottle falls off when being grabbed. One solution is to design a deep learning network dedicated to this scenario, and improve the accuracy of the processing result after inputting an image into the network by repeated training, so that accurate masks and capture points can be obtained after inputting an image into the deep learning network. Another solution is to process an inaccurate mask and obtain the grabber point information from the processed mask.

The latter solution is preferably used by the present invention, and in particular, the applicant has developed a method for correcting and acquiring capture points of a mask, which is relatively low in cost and relatively universal, and this is one of the key points of the present invention.

Fig. 2 is a flowchart illustrating an image data processing method for mask correction and acquisition of information related to a capture point according to an embodiment of the present invention. As shown in fig. 2, the method includes:

step S200, receiving image data containing an article to be processed;

step S210, identifying an article to be processed from image data and generating a mask of the article to be processed;

step S220, performing morphological processing on the generated mask of the object to be processed;

step S230, further processing the mask after the morphological processing to obtain the correction mask and/or the information related to the grabbing point of the object to be processed.

In step S200, the present embodiment does not limit the type of image data, and any type of image data may be applied to the present embodiment as long as the object included therein can be identified and the mask can be generated by the existing deep learning algorithm. Preferably, the present embodiment may acquire image data, which may be a 2D color map, in a similar manner to step S100.

For step S210, a mask for the article to be processed may be identified and generated in a manner similar to step S120.

For step S220, as shown in fig. 4, in the object mask obtained by the general deep learning network, the mask region usually cannot perfectly fit the object outline (the capture region in fig. 4 is the region where the bottleneck is located, it is not difficult to see that the difference between the mask and the actual bottleneck is large), a skew state is present, and there may be many holes inside the mask region, which is not so much affected for general object identification application, but the precision requirement is high when the invention is applied to an industrial scene of capturing an object, and such an error is not tolerable. The acquired mask needs to be processed. First, morphological processing is performed to change the pattern morphology of the mask region. In one embodiment, the morphological treatment may be a dilation treatment. After the two-dimensional image information is acquired, expansion processing is performed on the image to fill up the defects of image missing, irregularity and the like. For example, for each pixel point on the mask, a certain number of points around the point, e.g., 8-25 points, may be set to be the same color as the point. This step is equivalent to filling the periphery of each pixel, so if there is a missing part of the object mask, the operation will fill the missing part completely, and after such processing, the object mask will become complete without missing, and the mask as a whole will become slightly "fat" due to the dilation, and the appropriate dilation helps the subsequent further image processing operation.

As an example, in step S230, the present invention discloses three processing manners to obtain the correction mask of the article and obtain the information related to the grabbing point.

The first processing mode comprises the following steps:

step S240, acquiring a circumscribed rectangle of the mask subjected to expansion processing;

step S241, generating an inscribed circle of the circumscribed rectangle based on the circumscribed rectangle of the mask;

and step S242, acquiring the correction mask and/or grabbing point correlation information of the object to be processed based on the inscribed circle.

For step S240, an arbitrary circumscribed rectangle algorithm may be used to find the circumscribed rectangle for the mask, and as a specific implementation manner, the circumscribed rectangle may be obtainedCalculating the X coordinate value and the Y coordinate value of each pixel point in the mask, and respectively selecting the minimum X value, the minimum Y value, the maximum X value and the maximum Y value; next, the 4 values are combined into coordinates of a point, that is, the minimum X value and the minimum Y value constitute coordinates (X)_min,Y_min) The maximum X and Y values constitute the coordinate (X)_max,Y_max) The minimum X value and the maximum Y value constitute the coordinate (X)_min,Y_max) And the maximum X value and the minimum Y value constitute the coordinate (X)_max,Y_min). With point (X)_min,Y_min)，(X_max,Y_max)，(X_min,Y_max)，(X_max,Y_min) And respectively taking the four points as 4 corner points of the circumscribed rectangle and connecting the points in parallel to obtain the circumscribed rectangle.

In step S241, the key point of the present invention is that the inscribed circle algorithm is used as a loop for calculating the correction mask and the capture point information, and no improvement is made to the inscribed circle algorithm, so that the specific inscribed circle algorithm is not limited, and any inscribed circle algorithm can be used in the present invention.

For step S242, after the inscribed circle is acquired, the mask portion enclosed by the inscribed circle may be calculated and used as the correction mask for the object to be processed. The shape and the size of the obtained outline of the correction mask are the same as those of the inscribed circle, the position of the circle center of the inscribed circle is calculated on the inscribed circle correction mask, and the information of the circle center is obtained and is used as the related information of the grabbing point. Specifically, two-dimensional position information of the center of a circle, for example, X-axis and Y-axis information of the center of a circle, may be used as the X-axis position information and the Y-axis position information of the grasping point.

With the corrective mask obtained in this way, the mask shape has been kept consistent with the bottleneck shape, but the coverage area may still differ from the actual area of the bottleneck.

The second processing mode comprises the following steps:

step S250, processing the mask after expansion processing by using a circle detection algorithm;

and step S251, acquiring the correction mask and/or grabbing point correlation information of the to-be-processed object based on the processing result of the circle detection algorithm.

For step S250, the circle detection algorithm is also called a circle finding algorithm, and may be used to detect a circle feature in the irregular graph and find a circle included in the graph. Commonly used algorithms include a round hough transform algorithm, a random circle detection algorithm, and the like. The present embodiment focuses on finding a circle from a mask after morphological processing by using a circle detection algorithm, and does not limit which kind of circle detection algorithm is specifically used. Since the bottle mouth is circular in shape and the collected mask contains partial features of the shape of the bottle mouth, the circle found in the mask area is roughly the position of the bottle mouth.

In step S251, after the circle in the mask is found by the circle detection algorithm, the mask portion surrounded by the circle can be used as a correction mask for the object to be processed. And then calculating the center of the circle, and taking the information of the center of the circle as the associated information of the grabbing point. In a similar manner, the grasping point-related information may be two-dimensional position information of the center of a circle, which is used as the X-axis position information and the Y-axis position information of the grasping point.

The second method does not need to calculate the circumscribed rectangle and inscribed circle which do not exist originally, and only needs to search the circular part from the existing mask area, so that the calculation precision is higher than that of the first method.

The third processing mode comprises the following steps:

step S260, processing the mask subjected to expansion processing by using a template matching algorithm based on a template of a pre-stored object to be processed;

step S261, obtaining correction mask and/or capture point related information of the to-be-processed item based on the processing result of the template matching algorithm.

For step S260, the template of the object to be processed may be a template of the whole object to be processed, or may be a template of a grabbing area of the object to be processed, for example, in a scenario that the object to be grabbed is a black glass cosmetic bottle and the clamp needs to grab a bottle mouth of the cosmetic bottle, the template of the object to be processed may be a template for establishing a three-dimensional whole black glass cosmetic bottle, or a template for establishing only a grippable area of the bottle mouth. After the uncorrected mask region is obtained, the templates are matched within the mask region using a matching algorithm based on the pre-stored templates. In brief, the template corresponds to a known small image, and the template matching algorithm corresponds to searching for an object in a large image including the small image, the object to be found in the image is known, and the object and the template have the same size, direction and image elements, and the object, i.e. the small image, can be found in the image and the pose thereof can be determined through the template matching algorithm. The embodiment does not limit the specific matching algorithm, and since the mask itself may lose color information, the emphasis is on matching in shape rather than in color, and therefore the invention preferably uses the shape-based matching algorithm for matching. In addition, the efficiency and the accuracy of matching are comprehensively considered, and when the template matching is carried out, the matching can be considered successful when the shape similarity reaches 70-95%. Specifically, which value is selected can be selected and adjusted according to the requirements of the actual application scene.

In step S261, after finding the shape matching the pre-stored template from the mask, the mask surrounded by the shape may be used as a correction mask, and the information related to the capture point may be further calculated. Due to the adoption of the template, no matter what shape the object to be grabbed presents in the area to be grabbed, the object can be matched with and grabbed, and the method is not limited to the scene that the grabbing area of the object to be grabbed is circular. Accordingly, the position of the gripping point is different when gripping different articles. In one embodiment, the grabbing point may be a central point of a correction mask of the object to be grabbed, and the grabbing point association information may be two-dimensional position information of the central point, that is, X-axis position information and Y-axis position information of the grabbing point.

In engineering practice, the inventor finds that the third method can reach a higher standard in both precision and operation speed, and is the best of the three implementation modes of the invention, and the first two methods are only applicable to industrial scenes in which the area to be grabbed is circular in nature, while the third method can be used for grabbing any article, so the third implementation mode is also one of the key points of the invention.

In step S130, as described above, the grasp point associated information lacks partially necessary information required for grasp point information, and is not complete grasp point information, and thus cannot be directly used by the robot to perform grasp, but the grasp point associated information can be used to calculate grasp point information. In one embodiment, the grabbing point related information may be two-dimensional information of the grabbing points, and the grabbing point information required by the robot is three-dimensional information, and in order for the robot to use, the two-dimensional grabbing point related information should be changed into three-dimensional information, so as to grab the object to be grabbed based on the three-dimensional information. In the case where three-dimensional data of a grasping point cannot be directly obtained, in order to convert two-dimensional data into three-dimensional data, one method is to manually input data of a third dimension of an article to be grasped. For example, when the object to be grasped is a black glass cosmetic bottle for which three-dimensional grasping point information cannot be obtained, specific height information may be input in advance in a manual entry manner. Therefore, after the two-dimensional image is processed by the scheme and the two-dimensional information of the grabbing point is obtained, the two-dimensional grabbing point information can be further converted into three-dimensional grabbing point information for the robot to use based on the height information. This method requires manual input of height information before grasping, however many robot-based application scenarios are inherently the ones that wish to reduce human involvement, and in the manual input of height information, the material frames holding the bottles must always be in the same position, and if the material frames are placed in different positions, the height information changes, requiring re-input of height information, otherwise correct grasping is not possible. It is therefore desirable to be able to process in a more automated manner.

In order to realize such an automated processing mode, the applicant proposes a method for converting two-dimensional grasping point information into three-dimensional grasping point information without human intervention, and this is one of the key points of the present invention.

Fig. 3 shows a flow chart of a method for converting two-dimensional grab point information into three-dimensional grab point information according to an embodiment of the invention. As shown in fig. 3, the method includes:

step S300, acquiring reference object information of an object to be grabbed;

step S310, acquiring two-dimensional grabbing point information of an article to be grabbed;

step S320, processing reference object information of the object to be grabbed to acquire reference information, wherein the reference information comprises information which does not exist in the two-dimensional grabbing point information and can convert the two-dimensional information into three-dimensional information;

and step S330, generating three-dimensional grabbing point information of the object to be grabbed based on the reference information and the two-dimensional grabbing point information of the object to be grabbed.

As for step S300, the present invention is applied to a scenario where the point cloud of the object to be grabbed is not good, and therefore the reference object should be an object with a qualified point cloud, where the qualified point cloud in the present invention refers to information of a dimension missing from two-dimensional grabbing point information of the object that can be obtained through the point cloud, for example, an object with identifiable Z-axis information may be used as the reference object when X-axis information and Y-axis information of the grabbing point can be obtained. The reference object can be an object which is close to the object to be grabbed, and also can be other similar objects to be grabbed which are placed together with the object to be grabbed. In particular, for industrial scenes in which a large number of objects to be gripped, for example cosmetic bottles, are placed in a material frame, it is suitable as a reference object since the point cloud of the frame is usually complete and its height at each position is identical, i.e. at each position the height of the material frame is the same as the height of the object to be gripped or has a fixed height difference from the object to be gripped, as long as the entire frame is not strongly deformed. Therefore, when the point cloud of the object to be grabbed cannot be acquired in the scene, the 2D color image data of the object and the point cloud data of the identification frame can be acquired for subsequent steps. In other embodiments, when the camera is used for shooting, the plurality of objects to be captured are in different positions, so that the point cloud quality of each object to be captured is different in the overall point cloud data obtained by shooting in a certain position. Specifically, suitable point cloud data of some of the articles to be grasped may not be obtained, but suitable point cloud data of some other articles to be grasped may be obtained. In this case, the object to be grabbed with the better point cloud can be selected as a reference object of other objects to be grabbed.

Can acquire point cloud information through 3D industry camera, 3D industry camera is generally equipped with two camera lenses, catches article group of waiting to snatch respectively from different angles, can realize the show of the three-dimensional image of object after handling. And placing the object group to be grabbed below the vision sensor, simultaneously shooting by the two lenses, and calculating X, Y, Z coordinate values of all points of the object to be filled and the coordinate directions of all points by using a general binocular stereo vision algorithm according to the relative posture parameters of the two obtained images so as to convert the object group to be grabbed into point cloud data of the object group to be grabbed. In specific implementation, the point cloud may also be generated by using elements such as a visible light detector such as a laser detector and an LED, an infrared detector, and a radar detector.

As an example, a two-dimensional color map corresponding to a three-dimensional object region and a depth map corresponding to the two-dimensional color map may also be acquired in a depth direction perpendicular to the object. The two-dimensional color image corresponds to an image of a plane area vertical to a preset depth direction; each pixel point in the depth map corresponding to the two-dimensional color image corresponds to each pixel point in the two-dimensional color image one by one, and the value of each pixel point is the depth value of the pixel point. In one embodiment, the acquired reference object information may be a point cloud of the reference object or a depth map of the reference object.

For step S310, the present embodiment needs to acquire two-dimensional grasp point information, but the emphasis is not on the acquisition method, and thus the specific method of acquiring the information is not limited. Preferably, the two-dimensional grab point information may be acquired by using the method for acquiring the grab point related information in any of the foregoing embodiments.

For step S320, taking an industrial scene in which a plurality of black glass cosmetic bottles to be captured are arranged and placed in the material frame as an example, after the point clouds of the whole object group are obtained, clear point clouds of the reference object can be further identified from the obtained whole point clouds, for example, the point clouds of the material frame or the point clouds of the bottleneck in which some point clouds are clear can be identified from the whole point clouds. And then processing the identified point cloud to extract the height information in the point cloud as reference information. Although the example in which the information lacking in the two-dimensional grasp point information is height information is taken as an example in the present embodiment, it can be understood by those skilled in the art that the reference information may not be height information when the lacking information is not height information.

For step S330, if the point cloud of the bottle mouth is used, since the bottle types in the material frame are the same, the height of the point cloud is the same as the bottle mouth heights of all bottles, after the height information of the bottle mouth is obtained, the point cloud is combined with the two-dimensional grabbing point information to obtain three-dimensional grabbing point information, and then the fixture can perform grabbing based on the three-dimensional grabbing point information. If a point cloud of material frames is used, the height obtained from the point cloud may or may not be the same as the bottle. If the heights are different, an adjustment value can be preset according to the height difference between the two heights. After the height information of the material frame and the two-dimensional grabbing point information are obtained, the three-dimensional grabbing point information can be determined by combining the adjustment value. For example, if the height of the frame is 10cm and the adjustment value is-2 cm, the height of the bottle mouth can be calculated to be 10-2-8 cm, and then the three-dimensional grabbing point information can be obtained after the height is combined with the X-axis and Y-axis information of the grabbing points.

The robot or the fixture in the above embodiments may include various types of universal fixtures, which are standardized in structure and have a wide application range, such as three-jaw chucks and four-jaw chucks for lathes, flat tongs and index heads for milling machines, and the like. For another example, the clamp may be divided into a manual clamp, a pneumatic clamp, a hydraulic clamp, a pneumatic-hydraulic linkage clamp, an electromagnetic clamp, a vacuum clamp, or other bionic devices capable of picking up objects according to the clamping power source used by the clamp. The present invention does not limit the specific type of the clamp as long as the article grasping operation can be achieved.

In addition, it should be noted that although each embodiment of the present invention has a specific combination of features, further combinations and cross-combinations of these features between embodiments are also possible.

According to the embodiment, firstly, the grabbing point information of the object to be grabbed can be obtained by means of other image data such as color pictures under the condition that the point cloud of the object to be grabbed cannot be obtained, so that the robot or the clamp can grab the object to be grabbed directly by means of the grabbing point information without the aid of the point cloud of the object, and the object grabbing problem under the environment of point cloud loss is effectively solved; secondly, the invention provides three methods for correcting the mask and solving the two-dimensional grabbing point information, so that under the condition that the accurate object mask cannot be obtained, the inaccurate mask can be corrected to obtain the accurate mask and the grabbing point information as much as possible, and the problems that the grabbing point is inaccurate and the grabbing point is inaccurate or the grabbing point falls off due to inaccurate mask are effectively avoided; thirdly, the invention provides a method for automatically converting input two-dimensional grabbing point information into three-dimensional grabbing point information by a robot, which can automatically acquire reference information capable of converting two-dimensional grabbing points into three-dimensional grabbing points according to the environmental characteristics of an article to be grabbed, and acquire grabbing point information based on the reference information for the robot to grab.

Fig. 5 shows a grasp point information acquisition apparatus according to still another embodiment of the present invention, the apparatus including:

an image acquisition module 400, configured to acquire a non-point cloud image including an object to be grabbed, that is, to implement step S100;

a mask generation module 410, configured to process the non-point cloud image to obtain a mask of the object to be grabbed, that is, to implement step S110;

the mask processing module 420 is configured to process a mask of an object to be grabbed to obtain grabbing point association information, that is, to implement step S120;

and a grabbing point information generating module 430, configured to obtain grabbing point information used for controlling the robot to grab an article to be grabbed based on the grabbing point related information, that is, to implement step S130.

Fig. 6 shows an image data processing apparatus according to still another embodiment of the present invention, the apparatus including:

an image data receiving module 500, configured to receive image data including an article to be processed, that is, to implement step S200;

a mask generation module 510, configured to identify an article to be processed from the image data, and generate a mask of the article to be processed, that is, to implement step S210;

a mask processing module 520, configured to perform morphological processing on the generated mask of the to-be-processed object, that is, to implement step S220;

the processing module 530 is configured to further process the morphologically processed mask to obtain information related to the correction mask and/or the grabbing point of the to-be-processed object, that is, to implement step S230.

Fig. 7 shows a grasp point information acquisition apparatus according to still another embodiment of the present invention, the apparatus including:

a reference object information obtaining module 600, configured to obtain reference object information of an object to be grabbed, that is, to implement step S300;

a two-dimensional information obtaining module 610, configured to obtain two-dimensional grabbing point information of an article to be grabbed, that is, to implement step S310;

a reference information obtaining module 620, configured to process reference object information of the object to be grabbed, and obtain reference information, where the reference information includes information that the two-dimensional grabbing point information does not have, and can convert two-dimensional information into three-dimensional information, that is, is used to implement step S320;

a grabbing point information generating module 630, configured to generate three-dimensional grabbing point information of the to-be-grabbed item based on the reference information and the two-dimensional grabbing point information of the to-be-grabbed item, that is, to implement step S330.

In the device embodiments shown in fig. 5 to fig. 7, only the main functions of the modules are described, all the functions of each module correspond to the corresponding steps in the method embodiment, and the working principle of each module may also refer to the description of the corresponding steps in the method embodiment, which is not described herein again. In addition, although the correspondence between the functions of the functional modules and the method is defined in the above embodiments, it can be understood by those skilled in the art that the functions of the functional modules are not limited to the correspondence, that is, a specific functional module can also implement other method steps or a part of the method steps. For example, the above embodiment describes the method of implementing step S330 by the capture point information generating module 630, however, the capture point information generating module 630 may also be used to implement the method or part of the method of step S300, S310 or S320 according to the needs of practical situations.

The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any of the above embodiments. It should be noted that the computer program stored in the computer-readable storage medium of the embodiments of the present application may be executed by a processor of an electronic device, and the computer-readable storage medium may be a storage medium built in the electronic device or a storage medium that can be plugged into the electronic device in an attachable and detachable manner.

Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, where the electronic device may be a control system/electronic system configured in an automobile, a mobile terminal (e.g., a smart mobile phone, etc.), a personal computer (PC, e.g., a desktop computer or a notebook computer, etc.), a tablet computer, a server, and the like, and a specific implementation of the electronic device is not limited by the specific embodiment of the present invention.

As shown in fig. 8, the electronic device may include: a processor (processor)1202, a communication Interface 1204, a memory 1206, and a communication bus 1208.

Wherein:

the processor 1202, communication interface 1204, and memory 1206 communicate with one another via a communication bus 1208.

A communication interface 1204 for communicating with network elements of other devices, such as clients or other servers.

The processor 1202 is configured to execute the program 1210, and may specifically perform the relevant steps in the foregoing method embodiments.

In particular, program 1210 may include program code comprising computer operating instructions.

The processor 1202 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement an embodiment of the present invention. The electronic device comprises one or more processors, which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

The memory 1206 is used for storing programs 1210. The memory 1206 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The program 1210 may be downloaded and installed from a network through the communication interface 1204, and/or installed from a removable medium. The program, when executed by the processor 1202, may cause the processor 1202 to perform the operations of the above-described method embodiments. Broadly, the inventive content of the invention comprises:

a method for acquiring grab point information comprises the following steps:

acquiring a non-point cloud image containing an article to be grabbed;

processing the non-point cloud image to obtain a mask of the object to be grabbed;

processing a mask of an article to be grabbed to acquire grabbing point associated information;

and acquiring grabbing point information for controlling the robot to grab the object to be grabbed based on the grabbing point correlation information.

Optionally, the non-point cloud image comprises a color image.

Optionally, the processing the non-point cloud image includes processing the non-point cloud image based on deep learning.

Optionally, the processing of the non-point cloud image based on deep learning includes pre-constructing a deep learning network, and inputting the non-point cloud image into the deep learning network for processing.

Optionally, the processing of the mask of the object to be grabbed includes performing morphological processing on the mask of the object to be grabbed.

Optionally, the information related to the grabbing point includes two-dimensional information of the grabbing point.

Optionally, the gripping point comprises a centre point of the grippable region of the article.

Optionally, the capture point information includes three-dimensional information of the capture point.

Optionally, the method includes acquiring, based on the grabbing point associated information, grabbing point information for controlling the robot to grab the object to be grabbed, including preset missing dimension information of the grabbing point associated information, and acquiring, based on the grabbing point associated information and the preset missing dimension information of the grabbing point associated information, grabbing point information for controlling the robot to grab the object to be grabbed.

A grasp point information acquisition apparatus comprising:

the image acquisition module is used for acquiring a non-point cloud image containing an article to be grabbed;

the mask generation module is used for processing the non-point cloud image to obtain a mask of the object to be grabbed;

the mask processing module is used for processing a mask of an article to be grabbed so as to acquire the associated information of the grabbing point;

and the grabbing point information generating module is used for acquiring grabbing point information used for controlling the robot to grab the object to be grabbed based on the grabbing point associated information.

Optionally, the non-point cloud image comprises a color image.

Optionally, the mask generation module processes the non-point cloud image based on deep learning.

Optionally, a deep learning network is pre-constructed, and the mask generation module inputs the non-point cloud image into the deep learning network for processing.

Optionally, the mask processing module performs morphological processing on a mask of the object to be grabbed.

Optionally, the missing dimension information of the grabbing point associated information is preset, and the grabbing point information generating module acquires grabbing point information for controlling the robot to grab the object to be grabbed based on the grabbing point associated information and the missing dimension information of the preset grabbing point associated information.

A method for acquiring grab point information comprises the following steps:

acquiring reference object information of an object to be grabbed;

acquiring two-dimensional grabbing point information of an article to be grabbed;

processing reference object information of the object to be grabbed to obtain reference information, wherein the reference information comprises information which is not contained in the two-dimensional grabbing point information and can convert the two-dimensional information into three-dimensional information;

and generating three-dimensional grabbing point information of the object to be grabbed based on the reference information and the two-dimensional grabbing point information of the object to be grabbed.

Optionally, the reference object information includes a point cloud and/or a depth map of the reference object.

Optionally, the reference object has a qualified point cloud.

Optionally, the reference object comprises other objects to be grabbed and/or a material frame.

Optionally, the two-dimensional grab point information includes X-axis information and Y-axis information of the grab point.

Optionally, the reference information includes Z-axis information.

Optionally, the three-dimensional grabbing point information of the object to be grabbed is generated based on the reference information and the two-dimensional grabbing point information of the object to be grabbed, and includes a preset reference information adjustment value, and after the reference information is adjusted by using the reference information adjustment value, the three-dimensional grabbing point information of the object to be grabbed is generated based on the adjusted reference information and the two-dimensional grabbing point information of the object to be grabbed.

A grasp point information acquisition apparatus comprising:

the reference object information acquisition module is used for acquiring reference object information of an object to be grabbed;

the two-dimensional information acquisition module is used for acquiring two-dimensional grabbing point information of an article to be grabbed;

the reference information acquisition module is used for processing reference object information of the object to be grabbed and acquiring reference information, wherein the reference information comprises information which is not contained in the two-dimensional grabbing point information and can convert the two-dimensional information into three-dimensional information;

and the grabbing point information generating module is used for generating three-dimensional grabbing point information of the object to be grabbed based on the reference information and the two-dimensional grabbing point information of the object to be grabbed.

Optionally, the reference object has a qualified point cloud.

Optionally, the reference information includes Z-axis information.

Optionally, a reference information adjustment value is preset, and the grabbing point information generating module generates three-dimensional grabbing point information of the object to be grabbed based on the adjusted reference information and the two-dimensional grabbing point information of the object to be grabbed after adjusting the reference information by using the reference information adjustment value.

An image data processing method, comprising:

receiving image data containing an article to be processed;

and further processing the mask after the morphological processing to acquire the correction mask and/or the grabbing point correlation information of the object to be processed.

Optionally, the morphological treatment comprises a morphological dilation treatment.

Optionally, the further processing the mask after the morphological processing to obtain the correction mask and/or the information related to the grabbing point of the object to be processed includes:

acquiring a circumscribed rectangle of the mask subjected to expansion processing;

generating an inscribed circle of the circumscribed rectangle based on the circumscribed rectangle of the mask;

and acquiring the correction mask and/or grabbing point correlation information of the object to be processed based on the inscribed circle.

Optionally, the obtaining of the circumscribed rectangle of the mask after the expansion processing includes: and generating 4 corner points of the circumscribed rectangle based on the mask subjected to the expansion processing, and then generating the circumscribed rectangle based on the corner points.

processing the mask subjected to the expansion processing by using a circle detection algorithm;

and acquiring the associated information of the correction mask and/or the grabbing point of the object to be processed based on the processing result of the circle detection algorithm.

Optionally, the circle detection algorithm includes: a round hough transform algorithm, a random hough transform algorithm, and/or a random circle detection algorithm.

processing the mask subjected to the expansion processing by using a template matching algorithm based on a template of a pre-stored object to be processed;

Optionally, the matching algorithm comprises a shape-based matching algorithm.

Optionally, the identifying the object to be processed from the image data and generating the mask of the object to be processed includes: the image data is processed based on deep learning to identify an item to be processed and to generate a mask for the item to be processed.

An image data processing apparatus comprising:

and the processing module is used for further processing the mask after the morphological processing so as to acquire the correction mask and/or the grabbing point correlation information of the object to be processed.

Optionally, the processing module is specifically configured to:

Optionally, the matching algorithm comprises a shape-based matching algorithm.

An image data processing method, comprising:

receiving image data containing an article to be processed;

Optionally, the matching algorithm comprises a shape-based matching algorithm.

An image data processing apparatus comprising:

Optionally, the matching algorithm comprises a shape-based matching algorithm.

In the description herein, reference to the description of the terms "one embodiment," "some embodiments," "an illustrative embodiment," "an example," "a specific example" or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present application includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.

The logic and/or steps represented in the flowcharts or otherwise described herein, such as an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processing module-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It should be understood that portions of the embodiments of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.

Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations of the above embodiments may be made by those of ordinary skill in the art within the scope of the present application.

Claims

1. An image data processing method characterized by comprising:

receiving image data containing an article to be processed;

2. The image data processing method according to claim 1, wherein the morphological processing includes morphological dilation processing.

3. The image data processing method according to claim 1, characterized in that: the matching algorithm comprises a shape-based matching algorithm.

4. The image data processing method according to any one of claims 1 to 3, characterized in that: the gripping point comprises a center point of the grippable region of the article.

5. The image data processing method according to any one of claims 1 to 3, characterized in that: the identifying an article to be processed from image data and generating a mask for the article to be processed includes: the image data is processed based on deep learning to identify an item to be processed and to generate a mask for the item to be processed.

6. The image data processing method according to any one of claims 1 to 3, characterized in that: the grabbing point associated information comprises two-dimensional information of the grabbing points.

7. An image data processing apparatus characterized by comprising:

8. The image data processing apparatus according to claim 7, wherein the morphological processing includes morphological dilation processing.

9. The image data processing apparatus according to claim 7, characterized in that: the matching algorithm comprises a shape-based matching algorithm.

10. The image data processing apparatus according to any one of claims 7 to 9, characterized in that: the gripping point comprises a center point of the grippable region of the article.

11. The image data processing apparatus according to any one of claims 7 to 9, characterized in that: the identifying an article to be processed from image data and generating a mask for the article to be processed includes: the image data is processed based on deep learning to identify an item to be processed and to generate a mask for the item to be processed.

12. The image data processing apparatus according to any one of claims 7 to 9, characterized in that: the grabbing point associated information comprises two-dimensional information of the grabbing points.

13. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, the processor implementing the image data processing method of any one of claims 1 to 6 when executing the computer program.

14. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the image data processing method of any one of claims 1 to 6.