CN114820775A

CN114820775A - Computer vision guided unstacking method and equipment

Info

Publication number: CN114820775A
Application number: CN202110122401.4A
Authority: CN
Inventors: 丁万; 丁凯; 刘柯
Original assignee: Robert Bosch GmbH
Current assignee: Robert Bosch GmbH
Priority date: 2021-01-28
Filing date: 2021-01-28
Publication date: 2022-07-29

Abstract

A computer vision guided unstacking method and apparatus are provided. The method includes receiving a stack of 2D images and 3D point cloud data of an object; determining the vertical direction of the stack of objects by using a principal component analysis algorithm based on the 3D point cloud data; extracting first 3D point cloud data corresponding to the current highest layer from the 3D point cloud data based on the vertical direction; projecting the first 3D point cloud data into the 2D image to obtain a first 2D image of the current highest layer; extracting one or more objects included in a current highest layer from the first 2D image; projecting the extracted one or more objects back into the 3D point cloud data to obtain second 3D point cloud data corresponding to each of the one or more objects; and for one of the one or more objects, determining a 3D pose of the object based on the second 3D point cloud data of the object and the corresponding 3D point cloud template for unstacking the object. This enables unstacking of irregular objects without limiting the positioning of the stereoscopic vision device.

Description

Computer vision guided unstacking method and equipment

Technical Field

The invention relates to the field of logistics or industrial manufacturing, in particular to unstacking under the guidance of computer vision.

Background

In the field of logistics or industrial manufacturing, unstacking is a necessary operation. Manual unstacking is a common scenario in factories, but it is very labor intensive and also unsafe for workers. Computer vision guided unstacking has been developed and used, wherein the unstacking is performed using computer vision techniques to drive a robotic arm.

Current computer vision guided unstacking methods are typically used for objects with smooth flat surfaces, such as cubic objects, and therefore only 2D information of an object is derived without taking into account 3D information for unstacking the cubic object, but in actual logistic or industrial manufacturing scenarios there is a need to unstack irregularly shaped objects and therefore it would make sense to identify the 3D pose of an object.

Furthermore, current computer vision guided unstacking methods typically require that a stereoscopic vision device, such as a depth camera, be placed above the stacked objects, perpendicular to the stacked cubic objects. This limits the relative position of the stack and the camera device and makes it difficult to adapt to the requirements of different application scenarios, for example, where there is insufficient space at the top of the stack to place the camera device, such an approach is difficult to apply.

Disclosure of Invention

It is desirable to provide a computer vision-guided unstacking method and apparatus that does not restrict the relative positions of the stereoscopic vision device and the stack, and obtains 3D pose information of the objects to facilitate unstacking of irregularly shaped objects.

According to one aspect, a computer-vision-guided unstacking method is provided. The method comprises receiving a 2D image and 3D point cloud data of a stack of objects, the 2D image and the 3D point cloud data being acquired by a stereoscopic vision apparatus; determining a vertical direction of the stack of objects based on the 3D point cloud data using a principal component analysis algorithm; extracting first 3D point cloud data corresponding to a current highest layer of the stack of objects from the 3D point cloud data based on the determined vertical direction; projecting the first 3D point cloud data into the 2D image to obtain a first 2D image of the current topmost layer; extracting one or more objects included in the current highest layer from the first 2D image; projecting the extracted one or more objects back to the 3D point cloud data or the first 3D point cloud data to obtain second 3D point cloud data corresponding to each of the one or more objects; and for one of the one or more objects, determining a 3D pose of the object based on the second 3D point cloud data of the object and a 3D point cloud template of the object for unstacking the object.

According to another aspect, a computer-visually guided unstacking apparatus is provided. The apparatus comprises a receiving unit that receives a 2D image and 3D point cloud data of a stack of objects, the 2D image and the 3D point cloud data being acquired by a stereo vision device; and a processing unit that determines a vertical direction of the stack of objects using a principal component analysis algorithm based on the 3D point cloud data; extracting first 3D point cloud data corresponding to a current highest layer of the stack of objects from the 3D point cloud data based on the determined vertical direction; projecting the first 3D point cloud data into the 2D image to obtain a first 2D image of the current topmost layer; extracting one or more objects included in the current highest layer from the first 2D image; projecting the extracted one or more objects back to the 3D point cloud data or the first 3D point cloud data to obtain second 3D point cloud data corresponding to each object of the one or more objects; and for one of the one or more objects, determining a 3D pose of the object based on the second 3D point cloud data of the object and a 3D point cloud template of the object for unstacking the object.

According to another aspect, an unstacking system is provided. The system includes a robotic arm; the control equipment controls the robot arm to unstack a stack of objects; a stereoscopic vision device that collects 2D images and 3D point cloud data of the stack of objects; and a computer-vision-guided unstacking apparatus according to various embodiments.

According to a further aspect, a computer program product is provided, comprising computer program instructions which, when executed, cause an unstacking apparatus according to various embodiments of the present disclosure to perform a method of unstacking according to various embodiments of the present disclosure.

According to various embodiments of various aspects of the present disclosure, on one hand, in the unstacking process, firstly, a principal component analysis algorithm is used to determine the vertical direction of a stack of objects based on 3D point cloud data of the stack of objects, and then, first 3D point cloud data corresponding to the current highest layer of the stack of objects is derived from the 3D point cloud data based on the determined vertical direction, so that the highest layer of objects can be accurately determined without arranging a stereoscopic vision device above the stack of objects and perpendicular to the stack of objects; on the other hand, the 3D position of the object in the highest layer of the stack object is determined based on the 3D point cloud template of the object, and the 3D position of the object is considered, so that the stack removal of the irregular object can be realized.

Drawings

Embodiments are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings in which like reference numerals refer to similar elements.

FIG. 1 illustrates an unstacking system according to one embodiment;

FIG. 2 illustrates a flow diagram of a computer vision-guided unstacking method according to one embodiment.

Various aspects and features of various embodiments of the present invention are described with reference to the above-identified figures. The drawings described above are only schematic and are non-limiting. The size, shape, reference numerals, or appearance of the respective elements in the above-described drawings may be changed without departing from the gist of the present invention; in addition, the parts of the mobile communication terminal device according to the embodiment of the present invention in the above drawings are not all denoted by reference numerals, and only relevant components are denoted in some drawings, which does not limit the parts to only those shown in the drawings of the specification.

Detailed Description

Fig. 1 shows an unstacking system 1 according to an embodiment. In the logistics or industrial manufacturing industry, it is necessary to perform unstacking not only of regularly shaped objects, such as cubes, but also of irregularly shaped objects in some cases. For example, where a stack of objects 6 as shown in figure 1 is formed from a plurality of stacks of parts having irregular shapes, it is also necessary to de-stack such objects using computer vision guidance.

The destacking system 1 as shown in fig. 1 supports the destacking operation of a stack of objects 6 comprising irregularly shaped objects, although this is not limitative and the destacking system 1 can equally well be used for destacking regularly shaped objects, such as cubes.

The unstacking system 1 comprises a stereoscopic vision apparatus 2, a computer vision guided unstacking device 3, a control device 4 and a robot arm 5. The stereoscopic vision apparatus 2 comprises a 3D stereoscopic camera capable of acquiring RGBD images of the stack of objects, i.e. 2D images and 3D point cloud data of the stack of objects, during unstacking, before each time the robot arm is controlled to perform an unstacking operation on the highest layer of objects in the stack of objects 6. The stereoscopic vision apparatus 2 may be positioned at any position relative to the stack of objects 6, not limited to being perpendicular to the stack of objects directly above the stack of objects 6, but may in particular also be positioned at a position other than directly above.

The acquired 2D images and 3D point cloud data are received by the receiving unit 31 of the computer vision guided unstacking apparatus 3 and processed by the processing unit 32 to determine 3D pose information of the object to be unstacked comprised in the highest layer of the stack of objects 6. The specific process of the computer-vision-guided unstacking apparatus 3 will be described with reference to fig. 2 below. Since the 3D pose information of the object to be unstacked is obtained, it is possible to unstack not only an object having a regular shape such as a cube but also a part having an irregular shape as shown in fig. 1 using the information. In particular, the center of gravity of an irregularly shaped object to be unstacked can be determined from the determined 3D pose information, thereby determining a gripping position for the object, facilitating the generation of corresponding control signals to control the robot arm 5 to grip at the determined gripping position, thereby enabling an unstacking operation of the object.

In one embodiment, after determining the 3D pose information of the object to be grabbed, the processing unit 32 further generates control signals for the robot arm 5 to grab the respective object based on the determined 3D pose information. Specifically, the processing unit 32 determines the barycenter of the respective object based on the determined 3D pose information, and determines the grasp position for the object based on the barycenter of the object, and then generates the control signal based on the determined grasp position. The generated control signal is transmitted to the control device 4 so that the control device 4 can accordingly control the robot arm 5 to perform a de-stacking operation on the object with the determined gripping position.

In another embodiment, after determining the 3D pose information of the object to be gripped, the processing unit 32 sends the determined 3D pose information directly to the control device 4, the 3D pose information is further processed by the control device 4, and based on the 3D pose information control signals for the robot arm 5 are generated for controlling the robot arm 5 for the unstacking operation.

The control device 4 receives the 3D pose information or control signals from the processing unit 32, and controls the robot arm 5 based thereon. Specifically, the control device 4 can control the robot arm 5 to move to a specific gripping position in order to accurately grip the object. The robot arm 5 is capable of a motion with a plurality of degrees of freedom, and is moved to a specific gripping position under the control of the control device 4 to grip an object.

FIG. 2 illustrates a flow diagram of a computer vision-guided unstacking method 100 according to one embodiment. The method is performed by a computer vision guided unstacking apparatus 3.

According to the method 100, first at step 105, an RGBD image acquired by the stereo vision apparatus 2 is received from the stereo vision apparatus 2, comprising a 2D image and 3D point cloud data of the stack of objects 6. Since the 2D image and the 3D point cloud data are acquired by the same stereo vision apparatus 2, they have the same coordinate system. The 3D point cloud data includes ordered 3D point cloud data corresponding to the 2D image. And a predetermined corresponding relation exists between the pixel serial number of the 2D image and the 3D point cloud data. Nor does it exclude that the 2D image and the 3D point cloud data are acquired by different means. This step 110 is performed by the receiving unit 31 of the unstacking apparatus 3 described above.

Next, at step 110, the received RGBD images are initially processed, which includes a background filtering process or the like, to extract a 2D image and 3D point cloud data corresponding to the region of interest (i.e., the stack of objects 6).

At step 115, principal component analysis is performed on the 3D point cloud data to determine the vertical direction of the stack of objects, i.e. the direction in which the stack of objects is stacked as indicated by arrow a in fig. 1, based on the 3D point cloud data. As shown in fig. 1, the stereo vision apparatus 2 is placed obliquely with respect to the stack of objects 6. This is necessary when the upper space of the stack is insufficient to place the stereoscopic vision apparatus 2 or to reduce reflections to increase image quality.

The principal component analysis of the 3D point cloud data aims to find a direction in which the distribution of projection points of all neighborhood points in the direction is most concentrated, and then the vertical direction of the 3D point cloud data, i.e. the vertical direction of the stack of objects, can be determined based on the direction.

After the vertical direction of the stack of objects 6 has been determined, the first 3D point cloud data corresponding to the highest level of the stack of objects 6 can be extracted from the 3D point cloud data according to the vertical direction determined in step 115 in step 120, even if the stereoscopic vision apparatus is placed in a position tilted with respect to the stack of objects 6. The stack of objects 6 consists of a plurality of layers of objects, wherein each layer may comprise one or more objects. Likewise, the highest level may also include one or more objects. Considering that unstacking needs to be carried out layer by layer, the current highest layer is operated each time, and therefore first 3D point cloud data of the highest layer need to be extracted. The first 3D point cloud data includes 3D point cloud data of all objects included in the top level.

In step 125, the first 3D point cloud data extracted in step 120 is projected into the 2D image of the stack of objects 6 received in step 105 or obtained in step 110 to obtain a first 2D image of the highest layer of the stack of objects 6. The first 2D image shows all objects comprised in the highest layer. When the highest layer includes one or more objects, the first 2D image shows the one or more objects. Projecting the first 3D point cloud data into the 2D image is based on a predetermined correspondence between the pixel sequence number of the 2D image and the 3D point cloud data.

After the first 2D image showing all objects of the highest level is obtained in step 125, one object comprised in the highest level of the stack of objects 6 is extracted from the first 2D image as the current object in step 130 for facilitating a subsequent unstacking operation of the objects.

In one embodiment, each of all objects included in the highest layer may also be extracted in this step. In further embodiments, a portion of the objects in the highest layer may be extracted.

The extraction of the objects in the highest layer can be achieved by template matching or neural networks. In one embodiment, the computer-guided unstacking apparatus 3 further comprises a memory (not shown) storing 2D image templates and corresponding 3D point cloud templates of the individual objects. One or more objects in the highest layer can be extracted from the first 2D image based on the 2D image template. The 2D image template and the corresponding 3D point cloud template described above may be pre-acquired and determined for a single object in the stack of objects 6.

In another embodiment, one or more objects included in the highest layer can be extracted from the first 2D image based on a trained neural network. As an example, the first 2D image of the highest layer of the stack of objects 6 can be directly input to a trained neural network to extract all objects comprised in the highest layer from the first 2D image. As another example, the first 2D image can be first processed to obtain images including only one object therein, and then each image including one object is input to a trained neural network to extract the corresponding object.

The neural network, for example MaskRCNN, is generated by training a plurality of 2D images of the subject in advance. In one embodiment, one or more objects included in the highest layer can be extracted from the first 2D image using instance segmentation.

Next, in step 135, the extracted current object is projected back to the 3D point cloud data or the first 3D point cloud data, thereby obtaining second 3D point cloud data of the current object. In case that a plurality of objects in the highest layer are extracted in step 130, the extracted plurality of objects can be projected back to the 3D point cloud data or the first 3D point cloud data in step 135, thereby obtaining second 3D point cloud data of the plurality of objects. The projection is based on a predetermined correspondence between the pixel numbers of the 2D image and the 3D point cloud data.

At step 140, a 3D pose of the current object is determined based on the second 3D point cloud data of the current object and a 3D point cloud template of an object pre-stored in memory. The grabbing position of the robot arm can be determined based on the 3D pose, so that the robot arm can perform unstacking operation on the current object according to the grabbing position.

Where second point cloud data for a plurality of objects is obtained in step 135, a 3D pose of each object of the plurality of objects can be determined in step 140 based on the second 3D point cloud data for the object and a 3D point cloud template for the object pre-stored in memory.

Specifically, the following method may be employed to determine the 3D pose of the object. The six-dimensional pose of the object, including three-dimensional coordinate position and orientation (pitch, yaw and roll angles), is first roughly determined using a point-to-point feature-based algorithm (PPF), which is used as an initial pose for an iterative closest point algorithm (ICP) to determine the 3D pose of the object. More accurate pose information can be obtained by combining the PPF algorithm and the ICP algorithm.

After the 3D pose of the current object is determined in step 140, control signals for the robotic arm are generated based on the 3D pose of the current object to unstack the current object in step 145. If the 3D poses of multiple objects are determined in step 140, control signals for the robotic arm can be generated in step 145 for each object in turn based on the corresponding 3D pose.

In step 150, the control signal is transmitted to the control device 4, so that the control device 4 can unstack the current object according to the control signal. After the unstacking of the current object is completed, a signal may be returned to the computer-guided unstacking apparatus 3.

After receiving the signal, the method 100 may proceed to step 155 to determine whether the unstacking of all objects included in the current highest layer is completed, and if not completed (i.e., "N" shown in fig. 2), the method 100 returns to step 130 to further extract the next object in the highest layer as the current object, and then perform

step

130 and 150 on the object; if this has been done (i.e. "Y" as shown in fig. 2), the method 100 returns to step 105 to receive from the stereo vision apparatus 2 a further RGBD image acquired by the stereo vision apparatus 2 comprising a stack of 2D images of the object 6 and the 3D point cloud data, and repeats the method shown in fig. 2 to unstack the next highest layer.

In one embodiment, if in step 140 its 3D pose is determined for each of all objects in the current highest level, after determining in step 155 that unstacking of all objects comprised in the current highest level is not completed, the method 100 is returned to step 145 to generate control signals for the robotic arm to unstack the next object in the current highest level based on the 3D pose of the next object in the current highest level.

Various embodiments of a computer-vision-guided unstacking method and apparatus are described above with reference to fig. 1-2, which can be combined with one another to achieve different effects, without being limited by the type of subject matter. Furthermore, the above mentioned individual units/steps/processes are not limiting, the functionality of the above mentioned individual units/steps/processes can be combined/altered/modified to obtain the corresponding effect.

In an embodiment, some of the steps of the method as shown in fig. 2 can be performed by the control device 4, for example, steps 145 and 150 can also be performed by the control device 4.

The functions of these units can be implemented by software or corresponding hardware, or by means of a processor, for example, a computer program that can be read by a memory and executed by a processor to implement the functions of the units.

In particular, the functionality of the above-described devices can be implemented in a processor of the mobile device or can be implemented at a remote location with respect to the mobile device.

It will be appreciated that the computer vision-guided unstacking method and apparatus of the various embodiments of the present disclosure can be implemented by a computer program/software. These software can be loaded into the working memory of a microprocessor and when run, used to perform methods according to embodiments of the present disclosure.

Exemplary embodiments of the present disclosure cover both: the computer program/software of the present disclosure is created/used from the beginning and the existing program/software is transferred to the computer program/software using the present disclosure by means of an update.

According to further embodiments of the present disclosure, a computer program product, such as a machine (e.g. computer) readable medium, such as a CD-ROM, is provided, comprising computer program code which, when executed, causes a computer or processor to perform a method according to embodiments of the present disclosure. The machine-readable medium may be, for example, an optical storage medium or a solid-state medium supplied together with or as part of other hardware.

Computer programs for carrying out methods according to embodiments of the present disclosure may also be distributed in other forms, such as via the internet or other wired or wireless telecommunication systems.

The computer program may also be provided over a network, such as the world wide web, and can be downloaded into the operating computers of microprocessors from such a network.

It has to be noted that embodiments of the present disclosure are described with reference to different subject-matters. In particular, some embodiments are described with reference to method type claims whereas other embodiments are described with reference to apparatus type claims. However, a person skilled in the art will gather from the above and the following description that, unless other notified, in addition to any combination of features belonging to one type of subject-matter also any combination between features relating to different subject-matters is considered to be disclosed with this application. Also, all features can be combined, providing a synergistic effect greater than a simple sum of the features.

The foregoing description of specific embodiments of the present disclosure has been described. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The present disclosure has been described above with reference to specific embodiments, and it will be understood by those skilled in the art that the technical solutions of the present disclosure can be implemented in various ways without departing from the spirit and essential characteristics of the present disclosure. The specific embodiments are merely illustrative and not restrictive. In addition, any combination of these embodiments can be used to achieve the purpose of the present disclosure. The scope of the disclosure is defined by the appended claims.

The word "comprising" in the description and in the claims does not exclude the presence of other elements or steps, the order in which "first", "second", "step", etc. are recited and shown in the figures does not limit the order or number of steps. The functions of the respective elements described in the specification or recited in the claims may be divided or combined into plural corresponding elements or may be implemented by a single element.

Claims

1. A computer-vision-guided unstacking method comprising:

receiving a 2D image and 3D point cloud data of a stack of objects, wherein the 2D image and the 3D point cloud data have a predetermined corresponding relationship;

determining a vertical direction of the stack of objects based on the 3D point cloud data using a principal component analysis algorithm;

extracting first 3D point cloud data corresponding to a current highest layer of the stack object from the 3D point cloud data based on the determined vertical direction;

projecting the first 3D point cloud data into the 2D image to obtain a first 2D image of the current topmost layer;

extracting one or more objects included in the current highest layer from the first 2D image;

projecting the extracted one or more objects back to the 3D point cloud data or the first 3D point cloud data to obtain second 3D point cloud data corresponding to each of the one or more objects; and

for one of the one or more objects, determining a 3D pose of the object based on the second 3D point cloud data of the object and a 3D point cloud template of the object for unstacking the object.

2. The unstacking method as recited in claim 1, wherein the one or more objects included in the current highest level are extracted from the first 2D image based on a 2D image template of the object.

3. The unstacking method as recited in claim 1, wherein the one or more objects included in the current highest level are extracted from the first 2D image based on a trained neural network.

4. The unstacking method as recited in any one of claims 1-3 wherein the 3D pose of the object is determined using a point-to-feature based algorithm and an iterative closest point algorithm.

5. The unstacking method as claimed in any one of claims 1-3, further comprising:

performing background filtering on the 3D point cloud data; and

determining the vertical direction of the stack of objects using the principal component analysis algorithm based on the background filtered 3D point cloud data.

6. The method of unstacking as recited in any one of claims 1-3 further comprising:

generating control signals for a robotic arm based on the 3D pose of the object for unstacking the object.

7. A computer vision-guided unstacking apparatus comprising:

a receiving unit which receives a 2D image and 3D point cloud data of a stack of objects, the 2D image and the 3D point cloud data having a predetermined correspondence therebetween;

a processing unit that determines a vertical direction of the stack of objects using a principal component analysis algorithm based on the 3D point cloud data; extracting first 3D point cloud data corresponding to a current highest layer of the stack of objects from the 3D point cloud data based on the determined vertical direction; projecting the first 3D point cloud data into the 2D image to obtain a first 2D image of the current topmost layer; extracting one or more objects included in the current highest layer from the first 2D image; projecting the extracted one or more objects back to the 3D point cloud data or the first 3D point cloud data to obtain second 3D point cloud data corresponding to each object of the one or more objects; and for one of the one or more objects, determining a 3D pose of the object based on the second 3D point cloud data of the object and a 3D point cloud template of the object for unstacking the object.

8. The unstacking apparatus according to claim 7, wherein the processing unit extracts the one or more objects included in the current highest layer from the first 2D image based on a 2D image template of the object.

9. The unstacking apparatus as recited in claim 7, wherein the processing unit extracts the one or more objects included in the currently highest layer from the first 2D image based on a trained neural network.

10. The unstacking apparatus according to any one of claims 7-9 wherein the processing unit determines the 3D pose of the object using a point-to-feature based algorithm and an iterative closest point algorithm.

11. The unstacking apparatus according to any one of claims 7-9, wherein the processing unit further performs background filtering on the 3D point cloud data and determines the vertical direction of the stacked objects using the principal component analysis algorithm based on the background filtered 3D point cloud data.

12. The unstacking apparatus according to any one of claims 7-9 wherein the processing unit generates control signals for a robotic arm to unstack the objects based on the 3D pose of the objects.

13. An unstacking system comprising:

a robot arm;

the control equipment controls the robot arm to unstack a stack of objects;

a stereoscopic vision device that collects 2D images and 3D point cloud data of the stack of objects; and

a computer-vision-guided unstacking apparatus as claimed in any one of claims 7-12.

14. The unstacking system of claim 13 wherein,

the stereoscopic vision apparatus can be positioned at any position relative to the stack of objects other than directly above.

15. A computer program product comprising computer program instructions which, when executed, cause a computer vision-guided unstacking apparatus according to any one of claims 7-12 to perform the method according to any one of claims 1-6.