CN111368852A

CN111368852A - Article identification and pre-sorting system and method based on deep learning and robot

Info

Publication number: CN111368852A
Application number: CN201811605348.8A
Authority: CN
Inventors: 姜楠; 曲道奎; 邹风山; 王晓东; 毕丰隆; 徐佳新
Original assignee: Shenyang Siasun Robot and Automation Co Ltd
Current assignee: Shenyang Siasun Robot and Automation Co Ltd
Priority date: 2018-12-26
Filing date: 2018-12-26
Publication date: 2020-07-03

Abstract

The invention provides an article identification pre-sorting method based on deep learning, which comprises the steps of acquiring image data containing a target article by using an RGBD (red green blue) camera, performing target positioning on the image data by using a convolutional neural network, performing pixel level segmentation on a positioned target object by using a multi-scale target detection FPN (field programmable gate array) network to obtain a target article pixel point set, processing the target article pixel point set according to the corresponding relation between the image data and point cloud to obtain a point cloud set, processing the point cloud set in an iterative closest ICP (inductively coupled plasma) matching mode to obtain pose information of the target object, wherein the pose information is used for sorting the target object, so that accurate pose information of the target article can be provided, and a guarantee is provided for rapid sorting of a robot. The invention further correspondingly provides an article identification and pre-sorting system and a robot based on deep learning.

Description

Article identification and pre-sorting system and method based on deep learning and robot

Technical Field

The invention relates to the field of robot vision, in particular to an article identification and pre-sorting system and method based on deep learning and a robot.

Background

With the rapid development of electronic commerce, the flexibility and the efficiency of sorting can be greatly improved by using a robot as a sorting executing mechanism. For the robot, except for the teaching mode, the robot can autonomously work, and the mode of identifying and positioning the target through machine vision is more effective and practical.

The robot is composed of a mechanical arm and a movable chassis, so that the operation of a plurality of stations can be realized. When the machine vision is used for positioning the target, the common methods are divided into two methods, one method is based on a two-dimensional camera, and the target is grabbed in a plane calibration mode; the other is to adopt a three-dimensional sensor, which can position and operate any article in the space, and the vision recognition system becomes an indispensable auxiliary unit.

Visual recognition systems are often very complex systems requiring the ability to accurately capture images and react to external changes in real time. In addition, the vision recognition system is often required to track the external moving target in real time, so that the vision recognition system puts high requirements on the real-time performance of hardware and software systems, and the traditional method is still used mainly in the actual use process.

Disclosure of Invention

The embodiment of the invention provides an article identification and pre-sorting system and method based on deep learning and a robot, which can provide accurate pose information of target articles and guarantee the robot to quickly sort.

In a first aspect, the invention provides a deep learning-based item identification pre-sorting method, which comprises the following steps:

acquiring image data containing a target object by using an RGBD camera;

performing target positioning on the image data by using a convolutional neural network, and performing pixel level segmentation on the positioned target object by using a multi-scale target detection FPN network to obtain a target object pixel point set;

processing the target article pixel point set according to the corresponding relation between the image data and the point cloud to obtain a point cloud set;

and processing the point cloud set in an iterative closest point ICP (inductively coupled plasma) matching mode to obtain pose information of the target object, wherein the pose information is used for sorting the target object.

Optionally, before the performing target localization on the image data by using a convolutional neural network algorithm and performing pixel-level segmentation on the localized target object by using an FPN network to obtain a target item pixel point set, the method further includes:

and carrying out image recognition on the image data to obtain the identification information of the target object, wherein the identification information is used for identifying the uniqueness of the target object.

Optionally, after the processing the point cloud set by the ICP matching to obtain the pose information of the target object, and the pose information is used for sorting the target object, the method further includes:

and converting the pose information into position parameters of a robot coordinate system, so that the robot can grab the target object according to the position parameters.

Optionally, the performing target location on the image data by using a convolutional neural network includes:

the method comprises the steps of constructing a convolutional neural network, inputting an image of a target article into the convolutional neural network for training to obtain a trained convolutional neural network model, training a Softmax classifier through global target article features extracted by the convolutional neural network model, wherein the convolutional neural network comprises a convolutional pooling layer, a local feature fusion layer and a full connection layer, conducting modular preprocessing on image data, inputting results into the trained convolutional neural network model to obtain the features of the target article, and identifying by using the trained Softmax classifier to obtain the positioning of the target article.

receiving input image samples of multiple categories, normalizing the input image sample data of each category, convolving the normalized image sample data, mapping the convolved image sample data by adopting a preset asymmetric mapping matrix, arranging the mapped image sample data to obtain corresponding one-dimensional feature description, and calculating a neural network weight corresponding to the image of each category according to the one-dimensional feature description;

distributing the corresponding neural network weights of the plurality of category images by adopting a hierarchical structure, wherein the category number distributed in each layer is the maximum distinguishing classification number determined according to the asymmetric mapping matrix, the plurality of category images are sequentially distributed in the plurality of layers, and each layer forms a corresponding learning library;

processing input test type image sample data to obtain corresponding one-dimensional feature description, and performing feed-forward learning on the one-dimensional feature description corresponding to the test type image sample data and the neural network weight in the learning library to obtain whether the test type is in the learned type image.

Optionally, the processing the point cloud set by the iterative closest point ICP matching to obtain the pose information of the target object includes:

determining any two three-dimensional point sets in the point cloud set, namely a first three-dimensional point set X1 and a second three-dimensional point set X2;

calculating a corresponding near point for each point in the second set of three-dimensional points X2 in the first set of three-dimensional points X1;

obtaining rigid body transformation which enables the corresponding close point to have the minimum average distance, and obtaining translation parameters and rotation parameters;

obtaining a new transformation point set by using the translation and rotation parameters obtained in the previous step for the second three-dimensional point set X2;

and when the average distance between the new transformation point set and the reference point set is smaller than a given threshold value, stopping iterative computation, otherwise, taking the new transformation point set as a new second three-dimensional point set X2 to continue iteration until the requirement of the objective function is met, and obtaining the pose information of the target object.

Optionally, the performing image recognition on the image data to obtain the identification information of the target item includes:

and recognizing the image data by adopting an Optical Character Recognition (OCR) to obtain the identification information of the target object.

In a second aspect, the present invention provides an item identification pre-sorting system based on deep learning, comprising:

the vision board card is used for acquiring image data containing a target object by using an RGBD (red, green and blue) camera, processing a pixel point set of the target object according to the corresponding relation between the image data and the point cloud to obtain a point cloud set, processing the point cloud set in an iterative closest point ICP (inductively coupled plasma) matching mode to obtain pose information of the target object, wherein the pose information is used for sorting the target object;

the vision processing unit GPU server is used for carrying out target positioning on the image data by utilizing a convolutional neural network and carrying out pixel level segmentation on the positioned target object by utilizing a multi-scale target detection FPN network to obtain a target article pixel point set;

and the visual board card and the GPU server are communicated by adopting a TCP/IP protocol.

Optionally, the vision board is further configured to convert the pose information into a position parameter of a robot coordinate system, so that the robot grasps the target object according to the position parameter;

the GPU server is also used for carrying out image recognition on the image data to obtain the identification information of the target object, and the identification information is used for identifying the uniqueness of the target object.

Optionally, the GPU server is specifically configured to recognize the image data by using an optical character recognition OCR to obtain the identification information of the target item.

In a third aspect, the invention provides a robot for performing the deep learning based item identification pre-sorting method as described above.

According to the technical scheme, the embodiment of the invention has the following advantages:

the invention provides an article identification pre-sorting method based on deep learning, which comprises the following steps: the method comprises the steps of acquiring image data containing a target object by using an RGBD (red green blue) camera, performing target positioning on the image data by using a convolutional neural network, performing pixel level segmentation on the positioned target object by using a multi-scale target detection FPN (field programmable gate array) network to obtain a target object pixel point set, processing the target object pixel point set according to the corresponding relation between the image data and point cloud to obtain a point cloud set, processing the point cloud set in an iterative closest point ICP (inductively coupled plasma) matching mode to obtain pose information of the target object, wherein the pose information is used for sorting the target object, can provide accurate pose information of the target object and guarantee the rapid sorting of a robot, correspondingly provides an object identification pre-sorting system and the robot based on deep learning, and effectively improves the calculation speed by adopting a local private server, and accurate pose information of target objects can be provided, and a guarantee is provided for the robot to rapidly sort.

Drawings

FIG. 1 is a flow chart of a deep learning based item identification pre-sort method provided in an embodiment of the present invention;

FIG. 2 is a schematic diagram of a deep learning based item identification pre-sort method provided in an embodiment of the present invention;

fig. 3 is a block diagram of an item identification pre-sorting system based on deep learning provided in an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Referring to fig. 1, the present invention provides a deep learning-based item identification pre-sorting method, which includes:

s101, acquiring image data containing the target object by using an RGBD camera.

The RGBD camera adopts binocular stereo vision, and the binocular stereo vision is a method for acquiring three-dimensional geometric information of an object from a plurality of images based on a parallax principle. In a machine vision system, binocular vision generally obtains two digital images of surrounding scenery from different angles by two cameras simultaneously, or obtains two digital images of the surrounding scenery from different angles by a single camera at different times, can recover three-dimensional geometric information of an object based on a parallax principle, and reconstructs the three-dimensional shape and position of the surrounding scenery, the binocular vision is based on the parallax, and obtains the three-dimensional information based on a trigonometry principle, namely a triangle is formed between an image plane of the two cameras and a north object. The three-dimensional size of an object in the common field of view of the two cameras and the three-dimensional coordinates of the feature points of the spatial object can be obtained by keeping the position relationship between the two cameras, and the binocular vision system is composed of the two cameras.

S102, carrying out target positioning on the image data by using a convolutional neural network, and carrying out pixel level segmentation on the positioned target object by using a multi-scale target detection FPN network to obtain a target object pixel point set.

The FPN network has multiple positioning methods, can be obtained by respectively obtaining multi-scale features through multi-scale pictures, can also be directly used for predicting on the highest layer conv, which is a method similar to rcnn mainstream detection, and can also be used for predicting the feature maps of each layer with different scales, and the other method is to perform prediction after the upsampling and shallow layer fusion of each layer of feature maps, specifically to perform upsampling on a high-layer feature map 2x up, convolving and compressing a channel on a shallow-layer feature map 1 x1, and then performing fusion, wherein anchors are configured on each scale to select 15 anchors which are respectively corresponding to the region original image, and have the total {32^2,64^2,128^2,256^2,512^2} (1: 2:1, 2:1} total.

S103, processing the target article pixel point set according to the corresponding relation between the image data and the point cloud to obtain a point cloud set.

The point cloud is a collection of a vast number of points on the surface characteristic of the object. The point cloud obtained according to the laser measurement principle comprises three-dimensional coordinates (XYZ) and laser reflection Intensity (Intensity). The point cloud obtained according to the photogrammetry principle comprises three-dimensional coordinates (XYZ) and color information (RGB). And combining laser measurement and photogrammetry principles to obtain a point cloud comprising three-dimensional coordinates (XYZ), laser reflection Intensity (Intensity) and color information (RGB). After the spatial coordinates of each sampling point on the surface of an object are obtained, a point set is obtained as a point cloud, the number of points obtained by using a three-dimensional coordinate measuring machine is small, the distance between the points is large, and the point cloud is called as a sparse point cloud; the point clouds obtained by using the three-dimensional laser scanner or the photographic scanner have larger and denser point quantities, and are called dense point clouds.

And S104, processing the point cloud set in an iterative closest point ICP (inductively coupled plasma) matching mode to obtain pose information of the target object, wherein the pose information is used for sorting the target object.

The ICP matching specifically includes:

determining any two three-dimensional point sets in the point cloud set, namely a first three-dimensional point set X1 and a second three-dimensional point set X2; calculating a corresponding near point for each point in the second set of three-dimensional points X2 in the first set of three-dimensional points X1; obtaining rigid body transformation which enables the corresponding close point to have the minimum average distance, and obtaining translation parameters and rotation parameters; obtaining a new transformation point set by using the translation and rotation parameters obtained in the previous step for the second three-dimensional point set X2; and when the average distance between the new transformation point set and the reference point set is smaller than a given threshold value, stopping iterative computation, otherwise, taking the new transformation point set as a new second three-dimensional point set X2 to continue iteration until the requirement of the objective function is met, and obtaining the pose information of the target object.

In order to facilitate the distinguishing and marking of the target object, before the performing target localization on the image data by using the convolutional neural network algorithm and performing pixel-level segmentation on the localized target object by using the FPN network to obtain a target item pixel point set, the method further includes:

the image data is subjected to image recognition to obtain identification information of the target object, and the identification information is used for identifying the uniqueness of the target object, for example, by adding a character signboard, and obtaining character content by OCR recognition, which is not limited herein.

After the point cloud set is processed in an ICP matching manner to obtain pose information of the target object, where the pose information is used for sorting the target object, the method further includes:

and S105, converting the pose information into position parameters of a robot coordinate system, so that the robot can grab the target object according to the position parameters.

With reference to fig. 2, the coordinate system can be converted by the hand-eye calibration module of the robot, and the scheme can be used in the sorting process of the articles, specifically, the grabbing execution stage: the method comprises the steps of utilizing hand-eye calibration to capture the object position and pose information obtained through conversion, converting the object position and pose information obtained through camera recognition into a robot coordinate system in the hand-eye calibration stage, enabling a calibration plate to be static, enabling the robot to drive the camera to move for settlement, obtaining object types, actual positions and postures in the object recognition and positioning stage, obtaining identification information of the object through OCR recognition, performing pixel-level segmentation, and finally mapping point cloud for ICP matching to obtain position and pose information.

In one embodiment of step S102, the performing target localization on the image data by using a convolutional neural network includes:

In another embodiment of step S102, the performing target location on the image data by using a convolutional neural network includes:

The invention provides an article identification pre-sorting method based on deep learning, which comprises the following steps: the method comprises the steps of obtaining image data containing a target object by using an RGBD (red, green and blue) camera, performing target positioning on the image data by using a convolutional neural network, performing pixel level segmentation on the positioned target object by using a multi-scale target detection FPN (field programmable gate array) network to obtain a target object pixel point set, processing the target object pixel point set according to the corresponding relation between the image data and point cloud to obtain a point cloud set, processing the point cloud set in an iterative closest point ICP (inductively coupled plasma) matching mode to obtain pose information of the target object, wherein the pose information is used for sorting the target object, so that accurate pose information of the target object can be provided, and rapid sorting of a robot is guaranteed.

As shown in fig. 3, the present invention provides a deep learning-based item identification pre-sorting system, which includes:

the visual board card and the GPU server are communicated by adopting a TCP/IP protocol, and the calculation speed is effectively increased by adopting a local private server mode.

The vision board card is also used for converting the pose information into position parameters of a robot coordinate system so that the robot can grab the target object according to the position parameters;

the GPU server is further used for carrying out image recognition on the image data to obtain identification information of the target object, the identification information is used for identifying the uniqueness of the target object, and the GPU server specifically adopts Optical Character Recognition (OCR) to carry out recognition on the image data to obtain the identification information of the target object.

The calculation of deep learning is usually realized by adopting a GPU acceleration mode, a more powerful GPU is needed for a more complex deep learning network, the GPU with stronger calculation capability usually needs larger power consumption, the display card is not suitable for being placed on the composite robot, and otherwise, the working time of the composite robot can be greatly reduced. Therefore, the system architecture is realized by adopting an edge computing mode, namely, an i7 computing board card is used on the composite robot body and is connected to a local private server through a wireless network, and the function needing GPU computing is placed on the private server. In order to facilitate the integration and customization of the system, the visual software system adopts the ROS as a basic framework to realize multi-machine distributed processing.

The article identification pre-sorting system provided by the invention can be applied to the e-commerce storage logistics industry, such as logistics warehouses, various types of living goods exist, and the system can be used for replacing manual operation of classifying and warehousing the goods. The device can also be applied to industrial production flow, and can effectively replace the work of workers in certain severe production environments for reading instruments and meters and operating the simple control buttons.

The article identification pre-sorting system based on deep learning provided by the invention comprises a vision board card, a local private server and a vision processing unit (GPU) server, wherein the vision board card is used for acquiring image data containing a target article by using an RGBD camera, processing a pixel point set of the target article according to the corresponding relation between the image data and a point cloud to obtain a point cloud set, processing the point cloud set in an iterative closest point ICP (inductively coupled plasma) matching mode to obtain pose information of the target article, the pose information is used for sorting the target article, the vision processing unit (GPU) server is used for carrying out target positioning on the image data by using a convolutional neural network and carrying out pixel level segmentation on the positioned target article by using a multi-scale target detection (FPN) network to obtain a pixel point set of the target article, the vision board card and the GPU server are communicated by adopting a TCP/IP (Transmission control protocol/Internet protocol), the calculation speed is effectively improved, accurate pose information of the target object can be provided, and a guarantee is provided for the robot to sort quickly.

Accordingly, the present invention provides a robot for performing the deep learning based item identification pre-sorting method as described above.

The vision board card is arranged on the robot body and is in communication connection with the GPU server through a TCP/IP protocol.

When the robot is used for multi-station operation in an industrial field, the system scheme of the patent can effectively save the cost of the fixed robot needing to be invested and realize the flexible multi-station operation under certain conditions. Meanwhile, for the production of certain products, such as medicines, daily supplies and the like, the system can well distinguish different article types, and realize the functions of classification, arrangement and the like.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic or optical disk, or the like.

In view of the above, the detailed description of the article identification and pre-sorting system, method and robot based on deep learning provided by the present invention is provided, and those skilled in the art will appreciate that the concepts of the embodiments of the present invention may be changed in the specific implementation manners and the application ranges.

Claims

1. A deep learning based item identification pre-sorting method, the method comprising:

acquiring image data containing a target object by using an RGBD camera;

2. The deep learning based item identification pre-sorting method of claim 1, wherein before the target localization of the image data by using convolutional neural network algorithm and the pixel-level segmentation of the localized target object by using FPN network to obtain the target item pixel point set, the method further comprises:

3. The deep learning based item identification pre-sorting method according to claim 1, wherein the processing the point cloud collection by ICP matching results in pose information of the target object, the pose information being used for sorting use of the target object, and the method further comprises:

4. The deep learning based item identification pre-sorting method of claim 1, wherein the target locating the image data by using a convolutional neural network comprises:

5. The deep learning based item identification pre-sorting method of claim 1, wherein the target locating the image data by using a convolutional neural network comprises:

6. The deep learning-based item identification pre-sorting method according to claim 1, wherein the processing the point cloud set by means of iterative closest point ICP matching to obtain the pose information of the target object comprises:

7. The deep learning based item identification pre-sorting method according to claim 2, wherein the image recognition of the image data to obtain the identification information of the target item comprises:

8. An item identification pre-sorting system based on deep learning, comprising:

9. The deep learning based item identification pre-sorting system of claim 8, wherein the vision board is further configured to convert the pose information into position parameters of a robot coordinate system, so that a robot grabs the target object according to the position parameters;

10. A robot for performing the deep learning based item identification pre-sorting method of any of claims 1 to 7.