CN116213306A

CN116213306A - Automatic visual identification method and sorting system

Info

Publication number: CN116213306A
Application number: CN202310216821.8A
Authority: CN
Inventors: 刘炫志; 叶于平; 梁积鑫; 郑泽凡; 徐川; 宋展
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2023-02-27
Filing date: 2023-02-27
Publication date: 2023-06-06

Abstract

The invention discloses an automatic visual identification method and a sorting system. The method comprises the following steps: scanning an area with a target commodity, reconstructing a three-dimensional point cloud and generating a 2D gray scale image; and inputting the 2D gray level map into a trained neural network model, outputting 2D coordinate information of a central point of each target commodity in the image, and calculating 3D coordinate information corresponding to each 2D point under a camera coordinate system according to the three-dimensional point cloud. After the target commodity is identified, a sorting system can be further realized to control the mechanical arm to grasp and control the target commodity. The invention integrates multiple technologies such as deep learning, polarized light imaging, three-dimensional reconstruction of structured light and the like, overcomes the imaging quality problem caused by light transmission/reflection materials, reduces the influence of the image quality on the performance of a deep learning model, and can more accurately position and grasp objects.

Description

Automatic visual identification method and sorting system

Technical Field

The invention relates to the technical field of computer vision, in particular to an automatic vision identification method and a sorting system.

Background

With the development of scientific technology, the use of an automatic mechanical arm instead of manual operation in the existing production workshops is gradually becoming a trend. Compared with manual sorting, the mechanical arm has the characteristics of high efficiency, low cost and small error. The mechanical arm sorting systems commonly used in the market at present can be roughly divided into two types. The first type is fixed grabbing with fixed parameters, namely, only an invariable moving parameter is input to the mechanical arm, and the mechanical arm reaches a fixed place for grabbing by translating and rotating up and down, left and right by the same distance each time. The method is generally only applicable to grabbing fixed objects, and has strict limitation on the positions of the objects on the production line. The second category adopts a computer vision algorithm, automatically identifies the position of an object through a computer vision imaging and identification technology, and then transmits relevant position information to the mechanical arm for grabbing. The method is more intelligent, and at present, the problems of inaccurate grabbing and the like can occur when the surface grabbing of lambertian objects and the like is performed and the transparent packaging bag is grabbed.

The existing research mainly designs the mechanical arm from four aspects. The first method is to use fixed grabbing with fixed parameters, namely, manually setting translation or rotation parameters of the mechanical arm, and controlling the mechanical arm to move to a designated position for operation. The second type of method is to use a conventional 2D image feature extraction algorithm to perform fixed capturing, and in 2D feature capturing, preprocessing, such as gray level transformation, binarization, morphological processing, etc., is first required to perform preprocessing on an image so as to extract useful information in the image. Computer vision techniques, such as feature extraction, template matching, object tracking, etc., are then used to locate the position of the object. A third class of methods uses a deep learning algorithm (e.g., MASK-RCNN) to extract 2D image features. The method comprises the steps of training a neural network model in advance to enable the neural network model to have the capability of identifying a specified object in a 2D image, and then positioning and grabbing by using a manipulator according to an identification result. A fourth class of methods is based on grabbing of 3D features. Such methods use three-dimensional sensors to locate objects by measuring three-dimensional information of the object surface by means of a three-dimensional laser scanning-based method and a depth image-based method.

For example, patent application CN202210455324.9 proposes a gripping control system and method of an industrial robot provided with a mechanical arm electrically coupled with a camera device; the image pickup device comprises an information acquisition module and a sending module; the mechanical arm comprises a receiving module, an information processing module and a grabbing judgment module; acquiring commodity information pictures on a production line; the sending module is used for sending the commodity information picture to the receiving module; the information processing module is used for matching the commodity information picture received by the receiving module with the full-angle template picture, wherein the full-angle template picture is provided with 360 template pictures with different angles; the grabbing judgment module is used for rotating the mechanical arm according to the offset angle, enabling the clamping angle of the mechanical arm clamping hand to be consistent with the offset angle of the target commodity, and executing grabbing action on the target commodity. According to the scheme, the offset angle of the target commodity is identified, and the clamping hand of the rotating mechanical arm enables the clamping direction to be consistent with the offset angle, so that the clamping hand can firmly clamp the target commodity.

Patent application CN201910690105.7 discloses a garbage can identifying and grabbing method based on three-dimensional point cloud. Firstly, a three-dimensional point cloud laser radar module is added on a traditional garbage truck with a hanging barrel, and a hydraulic device is replaced by a mechanical arm executing unit. And then the vehicle-mounted three-dimensional laser radar scans to obtain distance and angle information of each scanning point, converts the polar coordinate information into three-dimensional coordinate point information under a rectangular laser radar coordinate system, integrates the three-dimensional coordinate point information to form point cloud data, and sends the point cloud data to the data processing unit. The data processing unit calculates the relative position relation between the garbage bin and the mechanical arm according to the point cloud model, combines the target position between the garbage bin and the mechanical arm, transmits error information between the two positions to the control solver, and generates a control signal to be transmitted to the mechanical arm. Finally, the mechanical arm accurately reaches the given target end point position, and garbage can grabbing, garbage dumping and garbage can homing are completed.

Patent application CN202110207871.0 discloses a plane grabbing detection method based on computer vision and deep learning, comprising: collecting or self-making a grabbing data set, and carrying out specific data enhancement; supplementing depth map information by using a depth supplementing algorithm, and carrying out depth information fusion, unified cutting and training verification division on the data set; according to the training obtained grabbing detection model, real image data is used as network input, grabbing quality fraction and five-dimensional representation of a grabbing frame are used as output, a counter-propagation algorithm and a standard gradient-based optimization algorithm are adopted, the four vertex information of the grabbing frame is converted through sequencing optimization, visualization is achieved, and finally mapping to real world coordinates is achieved. So that the difference between the detected grabbing frame and the true value is minimized.

Patent application CN202210163948.3 provides a method and a system for grabbing a target of a mechanical arm, which comprise: acquiring a clamping target identification data set; image preprocessing is carried out on the clamped target identification data set; training a target detection model based on the preprocessed clamping target identification data set; predicting and identifying the target to be clamped based on the trained target detection model; based on the prediction recognition result, solving the motion gesture of the mechanical arm according to inverse kinematics; the training target detection model based on the preprocessed clamping target recognition data set carries out position coding according to the preprocessed clamping target recognition data set, and then the clamping target recognition data set based on the position coding carries out the resolving of an encoder decoder based on a self-attention principle to obtain a clamping target recognition data prediction set; and obtaining a final target detection frame according to the clamped target identification prediction set.

The prior art, as analyzed, has mainly the following drawbacks:

1) For fixed grabbing using fixed parameters, the application scenario is single. The mechanical arm is controlled to move by manually setting parameters, which means that the mechanical arm can only grasp objects at fixed positions and the position of the mechanical arm cannot be changed, so that the layout design on the production line is strictly limited, and the subsequent adjustment is not facilitated.

2) The traditional 2D image feature extraction algorithm has poor robustness, and can not well complete the recognition tasks of various complex targets.

3) The 2D image feature extraction based on the deep learning method is easily influenced by external factors, such as reflection/transmission caused by the material of the target surface can influence the imaging quality, and errors occur in model prediction results.

4) Traditional mechanical arm vision snatchs and all uses 2D coordinates to carry out the location and guides snatchs, and three-dimensional coordinate z axle sets up to a fixed value generally, can lead to the mechanical arm to touch the target in an accurate way like this, appears grabbing empty phenomenon.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provides an automatic visual identification method and a sorting system, which are particularly suitable for transparent and reflective packaging bags.

According to a first aspect of the present invention, an automatic visual recognition method is provided. The method comprises the following steps:

scanning an area with a target commodity, reconstructing a three-dimensional point cloud and generating a 2D gray scale image;

and inputting the 2D gray level map into a trained neural network model, outputting 2D coordinate information of a central point of each target commodity in the image, and calculating 3D coordinate information corresponding to each 2D point under a camera coordinate system according to the three-dimensional point cloud.

According to a second aspect of the present invention, a sorting system is provided. The system comprises: a 3D structured light camera, a robotic arm, and a computer device, wherein the 3D structured light camera is configured to scan an area having a target commodity and to transmit a scanned image to the computer device; the computer device is configured to perform: obtaining 3D coordinate information of the target commodity according to the method, and taking the 3D coordinate information as grabbing point information; and calculating the corresponding point of the grabbing point information under the coordinate system of the mechanical arm base according to the hand-eye calibration conversion matrix, inputting the point into a mechanical arm operation instruction, and realizing the positioning grabbing of the mechanical arm.

Compared with the prior art, the method has the advantages that the method for unordered sorting of the objects on the complex surface based on the structured light three-dimensional reconstruction technology is provided, a 3D structured light camera based on polarization imaging is used for collecting a plurality of image data sets of the objects, the image data sets are placed into an example segmentation network for training after labels are manufactured, a neural network model capable of identifying transparent/reflective objects is obtained, then the objects are positioned by using the 3D imaging camera based on structured light by combining hand-eye calibration parameters, and finally a mechanical arm is guided to grasp. According to the invention, by combining the excellent 2D image recognition performance of the deep learning algorithm with the 3D structured light imaging technology based on polarization imaging, objects can be accurately recognized and captured, and particularly, the transparent/reflective packaging bag can be accurately captured.

Other features of the present invention and its advantages will become apparent from the following detailed description of exemplary embodiments of the invention, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.

Fig. 1 is a schematic illustration of a sorting system application scenario for packages according to one embodiment of the present invention;

FIG. 2 is a flow chart of a method of automatic visual identification of packages according to one embodiment of the invention;

FIG. 3 is a process schematic of an automatic visual identification method for packages according to one embodiment of the invention;

FIG. 4 is a schematic diagram of a polarizer principle according to one embodiment of the present invention;

FIG. 5 is a schematic diagram of a polarized camera pixel array according to one embodiment of the invention;

FIG. 6 is a block diagram of a MASK-RCNN network in accordance with one embodiment of the invention;

FIG. 7 is a schematic diagram of a FPN architecture according to one embodiment of the invention;

fig. 8 is a schematic diagram of an RPN structure according to one embodiment of the invention;

FIG. 9 is a reconstructed greyscale image effect according to one embodiment of the present invention;

FIG. 10 is a graph of recognition results based on a deep learning model implementation in accordance with one embodiment of the present invention;

in the drawings, prediction-prediction; score-score; cordinates-coordinates; anchor boxes-anchor boxes; sliding window-sliding window; intermediate layer-intermediate layer; a reg layer-regression layer; cls layer-classification layer; a conv feature map-convolution feature map; conv-convolution; roialign-region of interest alignment.

Detailed Description

Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless it is specifically stated otherwise.

The following description of at least one exemplary embodiment is merely exemplary in nature and is in no way intended to limit the invention, its application, or uses.

Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any specific values should be construed as merely illustrative, and not a limitation. Thus, other examples of exemplary embodiments may have different values.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.

The automatic visual recognition method provided by the invention can recognize various types of objects or commodities, and for the sake of clarity, the following description will take a recognition package as an example. After the target commodity is identified, a sorting system can be further realized to control the mechanical arm to grasp and control the target commodity. The mechanical arm is a mechanical electronic device which can move any object according to the time-varying requirement of the space pose (position and pose) so as to finish the task setting requirement, and the shape of the mechanical arm can be changed into various types. For example, the suction cup at the end of the mechanical arm can be replaced by other forms of grabbing devices, such as a mechanical arm, according to different grabbing targets.

Referring to fig. 1, taking the example of eyes outside the hand, a sorting system is provided that includes a 3D structured light camera, a robotic arm, and a computer device. The 3D structured light camera may scan a target area, such as a workstation, and send the scanned image to a computer device for processing. The computer device may perform the following processes: reconstructing a three-dimensional point cloud of the scanned image by using a 3D structured light technology, and generating a 2D gray scale image; transmitting the 2D gray level map into a trained neural network model, predicting the image by the model, and outputting mask information of each packaging bag instance and 2D coordinate information of a central point in the image; according to the reconstructed three-dimensional point cloud, calculating a 3D coordinate corresponding to the 2D point under a camera coordinate system; and calculating a corresponding point of the 3D point under the coordinate system of the mechanical arm base according to the hand-eye calibration conversion matrix, and inputting the point into a mechanical arm operation instruction, so that the mechanical arm is positioned and grabbed.

Specifically, as shown in fig. 2 and 3, the provided automatic visual recognition method for the packaging bag comprises the following steps:

step S210, scanning the target area, reconstructing a three-dimensional point cloud and generating a 2D gray scale map.

The 3D structured light camera obtains three-dimensional point cloud information thereof by scanning a target area and can generate a 2D image of the target. The points in the 2D image are in one-to-one correspondence with the points in the three-dimensional point cloud, and only one 3D point corresponds to the points in each 2D coordinate. When 2D coordinate information { P } of the object is obtained _2D (x _2D ,y _2D ) After } the corresponding 3D coordinates { P } can be obtained through calculation _3D (x _3D ,y _3D ,z _3D ) The calculation method is as follows:

index＝y _2D W _pic +x _2D (1)

x _3D ＝x[index] (2)

y _3D ＝y[index] (3)

z _3D ＝z[index] (4)

wherein the method comprises the steps ofIndex is the index value of the 2D coordinates, W _pic Is the width of the 2D image and stored in x, y, z is the three-dimensional point cloud data of the image.

The complex surface of the packaging bag can influence the reconstruction of the three-dimensional point cloud such as reflection/light transmission and the like, so that the 3D coordinates of the reflection point are lost, and in order to reduce the influence of the phenomenon on the reconstruction of the three-dimensional point cloud, the reflection is eliminated by utilizing the property of polarized light. Light rays are subjected to specular reflection (high polarization degree) and diffuse reflection (low polarization degree) on the surface of the transparent/reflective packaging bag at the same time, so that light received by the 3D structure light camera sensor is mixed with light rays with different polarization degrees. The polarized filter can be used for filtering out polarized light in the light or converting polarized light into unpolarized light for the camera to receive, so that the influence of reflection on the reconstruction of the three-dimensional point cloud is reduced. The principle of the polarizer is shown in fig. 4.

In one embodiment, in order to reconstruct a complete point cloud, a polarized camera capable of imaging 4 polarization states at a time is used, a linear polarizer is additionally arranged in front of a projector lens to enable projection light to be polarized light with a fixed angle, 4 point clouds are reconstructed by using 4 images with different polarization states, and point cloud fusion is performed to obtain the complete point cloud. As shown in fig. 5, the method for storing images by the polarization camera stores pixels of one polarization state in each of 4 adjacent pixels, which are respectively 0 °, 45 °, 90 °, 135 °. And after the image acquisition is completed, the pixels with the corresponding polarization states are taken out and are segmented into 4 pictures, so that 4 images with different polarization states can be obtained by one-time photographing, and each image is one quarter of the resolution of the original image.

By using the three-dimensional reconstruction method of the 3D structured light, the point clouds and the 2D gray level map under 4 polarization states can be reconstructed. For the point cloud obtained by reconstruction, the overexposure and the darkness of the image pixels can cause the point cloud to be missing, so that the corresponding fusion operation needs to be implemented by judging whether the pixel value of the gray level image and the point cloud are missing or not.

Step S220, inputting the 2D gray level map into a trained neural network model, outputting mask information of each target object instance in the image and 2D coordinate information of a center point, calculating corresponding 3D coordinates of the 2D point under a camera coordinate system according to the reconstructed three-dimensional point cloud, and further determining grabbing point coordinates.

The neural network model for object segmentation may employ various types of deep learning models. For example, a MASK-RCNN example segmentation model is trained using a 2D gray scale map, and can be used for example segmentation of a package bag with a complex surface, and the network model is segmented based on object detection, and the structure is shown in fig. 6. The Mask R-CNN process generally includes: data preprocessing (size, normalization, etc.) is performed on the input image; extracting a feature map by using a feature extraction network; setting a region of interest (ROI) through each point in the feature map, and obtaining candidate frames of a plurality of ROIs; sending the multiple ROI candidate boxes to a region selection network (RPN, region Proposal Network) for classification and regression, filtering out a portion of the candidate ROIs; performing an ROI alignment operation on the rest ROIs; finally, these ROIs are classified, regressed, mask (Mask) generated, etc.

In one embodiment, in order to combine the information of the deep image and the shallow image at the same time, an FPN (Feature Pyramid Networks, feature pyramid) structure as shown in fig. 7 is adopted in the neural network model, and is used for fusing images with multi-scale sizes, combining the image features of the deep image and the shallow image, and meeting the requirements of target detection and image classification.

The RPN structure adopted in the neural network model is shown in FIG. 8, and the RPN is used for extracting candidate frames and generating the region of interest, and has the advantages of less time consumption and high accuracy.

In one embodiment, the input image of the neural network model is a four-channel image, and the image of each layer is a single-channel gray scale image reconstructed at angles of 0 °, 45 °, 90 °, and 135 °. And (5) through neural network model calculation, mask information and box position information of each packaging bag instance of the surface layer can be identified. Based on the upper left angular coordinate L (X of the box output by the network _L ,Y _L ) And lower right angular position R (X) _R ,Y _R ) From the midpoint formula:

x _mid ＝(X _L +X _R )/2 (5)

y _mid ＝(Y _L +Y _R )/2 (6)

the center point coordinates of the example object can be calculated as M (x _mid ,y _mid )。

In order to prevent the problem of the inability to grasp caused by the lack of the 3D point coordinates corresponding to the center point M, a square area with M points as the upper left corner coordinates and 5 pixel values at the side length is taken, and each point M in the area is obtained _ij Corresponding three-dimensional point cloud coordinates { P _ij (x _ij ,y _ij ,z _ij ) And (3) calculating the average value of the two coordinates as the actual center point grabbing coordinates.

And step S230, calculating corresponding points of the grabbing points under the coordinate system of the mechanical arm base according to the hand-eye calibration conversion matrix, and further driving the mechanical arm to run instructions so as to realize positioning grabbing.

For the calculated 3D point coordinates { P _ij (x _ij ,y _ij ,z _ij ) -the coordinate value of which is based on the camera coordinate system P _Camera A kind of electronic device. For P _Camera Each point below the coordinate system P of the center point of the mechanical arm base can be calculated through a transformation matrix H _robot The following corresponding points:

where R is referred to as the rotation matrix and T is referred to as the translation vector.

After constructing the transformation matrix H, P can be calculated _Camera The lower point is at P _robot The corresponding values of:

that is, for one P _Camera Point P in coordinate system _ij The method comprises the following steps:

P′ _ij ＝R*P _ij +T (10)

wherein P' _ij Is P _ij Point at P _robot Corresponding points in the coordinate system.

Will P' _ij And inputting the mechanical arm as a final result, namely commanding the mechanical arm to run to the point for grabbing.

To further verify the effect of the present invention, experimental system tests were performed. Fig. 9 is a gray scale image reconstructed in a random state. Fig. 10 is a recognition effect achievable by the deep learning model. Experiments prove that the invention can drive the mechanical arm to carry out more accurate grabbing or moving control on the basis of accurately identifying the target object.

In summary, compared with the prior art, the invention has the following advantages:

1) The invention designs a transparent/reflective unordered sorting scheme of packaging bag objects based on a structured light reconstruction technology, and the 3D structured light camera and the mechanical arm are used for division work and cooperation to realize intelligent identification and accurate grabbing. Compared with the traditional mechanical arm grabbing technology, the invention does not need to manually set the movement parameters of the mechanical arm, can be flexibly applied to various production scenes, and has the advantages of strong robustness, high flexibility, excellent performance and the like.

2) The invention integrates multiple technologies such as deep learning, polarized light imaging, three-dimensional reconstruction of structured light and the like, reduces specular reflection by utilizing polarized light, carries out target identification by the deep learning, and obtains accurate target position information by the three-dimensional reconstruction of the structured light. Compared with a 3D grabbing technology using a deep learning algorithm, the method provided by the invention uses the characteristic of polarized light, overcomes the imaging quality problem caused by light transmission/reflection materials, and reduces the influence of the image quality on the performance of a deep learning model.

3) The invention adopts a three-dimensional point cloud reconstruction technology to convert 2D information into 3D information. Compared with the grabbing technology using the traditional visual algorithm, the method has stronger robustness, can identify targets with different color, shape and characteristics, and can be applied to production of a production line.

The present invention may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present invention.

The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.

The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.

Computer program instructions for carrying out operations of the present invention may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++, python, and the like, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information for computer readable program instructions, which can execute the computer readable program instructions.

Various aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, implementation by software, and implementation by a combination of software and hardware are all equivalent.

The foregoing description of embodiments of the invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the technical improvements in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the invention is defined by the appended claims.

Claims

1. An automatic visual identification method comprising the steps of:

2. The method of claim 1, wherein the neural network model is a MASK-RCNN model comprising a region selection network and a feature pyramid structure, the region selection network being configured to extract candidate boxes from an input image to generate a region of interest; the feature pyramid structure adopts two paths from top to bottom and from bottom to top to extract multi-scale images, and combines deep and shallow image features.

3. The method of claim 1, wherein the 2D coordinate information of the center point of the target commodity is calculated according to the following formula:

x _mid ＝(X _L +X _R )/2

y _mid ＝(Y _K +Y _R )/2

wherein, (X _L ，Y _L ) Is the upper left corner of the target commodity, (X) _R ，Y _R ) Is the upper right corner coordinate of the target commodity, M (x _mid ，y _mid ) Is the 2D coordinate information of the center point of the target commodity.

4. A method according to claim 3, further comprising: taking a square area with M points as the upper left corner coordinates and 5 pixel values at the side length, and obtaining each point M in the area _ij And calculating the average value of the corresponding three-dimensional point cloud coordinates as the actual center point coordinates.

5. The method of claim 1, wherein the three-dimensional point cloud is reconstructed according to the following formula:

index＝y _2D W _pic +x _2D

x _3D ＝x[index]

y _3D ＝y[index]

z _3D ＝z[index]

wherein P is _2D (x _2D ，y _2D ) Representing 2D coordinate information, (x) _3D ，y _3D ，Z _3D ) Representing the corresponding 3D coordinates, index is the index value of the 2D coordinates, W _pic Is the width of the 2D image.

6. A sorting system, comprising: 3D structured light camera, arm and computer device, wherein:

the 3D structured light camera is used for scanning the area with the target commodity and transmitting the scanned image to the computer equipment;

the computer device is configured to perform:

obtaining 3D coordinate information of a target commodity as grasping point information according to the method of any one of claims 1 to 5;

and calculating the corresponding point of the grabbing point information under the coordinate system of the mechanical arm base according to the hand-eye calibration conversion matrix, inputting the point into a mechanical arm operation instruction, and realizing the positioning grabbing of the mechanical arm.

7. The system of claim 1, wherein the 3D structured light camera is configured to: a linear polarizer is arranged in front of a projector lens, so that projection light is polarized light with a fixed angle; the computer equipment reconstructs 4 point clouds by using images with different polarization states of 0 degrees, 45 degrees, 90 degrees and 135 degrees, and performs point cloud fusion to obtain a complete three-dimensional point cloud.

8. The system of claim 1, wherein the corresponding points of the grasp point information in the robot base coordinate system are calculated according to the following formula:

/>

P′ _ij ＝R*P _ij +T

wherein { P _ij (x _ij ，y _ij ，z _ij ) The 3D coordinates of the grabbing point, P' _ij Is P _ij The point is the corresponding point of the coordinate system of the center point of the mechanical arm base, and P 'is calculated' _ij The input mechanical arm, R is a rotation matrix, and T is a translation vector.

9. The system of claim 1, wherein the target commodity is a package.

10. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor realizes the steps of the method according to any of claims 1 to 5.