CN114049557A

CN114049557A - Garbage sorting robot visual identification method based on deep learning

Info

Publication number: CN114049557A
Application number: CN202111323743.9A
Authority: CN
Inventors: 严圣军; 刘德峰; 梅文豪; 倪玮玮
Original assignee: Shanghai Zhiying Robot Technology Co ltd; Jiangsu Tianying Environmental Protection Energy Equipment Co Ltd; China Tianying Inc
Current assignee: Shanghai Zhiying Robot Technology Co ltd; Jiangsu Tianying Environmental Protection Energy Equipment Co Ltd; China Tianying Inc
Priority date: 2021-11-10
Filing date: 2021-11-10
Publication date: 2022-02-15

Abstract

The invention discloses a garbage sorting robot visual identification method based on deep learning, wherein a 2D area-array camera and a 3D line-array camera complete synchronous calibration under the same world coordinate system; establishing a YOLOv4 target detection model; starting multithreading, acquiring a pulse value of an encoder in real time, and simultaneously acquiring image data by using a 2D camera and a 3D camera; and acquiring current contour image data in the memory to obtain actual height information and width information of the target object, and sending the actual height information and width information together with the rectangular frame center coordinate, the rectangular frame width, the rectangular frame height and the target type obtained by the YOLOv4 target detection model to the garbage sorting robot, so that the robot realizes real-time online grabbing and classification. According to the method, the images of the 2D camera and the 3D camera are synchronously acquired, the encoder pulse value corresponds the target image detected by the YOLOv4 detection model with the 3D scanning data, and the target height and width information is obtained.

Description

Garbage sorting robot visual identification method based on deep learning

Technical Field

The invention relates to a visual identification method, in particular to a visual identification method of a garbage sorting robot based on deep learning, and belongs to the field of visual identification of garbage sorting robots.

Background

The robot picks materials and mainly finishes visual identification through front-end images, for a two-dimensional object, target materials can be collected through a 2D camera, and three-dimensional materials are required to be finished by a 3D camera due to reasons such as inconsistent shapes and sizes of the three-dimensional materials. Get object height information among traditional 3D camera and mainly carry out object height information's acquisition through installing laser displacement sensor at fixed position, although laser displacement sensor straight line performance is very good, and the accuracy is high, nevertheless in the data send process, because the unstable factor is more in the object motion process, often leads to the robot to snatch efficiency not high, and what more can lead to the robot to take place empty grabbing, collision. On the basis, the 2D camera and the 3D camera are generally fused for recognition, but the two cameras must ensure that the visual field areas of the object are the same when the object passes through the 2 cameras, and if the two cameras are not aligned, the positioning accuracy of the robot is not accurate. The existing technology can not well fuse the 2D camera and the 3D camera to achieve complete synchronization of data.

Disclosure of Invention

The invention aims to solve the technical problem of providing a garbage sorting robot visual identification method based on deep learning, and realizing synchronous fusion of data of a 2D camera and data of a 3D camera.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

a garbage sorting robot visual identification method based on deep learning is characterized by comprising the following steps:

the method comprises the following steps: the 2D area-array camera and the 3D line-array camera complete synchronous calibration under the same world coordinate system;

step two: establishing a YOLOv4 target detection model;

step three: starting multithreading, acquiring a pulse value of an encoder in real time, simultaneously acquiring image data by a 2D camera and a 3D camera, acquiring an RGB (red, green and blue) image to be identified and detected by the 2D camera, and acquiring the outline and height information of a target image by the 3D camera;

step four: inputting the RGB image into a YOLO v4 target detection model to obtain a rectangular frame center coordinate, a rectangular frame width, a rectangular frame height and a target type;

step five: the 3D camera circularly acquires single-contour images, and data are sequentially stored in a memory with a specified size, wherein the memory occupation size of the single-contour images is determined by the pulse number of the encoder;

step six: and acquiring current contour image data in the memory to obtain actual height information and width information of the target object, and sending the actual height information and width information together with the rectangular frame center coordinate, the rectangular frame width, the rectangular frame height and the target type obtained by the YOLOv4 target detection model to the garbage sorting robot, so that the robot realizes real-time online grabbing and classification.

Further, the calibration method of the 2D area-array camera in the first step includes:

collecting chessboard calibration plates at different positions and rotation angles by using a camera;

performing internal reference calibration by adopting Matlab to obtain an internal reference matrix and a distortion coefficient;

determining a world coordinate system on a chessboard calibration plate, determining pixel coordinates of 4 angular points on the calibration plate and world coordinates corresponding to the 4 angular points under the determined world coordinate system by a PNP 4-point calibration method, and performing external reference calibration on the 4 angular points by a solvePNP operator to obtain a rotation matrix and a translation matrix of the external reference so as to finish the external reference calibration.

Further, the calculation formula for converting the pixel coordinates to the world coordinates is as follows:

wherein: u and v are respectively a pixel abscissa and a pixel ordinate in a pixel coordinate system; x is the number of_w、y_w、z_wRespectively an abscissa, an ordinate and a vertical coordinate in a world coordinate system; r is a rotation matrix; t is a translation matrix; u. of₀、v₀、f_x、f_yIs a camera internal parameter, i.e. u₀And v₀Respectively the image center abscissa and the image center ordinate, f_xAnd f_yRespectively a transverse equivalent focal length and a longitudinal equivalent focal length; s is the camera coordinate in the camera coordinate system.

Further, the calibration method of the 3D line camera comprises:

irradiating the calibration plate by using a 3D laser to enable the laser to be parallel to an x axis of a fixed world coordinate system;

turning off the laser, improving the exposure time, and collecting a picture with clear angular points;

after the picture is collected, opening the laser, and then advancing the conveyor belt for a certain distance to enable the laser to fall on the other checkerboard and enable the laser to be parallel to the x axis;

closing the laser, improving the exposure time, and collecting another picture with clear angular points;

and finally, calibrating, and finishing the calibration of the 3D camera after the data storage is finished.

Further, the second step is specifically: the method comprises the steps of collecting RGB images by using a 2D camera, marking acquired picture information by marking personnel, carrying out model training by using a YOLOv4 target detection model, and generating a final YOLOv4 target detection model.

Further, the fourth step is specifically:

inputting the collected RGB image into a YOLOv4 target detection model with an input size of 608X 608 to obtain a list of all position frames Bounding Box with target materials in the image, and filtering by a non-maximum suppression NMS algorithm to obtain the coordinate position information of the target garbage points which need to be reserved finally;

the non-maximum suppression (NMS) algorithm is as follows:

wherein Si represents the score of each frame, M represents the frame with the highest current score, bi represents a certain frame of the rest frames, Nt is a set NMS threshold, and IOU is the proportion of the overlapping area of the two identification frames;

when the Yolov4 target detection model detects a target value, acquiring a pulse value of a current encoder in real time, wherein the acquired pulse value is a pulse value of a target material center coordinate, taking the height of a rectangular frame detected by the Yolov4 target detection model as the moving direction of a conveyor belt, and sending the pulse value of the current encoder, the detected target center coordinate, and the width and height information of the rectangular frame to a data analysis processing module together.

Further, the fifth step is specifically:

the data analysis processing module calculates the initial position of the target material in the 3D storage memory, and the specific calculation formula is as follows:

wherein A represents the current encoder pulse value; b represents an initial starting pulse value; 1216 denotes 1216 scan points on one contour; 3 represents 3 coordinate values of x, y and z; 4 represents that the x, y and z values of each scanning point respectively occupy 4 bytes;

circularly acquiring single-contour images of the 3D linear array camera, sequentially storing data in a memory with a specified size, wherein a specific calculation formula of the size of a stored byte is as follows:

wherein 1.6mm represents the world coordinate distance between the two scanned contours; 1216 denotes 1216 scan points on one contour; 3 represents 3 coordinate values of x/y/z; 4 indicates that the x/y/z values of each scanning point each occupy 4 bytes;

and finally, comprehensively converting according to the initial position of the target material in the 3D storage memory and the size of the storage bytes of the target material to obtain the final end position of the target material in the 3D storage memory, wherein the specific calculation formula is as follows:

final end position-memory initial position + storage byte size

And reading the corresponding target 3D scanning memory data in the data analysis processing module, namely finishing reading the x, y and z values of the target material in the 3D storage memory data in the three-dimensional world.

Further, the world coordinate distance of 1.6mm between the two contours is from the following sources:

the 3D camera scans a profile with 5 pulses, and a profile has 1216 points, each point includes x/y/z values, each value occupies 4 bytes, one encoder pulse value is 1000, the distance of the belt is 320mm when the belt travels one circle, that is, one pulse distance is 0.32mm, and one profile is scanned every 5 pulses, that is, 5 × 0.32 ═ 1.6mm, so that the distance between the two profiles is 1.6mm from the corresponding world coordinate.

Further, when the number of the 3D acquisition profiles is full of 50000, the 3D data stores data again from the initial position of the memory space, the data stored before are covered, the data are sequentially stored backwards, the data are stored in an infinite loop mode, and when the program stops running, the opened memory space is released, so that the memory is prevented from overflowing or leaking.

Further, the sixth step is specifically:

acquiring 3D memory data, taking x and y values as row and column pixel coordinates corresponding to Mat in OpenCV, carrying out normalization processing on z values from 0 to 255, taking the values obtained by the normalization processing as gray values corresponding to the x and y pixel coordinates in the Mat, extracting the gray values in the corresponding Mat according to the pixel values of the target center coordinates by the height values, and carrying out inverse normalization processing through the gray values to obtain the actual height information of the target;

processing the Mat image by using algorithms such as OpenCV thresholding segmentation, open-close operation, minimum contour processing and the like, and extracting width information of a target object;

and finally, the central coordinate position of the target object obtained by 2D extraction, the target type and the target height information and width information obtained by 3D extraction are integrally sent to the robot, so that the robot can realize real-time online grabbing and classification.

Compared with the prior art, the invention has the following advantages and effects:

1. according to the method, the images of the 2D camera and the 3D camera are synchronously acquired, the pulse value of the encoder corresponds the target image detected by the YOLOv4 detection model with the 3D scanning data, and the height and width information of the target is solved, so that the method is high in detection speed and high in identification precision;

2. according to the invention, a cyclic 3D memory data storage system is formed in the data reading process, so that not only is the memory not overflowed, but also the memory space and the encoder can be combined to read the data of the corresponding memory space in real time;

3. according to the invention, the deep learning YOLO target detection model is applied to the 2D camera and 3D camera fusion technology, the OpenCV is used for processing the memory data, the real-time grabbing accuracy of the robot is higher, and the robustness is stronger.

Drawings

Fig. 1 is a flowchart of a visual recognition method of a garbage sorting robot based on deep learning according to the present invention.

Fig. 2 is a schematic flow chart of the camera synchronization calibration of the present invention.

Fig. 3 is a schematic diagram of a calibration board before calibration of a 3D camera according to the present invention.

Fig. 4 is a schematic diagram of the calibration board after calibration of the 3D camera according to the present invention.

FIG. 5 is a flowchart of OpenCV data processing and sending the processing result to a robot to implement grabbing according to the present invention.

Fig. 6 is a schematic diagram of a target width AB obtained from the memory data corresponding to the OpenCV processing target according to the present invention.

Detailed Description

To elaborate on technical solutions adopted by the present invention to achieve predetermined technical objects, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, it is obvious that the described embodiments are only partial embodiments of the present invention, not all embodiments, and technical means or technical features in the embodiments of the present invention may be replaced without creative efforts, and the present invention will be described in detail below with reference to the drawings and in conjunction with the embodiments.

As shown in fig. 1, the visual recognition method of the garbage sorting robot based on deep learning of the present invention includes the following steps:

the method comprises the following steps: and the 2D area-array camera and the 3D line-array camera complete synchronous calibration under the same world coordinate system.

As shown in fig. 2, the calibration method of the 2D area-array camera includes:

The calibration method of the 3D linear array camera comprises the following steps:

as shown in fig. 3, turning off the laser, increasing the exposure time, and collecting a picture with clear corner points;

as shown in fig. 4, the laser is turned off, the exposure time is increased, and another picture with clear corner points is acquired;

and finally, calibrating the 3D camera after the data storage is finished, wherein the calibration process of 4 angular points is the same as the 2D calibration process.

Step two: establishing a YOLOv4 target detection model; the method comprises the steps of collecting RGB images by using a 2D camera, marking acquired picture information by marking personnel, carrying out model training by using a YOLOv4 target detection model, and generating a final YOLOv4 target detection model.

step four: inputting the RGB image into a YOLO v4 target detection model to obtain the center coordinates of a rectangular frame, the width of the rectangular frame, the height of the rectangular frame and the type of a target.

the non-maximum suppression (NMS) algorithm is as follows:

Step five: the 3D camera circularly acquires single-contour images, data are sequentially stored in a memory with a specified size, and the memory occupation size of the single-contour images is determined by the pulse number of the encoder.

final end position-memory initial position + storage byte size

The world coordinate distance of 1.6mm between the two contours is from the following sources:

According to the invention, a cyclic 3D memory data storage system is formed in the data reading process, when 50000 data are full of 3D acquisition outlines, the 3D data stores data again from the initial position of the memory space, the previously stored data is covered, the data are sequentially stored backwards and continuously, the infinite cyclic storage of the data is carried out, and when the program stops running, the opened memory space is released, so that the overflow or leakage of the memory is prevented.

As shown in fig. 5, acquiring and obtaining 3D memory data, taking x and y values as row and column pixel coordinates corresponding to Mat in OpenCV, performing normalization processing of 0 to 255 on z values, taking values obtained by the normalization processing as gray values corresponding to the x and y pixel coordinates in Mat, extracting gray values in the Mat corresponding to the height values according to pixel values of target center coordinates, and performing inverse normalization processing through the gray values to obtain actual height information of the target object;

processing the Mat image by using algorithms such as OpenCV thresholding segmentation, open-close operation, minimum contour processing and the like, and extracting width information of a target object; as shown in fig. 6, contour information is extracted for the corresponding garbage object, wherein the length of the point AB is the width information of this object.

According to the method, the images of the 2D camera and the 3D camera are synchronously acquired, the pulse value of the encoder corresponds the target image detected by the YOLOv4 detection model with the 3D scanning data, and the height and width information of the target is solved, so that the method is high in detection speed and high in identification precision; according to the invention, a cyclic 3D memory data storage system is formed in the data reading process, so that not only is the memory not overflowed, but also the memory space and the encoder can be combined to read the data of the corresponding memory space in real time; according to the invention, the deep learning YOLO target detection model is applied to the 2D camera and 3D camera fusion technology, the OpenCV is used for processing the memory data, the real-time grabbing accuracy of the robot is higher, and the robustness is stronger.

Although the present invention has been described with reference to a preferred embodiment, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A garbage sorting robot visual identification method based on deep learning is characterized by comprising the following steps:

step two: establishing a YOLOv4 target detection model;

2. The visual recognition method of the garbage sorting robot based on deep learning of claim 1, wherein: the calibration method of the 2D area-array camera in the first step comprises the following steps:

3. The visual recognition method of the garbage sorting robot based on deep learning of claim 1, wherein: the calculation formula for converting pixel coordinates to world coordinates is as follows:

4. The visual recognition method of the garbage sorting robot based on deep learning of claim 2, wherein: the calibration method of the 3D linear array camera comprises the following steps:

5. The visual recognition method of the garbage sorting robot based on deep learning of claim 1, wherein: the second step is specifically as follows: the method comprises the steps of collecting RGB images by using a 2D camera, marking acquired picture information by marking personnel, carrying out model training by using a YOLOv4 target detection model, and generating a final YOLOv4 target detection model.

6. The visual recognition method of the garbage sorting robot based on deep learning of claim 1, wherein: the fourth step is specifically as follows:

the non-maximum suppression (NMS) algorithm is as follows:

7. The visual recognition method of the garbage sorting robot based on deep learning of claim 1, wherein: the fifth step is specifically as follows:

final end position-memory initial position + storage byte size

8. The visual recognition method of the garbage sorting robot based on deep learning of claim 7, wherein: the world coordinate distance of 1.6mm between the two contours is from the following sources:

9. The visual recognition method of the garbage sorting robot based on deep learning of claim 7, wherein: when the number of the 3D acquisition outlines is full of 50000, the 3D data stores data again from the initial position of the memory space, the data stored before are covered, the data are sequentially stored backwards, the data are stored in an infinite loop mode, when the program stops running, the opened memory space is released, and the memory is prevented from overflowing or leaking.

10. The visual recognition method of the garbage sorting robot based on deep learning of claim 1, wherein: the sixth step is specifically as follows: