CN112784717B

CN112784717B - Automatic pipe fitting sorting method based on deep learning

Info

Publication number: CN112784717B
Application number: CN202110039092.4A
Authority: CN
Inventors: 韩慧妍; 吴伟州; 韩燮; 乔道迹
Original assignee: North University of China
Current assignee: North University of China
Priority date: 2021-01-13
Filing date: 2021-01-13
Publication date: 2022-05-13
Anticipated expiration: 2041-01-13
Also published as: CN112784717A

Abstract

The invention belongs to the technical field of machine vision, and particularly relates to a method for automatically sorting pipe fittings based on deep learning. Aiming at the problems that the current manual sorting is high in cost, low in efficiency, complicated, tasteless and low in recognition degree by using a traditional method. The invention improves the MaskR-CNN algorithm, improves the recognition rate and mask judgment of the algorithm, and improves the recognition rate while ensuring the speed. And putting the pictures shot by the camera into a network to obtain a classification result and a mask. Judging the type and the size of the pipe fitting, and recording information. The pipe fitting grabbing point is positioned in a mode of Zhang Zhengyou calibration and eye-to-eye calibration outside the hand. The mechanical arm is used for grabbing and stacking the pipe fittings. The automatic sorting method for the pipe fittings has high efficiency and higher robustness for identifying and grabbing the pipe fittings in different environments; but wide application in mill's letter sorting, object classification and snatch.

Description

Automatic pipe fitting sorting method based on deep learning

Technical Field

The invention belongs to the technical field of machine vision, and particularly relates to a method for automatically sorting pipe fittings based on deep learning.

Background

The visual technology of the research robot can improve the sensing ability of the robot to the external environment and greatly reduce the burden of workers, so that the workers are liberated from the complicated and tasteless working environment. At present, the labor demand of China is very high, the robot can be applied to the fields, the production cost is reduced, the working efficiency is improved, and the market competitiveness of enterprises is improved, so that industrial robots are used as core workers of the enterprises in many industries, and the production efficiency of the enterprises is improved.

In industrial robot's pipe fitting discernment field, the mode of traditional study is replaced in the accessible degree of depth study, makes the recognition effect of robot better, improves the recognition rate of part, and the mask profile through degree of depth study training is influenced less by external factors (illumination, shade), consequently can use the mask area to judge the yardstick of pipe fitting, carries out the pipe fitting more accurately and snatchs and route planning, improves and snatchs efficiency, guarantees the accuracy, stability and the practicality of operation better, has compensatied the not high defect of traditional approach recognition rate.

Conventional industrial robots generally can only recognize the same type and size of pipe fittings, and if various types and sizes of pipe fittings are to be sorted, a plurality of robots are required. The invention can use a single robot to grab the pipe fittings with multiple classes and scales.

Therefore, the automatic pipe sorting research based on deep learning has extremely important practical significance.

Disclosure of Invention

Aiming at the boring and fussy sorting of the pipe fittings, deep learning is introduced to classify the categories and the sizes of the pipe fittings. The invention provides a method for automatically sorting pipe fittings based on deep learning. The method is suitable for both static and dynamic pipe sorting (grabbing and stacking of pipes on a conveyor belt).

In order to achieve the purpose, the invention adopts the following technical scheme:

a method for automatically sorting pipes based on deep learning comprises the following steps:

step 1, labeling categories and mask outlines of collected pipe fitting images by using a labelme tool to obtain a pipe fitting image of a json type labeling result;

step 2, performing data enhancement on the pipe fitting image with the labeling result to obtain a data set;

step 3, designing a network structure, and reading the data set in the step 2 into the network for training;

step 4, reading the network weight trained in the step 3 into a network by using the accuracy, the recall rate and the intersection ratio evaluation judgment, and reading a to-be-predicted image into the network for prediction to obtain the category and the mask outline of the pipe fitting;

step 5, judging different sizes of the same type of pipe fittings according to the area of the mask, and calculating to obtain a grabbing point;

step 6, converting a camera and a robot coordinate system, and calculating a pipe fitting grasping point of the coordinate system at the tail end of the mechanical arm;

and 7, connecting the computer and the camera through a gige gigabit network port, setting a port number of the network port of the computer connected with the camera to be 192.169.1.X which is the same as that of the camera, connecting the computer and the robot mechanical arm through a TCP/IP protocol, compiling a RAPID language by using Robots studio software, controlling shooting and grabbing operations of the camera and the robot, and receiving feedback of the robot.

Further, in the step 2, data enhancement is performed on the pipe fitting image with the labeling result, and the specific steps are as follows: and rotating, overturning, blurring, Gaussian filtering, bilateral filtering and white noise adding are carried out on the pipe fitting image with the labeling result.

Further, the step 3 designs a network structure, specifically:

forming a backbone network by Resnet-101-FPN, using a 7 multiplied by 7 convolution kernel with the step length of 2, wherein the number of channels is 64; followed by a 3 x 3 convolution kernel with step size 2 and maximum pooling operation; the second layer convolution is composed of step length of 1, three convolution kernels of 1 × 1, 3 × 3 and 1 × 1, and channel numbers of 64, 64 and 256; the third layer of convolution comprises the steps of 1, four convolution kernels of 1 × 1, 3 × 3 and 1 × 1, and the number of channels of 128, 128 and 512; the fourth layer of convolution consists of 1 step length, twenty three convolution kernels of 1 multiplied by 1, 3 multiplied by 3 and 1 multiplied by 1, and the number of channels of 256, 256 and 1024; the last layer of convolution has step length of 1, three convolution kernels of 1 × 1, 3 × 3 and 1 × 1 and channel numbers of 512, 512 and 2048;

secondly, using an area generation network to adopt 5 scales with different area sizes, wherein each scale comprises 3 different aspect ratios, generating 15 different candidate frames by using each pixel point in a feature map in total, and generating 9 different candidate frames by considering that the sizes of the pipe fittings adopt the scales of 32 × 32, 64 × 64 and 128 × 128 and the aspect ratios of 0.8:1, 1:1 and 1:1.25 according to the generated candidate frames; scanning the image through a sliding window so as to find a regional target; the types and the pixels of the searched target areas are classified respectively, and a mixed attention mechanism is added before the pixels of the searched target areas are predicted to improve the prediction effect.

Still further, the mixed attention mechanism is a mixture of channel attention and spatial attention.

Further, judge the different sizes of the pipe fittings of the same type through the mask area in step 5, the concrete step is:

calculating the area of the marked mask outline in the step 1, sequentially placing the area into an m x n-dimensional array, and calculating area [ ] [ ]; obtaining the type of the pipe fitting through the network in the step 4, finding the row position of the area array corresponding to the pipe fitting, marking as q, calculating the mask outline obtained in the network, calculating the pixel area of the mask, comparing the pixel area with the data in the area [ q ] [ ], finding the position of the value which is most similar to the area array, and counting as p, so the area [ q ] [ p ] is the obtained position, and judging the size of the pipe fitting.

Further, the step 5 of calculating the grabbing point includes the following specific steps:

firstly, calculating the end points of all four polygons of the pipe fitting, namely xminP, xmaxP, yminP and ymaxP, and constructing four cuts of P through the four pointsDetermining two 'shell clamping' sets, if one or two lines coincide with one edge, calculating the area of a rectangle determined by the four lines, and storing the area as the current minimum value; otherwise, defining the current minimum value as infinity, rotating the line clockwise until one line is coincident with one edge of the polygon, calculating the area of a new rectangle, comparing the area with the current minimum value, updating if the area is smaller than the current minimum value, storing the rectangle information for determining the minimum value, repeating the operation until the angle of the line rotation is larger than 90 degrees, obtaining the minimum area of the external rectangle of the pipe fitting, namely obtaining the minimum external rectangle of the pipe fitting, and counting two adjacent end points of the external rectangle as (X)₁,Y₁)、(X₂,Y₂) So that the center point of the pipe is

This point is the robot's grasping point.

Further, in step 6, the coordinate system of the camera and the robot is converted, and a pipe gripping point of the coordinate system at the tail end of the robot arm is calculated, and the method specifically comprises the following steps:

the camera calibration method of the planar target of Zhangyingyou is used for obtaining the position information of the pipe in a camera coordinate system, the position corresponding relation between the robot coordinate system and the camera coordinate system is obtained through hand-eye calibration, affine transformation is used, and a matrix of the affine transformation is as follows:

X＝m11×x+m12×y+m13

Y＝m21×x+m22×y+m23

Z＝m31×x+m32×y+m33

where, (x, y) is a coordinate point of an image pixel, (x ', y') is a coordinate converted to the end of the robot arm, m11, m12, m21, m22, m31, m32 are rotation amounts, and m13, m23, m33 are translation amounts.

And planning the path of the grasping point of the robot, and calculating by using a random path graph algorithm, wherein the robot obtains an optimal path while avoiding colliding with the barrier between the initial position and the target position.

Compared with the prior art, the invention has the following advantages: present prior art, the overwhelming majority of robot letter sorting is applicable to the pipe fitting and neatly puts, and fixed position snatchs the work. And the problems of single grabbing type, low identification precision and the like exist. The pipe fitting sorting machine can sort pipe fittings of different types and sizes through the same mechanical claw, and the pipe fittings do not need to be placed in order, so that the use cost and the time cost in the actual process are saved. In terms of precision, compared with other network structures, the network structure provided by the invention has the advantages that the recognition precision is improved, and the problems of mistaken grabbing, mistaken grabbing and the like in the grabbing process are further reduced.

Drawings

FIG. 1 is a data set annotation interface diagram;

FIG. 2 is a network layout of the present method;

FIG. 3 is a model diagram of a channel attention mechanism;

FIG. 4 is a model diagram of a spatial attention mechanism;

FIG. 5 is a pipe image artwork;

FIG. 6 is a result of network instance segmentation in accordance with the present invention;

fig. 7 is a schematic of the robot and conveyor belt.

Detailed Description

step 1, shooting images of pipe fitting types such as a tee joint, an outer wire, an inner wire, an elbow and the like, and carrying out type and mask outline labeling on the collected pipe fitting images by using a labelme tool to obtain a pipe fitting image of a json type labeling result;

step 2, performing data enhancement on the pipe fitting image with the labeling result: rotating, turning over, blurring, Gaussian filtering, bilateral filtering and white noise adding to obtain a data set; wherein the selection data is prior to enhancement

As a training set, the training set is,

as a test set. Gaussian filtering, bilateral filtering, etc. are prior art and will not be described in detail here.

And 3, designing a network structure, reading the data set in the step 2 into the network for training, taking 3000 pipes as a training set and 1000 pipes as a testing set, wherein the batch size is 3, and the learning rate is 0.0001.

The invention designs a network structure as shown in figure 2, a backbone network is formed by Resnet-101-FPN, 7 multiplied by 7 convolution kernels with the step length of 2 are used, and the number of channels is 64; followed by a 3 x 3 convolution kernel with step size 2 and maximum pooling; the second layer convolution is composed of 1 step length, 1 × 1, 3 × 3 and 1 × 1 convolution kernels, 64 and 256 channels; the third layer of convolution has the step length of 1, four convolution kernels of 1 × 1, 3 × 3 and 1 × 1 and the number of channels of 128, 128 and 512; the fourth layer of convolution consists of 1 step length, twenty three convolution kernels of 1 multiplied by 1, 3 multiplied by 3 and 1 multiplied by 1, and the number of channels of 256, 256 and 1024; the last layer of convolution has step length of 1, three convolution kernels of 1 × 1, 3 × 3 and 1 × 1 and channel numbers of 512, 512 and 2048; because the low-level characteristic semantic information is less, the target position is accurate; the feature semantic information of the high layer is rich, but the target position is rough. If the information of the upper layer and the information of the lower layer are fused, the method has better effect in identification. Features in the image are fused in a top-down manner. Finally, a characteristic diagram with five scales is achieved.

Using a regional generation network to adopt 5 scales with different area sizes, wherein each scale comprises 3 different aspect ratios, generating 15 different candidate frames by using each pixel point in a feature map, and generating 9 different candidate frames by considering that the sizes of the pipe fittings adopt the scales of 32 × 32, 64 × 64 and 128 × 128 and the aspect ratios of 0.8:1, 1:1 and 1:1.25 according to the generated candidate frames; through the improvement, the training can be better converged, the occupied video memory is smaller, and the classification, regression and segmentation effects on small targets are improved. Scanning the image through a sliding window so as to find a regional target; the types and the pixels of the searched target areas are classified respectively, and a mixed attention mechanism is added before the pixels of the searched target areas are predicted to improve the prediction effect. As shown in fig. 3 and 4, the basic idea of the attention mechanism is to let the model ignore irrelevant information and focus more on the important information that we want it to focus on.

accuracy (Precision): in all the targets of model prediction, the Precision calculation formula is as follows:

recall (Recall): in all real targets, the model predicts the correct target proportion of the pipe fitting, namely Recall ratio, and the Recall calculation formula is as follows:

IoU, the ratio of the intersection and union of the "predicted bounding box" and the "true bounding box" is calculated. The calculation formula is as follows:

in the formula, the number of detection frames of TP (true Positive) IoU >0.5 is calculated only once by the same GT (ground Truth) frame.

FP (false Positive): IoU < 0.5, i.e. the number of redundant detection boxes for detecting the same GT box.

Fn (false negative): there is no detected number of GT boxes.

And acquiring and recording the category and the mask outline of the pipe fitting with the highest score. The predicted image and the result are shown in fig. 5 and 6.

Step 5, judging different sizes of the same type of pipe fittings according to the area of the mask: calculating the area of the marked mask outline in the step 1, sequentially placing the area into an m x n-dimensional array, and calculating area [ ] [ ]; obtaining the type of the pipe fitting through the network in the step 4, finding the row position of the area array corresponding to the pipe fitting, marking as q, calculating the mask outline obtained in the network, obtaining the pixel area of the mask, comparing with the data in the area [ q ] [ ], finding the position of the value which is most similar to the area array, and counting as p, so the area [ q ] [ p ] is the obtained position, and judging the size of the pipe fitting.

Calculating to obtain a grabbing point: firstly, calculating end points of all four polygons of the pipe fitting, namely xminP, xmaxP, yminP and ymaxP, constructing four tangent lines of P through the four points, determining two 'clamping shell' sets, if one or two lines coincide with one edge, calculating the area of a rectangle determined by the four lines, and storing the area as the current minimum value; otherwise, defining the current minimum value as infinity, rotating the line clockwise until one line is coincident with one edge of the polygon, calculating the area of a new rectangle, comparing the area with the current minimum value, updating if the area is smaller than the current minimum value, storing the rectangle information for determining the minimum value, repeating the operation until the angle of the line rotation is larger than 90 degrees, obtaining the minimum area of the external rectangle of the pipe fitting, namely obtaining the minimum external rectangle of the pipe fitting, and counting two adjacent end points of the external rectangle as (X)₁,Y₁)、(X₂,Y₂) So that the center point of the pipe is

This point is the robot's grasping point.

And 6, converting a camera and a robot coordinate system, and calculating a pipe gripping point of the coordinate system at the tail end of the mechanical arm: the camera calibration method of the planar target of Zhangyingyou is used for obtaining the position information of the pipe in a camera coordinate system, the position corresponding relation between the robot coordinate system and the camera coordinate system is obtained through hand-eye calibration, affine transformation is used, and a matrix of the affine transformation is as follows:

X＝m11×x+m12×y+m13

Y＝m21×x+m22×y+m23

Z＝m31×x+m32×y+m33

where, (x, y) is a coordinate point of an image pixel, (x ', y') is a coordinate converted to the end of the arm, m11, m12, m21, m22, m31, m32 are rotation amounts, and m13, m23, m33 are translation amounts.

The method for acquiring the transformation parameters comprises the following steps: four spots on the tray were randomly found and marked with a vivid pen. The conveyer belt is started, and when the tray on the conveyer belt reaches the lower part of the camera, the camera shoots the tray. And (4) extracting the position information of the picture marked by the bright pen from the photographed picture. And manually operating the mechanical arm, respectively moving the tail end of the mechanical arm to four positions marked by using bright-colored pens, and respectively recording x and y values in a coordinate system of the mechanical arm at the moment. And obtaining a transformation parameter m through an affine transformation matrix. It is written as a function. And calling a function each time, and converting the camera coordinate system of the previously calculated pipe fitting central point into coordinates in the coordinate system of the mechanical arm.

The robot and conveyor belt schematic 7 shows that the conveyor belt and the mechanical arm are controlled, and the state of the conveyor belt, i.e. starting, stopping and whether the gripping point is reached, is controlled and judged through IO flow. And when the program judges that the tray on the conveyor belt reaches the shooting point, giving a computer signal to control the camera to take pictures. And (4) storing the pictures, reading the pictures into the trained network, and acquiring the position of the pipe fitting in the terminal coordinate system of the mechanical arm by using the steps 3 and 4. As TCP/IP can only transmit one character string each time, the grabbing point and the stacking point of the pipe fitting are added with "/" in the middle to form a long character string, and the long character string is transmitted to the robot by using a computer. The robot receives the character string transmitted by the computer and decodes the character string by judging '/'. And recording the grabbing points and the stacking points of the pipe fittings, and planning the path of the mechanical arm by using a random path graph algorithm. And executing the grabbing task to finish the pipe fitting grabbing task once. And starting the transmission belt to wait for the next grabbing task.

In this embodiment, the network designed by the present invention is evaluated by using the map (mean average precision) and mrecall (mean call), and the maximum number of detections per picture (maxDets) is 100. The target detection performance evaluation is shown in table 1, and the mask detection performance evaluation is shown in table 2. Experimental results show that the network model provided by the invention has better target detection and Mask segmentation effects compared with the current popular segmentation networks (Solo, Mask R-CNN and Deepmask). Compared with the basic MaskR-CNN model, the accuracy and the regression rate of target detection are respectively improved by 3.5 percent and 7.1 percent, and the accuracy and the regression rate of mask detection are improved by 2.8 percent and 2.4 percent.

TABLE 1 comparison of target detection Performance evaluation

TABLE 2 mask test Performance evaluation comparison

According to the method, different sizes of the same type of pipe fittings are distinguished by calculating the area of the mask. Selecting a three-way pipe fitting for testing, photographing the three-way pipe fitting at the same height by the same camera, calculating the area of a pixel point of the three-way pipe fitting, and comparing the areas of the three-way pipe fitting with each other as shown in table 3. In table 3, the pixel point areas calculated by the network of the present invention in the half inch, one inch and one inch half tee pipe are all closer to the manually labeled areas than the basic Mask R-CNN. Therefore, there is a great advantage in determining the tube size by using the tube mask pixels.

TABLE 3 tee pixel area comparison

Those skilled in the art will appreciate that the invention may be practiced without these specific details. Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all matters of the invention which utilize the inventive concepts are protected.

Claims

1. A method for automatically sorting pipe fittings based on deep learning is characterized by comprising the following steps:

2. The method for automatically sorting the pipes based on the deep learning as claimed in claim 1, wherein the step 2 of performing data enhancement on the pipe image with the labeled result comprises the following specific steps: and rotating, overturning, blurring, Gaussian filtering, bilateral filtering and white noise adding are carried out on the pipe fitting image with the labeling result.

3. The method for automatically sorting pipes based on deep learning according to claim 1, wherein the step 3 is to design a network structure, specifically:

forming a backbone network by Resnet-101-FPN, using a 7 multiplied by 7 convolution kernel with the step length of 2, wherein the number of channels is 64; followed by a 3 x 3 convolution kernel with step size 2 and maximum pooling; the second layer convolution is composed of step length of 1, three convolution kernels of 1 × 1, 3 × 3 and 1 × 1, and channel numbers of 64, 64 and 256; the third layer of convolution comprises the steps of 1, four convolution kernels of 1 × 1, 3 × 3 and 1 × 1, and the number of channels of 128, 128 and 512; the fourth layer of convolution consists of 1 step length, twenty three convolution kernels of 1 multiplied by 1, 3 multiplied by 3 and 1 multiplied by 1, and the number of channels of 256, 256 and 1024; the last layer of convolution has step length of 1, three convolution kernels of 1 × 1, 3 × 3 and 1 × 1 and channel numbers of 512, 512 and 2048;

4. The method of claim 3, wherein the hybrid attention mechanism is a hybrid of channel attention and spatial attention.

5. The method for automatically sorting pipes based on deep learning as claimed in claim 1, wherein the step 5 is to judge different sizes of pipes of the same category by mask area, and comprises the following specific steps:

6. The method for automatically sorting pipes based on deep learning according to claim 1, wherein the grasping point is calculated in the step 5, and the method comprises the following specific steps:

firstly, calculating end points of all four polygons of the pipe fitting, namely xminP, xmaxP, yminP and ymaxP, constructing four tangent lines of P through the four points, determining two 'clamping shell' sets, if one or two lines coincide with one edge, calculating the area of a rectangle determined by the four lines, and storing the area as the current minimum value; otherwise, defining the current minimum value as infinity, rotating the line clockwise until one line is coincident with one edge of the polygon, calculating the area of a new rectangle, comparing the area with the current minimum value, updating if the area is smaller than the current minimum value, storing the rectangle information for determining the minimum value, repeating the operation until the angle of the line rotation is larger than 90 degrees, obtaining the minimum area of the external rectangle of the pipe fitting, namely obtaining the minimum external rectangle of the pipe fitting, and counting two adjacent end points of the external rectangle as (X)₁,Y₁)、(X₂,Y₂) So that the center point of the pipe is

This point is the grasping point of the robot.

7. The method for automatically sorting the pipes based on the deep learning as claimed in claim 1, wherein the step 6 is to convert a coordinate system of the camera and the robot and calculate a pipe grabbing point of the coordinate system at the tail end of the robot arm, and the method comprises the following specific steps:

X＝m11×x+m12×y+m13

Y＝m21×x+m22×y+m23

Z＝m31×x+m32×y+m33