CN108846415A

CN108846415A - The Target Identification Unit and method of industrial sorting machine people

Info

Publication number: CN108846415A
Application number: CN201810496518.7A
Authority: CN
Inventors: 周庆华; 王乐; 王磊; 李分芳; 李野华
Original assignee: Changsha University of Science and Technology
Current assignee: Changsha University of Science and Technology
Priority date: 2018-05-22
Filing date: 2018-05-22
Publication date: 2018-11-20

Abstract

The invention relates to a target recognition device and method for a sorting robot. The device includes a camera, a processor and a memory, and also includes a program for performing the following steps: obtaining image information of the target to be measured through the camera; loading a convolutional neural network Framework and its training model; use several candidate anchorboxes to generate a bounding box in the feature map; predict the category and category score corresponding to the bounding box; obtain the bounding box with the largest category score through non-maximum value suppression. The training model of loading the convolutional neural network is a program of the following steps: pre-training the network; obtaining the label data set of the target object; performing cluster analysis on the target frame in the label data set; obtaining each layer by forward propagation The output of the network is obtained to obtain the error between the output and the label; the gradient of the weights and biases of each layer is reversely calculated according to the error, and the weights and biases of each layer are adjusted. The detection speed of the invention is very fast, and real-time performance can be realized while ensuring accuracy.

Description

Object recognition device and method for industrial sorting robot

技术领域technical field

本发明涉及一种工业分拣机器人，尤其涉及工业分拣机器人的目标识别装置和方法。The invention relates to an industrial sorting robot, in particular to an object recognition device and method for the industrial sorting robot.

背景技术Background technique

如今，自动化生产越来越普及，机器人被广泛应用到各种各样的工业流水线上，而工件机械分拣作业是工业流程中一项常见的任务。对运动目标的识别和定位是分拣作业的基础，早期的分拣机器人大多采用示教编程的方法，虽可完成一些固定的操作，但无法完成更加智能化的作业；而一般的机器视觉技术，例如单纯的边缘提取、模板匹配、图像增强等处理技术，虽然能够检测出目标物体，但很难在复杂场景中对目标物体进行精确分拣。Nowadays, automated production is becoming more and more popular, robots are widely used in various industrial assembly lines, and mechanical sorting of workpieces is a common task in industrial processes. The identification and positioning of moving objects is the basis of sorting operations. Most of the early sorting robots used the method of teaching programming. Although they can complete some fixed operations, they cannot complete more intelligent operations; while general machine vision technology , such as pure edge extraction, template matching, image enhancement and other processing technologies, although the target object can be detected, it is difficult to accurately sort the target object in a complex scene.

卷积神经网络(CNN)主要用来识别扭曲不变性的二维图形，它是能模拟动物大脑的一种数学模型，只要用已知的模式对卷积网络加以训练，它就能够学习大量的输入与输出之间的映射关系，具有输入输出对之间的映射能力。将卷积神经网络算法用于分拣机器人，虽然能够解决在复杂场景中分拣精度的需求，但当场景中干扰物较多、密度较大时，需要同时对多个目标物体进行处理，且容易受到光照等因素的影响，目标识别和定位的速度较慢，容易产生漏检和误检，无法满足机器末端的抓手分拣时对于实时性的要求。Convolutional neural network (CNN) is mainly used to identify two-dimensional graphics with distortion invariance. It is a mathematical model that can simulate animal brains. As long as the convolutional network is trained with known patterns, it can learn a large number of The mapping relationship between input and output has the ability to map between input and output pairs. Using the convolutional neural network algorithm for sorting robots can solve the need for sorting accuracy in complex scenes, but when there are many distracting objects and high density in the scene, multiple target objects need to be processed at the same time, and It is easily affected by factors such as light, and the speed of target recognition and positioning is slow, which is prone to missed and false detections, and cannot meet the real-time requirements of the gripper at the end of the machine for sorting.

这是现有技术的不足之处。This is the weak point of prior art.

发明内容Contents of the invention

针对现有技术的不足，本发明要解决的技术问题是提供一种工业分拣机器人的目标识别装置和方法，它们可助于实现对目标物体的实时检测。Aiming at the deficiencies of the prior art, the technical problem to be solved by the present invention is to provide an object recognition device and method for an industrial sorting robot, which can help realize real-time detection of object objects.

本发明的工业分拣机器人的目标识别装置，包括摄像头，处理器和存贮器，其特征是：该装置还包括执行如下步骤的程序：The target identification device of industrial sorting robot of the present invention comprises camera, processor and memory, it is characterized in that: this device also comprises the program that carries out following steps:

通过摄像头获取待测目标的图片信息；Obtain the picture information of the target to be tested through the camera;

加载卷积神经网络框架及其训练模型，得到若干候选的anchor boxes；Load the convolutional neural network framework and its training model to obtain several candidate anchor boxes;

使用anchor boxes，在特征图中产生若干边界框；Use anchor boxes to generate several bounding boxes in the feature map;

对上述若干边界框预测出对应的类别和类别分数；Predict the corresponding categories and category scores for the above-mentioned several bounding boxes;

通过非极大值抑制获得类别分数最大的边界框，其对应的类别作为目标的类别。The bounding box with the largest category score is obtained by non-maximum value suppression, and its corresponding category is used as the category of the target.

所述加载卷积神经网络的训练模型是如下步骤的程序：The training model of the described loading convolutional neural network is the program of following steps:

首先在ImageNet数据集上进行网络的预训练，得到网络的预训练模型；First, pre-train the network on the ImageNet dataset to obtain the pre-training model of the network;

然后通过摄像头获取目标物体的数据集，进行人工标注，获得标签数据集；Then obtain the data set of the target object through the camera, perform manual labeling, and obtain the label data set;

对标签数据集中的目标框的宽和高通过k-means进行聚类分析，得到几组候选的anchor boxes；The width and height of the target box in the label data set are clustered and analyzed by k-means, and several sets of candidate anchor boxes are obtained;

通过前向传播获取每一层的网络输出，得到输出与标签的误差；Obtain the network output of each layer through forward propagation, and obtain the error between the output and the label;

依据误差反向依次计算各层权值和偏置的梯度，并调整各层的权值和偏置。Calculate the gradients of the weights and biases of each layer in reverse order according to the error, and adjust the weights and biases of each layer.

所述k＝2。The k=2.

在预训练中，采用多尺度进行训练，每隔10轮改变输入图片的尺寸。In pre-training, multi-scale training is used, and the size of the input image is changed every 10 rounds.

在预训练中，以32作为下采样因子，使输入图片的大小保持在320到608之间。In the pre-training, the size of the input image is kept between 320 and 608 with 32 as the downsampling factor.

所述卷积神经网络是在Darkne-19的基础上的改进，去掉了最后一个卷积层，增加了3个卷积核大小为3x3，通道数为1024的卷积层，和一个了卷积核大小为1x1的卷积层。The convolutional neural network is an improvement based on Darkne-19. The last convolutional layer is removed, and 3 convolutional layers with a kernel size of 3x3 and a channel number of 1024 are added, and a convolutional layer A convolutional layer with a kernel size of 1x1.

本发明的工业分拣机器人的目标识别方法，其特征是包括如下步骤：The target identification method of industrial sorting robot of the present invention is characterized in that comprising the steps:

所述加载卷积神经网络的训练模型是如下步骤：The training model of described loading convolutional neural network is the following steps:

本发明的有益效果是：第一，采用卷积神经网络训练出的模型可在复杂场景中对于目标进行识别和定位，由于选用整幅图片来训练模型，整个目标检测的途径是一个单一的卷积神经网络，检测性能可以进行端到端的优化，因此检测速度非常快，在保证精度的同时，可以实现实时性的要求。第二，本发明的卷积神经网络是在基础网络Draknet-19的基础上进行的改进，添加了3个卷积核为3*3，通道数位1024的卷积层，1个卷积核大小为1*1的卷积层，加深网络的同时，减少了网络的训练参数，使网络能够提取更加丰富的特征信息，提高了网络对目标物体识别的精度。第三，由于对网络进行多尺度训练，网络对不同尺寸的输入图像具有健壮性。最后，通过对数据集中手工标注的边界框做聚类分析，找到边界框的统计规律，发现当k＝2时，具有较好的分拣效果。The beneficial effects of the present invention are as follows: firstly, the model trained by the convolutional neural network can identify and locate the target in complex scenes, and since the whole picture is used to train the model, the way of the whole target detection is a single volume Integrated neural network, the detection performance can be optimized end-to-end, so the detection speed is very fast, and the real-time requirements can be achieved while ensuring the accuracy. Second, the convolutional neural network of the present invention is an improvement on the basis of the basic network Draknet-19, adding 3 convolution kernels of 3*3, a convolution layer with a channel number of 1024, and a convolution kernel size It is a 1*1 convolutional layer. While deepening the network, it reduces the training parameters of the network, enables the network to extract more abundant feature information, and improves the accuracy of the network's recognition of target objects. Third, due to the multi-scale training of the network, the network is robust to input images of different sizes. Finally, by performing cluster analysis on the manually marked bounding boxes in the data set, the statistical rules of the bounding boxes are found, and it is found that when k=2, it has a better sorting effect.

附图说明Description of drawings

图1是本发明的工业分拣机器人的目标检测的流程图。Fig. 1 is a flow chart of the object detection of the industrial sorting robot of the present invention.

图2是本发明的工业分拣机器人的卷积神经网络训练阶段流程图。Fig. 2 is a flow chart of the convolutional neural network training stage of the industrial sorting robot of the present invention.

图3是本发明的工业分拣机器人的卷积神经网络的网络结构表。Fig. 3 is a network structure table of the convolutional neural network of the industrial sorting robot of the present invention.

图4是本发明的卷积神经网络训练的迭代次数与损失函数的关系图。Fig. 4 is a relationship diagram between the number of iterations and the loss function of the convolutional neural network training of the present invention.

图5是本发明的工业分拣机器人的k值和代价函数曲线图。Fig. 5 is a graph of k value and cost function of the industrial sorting robot of the present invention.

具体实施方式Detailed ways

现结合附图和实施例对本发明作进一步详细说明。The present invention will now be described in further detail in conjunction with the accompanying drawings and embodiments.

本发明将将改进的YOLOv2卷积神经网络算法运用到加工零件的分拣机器人中，实现了基于机器视觉的分拣操作。该系统通过传送带上的摄像头获取图片信息，从不同型号的零件中实时地识别和定位出目标零件，并通过机械末端的抓手进行抓取，实现整齐的摆放。The invention applies the improved YOLOv2 convolutional neural network algorithm to the sorting robot for processing parts, and realizes the sorting operation based on machine vision. The system obtains picture information through the camera on the conveyor belt, identifies and locates the target parts in real time from different types of parts, and grabs them through the gripper at the end of the machine to achieve neat placement.

参看图1、图2，YOLOv2卷积神经网络算法，把目标检测问题作为一个回归问题来处理，可以一次性实时预测多个目标边框的位置和类别，在一个网络中同时完成了目标物体的定位和分类，整个检测过程是完全统一的。Referring to Figure 1 and Figure 2, the YOLOv2 convolutional neural network algorithm treats the target detection problem as a regression problem, and can predict the position and category of multiple target borders in real time at one time, and complete the positioning of the target object in one network at the same time And classification, the whole detection process is completely unified.

由于依靠网格产生边界框，召回率较低，YOLOv2通k-means方法对数据集中手工标注的边界框做聚类分析，找到边界框的统计规律，用这几种宽和高anchor boxes在特征图中产生边界框，提高了召回率和定位的准确度。Due to relying on the grid to generate the bounding box, the recall rate is low. YOLOv2 uses the k-means method to perform cluster analysis on the manually marked bounding boxes in the data set, find the statistical rules of the bounding boxes, and use these types of wide and high anchor boxes in the feature Bounding boxes are generated in the graph, which improves the recall and localization accuracy.

对于一个目标物体可能产生多个边界框，每个边界框包含x,y,h,w和置信分数及类别分数等变量；其中x,y表示相对于坐标原点的边界框的中心坐标，w,h表示相对于整幅图像的边界框的宽和高；置信分数反应了边界框预测的准确性；Multiple bounding boxes may be generated for a target object, and each bounding box contains variables such as x, y, h, w and confidence scores and category scores; where x, y represent the center coordinates of the bounding box relative to the origin of the coordinates, w, h represents the width and height of the bounding box relative to the entire image; the confidence score reflects the accuracy of the bounding box prediction;

需要通过非极大值抑制来留下置信分数最高的边界框，以实现对物体的精确定位；It is necessary to leave the bounding box with the highest confidence score through non-maximum suppression to achieve precise positioning of the object;

通过将卷积层提取出的特征信息输入到softmax分类器中，对于每个边界框预测出的多个物体的类别概率，也通过非极大值抑制留下类别概率分数最高的预测，以实现对物体的分类；By inputting the feature information extracted by the convolutional layer into the softmax classifier, for the category probabilities of multiple objects predicted by each bounding box, the prediction with the highest category probability score is also left through non-maximum value suppression to achieve classification of objects;

将YOLO v2卷积神经网络算法用于机械加工零件的分拣，由于没有选择滑动窗口或提取候选区域的方式训练网络，而是直接选用整幅图片来训练模型，整个目标检测的途径是一个单一的卷积神经网络，检测性能可以进行端到端的优化，因此检测速度非常快，在保证精度的同时，可以实现实时性的要求。The YOLO v2 convolutional neural network algorithm is used for the sorting of machined parts. Since the network is not trained by sliding windows or candidate regions, the entire image is directly used to train the model. The entire target detection method is a single With the convolutional neural network, the detection performance can be optimized end-to-end, so the detection speed is very fast, and the real-time requirements can be achieved while ensuring the accuracy.

由于标签数据较少，且往往分辨率较低，如果直接采用制作的标签数据集进行网络的训练，往往精度不高，定位较差，因此在实际的训练中，采用了预训练的方法。ImageNet数据集有1400多万幅图片，涵盖2万多个类别，图像清晰，分辨率较高，且多数的图片具有明确的类别标注信息，因此，在ImageNet数据集上进行预训练，得到网络的预训练模型，使网络获得对目标物体的普遍认识；Due to the small amount of label data and often low resolution, if the produced label data set is directly used for network training, the accuracy is often low and the positioning is poor. Therefore, in the actual training, the pre-training method is used. The ImageNet data set has more than 14 million pictures, covering more than 20,000 categories. The images are clear and high-resolution, and most of the pictures have clear category label information. Therefore, pre-training on the ImageNet data set can get the network Pre-training model to enable the network to obtain a general understanding of the target object;

然后通过摄像头获取目标物体的数据集，进行人工标注，获得标签数据集；例如采用我们收集的1000个手工标签的螺母和垫片数据集对网络进行再训练，然后对标签数据集的宽和高采用k-means进行聚类分析，通过k值和代价函数之间的关系，获取最优k值，如图5中所示，k＝2时，分拣效果较好。然后采用最优k值进行聚类分析，得到几组anchor boxes。Then obtain the data set of the target object through the camera, manually mark it, and obtain the label data set; for example, use the 1000 hand-labeled nuts and gasket data sets we collected to retrain the network, and then adjust the width and height of the label data set K-means is used for cluster analysis, and the optimal k value is obtained through the relationship between the k value and the cost function. As shown in Figure 5, when k=2, the sorting effect is better. Then cluster analysis is performed using the optimal k value to obtain several sets of anchor boxes.

网络的训练阶段主要分为前向传播和反向传播，前向传播主要是依次计算各层的输出值，反向传播主要是依据误差反向依次计算各层权值和偏置的梯度，并在计算完毕后，调整各层的权值和偏置。在网络的训练过程中通过前向传播首先获取每一层的网络输出，然后得到输出与标签的误差，通过反向传播，不断的更新参数减小损失函数，直到网络完全收敛。通过对卷积神经网络训练过程中的损失函数进行可视化，分析模型训练是否收敛，精度是否达到预设要求。The training phase of the network is mainly divided into forward propagation and back propagation. The forward propagation is mainly to calculate the output value of each layer in turn, and the back propagation is mainly to calculate the gradient of the weight and bias of each layer in reverse order based on the error, and After the calculation is complete, adjust the weights and biases of each layer. In the training process of the network, the network output of each layer is first obtained through forward propagation, and then the error between the output and the label is obtained. Through back propagation, the parameters are continuously updated to reduce the loss function until the network completely converges. By visualizing the loss function during the training process of the convolutional neural network, analyze whether the model training is convergent and whether the accuracy meets the preset requirements.

参看图4，当训练的迭代次数达到40000次时，训练模型已经收敛，此时平均损失函数达到1.0。训练出的模型能够在复杂场景中对目标物体进行准确的识别和定位。Referring to Figure 4, when the number of training iterations reaches 40,000, the training model has converged, and the average loss function reaches 1.0. The trained model can accurately identify and locate target objects in complex scenes.

训练出的模型在传送带上达到了每秒20帧的速度，满足了实时性要求。通过单张图片的测试看到，对于任意输入的含有目标物体的图片，通过卷积神经网络训练出的模型，能够在复杂场景中检测到目标物体，并能够准确的进行分类和定位预测。The trained model reaches a speed of 20 frames per second on the conveyor belt, which meets the real-time requirements. Through the test of a single picture, it can be seen that for any input picture containing the target object, the model trained by the convolutional neural network can detect the target object in complex scenes, and can accurately classify and predict the location.

为了使网络对不同输入尺寸的图片具有一定的鲁棒性，采用多尺度训练的方法，每训练10轮，就改变输入图片的尺寸。在训练过程中，以32作为下采样因子，使输入图片的大小保持在320到608之间。In order to make the network robust to pictures of different input sizes, a multi-scale training method is adopted, and the size of the input pictures is changed every 10 rounds of training. During the training process, the size of the input image is kept between 320 and 608 with a downsampling factor of 32.

参看图3，本发明的网络结构采用卷积神经网络(CNN)实现，CNN通过卷积来模拟特征区分，并且通过卷积的权值共享及池化，来降低网络参数的数量级，最后通过传统神经网络完成分类等任务。基于YOLO的基础模型Darkne-19，具有19个卷积层，第一层为卷积层，通道数为32，卷积核大小为3x3；第二层为最大池化层，卷积核大小为2x2，步长为2；第三层为卷积层，通道数为64，卷积核大小为3x3；第四层为最大池化层，卷积核大小为2x2，步长为2；第五、六、七层为卷积层，卷积核大小分别为3x3，1x1，3x3，通道数分别为128、64、128；第八层为最大池化层，卷积核大小为2x2，步长为2；第九、十、十一为卷积层，卷积核大小分别为3x3，1x1，3x3，通道数分别为256、128、256；第十二层为第大池化层，卷积核大小为2x2，步长为2；第十三、十四、十五、十六、十七层为卷积层，卷积核大小分别为3x3、1x1、3x1、1x1、3x3，通道数为512、256、512、256、512；第十八层为最大池化层，卷积核大小为2x2，步长为2；第十九、二十、二十一、二十二、二十三层为卷积层，卷积核大小分别为3x3、1x1、3x3、1x1、3x3，通道数分别为1024、512、1024、512、1024；第二十四层为卷积层，卷积核大小为1x1，通道数为1000。Darkne-19中大量运用了卷积的级联结构，卷积核的大小主要包括3×3和1×1两种大小的卷积核。借鉴了Network in network的思想，在3x3的卷积核之间都添加了1x1的卷积核，其中卷积层负责提取目标物体的特征信息，最大池化层对目标物体中的关键信息进行提取，减少了冗余信息，减少了网络训练的参数。Referring to Fig. 3, the network structure of the present invention is realized by a convolutional neural network (CNN). CNN simulates feature distinction through convolution, and reduces the order of magnitude of network parameters through convolution weight sharing and pooling. Finally, through traditional Neural networks perform tasks such as classification. The basic model Darkne-19 based on YOLO has 19 convolutional layers, the first layer is a convolutional layer, the number of channels is 32, and the size of the convolution kernel is 3x3; the second layer is the maximum pooling layer, and the size of the convolution kernel is 2x2, the step size is 2; the third layer is a convolutional layer, the number of channels is 64, and the convolution kernel size is 3x3; the fourth layer is the maximum pooling layer, the convolution kernel size is 2x2, and the step size is 2; the fifth , The sixth and seventh layers are convolutional layers, the convolution kernel sizes are 3x3, 1x1, 3x3, and the number of channels is 128, 64, and 128 respectively; the eighth layer is the maximum pooling layer, the convolution kernel size is 2x2, and the step size is 2; the ninth, tenth, and eleventh are convolution layers, the convolution kernel sizes are 3x3, 1x1, 3x3, and the number of channels is 256, 128, and 256 respectively; the twelfth layer is the largest pooling layer, and the convolution kernel The size is 2x2, the step size is 2; the 13th, 14th, 15th, 16th, and 17th layers are convolution layers, the convolution kernel sizes are 3x3, 1x1, 3x1, 1x1, 3x3, and the number of channels is 512 , 256, 512, 256, 512; the eighteenth layer is the maximum pooling layer, the convolution kernel size is 2x2, and the step size is 2; is a convolution layer, the convolution kernel sizes are 3x3, 1x1, 3x3, 1x1, 3x3, and the number of channels is 1024, 512, 1024, 512, 1024 respectively; the twenty-fourth layer is a convolution layer, and the convolution kernel size is 1x1, the number of channels is 1000. In Darkne-19, a large number of convolutional cascade structures are used. The size of the convolution kernel mainly includes two sizes of convolution kernels: 3×3 and 1×1. Drawing on the idea of Network in network, a 1x1 convolution kernel is added between the 3x3 convolution kernels. The convolution layer is responsible for extracting the feature information of the target object, and the maximum pooling layer extracts the key information in the target object. , reducing redundant information and reducing network training parameters.

本文在Darkne-19的基础上进行改进，去掉了最后一个卷积层，增加了3个卷积核大小为3x3，通道数为1024的卷积层，和一个卷积核大小为1x1的卷积层。加深网络的同时，减少了网络的训练参数，使网络能够提取更加丰富的特征信息，提高了网络对目标物体识别的精度。使用anchor boxes在特征图上预测边界框。This article improves on the basis of Darkne-19, removes the last convolutional layer, adds 3 convolutional layers with a kernel size of 3x3, and a convolutional layer with a channel number of 1024, and a convolutional layer with a kernel size of 1x1 Floor. While deepening the network, the training parameters of the network are reduced, so that the network can extract more abundant feature information, and the accuracy of the network's recognition of the target object is improved. Use anchor boxes to predict bounding boxes on feature maps.

以上所述仅是本发明的较佳实施例而已，并非对本发明做任何形式上的限制，虽然本发明已以较佳实施例揭露如上，然而并非用以限定本发明，任何熟悉本专业的技术人员，在不脱离本发明技术方案的范围内，当可利用上述揭示的技术内容作出些许更动或修饰为等同变化的等效实施例，但凡是未脱离本发明技术方案的内容，依据本发明的技术实质对以上实施例所作的任何简单修改、等同变化与修饰，均仍属于本发明技术方案的范围内。The above descriptions are only preferred embodiments of the present invention, and do not limit the present invention in any form. Although the present invention has been disclosed as above with preferred embodiments, it is not intended to limit the present invention. Anyone familiar with this professional technology Personnel, without departing from the scope of the technical solution of the present invention, when the technical content disclosed above can be used to make some changes or modifications to equivalent embodiments with equivalent changes, but all the content that does not depart from the technical solution of the present invention, according to the present invention Any simple modifications, equivalent changes and modifications made to the above embodiments by the technical essence still belong to the scope of the technical solutions of the present invention.

Claims

1. A target recognition device of an industrial sorting robot, comprising a camera, a processor and a memory, is characterized in that: the device also includes a program that performs the following steps:

Obtain the picture information of the target to be tested through the camera;

Load the convolutional neural network framework and its training model to obtain several candidate anchor boxes;

Use anchor boxes to generate several bounding boxes in the feature map;

Predict the corresponding categories and category scores for the above-mentioned several bounding boxes;

The bounding box with the largest category score is obtained by non-maximum value suppression, and its corresponding category is used as the category of the target.

2. the target identification device of industrial sorting robot as claimed in claim 1, is characterized in that: the training model of described loading convolutional neural network is the program of following steps:

First, pre-train the network on the ImageNet dataset to obtain the pre-training model of the network;

Then obtain the data set of the target object through the camera, perform manual labeling, and obtain the label data set;

The width and height of the target box in the label data set are clustered and analyzed by k-means, and several sets of candidate anchor boxes are obtained;

Obtain the network output of each layer through forward propagation, and obtain the error between the output and the label;

Calculate the gradients of the weights and biases of each layer in reverse order according to the error, and adjust the weights and biases of each layer.

3. The object recognition device of an industrial sorting robot according to claim 2, characterized in that: said k=2.

4. The object recognition device of an industrial sorting robot according to claim 2, characterized in that: in the pre-training, multi-scale training is adopted, and the size of the input picture is changed every 10 rounds.

5. The object recognition device of an industrial sorting robot according to claim 4, characterized in that: in the pre-training, 32 is used as the downsampling factor to keep the size of the input picture between 320 and 608.

6. The object recognition device for industrial sorting robots according to any one of claims 1 to 5, characterized in that: the convolutional neural network is an improvement on the basis of Darkne-19, and the last convolutional layer is removed , adding three convolutional layers with a kernel size of 3x3 and a channel number of 1024, and a convolutional layer with a kernel size of 1x1.

7. A target recognition method for an industrial sorting robot, characterized in that it comprises the steps:

Obtain the picture information of the target to be tested through the camera;

Use anchor boxes to generate several bounding boxes in the feature map;

8. the target recognition method of industrial sorting robot as claimed in claim 7 is characterized in that: the training model of described loading convolutional neural network is the following steps: