CN111178405A

CN111178405A - Similar object identification method fusing multiple neural networks

Info

Publication number: CN111178405A
Application number: CN201911310303.2A
Authority: CN
Inventors: 姚信威; 王佐响; 潘律翰; 洪佳升; 朱启月; 袁聪儿
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2019-12-18
Filing date: 2019-12-18
Publication date: 2020-05-19

Abstract

The invention discloses a similar object identification method fusing multiple neural networks, which comprises the steps of acquiring a plurality of image sets with similar objects through a camera, preprocessing and labeling the image sets, expanding the processed data sets by using a data enhancement method, training the training sample sets by using Cascade R-CNN, Grid R-CNN, Libra R-CNN and Retina-Net respectively, integrating the training results of the four networks, setting a multi-network voting mechanism according to the verification accuracy and outputting the results to obtain the final integrated network identification. And importing a real-time image of the object to be recognized through the camera, recognizing the object to be recognized through the integrated neural network, and finally outputting a recognition result of the object to be recognized in the image. The invention realizes the identification of similar objects under the condition of integrating a plurality of neural network identification results and can achieve the aim of identifying similar objects in a short time with high accuracy.

Description

Similar object identification method fusing multiple neural networks

Technical Field

The invention relates to the technical field of electric digital data processing, in particular to a similar object identification method fusing multiple neural networks for image data analysis and processing.

Background

In recent years, technological innovation in the field of image recognition is continuously developed and increased, and transformation and upgrading are actively carried out in many industries, so that the cost is reduced and the efficiency is improved by using the image recognition technology. Image recognition has become an important research direction in aspects such as warehouse storage and inventory or supermarket same-type article recognition, and the utilization of a computer to efficiently distinguish similar objects is beneficial to reducing labor intensity of workers, improving working efficiency and reducing early-stage cost.

In the prior art, besides the object discrimination by human, there are some techniques for performing image processing by a computer and discriminating similar objects in an image. Currently, the target identification is mainly performed as follows:

(1) preprocessing a template image and an image of an object to be detected to generate a feature library, detecting the edge of the object in the image through a Sobel operator, a Roberts operator, a Prewitt operator, a Kirsch operator and the like, removing the miscellaneous edges, burrs and the like of the edge image to keep the connectivity of the edge, and measuring the similarity of the object to be detected by using the image contour feature matching cost to distinguish and identify;

(2) identifying and distinguishing similar pictures by utilizing combination of Python and OpenCV, processing the images by algorithms such as average hash and perceptual hash, calculating a Hamming distance, and representing the similarity degree of the object by utilizing the length of the Hamming distance;

however, in the above technical solutions, the time taken for edge detection is long, the recognition efficiency of the Python and OpenCV combined method is not high, and both methods have certain limitations.

Disclosure of Invention

In view of the prior art and the background, the invention solves the problems that in the prior art, the traditional method for manually distinguishing similar objects takes longer time and consumes labor force, so that the competitiveness of an enterprise is hindered, a general computer neural network recognition algorithm takes longer time when recognizing the similar objects, the detection rate cannot meet the actual usable requirement, and the acquired images have very clear requirements.

The invention adopts the technical scheme that a similar object identification method fusing multiple neural networks comprises the following steps:

step 1: acquiring a plurality of image sets with similar objects through a camera;

step 2: preprocessing and labeling the image set;

and step 3: expanding the processed data set by using a data enhancement method to obtain a training sample set which meets the number of samples required by neural network training;

and 4, step 4: respectively training the training sample set by using Cascade R-CNN, Grid R-CNN, Libra R-CNN and Retina-Net;

and 5: integrating the results of the four network training and setting a multi-network voting mechanism to output results according to the verification accuracy to obtain a final integrated neural network;

step 6: and (3) leading in the image of the object to be recognized in real time through the camera so as to recognize the object to be recognized through the integrated neural network, and finally outputting the recognition result of the object to be recognized in the image.

Further, in the step 1, the similar objects are objects of the same type and similar shapes.

Further, in the step 2, the image processing specifically includes the following steps:

step 2.1: manually screening and checking all the images in the image set, and removing the images with the object shielding rate exceeding a preset value;

step 2.2: and performing block diagram annotation on the screened image set, and adding the specific category and name of the object in the image into an annotation file.

Further, in step 3, the data enhancement includes the following steps:

step 3.1: for each image in the processed image set, one or more of means of random scaling, translation, rotation, overturning, pixel deletion, contrast adjustment and affine transformation are selected, so that the total amount of the image is expanded to 16 times and no repeated image exists;

step 3.2: calculating a surrounding frame of the object after data enhancement and storing the surrounding frame into a labeling file, realizing synchronous expansion of the picture set and the surrounding frame, and finishing data enhancement;

step 3.3: and obtaining a training sample set which meets the number of samples required by the neural network training after data enhancement.

Further, in the step 4, initializing the neural network training model includes the following steps:

step 4.1: initializing a Cascade R-CNN neural network training model; training a plurality of cascade detectors, setting a learning rate, a momentum and a weight attenuation rate by adopting a random gradient descent method, setting linear attenuation to attenuate once every A step length, setting the total number of training rounds, and training all data by each round;

step 4.2: initializing a Grid R-CNN neural network training model, setting a learning rate, a momentum and a weight attenuation rate by adopting a random gradient descent method, setting linear attenuation to be once attenuated per B step length, setting the total number of training rounds, and training all data per wheel;

step 4.3: initializing a Libra R-CNN neural network training model, setting a learning rate, a momentum and a weight attenuation rate by adopting a random gradient descent method, setting linear attenuation as attenuation once per C step length, setting the total number of training rounds, and training all data by each round;

step 4.4: initializing a Retina-Net neural network training model, setting a learning rate, momentum and a weight attenuation rate by adopting a random gradient descent method, setting linear attenuation as attenuation once per D step length, setting the total number of training rounds, and training all data by each round.

Further, in the step 5, according to the output accuracy of the four neural network model verification sets in the step 4, a weight is assigned to each network, a multi-network voting mechanism is established on the basis of the weight, and the voting mechanism integrates the recognition result of each network and the proportion weight to output a final result, so that the integrated neural network integrating the targets of various network recognition results is realized.

Further, in step 6, the integrated neural network can automatically process images transmitted by the camera in real time, and the processing speed reaches 0.2 s/frame.

The invention provides a similar object identification method fusing multiple neural networks, which integrates the advantages of a single network and integrates the advantages of multiple networks, acquires a large number of image sets with similar objects through a camera, performs a series of screening processing on the image sets, after marking data, enhances and expands the data sets through the data to obtain a training sample set meeting the requirements of the neural networks, trains a target data set by using Cascade R-CNN, Grid R-CNN, Libra R-CNN and Retina-Net respectively, sets a learning rate, a momentum and a weight attenuation rate by using an optimization method of random gradient descent, determines corresponding weights according to a verification set, and finally sets a voting mechanism to perform integrated processing on the training results of the four networks to obtain a final multi-network identification result integrated model, the network can integrate the advantages of a single network and achieve higher recognition rate. And finally, introducing a real-time image of the object to be recognized into the environment through a camera, recognizing and identifying the object to be recognized through the integrated multi-neural-network recognition model, and outputting a recognition result of the object to be recognized in the image.

The invention integrates various neural networks, combines the advantages of a single network for identification, can be distinguished in a short time, has high identification rate, can efficiently and accurately identify similar objects of real-time images, solves the defects of low manual identification efficiency and high cost in the prior art, simultaneously makes up the problems of low identification speed and low accuracy of the existing single network, can meet the requirement of actual use, can be widely applied to various aspects such as warehouse article counting, supermarket article counting and the like in the future, can better realize the unmanned aim when being carried on the camera of the robot, and improves the operation efficiency.

Drawings

FIG. 1 is a flow chart of similar object recognition with multiple neural networks fused.

Fig. 2 is a diagram of the recognition effect displayed by the built test bed.

Detailed Description

The present invention is described in further detail with reference to the following examples, but the scope of the present invention is not limited thereto.

The invention relates to a similar object identification method fusing multiple neural networks, which comprises the following steps.

in the step 1, the similar objects are the same type and similar in shape.

In the present invention, similar objects, such as mineral water of different brands, are generally objects of the same type and similar shapes.

In the invention, the image set is shot manually to obtain representative data sets with different scales, angles, placing modes and illumination conditions, and the representative data sets can be selected by the technicians in the field.

Step 2: preprocessing and labeling the image set;

in the step 2, the image processing specifically includes the following steps:

In the present invention, the preset value in step 2.1 may be set based on the features of the image set to be recognized and trained, for example, 70%.

In the invention, the marking in the step 2.2 can be manually marked, and can also be based on computer identification and automatic framing identification.

in step 3, the data enhancement includes the following steps:

In the invention, an embodiment is given for the data expansion in the step 3.1, such as setting the turnover probability to be 0.5, adjusting the contrast to be 1.5, setting the rotation angle to be between minus 45 degrees and 45 degrees, filtering the Gaussian to be 0-3.0, filtering the mean value to be 2-7, filtering the median value to be 3-11, missing ratio to be 0.01-0.1, turning the color channel to be 0.05, adjusting the brightness to be +/-10 pixel values, and setting the translation distance to be 10% of the image width.

In the invention, in step 3.2, because the enhancement means conforms to the mathematical rules, the surrounding frame or the polygonal mask of the enhanced object can be calculated through the mathematical rules and stored in the annotation file.

in step 4, initializing four separate neural network training models, which includes the following steps:

In the invention, the Cascade R-CNN is composed of a residual convolutional neural network with the depth of 50 and a characteristic pyramid network with the depth of 4, wherein the number of layers of the residual convolutional neural network is 4, and the number of channels of each layer of the characteristic pyramid is 256,512,1024,2048. The number of input channels of the area generation network of the residual convolutional neural network is 256, the number of output channels is 256, the aspect ratio of a window frame is 0.5,1.0 and 2.0, the window step length on each feature layer is 4,8,16,32 and 64, the classification Loss adopts cross entropy, the classification is carried out by using a Sigmoid function of the neural network, the regression Loss adopts a SmoothL1Loss function, and the weights of the two are 1. The output size of the target detection extraction layer is 7, the number of samples is 2, the step length of the characteristic diagram is 4,8,16 and 32, the IOU threshold value of each cascaded target detection layer is 0.5, 0.6 and 0.7, the number of input channels of the last connecting layer is 256, the number of output channels is 1024, and the problem of noise interference of a detection frame is solved by training a plurality of cascaded detectors.

In the invention, the Grid R-CNN changes the conventional method for realizing target position correction through a regression mode into the method for realizing accurate correction of a target positioning frame through a full convolution network, and consists of a residual convolution neural network with the depth of 50 and a characteristic pyramid network with the depth of 4, wherein the number of layers of the residual convolution neural network is 4, and the number of channels of each layer of the characteristic pyramid is 256,512,1024,2048. The number of input channels of the area generation network of the residual convolutional neural network is 256, the number of output channels is 256, the aspect ratio of a window frame is 0.5,1.0 and 2.0, the window step length on each feature layer is 4,8,16,32 and 64, the classification Loss adopts cross entropy, the classification is carried out by using a Sigmoid function of the neural network, the regression Loss adopts a SmoothL1Loss function, and the weights of the two are 1. The Grid point number in a target detection extraction layer of the network is 9, the input channel is 256, cross entropy is adopted for loss, the input channel number of the last connection layer is 256, and the output channel number is 1024.

In the invention, the Libra-RCNN is composed of a residual convolution neural network with the depth of 50 and a characteristic pyramid network with the depth of 4, wherein the number of layers of the residual convolution neural network is 4, and the number of channels of each layer of the characteristic pyramid is 256,512,1024,2048. The number of input channels of the area generation network of the residual convolution neural network is 256, the number of output channels is 256, the aspect ratio of a window frame is 0.5,1.0 and 2.0, the window step length on each feature layer is 4,8,16,32 and 64, the classification Loss adopts cross entropy, the classification is carried out by using a Sigmoid function of the neural network, the regression Loss adopts BalancedL1Loss,

the weight of the two is 1, the output size of the target detection extraction layer is 7, the number of samples is 2, the step length of the feature map is 4,8,16 and 32, the number of input channels of the last connection layer is 256, and the number of output channels is 1024.

In the invention, the Retina-Net is composed of a residual convolution neural network with the depth of 50 and a characteristic pyramid network with the depth of 4, wherein the number of layers of the residual convolution neural network is 4, and the number of channels of each layer of the characteristic pyramid is 256,512,1024,2048. The number of input channels of the area generation network of the residual convolutional neural network is 256, the number of output channels is 256, the aspect ratio of a window frame is 0.5,1.0 and 2.0, the window step length on each feature layer is 4,8,16,32 and 64, the classification Loss adopts cross entropy, the classification is carried out by using a Sigmoid function of the neural network, the regression Loss adopts a SmoothL1Loss function, and the weights of the two are 1.

And 5: integrating the results of the four network training, and setting a multi-network voting mechanism to output results according to the verification accuracy to obtain a final integrated neural network;

in the step 5, according to the output accuracy of the four neural network model verification sets in the step 4, a weight is distributed to each network, a multi-network voting mechanism is established on the basis of the weight, and the voting mechanism integrates the identification result of each network and the proportion weight to output a final result, so that the integrated neural network integrating the targets of various network identification results is realized.

In the invention, the verification set corresponds to the training sample set.

In the step 6, the integrated neural network can automatically process the images transmitted by the camera in real time, and the processing speed reaches 0.2 s/frame.

In the invention, the identification accuracy rate reaches more than 98%.

The invention carries out real-time image recognition by integrating the training results of four neural network recognition models, integrates the advantages of a single network and integrates the advantages of a plurality of networks, obtains a large number of image sets with similar objects by a camera, carries out a series of screening treatment on the image sets, obtains a training sample set meeting the requirements of the neural network by data enhancement and expansion of the data set after marking data, trains a target data set by using Cascade R-CNN, Grid R-CNN, Libra R-CNN and Retina-Net respectively, sets a learning rate, momentum and a weight attenuation rate by using an optimization method of random gradient descent, determines corresponding weights according to a verification set, and finally sets a mechanism to carry out integrated treatment on the training results of the four networks to obtain a final multi-network recognition result integrated model, wherein the network can integrate the advantages of the single network, a higher recognition rate is achieved; and finally, introducing a real-time image of the object to be recognized into the environment through a camera, recognizing and identifying the object to be recognized through the integrated multi-neural-network recognition model, and outputting a recognition result of the object to be recognized in the image.

Claims

1. A similar object identification method fusing multiple neural networks is characterized in that: the method comprises the following steps:

step 2: preprocessing and labeling the image set;

2. The method for identifying similar objects fusing multiple neural networks according to claim 1, wherein: in the step 1, the similar objects are the same type and similar in shape.

3. The method for identifying similar objects fusing multiple neural networks according to claim 1, wherein: the step 2 comprises the following steps:

4. The method for identifying similar objects fusing multiple neural networks according to claim 1, wherein: the step 3 comprises the following steps:

5. The method for identifying similar objects fusing multiple neural networks according to claim 1, wherein: in step 4, initializing the neural network training model includes the following steps:

6. The method for identifying similar objects fusing multiple neural networks according to claim 1, wherein: in the step 5, according to the output accuracy of the four neural network model verification sets in the step 4, a weight is distributed to each network, a multi-network voting mechanism is established on the basis of the weight, and the voting mechanism integrates the identification result of each network and the proportion weight to output a final result, so that the integrated neural network integrating the targets of various network identification results is realized.

7. The method for identifying similar objects fusing multiple neural networks according to claim 1, wherein: in the step 6, the integrated neural network can automatically process the images transmitted by the camera in real time, and the processing speed reaches 0.2 s/frame.