CN112085024A

CN112085024A - Tank surface character recognition method

Info

Publication number: CN112085024A
Application number: CN202010998223.7A
Authority: CN
Inventors: 罗印升; 周兴杰; 宋伟; 刘亚东; 陈传毅; 曹阳阳
Original assignee: Jiangsu University of Technology
Current assignee: Jiangsu University of Technology
Priority date: 2020-09-21
Filing date: 2020-09-21
Publication date: 2020-12-15

Abstract

The invention provides a method for identifying characters on the surface of a tank, which comprises the following steps: acquiring a tank surface image in real time; carrying out character positioning on the tank surface image acquired in real time by using a pre-established YOLO-V3 neural network detection model to obtain whether the tank surface image contains characters and character position information; outputting the position of the character prediction box, and cutting out a character area; the cut character area has the problem that the fit with the actual text area is not tight enough, and the correction is carried out by using a text correction algorithm; segmenting text characters of each line through histogram projection; identifying the cut text region by using an improved end-to-end indefinite length text CRNN model; according to the system condition at the time, if the mode is an online mode, the recognition result is uploaded to a database; if the mode is an off-line mode, the recognition result is stored locally. The method has the advantages of high robustness, high accuracy and high speed, and provides an intelligent character positioning detection scheme for milk powder tank production enterprises.

Description

Tank surface character recognition method

Technical Field

The invention relates to the field of character recognition on an automatic production line, in particular to a character recognition method based on YOLO-V3 and improved CRNN.

Background

In the actual industrial production process, the character recognition technology is a very important aspect, and the character recognition technology has been successfully applied to industrial production and assembly, such as character code spraying and labeling on the surfaces of many small electronic components, circuit boards and some large parts, and manufacturers can identify and track product information through the code spraying and labeling. Because human eyes are difficult to detect efficiently for a long time and certain severe working environments such as high temperature and high pressure bring difficulty to manual detection and identification, automatic and accurate character positioning and identification by adopting a machine vision technology become an important link in an industrial production flow.

The image processing method based on contour detection and edge processing can also extract character areas, but each type of image needs to be subjected to targeted layout analysis, and the selection of the contour is greatly influenced by the background of the image, so that the generalization capability of the method is not strong. The deep convolutional neural network is used for positioning and extracting the text information in the layout, and the text regions of various certificates can be positioned and extracted, so that the application range of character recognition and positioning is wider, and the image recognition of a complex background is stronger.

The current OCR recognition method applied in the industry mainly has the following problems:

1) the adaptability is poor when a traditional algorithm is used for extracting a character region and segmenting a character text, and the conditions of uneven brightness imaging, character adhesion, character blurring and the like are difficult to accurately extract;

2) the way of extracting features needs to be designed manually for the extracted characters, and the universality of detecting new fonts is poor. Therefore, a character positioning and recognition method based on YOLO-V3 and CRNN is provided.

Disclosure of Invention

In order to achieve the above object, the present invention provides a can surface character recognition method, comprising: acquiring a tank surface image in real time; carrying out character positioning on the tank surface image acquired in real time by using a pre-established YOLO-V3 neural network detection model to obtain whether the tank surface image contains characters and character position information; outputting the position of the character prediction box, and cutting out a character area; the cut character area has the problem that the fit with the actual text area is not tight enough, and the correction is carried out by using a text correction algorithm; segmenting text characters of each line through histogram projection; identifying the cut text region by using an improved end-to-end indefinite length text CRNN model; according to the system condition at the time, if the mode is an online mode, the recognition result is uploaded to a database; if the mode is an off-line mode, the recognition result is stored locally.

In the above scheme, the tank surface character recognition method further includes: and the executing mechanism switches the next workpiece to be identified.

In the above aspect, the tank surface includes: milk powder can surface, can surface.

In the above scheme, the method for establishing the YOLO-V3 neural network detection model includes:

s1: collecting a large number of character images on the surface of the milk powder can;

s2: preprocessing the surface image of the milk powder can; the pretreatment comprises the following steps of carrying out normalization treatment on the surface image of the milk powder can: filling the milk powder tank surface image after the equal scaling into a blank image with a preset pixel size, and then filling all pixels except the milk powder tank image on the blank image into a set color; the pixel size of the blank image is 416 x 416, and the blank image can cover the scaled milk powder can image; the set color comprises gray;

s3: carrying out character area calibration on the preprocessed images to obtain character position labels of the images; carrying out character region calibration on the preprocessed milk powder tank image in a manual frame selection mode, and storing coordinates of an upper left corner point and a lower right corner point of each calibration region and characters of corresponding regions as character tags;

s4: dividing the image and the position information label thereof into a training set sample and a test set sample, wherein the division ratio is 9: 1; and training the preset YOLO-V3 deep neural network by using a training set sample and a test machine sample and using a GPU (graphics processing Unit) to obtain a YOLO-V3 neural network detection model.

In the scheme, the preprocessing further comprises the step of performing data enhancement on the character image on the surface of the tank obtained after the normalization processing; the data enhancement of the character image on the surface of the tank comprises the following steps:

class of geometric transformations: turning, rotating, cutting, deforming and zooming;

color transform class: including noise, blur, color transformation, erasure, padding;

and generating a sample picture according to the picture style simulation.

In the above scheme, training the previously constructed YOLO-V3 neural network includes:

s1: setting training parameters: setting the iteration step number epochs to 10000, setting the learning rate optimizer optizer to 'adam', setting the batch training sample number batch _ size to 64, and setting the sizes of 9 prior frames anchor box;

s2: training the YoLO-V3 neural network: inputting the sample data of the training set and the character data of the surface of the milk powder can corresponding to each sample into a YOLO-V3 neural network for model training;

s3: and (3) testing a model: after each training is finished, testing the trained model by using the test set sample, and if the detection rate of the model on flaws in the test set sample exceeds 95% and the detection accuracy is not lower than 95%, taking the model obtained by the last training as a final YoLO-V3 neural network detection model; otherwise, taking the model obtained by the last training as the YOLO-V3 neural network to be trained currently, and repeating the steps S2-S3 until the final YOLO-V3 neural network detection model is obtained.

In the above scheme, the setting of the size of the prior frame by a k-means clustering method includes: randomly selecting 9 character region rectangular frame objects as clustering centers of a training set, distributing each object to the closest clustering center according to the distance between each object and each clustering center, distributing a sample, recalculating the clustering centers, repeating the distribution process until no clustering center changes, and obtaining the value of the current 9 clustering centers, namely the size of 9 prior frames.

In the above scheme, the text rectification algorithm includes:

s1: carrying out gray processing on the cut image, wherein the character is white and the background is color;

s2: performing Gaussian blur to enable the text parts to be connected into a block, and performing threshold processing on the image to obtain a binary image;

s3: fitting by using a minimum rectangle, and screening to obtain the area of a text region part according to the area size of the rectangle, wherein the text region part is the minimum external rectangle of the text;

s4: and obtaining the central point of the minimum circumscribed rectangle, and carrying out affine transformation on the text region to realize the correction of the inclined text.

In the above scheme, the histogram horizontal projection algorithm includes:

s1: binaryzation is carried out on the corrected image obtained in the step four;

s2: calculating pixel sums in the horizontal direction of the binarized image, and drawing a histogram with the horizontal axis as the image vertical axis coordinate range of 0-image width and the vertical axis as the pixel sums;

s3: finding out the valley part of the histogram, wherein the valley part is a division line;

s4: and segmenting each line of text image according to the segmentation line.

In the above scheme, the identification of the improved end-to-end indefinite-length text CRNN model includes:

s1: performing graying processing on each line of text images, and converting three-channel images into a single-channel grayscale image;

s2: scaling and filling the length, fixing the size of the input image to 32 × width, scaling or expanding, which can be similar to the training sample;

s3: converting the data label sparse matrix, processing the label matrix, and converting the label matrix into a required data format;

s4: building a model, changing the structure, changing the idea of the original CNN + RNN + CTC transcription layer into CNN + CTC, modifying and designing 7 layers of convolution layers and 5 layers of small convolution neural networks on a VGG (variable gradient generator) by a main network, and adding two times of batch regularization in the middle layer to avoid gradient dispersion of the model and accelerate convergence;

s5: and (4) data post-processing, namely performing mapping transformation on the subscript of the dictionary array corresponding to the real value to obtain an actual value.

According to the method for recognizing the characters on the surface of the can, real-time recognition of text characters is achieved by using a YOLO-V3 and CRNN deep learning method, the area where the characters are located can be quickly selected out, and recognition can be achieved. Because deep learning is applied, the method has the advantages of high robustness, high accuracy and high speed, provides an intelligent character positioning detection scheme for milk powder tank production enterprises, and solves the problems of low manual identification efficiency, low speed, high personnel cost and the like.

Drawings

FIG. 1 is a schematic view of a process for recognizing characters on the surface of a milk powder container according to an embodiment of the present invention;

FIG. 2 is a diagram of the training process of YOLO-V3 according to an embodiment of the present invention;

fig. 3 is a diagram of an improved CRNN partial network architecture according to an embodiment of the present invention.

Detailed Description

So that the manner in which the features and aspects of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings.

The invention provides a method for recognizing characters on the surface of a can, which is shown in figure 1, wherein figure 1 is a schematic flow chart of the recognition process of the characters on the surface of a milk powder can provided by the embodiment of the invention; the can surface character recognition includes: acquiring a tank surface image in real time; in the data acquisition stage, the surface image of the milk powder tank is shot by a camera and the like at multiple angles as much as possible so as to cover characters at various angles and different characters, so that the detection rate and the accuracy of the trained model are high.

Carrying out character positioning on the tank surface image acquired in real time by using a pre-established YOLO-V3 neural network detection model to obtain whether the tank surface image contains characters and character position information; outputting the position of the character prediction box, and cutting out a character area; the cut character area has the problem that the fit with the actual text area is not tight enough, and the correction is carried out by using a text correction algorithm; segmenting text characters of each line through histogram projection; identifying the cut text region by using an improved end-to-end indefinite length text CRNN model; according to the system condition at the time, if the mode is an online mode, the recognition result is uploaded to a database; if the mode is an off-line mode, the recognition result is stored locally.

Wherein, the tank surface character recognition method further comprises: and the executing mechanism switches the next workpiece to be identified.

In embodiments provided herein, the can surface can comprise: the surface of the milk powder can, the surface of the can and the surfaces of other canned packages.

In the embodiment provided by the invention, the method for establishing the YoLO-V3 neural network detection model comprises the following steps:

s2: preprocessing the surface image of the milk powder can; the pretreatment comprises the following steps of carrying out normalization treatment on the surface image of the milk powder can: filling the milk powder tank surface image after the equal scaling into a blank image with a preset pixel size, and then filling all pixels except the milk powder tank image on the blank image into a set color; the pixel size of the blank image is 416 x 416, and the blank image can cover the scaled milk powder can image; the normalization process is scaled to a corresponding size according to the original length-width ratio of the image, the image is filled into the middle of a 416 × 416 blank image, other pixels are all filled with white, so that the original texture characteristics of the image are kept as much as possible, data enhancement is carried out on the normalized defective image through a turning and translation strategy, and the set color comprises gray;

s3: carrying out character area calibration on the preprocessed images to obtain character position labels of the images; carrying out character region calibration on the preprocessed milk powder tank image in a manual frame selection mode, and storing coordinates of an upper left corner point and a lower right corner point of each calibration region and characters of corresponding regions as character tags; in the image calibration stage, the manufacturing of the character image label comprises the following steps: marking software can be used for manually marking the character area by using a rectangular frame; the horizontal and vertical coordinates of the upper left corner point and the lower right corner point of the rectangular frame are stored in a txt file, the marking software can adopt the existing software such as graph drawing modification software and labelImg, and corresponding coordinate data can be automatically or manually obtained only after manual frame selection.

S4: dividing the image and the position information label thereof into a training set sample and a test set sample, randomly dividing the training set sample and the test machine sample according to a division ratio of 9: 1, can float up and down; and training the preset YOLO-V3 deep neural network by using a training set sample and a test machine sample and using a GPU (graphics processing Unit) to obtain a YOLO-V3 neural network detection model.

In the embodiment provided by the invention, the preprocessing further comprises the step of performing data enhancement on the character image on the surface of the tank obtained after the normalization processing; the data enhancement of the character image on the surface of the tank comprises the following steps:

and generating a sample picture according to the picture style simulation.

In the embodiment provided by the invention, the training of the preset YoLO-V3 neural network comprises the following steps:

And then, the final YOLO-V3 neural network detection model can be used for carrying out real-time detection on the characters: and acquiring an image of the surface of the milk powder tank to be detected in real time by using a camera, and detecting the character position of the surface of the milk powder tank in real time according to the trained YOLO-V3 network.

In the embodiment provided by the invention, the size of the prior frame is set by a k-means clustering method, and the method comprises the following steps: randomly selecting 9 character region rectangular frame objects as clustering centers of a training set, distributing each object to the closest clustering center according to the distance between each object and each clustering center, distributing a sample, recalculating the clustering centers, repeating the distribution process until no clustering center changes, and obtaining the value of the current 9 clustering centers, namely the size of 9 prior frames.

For better understanding of the character position detection method based on the YOLO-V3 network, the working principle of the YOLO-V3 network is briefly explained here:

a. the YOLO-V3 network evenly divides the input image into S × S cells;

b. each cell predicts B Bounding boxes (Bounding boxes), and information of these Bounding boxes is given in the form of vectors. The information of the bounding box includes position information (coordinates of the center point of the rectangular box, width and height), Confidence (Confidence) and class information of the predicted object.

c. For training data, after images and labels are input, five parameters (coordinates of the center point of a rectangular frame, width and height and confidence degree) output by a unit cell are substituted into a loss function (the loss function is used for calculating the difference between the five parameters calculated by the YOLO-V3 network and the five labeled parameters) for calculation, and the weight is adjusted through back propagation, so that the confidence degree of a correct character region is increased, and the confidence degree of an incorrect character region is reduced; five parameters (coordinates of the center point of a rectangular frame, width and height, confidence coefficient) are also calculated after data collected in real time are input into a YOLO-V3 network, and after the five data are calculated through a loss function, a boundary frame which enables the loss function value to be the lowest, namely a final required classification frame, namely a detection result, is obtained.

In an embodiment provided by the present invention, the text rectification algorithm includes:

In an embodiment provided by the present invention, the histogram horizontal projection algorithm includes:

s4: and segmenting each line of text image according to the segmentation line.

In an embodiment provided by the present invention, the identification of the improved end-to-end indefinite-length text CRNN model includes:

In the embodiment provided by the invention, the network structure of the pre-established improved CRNN neural network is as follows:

s1, deleting the original RNN structure, wherein the CNN comprises 7 convolutional layers, and the maximum pooling treatment is carried out after each convolutional layer except the convolutional layers of the third layer and the seventh layer;

s2, adjusting parameters in the CRNN, setting the batch size to be 16 or 32, setting the learning rate to be 0.00001, and setting the epoch to be 100;

s3, classifying the training set and the test set of the text image obtained in the last step, and writing codes to generate acquainted pictures and text pictures of various conditions according to the text characteristics in the pictures, so that the generalization capability of the model is increased, the text data set is disordered and randomly divided into two parts: 10000 test sets and 100000 training sets are included;

s4, putting the training set into a CRNN network for training, and testing the current model while training;

s5, stopping training when the loss functions of the training and the testing are converged, and acquiring the CRNN model

S6, recognizing the cut text by using a CRNN recognition model; '

S7, according to the system condition at that time, if the mode is on-line mode, the recognition result is uploaded to the database; if the mode is the off-line mode, the recognition result is stored locally;

s8, switching the next workpiece to be identified by the executing mechanism;

As described above, the present invention is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present invention, and the present invention shall be covered thereby. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A method of canister surface character recognition, the method comprising:

acquiring a tank surface image in real time;

carrying out character positioning on the tank surface image acquired in real time by using a pre-established YOLO-V3 neural network detection model to obtain whether the tank surface image contains characters and character position information;

outputting the position of the character prediction box, and cutting out a character area;

the cut character area has the problem that the fit with the actual text area is not tight enough, and the correction is carried out by using a text correction algorithm;

segmenting text characters of each line through histogram projection;

identifying the cut text region by using an improved end-to-end indefinite length text CRNN model;

according to the system condition at the time, if the mode is an online mode, the recognition result is uploaded to a database; if the mode is an off-line mode, the recognition result is stored locally.

2. The can surface character recognition method of claim 1, further comprising: and the executing mechanism switches the next workpiece to be identified.

3. The can surface character recognition method of claim 1, wherein the can surface comprises: milk powder can surface, can surface.

4. The can surface character recognition method of claim 1, wherein the method for building the YOLO-V3 neural network detection model comprises:

5. The tank surface character recognition method of claim 4, wherein the preprocessing further comprises data enhancement of the tank surface character image obtained after the normalization processing; the data enhancement of the can surface character image comprises the following steps:

and generating a sample picture according to the picture style simulation.

6. The method of pot surface character recognition according to claim 1, wherein the training of the pre-built YOLO-V3 neural network comprises:

7. The tank surface character recognition method of claim 6, wherein the size of the prior box is set by a k-means clustering method comprising: randomly selecting 9 character region rectangular frame objects as clustering centers of a training set, distributing each object to the closest clustering center according to the distance between each object and each clustering center, distributing a sample, recalculating the clustering centers, repeating the distribution process until no clustering center changes, and obtaining the value of the current 9 clustering centers, namely the size of 9 prior frames.

8. The can surface character recognition method of claim 1, wherein the text rectification algorithm comprises:

9. The tank surface character recognition method of claim 1, wherein the histogram horizontal projection algorithm comprises:

s4: and segmenting each line of text image according to the segmentation line.

10. The method of canister surface character recognition according to claim 1, wherein the improved end-to-end indefinite length text CRNN model recognition comprises: