CN116486228A

CN116486228A - Paper medicine box steel seal character recognition method based on improved YOLOV5 model

Info

Publication number: CN116486228A
Application number: CN202310351717.XA
Authority: CN
Inventors: 王俊茹; 黄杨乐天; 刘宜胜; 向忠
Original assignee: Zhejiang Sci Tech University ZSTU
Current assignee: Zhejiang Sci Tech University ZSTU
Priority date: 2023-03-30
Filing date: 2023-03-30
Publication date: 2023-07-25

Abstract

The invention discloses a recognition method of paper medicine box embossed characters based on an improved YOLOV5 model, which comprises the following steps: acquiring steel seal character images of the medicine boxes by using an image acquisition device; preprocessing the image, such as image enhancement, and labeling the identification area; inputting the training result into an improved YOLOV5 model to obtain a trained YOLOV5 model; the improved YOLOV5 model includes: the addition of the efficient location attention mechanism CA (Coordinateattention) in the YOLOV5 backbone network enables the model network to focus on a large range of location information without bringing excessive calculation, thereby being beneficial to improving the model performance and better locating and identifying targets; the SimSPPF is used for replacing the SPPF rapid pooling layer, so that the training speed is improved; for dense small targets, a smaller prior frame is additionally used in the model, and a smaller detection head is correspondingly added; and inputting the data set to be identified into a process to obtain a final prediction result.

Description

Paper medicine box steel seal character recognition method based on improved YOLOV5 model

Technical Field

The invention relates to the field of medicine box target identification, in particular to a paper medicine box embossed character identification method based on an improved YOLOV5 model.

Background

The embossed characters are characters frequently appearing on various medicine boxes, foods, machine nameplates and industrial materials, and mainly comprise information such as factory numbers, production dates, working conditions and the like. Compared with the modes of ink-jet characters, printing characters and the like, the process of steel seal characters is relatively simple, the steel seal characters are not easy to pollute the environment, are not easy to be worn physically, cannot fall off with time and the like, and the steel seal characters are selected by a plurality of medicine boxes, namely the production date and the service life of the product. The accuracy of the seal is to ensure that the consumer can use the necessary condition of the medicine within the specified time limit, so that the identification of the seal characters on the medicine box is a key ring.

On an initial manufacturer's line, embossed characters of the product are typically detected by employing labor force identification. The detection mode is low in speed, and because of subjective activity of people, judgment of character definition of each person is different, missed detection and false detection rate are high, and employment of labor force is high in cost. In addition, along with the development of life technology, the condition of recognizing the embossed characters by means of machines is gradually brand-new, means such as OCR character recognition and template matching recognition are gradually enriched, and in manufacturers selecting modes such as printed characters, inkjet characters and the like, the characters are high in definition such as production date and service life, have obvious differences from background colors, and are easy to recognize by common image processing means. However, for the steel seal character, the target and the background have similar colors, the contrast is not obvious, in addition, when a camera acquires a medicine box image which moves fast on a production line, the situation of inaccurate focusing and the like can occur, so that the acquired image is unclear, and the recognition of the traditional image processing is not facilitated. Due to the influence of environments such as background light and the like, and recognition targets are not on the same plane, the difficulty of recognition of embossed characters can be further increased, and the obtained accuracy is not ideal. In recent years, in the field of machine vision, target detection has been attracting attention, and is also being used for pipeline detection and recognition by many manufacturers. Many methods of object detection have been proposed, and many excellent neural networks, such as CNN, RCNN, VGG, have been employed. In addition to the need to pay attention to whether the detection accuracy has room for improvement, the detection speed is also a factor to be considered because it satisfies the requirements for real-time recognition of the pipeline.

Disclosure of Invention

In view of the above background, in order to improve the precision and efficiency of detection of various embossed characters of paper medicine boxes, a paper medicine box embossed character recognition method based on an improved YOLOV5 model is provided. According to the method, the high-efficiency position attention mechanism CA (Coordinate attention) is added in the YOLOV5 model, so that the model network can pay attention to the wide-range position information without bringing excessive calculated amount, and the model performance is improved, and targets are better positioned and identified; the rapid pooling layer SimSPPF is used for replacing SPPF, so that training speed is improved; for dense small targets, a smaller a priori frame is additionally used in the model and a smaller detection head is correspondingly added.

The technical scheme adopted by the invention is as follows:

a paper medicine box steel seal character recognition method based on an improved YOLOV5 model comprises the following steps:

s1, acquiring steel seal character images of clear and fuzzy paper medicine boxes by using a camera, and manually labeling the steel seal character images to prepare a data set in a YOLO format after image enhancement and image expansion;

s2, dividing the data set into a verification set, a training set and a test set according to the proportion;

s3, setting parameters batch_size, works and Epoch according to configuration information of a computer memory, a display card and a display memory which are actually operated;

s4, loading a pre-training model, randomly initializing network weights, and inputting a data set into the improved YOLOV5 model for training;

s5, the YOLOV5 model consists of a Backbone network, a Neck network and a Head network; the method comprises the steps that a back bone network performs feature extraction on a data set, a Neck network double-tower structure samples to further extract features, and a Head network performs target prediction on features of different sizes of anchors of a grid by using the extracted features;

s6, adjusting the prior frame by a training prediction graph obtained by one round of network training to obtain a prediction frame, calculating IoU with a real frame of a target to represent the intersection ratio of the prediction frame and the real frame, and weighting the classification loss, the confidence loss and the positioning loss to obtain the total loss of the network according to the positioning loss obtained by the target regression function CIoU;

s7, reversely transmitting the loss to an improved YOLOV5 model, and updating network weight parameters by means of an SGD random gradient descent method to obtain a new weight model;

s8, repeating the steps S4) -S7) through the Epoch value set in the S3), wherein each round of updated network model parameters is used as a pre-training model of the next round to start a new round of training; index information obtained according to each round; finally obtaining an optimal model in the set training wheel number;

s9, inputting the images to be tested in the test set into the optimal model in the step S8), training, outputting the obtained prediction frames, restraining according to NMS maximum values, grouping all the prediction frames according to labels of each category, sequencing the confidence degrees in the groups from large to small to obtain rectangular frames with highest score, traversing the remaining rectangular frames, calculating the ratio of intersection and union of the rectangular frames with the highest score, removing the remaining rectangular frames if the ratio is larger than a preset threshold, and continuously performing the operation on the remaining detection frames until the final prediction frames are obtained, so that the identification of the steel seal characters of the medicine box is realized; and obtaining a final prediction frame, and realizing the recognition of the seal characters of the medicine box.

Further, the image enhancement and data set expansion method in step S1) includes: image contrast, image blurring, image noise, image scaling, image rotation.

Further, marking the obtained images by using a line screen page tool makesequence, presetting the categories to be identified, framing the production date to be identified by using a rectangular frame, marking corresponding category labels, outputting TXT format of the YOLO labels after finishing the labels for all the images, wherein the marking contents are respectively category (class), normalized rectangular center xy coordinates (x_center, y_center) and rectangular width (width) and length (height).

Further, the modified YOLOV5 of step S5) is composed of three networks of a backhaul network, a neg network, and a Head network.

Further, the YOLOV5 model in step S4) comprises the following improved methods:

a. and adding a CA position attention mechanism module before the rapid pooling layer, dividing an input feature image into two directions by the module, extracting features along one direction, simultaneously reserving position information along the other direction, respectively carrying out global average pooling, and then coding the feature images obtained in the two directions.

b. The rapid pooling layer replaces SPPF with SimSPPF with faster speed, and replaces SiLU activation function with ReLU activation function;

c. for small target detection, a smaller prediction frame is added [5,6,8, 14, 15, 11], and a smaller detection Head is correspondingly added in a Head network; the original Neck network is continued to be subjected to up-sampling and other treatments, so that the feature map is continuously expanded; and fusing the obtained feature map with the feature map in the backhaul network for small target detection.

Further, the formula of the total loss weight of YOLOV5 in step S6) is as follows:

LOSS＝λ ₁ L _cls +λ ₂ L _obj +λ ₃ L _loc

in this formula: l (L) _cls For classification loss, only the classification loss of the positive sample is calculated;

L _obj for confidence loss (target loss), all sample losses are calculated;

L _loc for the positioning loss, CIoU_loss is adopted, and only the positioning loss of the positive sample is calculated;

λ ₁ λ ₂ λ ₃ is a weight coefficient;

further, in step S6), the CIoU loss function is used as a prediction block regression loss function for improving the YOLOV5 model algorithm, and the correlation formula is as follows:

wherein:

b- -the center point of the prediction box;

b ^gt -the center point of the real frame;

c- -the diagonal distance of the smallest bounding rectangle that can contain both the predicted and real frames;

w, h- -the width and height of the prediction box;

alpha-weight coefficient;

v- -aspect ratio similarity coefficient.

Compared with the prior art, the invention has the following improvement effects:

1. according to the paper medicine box seal character recognition method based on the improved YOLOV5 model, a CA position attention mechanism module is added before a rapid pooling layer. The method is beneficial to better positioning and identifying the steel seal characters of the medicine box by the network model, and improves the identification precision;

2. the rapid pooling layer replaces SPPF with SimSPPF with faster speed, and replaces SiLU activation function with ReLU activation function, thereby improving training speed and better meeting real-time identification requirement of medicine box manufacturer assembly line;

3. for small target detection, a smaller prior frame is added, and a smaller detection Head is correspondingly added in the Head network. The original Neck network is continuously subjected to processing such as up-sampling once, so that the feature map is continuously expanded to adapt to the complex background, the omission phenomenon of small objects of the seal characters of the medicine box is reduced, and the robustness of the algorithm is improved.

Drawings

FIG. 1 is a flow chart of a method for identifying embossed characters of a medicine box according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a model structure of an improved YOLOV5 according to an embodiment of the present invention;

FIG. 3 is a graph of mAP_0.5 results according to an embodiment of the invention.

Detailed Description

In order that the manner in which the invention is practiced, and which will be readily understood, a more particular description of the invention will be rendered by reference to specific embodiments that are illustrated below. In the description of the present invention, it should be noted that the step numbers are for descriptive purposes only and are not to be construed as indicating or implying relative importance. Exemplary embodiments of the present disclosure will be described in more detail below with reference to the drawings, but should not be limited by the embodiments set forth herein.

Based on the above, the embodiment of the invention provides a paper medicine box embossed character recognition method based on an improved YOLOV5 model, and the flow is shown in fig. 1, and specifically comprises the following steps:

s1, acquiring a certain number of clear and fuzzy embossed character images of a paper medicine box by using a camera, and carrying out image contrast adjustment, image blurring, image noise adding, image scaling, image rotation and other enhancement expansion images, wherein each type of image is 200 sheets on average, marking the embossed character by using an online tool MakeSence, and making a data set in a YOLO format by using a rectangular frame;

s2, the steel seal image data set of the medicine box is prepared according to the proportion of 8:1:1 is divided into a training set, a validation set and a test set. Respectively placing the TXT-format files into corresponding train, val, test folders under the labels folders, and respectively placing the steel seal character images of the medicine boxes into corresponding train, val, test folders under the images folders;

s3, the computer memory used in the embodiment is 16GB, the display card is NVIDIA GTX1060, the display memory is 6GB, the batch_size=4, the works=2 and the epoch=200 are set according to the memory, a configuration file (yaml) is created, and the configuration file is set as a corresponding file path in the step S2);

s4, setting a path of weight parameters, loading a pre-training model (.pt), randomly initializing network weights, and inputting a data set into an improved YOLOV5 model for training;

s5, the network model consists of a backhaul network, a Neck network and a Head network, and the network structure is shown in figure 2:

the method comprises the steps of a, extracting features by a backbone network, inputting images in a size of 3 x 640, and outputting 64 x 320 images by using a Mosaic data enhancement structure to randomly scale, shear and splice four images to improve a receptive field and reduce the loss of initial image information; then, a Conv layer is adopted, and a specific calculation formula of Conv [ ch_out, kernel, stride, and padding ] is adopted:

Output＝(Input-kernel+2*padding)/stride+1

therefore, the Output image size is 128×160×160, then feature fusion is realized through the C3 layer, residual features are learned, and the channel numbers of Input and Output are not changed. The four dimensional feature maps (20 x 1024, 40 x 512, 80 x 256, 160 x 128) are finally obtained through 4 rounds of Conv and C3 modules in a back bone network, then the input feature map is divided into two directions of width and height through a CA position attention mechanism module, features are extracted along one direction, position information is reserved along the other direction, global average pooling is carried out respectively, and then the feature maps obtained in the two directions are encoded, wherein the formula is as follows:

wherein Z is _C Is the output associated with the C-channel, H, W is the height and width of the input image.

Finally, 3 identical characteristic diagrams of 5 multiplied by 5 are fused in a maximum pooling way through a rapid pooling SimSPPF layer, so that the expression capacity of the characteristic diagrams is enriched, the size of the characteristic diagrams is unchanged, and the number of channels is unchanged;

the Neck network can better extract the characteristics given by the backhaul, so that the performance of the network is improved, and the characteristics of the double-tower structure, namely the top-down characteristic fusion of the FPN structure and the bottom-up characteristic fusion of the PAN structure are adopted. And the feature map obtained by the rapid pooling layer passes through an upsamples up-sampling module in the Pytorch to obtain a feature map with the size of 40 x 512, the up-sampled feature map is spliced with a feature map with the corresponding size in a backhaul network through a Concat module, and feature fusion is carried out through a C3 module. Because the backup network has completed the extraction of the main characteristic information, the Neck network adopts a C3 module without residual error, so that the large-scale semantic information and the small-scale detail information can be better fused, thereby enhancing the positioning capability on a plurality of scales. The improved YOLOV5 model adds an additional up-sampling round for 4 times, so that a 4-layer feature map is obtained to better detect small targets.

The 4 detection layers after the head network improvement correspond to the (20×20, 40×40, 80×80, 160×160) feature maps obtained in the neg respectively. For predicting and regressing targets, each grid on the feature map has an a priori frame initially set to hold its location and classification information.

S6, obtaining total loss by one round of network training, wherein the total loss consists of classification loss, target loss and positioning loss:

a. the prior frame is adjusted by the training prediction graph to obtain a prediction frame, and then the prediction frame is calculated IoU with the real frame of the target:

and (3) representing the ratio of the intersection and union of the prediction frame and the real frame, and obtaining the positioning loss according to the target regression function CIoU:

b. the classification loss is calculated using a binary cross entropy function:

c. as with the above-described calculation of the loss, the confidence loss L is calculated using a binary cross entropy function _obj ；

The classification loss, confidence loss, and location loss are then weighted to obtain the total loss of the network:

LOSS＝λ ₁ L _cls +λ ₂ L _obj +λ ₃ L _loc

s7, reversely transmitting the loss to a YOLOV5 model, and updating network weight parameters by means of an SGD random gradient descent method to obtain a new weight model;

s8, repeating the steps S4) -S7) through the Epoch value set in the S3), wherein each round of updated network model parameters is used as a pre-training model of the next round to start a new round of training; each round of training can obtain boundary box loss, target detection loss, classification loss, P (precision), R (recall), AP (single-class precision), mAP #

Average accuracy of each category) and the like; the improved YOLOV5 network with the final training is used for obtaining an optimal model (best. Pt) in a set training round number;

s9, inputting the images to be tested in the test set into the optimal model of S8), outputting the obtained prediction frames, suppressing according to NMS maximum values, grouping all the prediction frames according to labels of each category, sorting the confidence degrees in the groups from large to small to obtain rectangular frames with highest scores, traversing the remaining rectangular frames, calculating the ratio of intersection and union of the rectangular frames with the highest scores, removing the remaining rectangular frames if the ratio is greater than a preset threshold, and continuously performing the operation on the remaining detection frames until the final prediction frames are obtained, thereby realizing the identification of steel seal characters of the medicine box;

s10, in order to judge parameters such as the running speed, the calculated amount, the accuracy, the loss and the like of the model and to know the generalization capability of the model, the parameters are further adjusted to the network model according to the indexes of different models in comparison, the structure is optimized, and finally whether the actual capability of the network reaches the detection standard is confirmed.

The result obtained according to the above example is that the average running speed of the SimSPPF layer is improved by 17.16% compared with the original SPPF layer, the average running speed is shown in table 1, and the comparison of the mAP indexes trained by the model YOLOV5 before and after modification is shown in table 2.

Fast pooling layer	Single layer average running speed/second
		SPPF	0.170002503
SimSPPF	0.140821323

TABLE 1 average operating speed of SPPF and SimSPPF monolayers

Network model	mAP@.5 (IoU =0.5 mean accuracy average)
		YOLOV5	0.973
Improved YOLOV5	0.995

TABLE 2 mAP index comparison of model training for YOLOV5 before and after improvement

According to the embodiment, the invention provides a paper medicine box seal character recognition method based on an improved YOLOV5 model, which optimizes a backbone network, embeds a CA position attention module in the backbone network part, is beneficial to better positioning and recognizing targets by the model, and improves the detection precision of small targets; the SimSPPF replaces the original SPPF, so that the identification speed is increased, and the real-time detection requirement of a production line is better met; the method has the advantages that a smaller prior frame and a detection head are added, the recognition capability of the model to the small-size target is improved, the missing detection of the small target is reduced, and the problem of missing detection and false detection of the paper medicine box seal character target can be effectively solved by using the method even facing a more complex background.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. The generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Therefore, the invention is not limited to the embodiments shown herein, but various modifications and improvements made to the technical solution of the invention shall fall within the scope of protection of the invention.

Claims

1. The paper medicine box steel seal character recognition method based on the improved YOLOV5 model is characterized by comprising the following steps of:

s5, the YOLOV5 model consists of a Backbone network, a Neck network and a Head network; the back bone network performs feature extraction on the data set, the Neck network double-tower structure samples are used for further extracting features, and the Head network performs target prediction on features of different sizes of anchors of the grid by using the extracted features;

2. The method for identifying paper kit steel seal characters based on an improved YOLOV5 model as claimed in claim 1, wherein the method comprises the following steps of: the image enhancement and data set expansion mode in the step S1) comprises the following steps: image contrast, image blur, image noise, image scaling, and image rotation.

3. A paper kit embossed character recognition method based on an improved YOLOV5 model as claimed in claim 2, wherein: marking the obtained images by using a line screen page tool makesequence, presetting the categories to be identified, framing the production date to be identified by using a rectangular frame, marking corresponding category labels, outputting TXT format of the YOLO labels after finishing the labels for all the images, wherein the marking contents are categories (classes), normalized rectangular center xy coordinates (x_center, y_center) and rectangular width (width) and length (height) respectively.

4. The method for identifying paper kit steel seal characters based on an improved YOLOV5 model as claimed in claim 1, wherein the method comprises the following steps of: the modified YOLOV5 of step S5) is composed of three networks of a backhaul network, a neg network and a Head network.

5. The method for identifying paper kit steel seal characters based on an improved YOLOV5 model as claimed in claim 1, wherein the method comprises the following steps of: the YOLOV5 model in step S4) includes the following improved methods:

a. adding a CA position attention mechanism module before a rapid pooling layer, dividing an input feature image into two directions through the module, extracting features along one direction, simultaneously reserving position information along the other direction, respectively carrying out global average pooling, and then coding the feature images obtained in the two directions, wherein the formula is as follows:

wherein Z is _C Is the output associated with the C-channel, H, W is the height and width of the input image;

6. The method for identifying paper kit steel seal characters based on an improved YOLOV5 model as claimed in claim 1, wherein the method comprises the following steps of: the total loss weight formula of YOLOV5 in step S6) is as follows:

LOSS＝λ ₁ L _cls +λ ₂ L _obj +λ ₃ L _loc

L _obj for confidence loss (target loss), all sample losses are calculated;

λ ₁ λ ₂ λ ₃ is a weight coefficient.

7. The method for identifying paper kit steel seal characters based on an improved YOLOV5 model as claimed in claim 1, wherein the method comprises the following steps of: in step S6), the CIoU loss function is used as a prediction block regression loss function for improving the YOLOV5 model algorithm, and the correlation formula is as follows:

wherein:

b- -the center point of the prediction box;

b ^gt -the center point of the real frame;

c- -the diagonal distance of the smallest bounding rectangle that can contain both the predicted and real frames; w, h- -the width and height of the prediction box;

alpha-weight coefficient;

v- -aspect ratio similarity coefficient.