CN115527210A - OCR character detection method and device based on YOLOv7 algorithm - Google Patents

OCR character detection method and device based on YOLOv7 algorithm Download PDF

Info

Publication number
CN115527210A
CN115527210A CN202211170987.2A CN202211170987A CN115527210A CN 115527210 A CN115527210 A CN 115527210A CN 202211170987 A CN202211170987 A CN 202211170987A CN 115527210 A CN115527210 A CN 115527210A
Authority
CN
China
Prior art keywords
ocr character
module
character detection
yolov7
network model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211170987.2A
Other languages
Chinese (zh)
Inventor
姚正
刘超
张庆庆
李建勋
李欢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunshan Baiao Software Co ltd
Original Assignee
Kunshan Baiao Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunshan Baiao Software Co ltd filed Critical Kunshan Baiao Software Co ltd
Priority to CN202211170987.2A priority Critical patent/CN115527210A/en
Publication of CN115527210A publication Critical patent/CN115527210A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/146Aligning or centring of the image pick-up or image-field
    • G06V30/147Determination of region of interest
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19147Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Character Discrimination (AREA)

Abstract

The invention discloses an OCR character detection method and device based on a YOLOv7 algorithm, wherein the detection method comprises the following steps: constructing an OCR character detection network model, wherein the OCR character detection network model adopts a YOLOv7 network structure as a basic frame, deep feature extraction is carried out in a backbone feature extraction network of the YOLOv7 network structure in a deep separable convolution operation mode, and an SE attention mechanism module is embedded in a head prediction module of the YOLOv7 network structure to carry out visualization and extraction of key features; establishing a training set and a verification set for carrying out training, learning and detection verification on the OCR character detection network model; and the configuration calling and deploying module is used for converting the trained OCR character detection network model into a universal model in a unified file format, deploying the universal model into the engineering platform, detecting the training set and/or the verification set and outputting an OCR character prediction result. The detection method has small parameter quantity and operand, and can achieve the light OCR character detection effect with high operation precision and high speed.

Description

OCR character detection method and device based on YOLOv7 algorithm
Technical Field
The invention relates to the technical field of OCR character image processing, in particular to an OCR character detection method and device based on a YOLOv7 algorithm.
Background
In modern life, electronic equipment is widely applied to production and life of people. The OCR character refers to a character printed on an electronic device, and industry can quickly acquire related production information of the electronic device through the OCR character. OCR character recognition has been a subject of close attention by learners because OCR characters are generally affected by the size of electronic equipment and the manufacturing process, and the size of characters is difficult to be recognized quickly and directly by the naked eye. With the continuous development of computer technology, the deep learning-based OCR character method makes new progress on OCR character detection technology, and becomes a main research method in multiple fields of current image processing through the powerful feature generalization and expression capability of the deep learning-based OCR character method. At present, two methods are mainly adopted for OCR character detection/recognition: one is an OCR character detection method based on a traditional method, and the other is an OCR character detection method based on deep learning.
Based on the traditional OCR character detection method, a corresponding threshold is set according to the shape and character characteristics of OCR characters on electronic equipment to realize the OCR character detection effect, characteristics are artificially screened by setting a reasonable threshold or applying a multiple image morphological processing mode, and the OCR character detection effect with certain precision can be usually realized according to actual conditions.
In the past OCR character detection model is realized by adopting a convolutional neural network with high depth and large width, so that a high-precision detection model can be obtained usually. However, as the depth of the network increases, the parameter quantity and the computation quantity of the network also increase greatly, so that the detection speed and the occupation of the program running space of the OCR character detection model during industrial deployment are seriously affected.
Therefore, how to obtain an OCR character detection model with high accuracy and high speed by using a detection network with a small number of parameters and a small occupied weight space becomes a technical problem to be solved urgently. In view of the above, the present invention is particularly proposed.
Disclosure of Invention
In order to overcome the defects, the invention provides an OCR character detection method and device based on a YOLOv7 algorithm, which have small parameters and operand and can achieve the light OCR character detection effect with high operation precision and high speed.
The technical scheme adopted by the invention for solving the technical problem is as follows: an OCR character detection method based on a YOLOv7 algorithm comprises the following steps:
constructing an OCR character detection network model, wherein the OCR character detection network model adopts a YOLOv7 network structure as a basic frame, deep feature extraction is carried out in a backbone feature extraction network of the YOLOv7 network structure in a deep separable convolution operation mode, and an SE attention mechanism module is embedded in a head prediction module of the YOLOv7 network structure to carry out visualization and extraction of key features;
establishing a training set and a verification set for carrying out training, learning and detection verification on the OCR character detection network model;
and the configuration calling and deploying module is used for converting the trained OCR character detection network model into a universal model in a unified file format, deploying the universal model into an engineering platform, detecting the training set and/or the verification set and outputting an OCR character prediction result.
As a further improvement of the present invention, the backbone feature extraction network has four groups of first CBS modules each composed of a depth separable convolutional layer, a batch normalization layer BN and an activation function SiLu, and three groups of first combination modules each composed of an MP1 module and an ELAN module, the four groups of the first CBS modules and the three groups of the first combination modules are sequentially arranged in a data processing order, the three groups of the first combination modules respectively output feature maps, and the feature maps output by the three groups of the first combination modules are different in size;
the head prediction module is provided with a second combination module consisting of an SPPCSPC module, two UP modules, four ELAN-H modules and two MP2 modules, and a third combination module consisting of a REP module and a CONV module, wherein the second combination module is used for receiving and fusing feature maps output by the first combination module to obtain three groups of fused feature results; the three groups of third combination modules are respectively used for receiving and processing the three groups of fusion characteristic results to obtain three groups of network prediction results with different sizes; in addition, the SPPCSPPC module is embedded with the SE attention mechanism module.
As a further refinement of the present invention, the depth separable convolution layer includes a depth separable convolution with a convolution kernel size of 3 × 3, and a point-by-point convolution with a convolution kernel size of 1 × 1;
in each of the first combination modules, the MP1 module is composed of three second CBS modules and one maximized pooling layer, the ELAN module is composed of seven third CBS modules and one fully connected layer, and the structures of the second CBS module and the third CBS module are respectively the same as the structure of the first CBS module.
As a further improvement of the present invention, the sppcpcs module is composed of seven fourth CBS modules, three maximum pooling layers and the SE attention mechanism module, the MP2 module is composed of three fifth CBS modules and one maximum pooling layer, and the structures of the fourth CBS module and the fifth CBS module are respectively the same as the structure of the first CBS module.
As a further improvement of the present invention, a method for establishing the training set and the verification set comprises:
acquiring a plurality of target object pictures with OCR characters;
labeling the obtained target object picture to obtain a corresponding labeling frame; dividing the marked target object picture into a training sample set and a verification sample set;
performing enhancement processing on the obtained training sample set to obtain the training set;
the validation sample set is used directly as the validation set.
As a further improvement of the invention, based on the training set, the method for performing iterative training on the OCR character detection network model comprises the following steps:
inputting the training set into the OCR character detection network model in batches for forward propagation training, and obtaining a network prediction result of each batch of forward propagation training after passing through an intermediate layer of the OCR character detection network model;
updating and optimizing the OCR character detection network model by adopting a back propagation algorithm, namely: firstly, calculating the network prediction result of forward propagation training of each batch and the error between the labeling frames corresponding to the training samples of the batch by using a loss function of the OCR character detection network model, and then updating the convolution kernel weight parameters and the intermediate layer connection parameters of the OCR character detection network model by using an Adam optimizer until the error value tends to be stable or the OCR character detection network model reaches a set training round, so as to obtain the trained OCR character detection network model.
As a further improvement of the invention, based on the verification set, the method for performing detection verification by the OCR character detection network model comprises the following steps:
and inputting the verification set into the trained OCR character detection network model to obtain the size, position and category of an OCR character image, namely completing a target detection task and verifying the OCR character detection network model.
As a further improvement of the invention, the file format in the general model is an ONNX format.
The invention also provides an OCR character detection device based on the YOLOv7 algorithm, which comprises the following components:
the OCR character detection network model is configured to adopt a YOLOv7 network structure as a basic frame, adopt a deep separable convolution operation mode to extract deep features in a backbone feature extraction network of the YOLOv7 network structure, and embed an SE attention mechanism module in a head prediction module of the YOLOv7 network structure to display and extract key features;
a training set configured for training learning for the OCR character detection network model;
a validation set configured for the OCR character detection network model to perform detection validation;
and the calling and deploying module is configured to convert the trained OCR character detection network model into a general model in a unified file format, deploy the general model into an engineering platform, detect the training set and/or the verification set and output an OCR character prediction result.
The beneficial effects of the invention are: according to the invention, through technical innovation, the constructed OCR character detection network model is embedded with a deep separable convolution operation and SE attention mechanism on the basis of a YOLOv7 network structure, so that the parameters and the operand of the OCR character detection network model are small, the light OCR character detection effect with high operation precision and high speed is achieved, and the problems of low operation precision and low inference speed of the conventional OCR detection network are well solved.
Drawings
FIG. 1 is a flow chart of the OCR character detection method based on the YOLOv7 algorithm according to the present invention;
FIG. 2 is a block diagram of the architecture of an OCR character detection network model according to the present invention;
FIG. 3 is a block diagram of the MP module according to the present invention;
FIG. 4 is a block diagram of the ELAN module according to the present invention;
FIG. 5 is a block diagram of the structure of the SPPCSPC module according to the present invention;
FIG. 6 is a block diagram of a configuration of an SE attention mechanism module according to the present invention;
fig. 7 is a block diagram of the OCR character detection apparatus based on the YOLOv7 algorithm according to the present invention.
The following description is made with reference to the accompanying drawings:
1. OCR character detection network model; 10. a backbone feature extraction network; 100. a first CBS module; 101. a first assembly module; 11. a head prediction module; 110. a second assembling module; 111. a third assembling module; 20. training a set; 21. a verification set; 3. and calling and deploying the module.
Detailed Description
A preferred embodiment of the present invention will be described in detail below with reference to the accompanying drawings.
Example 1:
please refer to fig. 1, which is a flowchart illustrating an OCR character detection method based on the YOLOv7 algorithm according to the present invention.
The OCR character detection method based on the YOLOv7 algorithm comprises the following operation steps:
s1: constructing an OCR character detection network model 1, wherein the OCR character detection network model 1 adopts a YOLOv7 network structure as a basic frame, deep feature extraction is carried out in a backbone feature extraction network 10 of the YOLOv7 network structure by adopting a depth separable convolution operation mode, specifically, OCR character feature information is extracted by adopting depth separable convolution, and an SE attention mechanism module is embedded in a head prediction module 11 of the YOLOv7 network structure for visualization and extraction of key features, so that the detection precision is improved;
s2: establishing a training set 20 and a verification set 21 for the OCR character detection network model 1 to perform training, learning and detection verification;
s3: and the configuration calling and deploying module 3 is used for converting the trained OCR character detection network model 1 into a universal model in a unified file format, deploying the universal model into an engineering platform, detecting the training set 20 and/or the verification set 21, and outputting an OCR character prediction result.
According to the invention, through technical innovation, the constructed OCR character detection network model is embedded with a deep separable convolution operation and SE attention mechanism on the basis of a YOLOv7 network structure, so that the parameters and the operand of the OCR character detection network model are small, the light OCR character detection effect with high operation precision and high speed is achieved, and the problems of low operation precision and low inference speed of the conventional OCR detection network are well solved.
Example 2:
embodiment 2 provides an OCR character detection network model structure, and the specific structure can be referred to fig. 2.
The OCR character detection network model 1 comprises a backbone feature extraction network 10 and a head prediction module 11, wherein the backbone feature extraction network 10 comprises four first CBS modules 100 and three first combination modules 101, wherein the four first CBS modules 100 are composed of depth separable convolution layers (depthwise partial), batch normalization layers BN and activation functions SiLu, the three first combination modules 101 are composed of MP1 modules and ELAN modules, the four first CBS modules 100 and the three first combination modules 101 are sequentially arranged according to a data processing sequence, the three first combination modules 101 respectively output feature maps, and the feature maps output by the three first combination modules 101 are different in size; the head predicting module 11 has a second combination module 110 composed of an SPPCSPC module, two UP modules, four ELAN-H modules and two MP2 modules, and a third combination module 111 composed of a REP module and a CONV module, where the second combination module 110 is configured to receive and fuse three sets of feature maps output by the first combination module 101 to obtain three sets of fused feature results; the three groups of third combination modules 111 are respectively configured to receive and process the three groups of fusion feature results to obtain three groups of network prediction results (i.e., prediction frames) with different sizes; in addition, the SPPCSPPC module is embedded with the SE attention mechanism module.
Further preferably, the depth-separable convolution layer includes depth-separable convolutions with convolution kernel size of 3 × 3 and point-by-point convolutions with convolution kernel size of 1 × 1, which can achieve the effect of reducing the number of parameters.
Further preferably, in each of the first combination modules 101, the MP1 module is composed of three second CBS modules and one maximized pooling layer, the ELAN module is composed of seven third CBS modules and one fully-connected layer, and the structures of the second CBS module and the third CBS module are the same as the structure of the first CBS module 100. For the MP1 module (the specific structure of the MP module can be seen in fig. 3), its role is mainly to visualize features while reducing dimensionality in a manner of minimal information loss; for the ELAN module (the specific structure can be seen in fig. 4), the same group parameters and channel multipliers are applied to convolution kernels of a computation layer by expanding channels and cardinality, then the group parameters are scrambled into a plurality of groups, and then the groups are spliced and fused, so that the OCR character detection network model 1 can learn the effects of diversified feature information and multi-scale feature information.
Further preferably, the number of layers corresponding to each of the first CBS modules 100 is 2. Therefore, according to the backbone feature extraction network 10 structure shown in fig. 2, the number of layers corresponding to the MP1 module is 7, the number of layers corresponding to the ELAN module is 15, and the backbone feature extraction network 10 includes 93 layers in total.
Further preferably, the sppcpcs module is composed of seven fourth CBS modules, three maximum pooling layers and the SE attention mechanism module, the MP2 module is composed of three fifth CBS modules and one maximum pooling layer, and the structures of the fourth CBS module and the fifth CBS module are respectively the same as the structure of the first CBS module 100. For the SPPCSPC module, in combination with the SPPCSPC module structure diagram shown in fig. 5 and the SE attention mechanism module structure diagram shown in fig. 6, it can be seen that: the SPPCSPPC module divides the input into two branches, one branch is subjected to feature extraction, then features of different sizes are subjected to maximum pooling through a feature pyramid, feature information is extracted in a display mode, and the result and a feature map (namely the other branch) which is not subjected to feature pyramid pooling are spliced. For the MP2 module, the MP1 module and the MP2 module belong to MP modules, and the ratio of the two modules is mainly changed by the number of channels.
In addition, the SE attention mechanism module employs conventional technical means, and therefore, will not be described in detail herein. The structure of the ELAN-H module is the same as that of the ELAN module, but the number of output channels of the ELAN-H module is 1/4 of the number of input channels. The REP module includes one depth separable convolutional layer with convolution kernel size of 3 × 3 and one convolutional layer with convolution kernel size of 1 × 1.
In summary, the structure of the OCR character detection network model 1 is generally described with reference to fig. 2, which is: (1) after passing through four groups of the first CBS modules 100, one ELAN module is accessed, and then the three groups of the first combination modules 101 are entered, the output of the three groups of the first combination modules 101 corresponds to the C3/C4/C5 output, and the sizes of the output three groups of feature maps are respectively 80 × 80 × 512, 40 × 40 × 1024, and 20 × 20 × 1024. (2) Three groups of feature maps output by the backbone feature extraction network 10 enter different-size modules of the head prediction module 11 respectively, wherein one group of feature maps (20 × 20 × 1024) enter the SPPCSPC module, and the number of time channels is reduced from 1024 to 512; the remaining two groups of feature maps (80 multiplied by 512 and 40 multiplied by 1024) are fused with the result of the UP module according to the form of top-down to obtain results P3, P4 and P5, and then are fused with the results P4 and P5 according to the form of bottom-UP to obtain three groups of fused feature results; finally, the three groups of fusion feature results respectively perform channel number adjustment and convolution calculation through the three groups of third combination modules 111, specifically: obtaining 3 output frames with different sizes, then calculating and comparing the intersection ratio of the training set (ground route) and the three output frames, and outputting a prediction frame with the highest intersection ratio value (namely a network prediction result).
In addition, the loss function defined in the OCR character detection network model 1 according to the present invention is:
Figure BDA0003860490810000101
in the above formula, L total Is the total loss function, N is the number of detection layers, L conf As a function of confidence, L local For the localization loss function, L cls As a function of the classification loss, λ 1 、λ 2 、λ 3 The weights of the three loss functions are respectively.
The three loss functions are specifically described below:
(1) confidence function L conf The expression of (a) is:
Figure BDA0003860490810000102
Figure BDA0003860490810000103
σ(x i )=Sigmoid(x i );
in the above formula, y i ∈[0,1]Expressing the intersection ratio IOU of the predicted target boundary box and the real target boundary box, sigma expressing the obtained prediction confidence, b expressing the predicted box gt Representing the label box, n is the number of positive and negative samples, x i Is the probability of the current class obtained after the activation function.
(2) Location loss function L local The expression of (a) is:
L local =(x-x 1 ) 2 +(y-y 1 ) 2 +(w-w 1 ) 2 +(h-h 1 ) 2
in the above formula, x, y, w and h represent the coordinates of the predicted frame and the length and width of the frame, respectively.
(3) Classification loss function L cls The expression of (a) is:
Figure BDA0003860490810000111
Figure BDA0003860490810000112
in the above formula, S i ×S j The number of grids into which the scale is divided is shown, n is the number of positive and negative samples, x i Representing the current class predictor, y i Representing the current categoryThe true value of (d).
Example 3:
this embodiment 3 provides a method for establishing the training set 20 and the validation set 21.
With reference to fig. 3, the method for establishing the training set 20 and the validation set 21 includes the following steps:
s21: acquiring a plurality of target object pictures with OCR characters;
specifically, a wafer image is acquired through an acquisition device (such as a camera), then a wafer image without OCR characters is removed, and a wafer image with OCR characters is reserved, so that a plurality of target object images with OCR characters are obtained.
S22: labeling the obtained target object picture to obtain a corresponding labeling frame; dividing the marked target object picture into a training sample set and a verification sample set;
specifically, labelme label making software is used for manually labeling each wafer picture with OCR characters, namely, each OCR character is labeled with a label, and after labeling, a corresponding labeling frame, position information (including coordinates of a central point of the labeling frame, frame height and frame width) and category information can be obtained; then, dividing the marked target object picture into a training sample set and a verification sample set, wherein the number proportion of the pictures in the training sample set and the verification sample set is not limited, and the number proportion is preferably 9:1 or 8:2, or others.
S23: performing enhancement processing on the obtained training sample set to obtain the training set 20; the validation sample set may be used directly as the validation set 21.
Specifically, a training sample set is enhanced by adopting enhancement modes such as mosaic, mosaic9, mixup, copy _ paste and the like to obtain the training set 20; the mosaic and mosaic9 are the effect of randomly extracting 4 images from a data set to splice random areas to realize data enhancement, the mixup is the enhancement by mixing layers of two random images, and the copy _ pass is the enhancement by randomly replacing the backgrounds of different images.
Example 4:
based on the training set 20 and the verification set 21 established in embodiment 3, this embodiment 4 provides a method for performing training, learning, detection, and verification on the OCR character detection network model 1, which specifically includes:
i) based on the training set 20, the method for performing iterative training on the OCR character detection network model 1 comprises the following steps:
s24: setting an initial iteration number (e.g. 2000), an initial learning rate (e.g. 10) -4 ) And batch _ size (e.g., 320), and set data set path parameters and category parameters (e.g., 38), trained with the parallel GPU;
s25: inputting the training set 20 into the OCR character detection network model 1 in batches for forward propagation training, and obtaining a network prediction result of each batch of forward propagation training after passing through the middle layer of the OCR character detection network model 1;
specifically, clustering target object pictures in the training set 20 to obtain K prior frames, inputting all the prior frames and original pictures in the training set 20 into the OCR character detection network model 1 to generate a feature map, and obtaining position information, category information and confidence of the feature map relative to the prior frames; obtaining a certain number of candidate frames based on a set confidence threshold value, and position information, category information and confidence of the feature map relative to the prior frame; and performing non-maximum suppression on all the candidate frames to obtain a prediction frame, namely a network prediction result.
S26: updating and optimizing the OCR character detection network model 1 by adopting a back propagation algorithm, namely: firstly, calculating errors between a network prediction result (a prediction frame) of forward propagation training of each batch and a labeling frame corresponding to a batch of training samples by using a loss function of the OCR character detection network model 1, and then updating a convolution kernel weight parameter and an intermediate layer connection parameter of the OCR character detection network model 1 by using an Adam optimizer until the error value tends to be stable or the OCR character detection network model 1 reaches a set training round, so as to obtain the trained OCR character detection network model 1.
Specifically, the update formula is:
Figure BDA0003860490810000131
η in the above formula is the step length, typically η =1 × 10 -5 ;ω t+1 ,v t+1 Respectively represent omega t ,v t As a result of the update of (2),
Figure BDA0003860490810000132
the partial derivative result is shown.
II) based on the verification set 21, the method for the OCR character detection network model 1 to perform detection verification (specifically, the method is used for verifying the model detection result of the OCR character detection network model 1 after each iteration training) comprises the following steps:
and inputting the verification set 21 into the trained OCR character detection network model 1 to obtain the size, position and category of an OCR character image, namely completing a target detection task and verifying the OCR character detection network model 1.
Example 5:
this embodiment 5 provides a method for the calling and deploying module 3 to convert and deploy the trained OCR character detection network model 1, which specifically includes:
s51: applying the trained OCR character detection network model 1 weight to a Yolov7 official program, converting a suffix pt format file into a suffix ONNX format, and obtaining the general model, namely, the file format in the general model is the ONNX format;
s52: downloading visual studio 2022 from the official network, downloading NET platform deployment engineering from the Yolov7 official network, replacing ONNX format files in the original Yolov7 with ONNX format files of the general model, and then modifying type parameters, type quantity parameters, size parameters, confidence coefficient parameters and the like to finish the deployment of the general model in the engineering platform;
s53: and inputting the training set 20 and/or the verification set 21 into an engineering platform for detection to obtain all prediction results, and simultaneously performing test analysis.
As can be seen from the above, the calling and deploying module 3 can be understood as a set of "Yolov7 official program, visual studio 2022 for official website download, and NET platform deployment engineering for Yolov7 official website download".
The method for the engineering platform to detect the training set 20 and/or the verification set 21 is as follows: in the detection, six evaluation indexes, namely precision (precision), recall (Recall), intersection Over Unit (IOU), average Precision (AP), average precision Mean (MAP) and detection speed (speed), are adopted as evaluation standards of the OCR character detection experiment.
The precision ratio is defined as:
Figure BDA0003860490810000141
recall is defined as:
Figure BDA0003860490810000142
the cross-over ratio is defined as:
Figure BDA0003860490810000151
the average accuracy is defined as:
Figure BDA0003860490810000152
the mean precision mean is defined as:
Figure BDA0003860490810000153
in the above formula, TP represents a positive sample predicted as a positive class by the model, TN represents a negative sample predicted as a negative class by the model, FP represents a negative sample predicted as a positive class by the model, FN represents a positive sample predicted as a negative class by the model, a represents a predicted OCR character prediction result, B represents an OCR character label, n represents the number of classes, P (k) and △R(k) precision and recall, respectively.
Example 6:
this embodiment 6 provides an OCR character detection device based on the YOLOv7 algorithm, and the specific structure can be seen from fig. 7.
The OCR character detection apparatus includes: the method comprises an OCR character detection network model 1, a training set 20, a verification set 21 and a calling and deploying module 3, wherein the OCR character detection network model 1 is configured to adopt a YOLOv7 network structure as a basic framework, deep feature extraction is carried out in a backbone feature extraction network 10 of the YOLOv7 network structure in a deep separable convolution operation mode, an SE attention mechanism module is embedded in a head prediction module 11 of the YOLOv7 network structure to carry out visualization and extraction of key features, and detection accuracy is improved; the training set 20 is configured for training learning by the OCR character detection network model 1; the validation set 21 is configured to be used for the OCR character detection network model 1 to perform detection validation, specifically, for model detection result validation after each round of iterative training is performed on the OCR character detection network model 1; the calling and deploying module 3 is configured to convert the trained OCR character detection network model 1 into a universal model in a unified file format, deploy the universal model into an industrial application engineering platform, and then detect the training set 20 and/or the verification set 21 and output OCR character prediction results.
Further preferably, the OCR character detection network model 1 may adopt the network model structure shown in embodiment 2.
The training set 20 and the validation set 21 may adopt the "training set" and "validation set" structures shown in example 3.
The invoking and deploying module 3 may adopt Yolov7 official program, visual studio 2022 downloaded by the official website, and NET platform deployment project downloaded by the Yolov7 official website as shown in embodiment 5.
The OCR character detection apparatus provided in this embodiment 6 can execute the OCR character detection method based on the YOLOv7 algorithm provided in any embodiment of the present invention, and has corresponding beneficial effects of the execution method.
In conclusion, through technical innovation, the constructed OCR character detection network model is embedded with a deep separable convolution operation and an SE attention mechanism on the basis of a YOLOv7 network structure, so that the parameters and the operand of the OCR character detection network model are small, the light OCR character detection effect with high operation precision and high speed is achieved, and the problems of low operation precision and low inference speed of the conventional OCR detection network are well solved.
In the previous description, numerous specific details were set forth in order to provide a thorough understanding of the present invention. The foregoing description is that of the preferred embodiment of the invention only, and the invention can be practiced in many ways other than as described herein, so that the invention is not limited to the specific implementations disclosed above. And that those skilled in the art may, using the methods and techniques disclosed above, make numerous possible variations and modifications to the disclosed embodiments, or modify equivalents thereof, without departing from the scope of the claimed embodiments. Any simple modification, equivalent change and modification of the above embodiments according to the technical essence of the present invention are within the scope of the technical solution of the present invention.

Claims (9)

1. An OCR character detection method based on a YOLOv7 algorithm is characterized in that: the method comprises the following steps:
constructing an OCR character detection network model (1), wherein the OCR character detection network model (1) adopts a YOLOv7 network structure as a basic frame, deep feature extraction is carried out in a backbone feature extraction network (10) of the YOLOv7 network structure in a deep separable convolution operation mode, and an SE attention mechanism module is embedded in a head prediction module (11) of the YOLOv7 network structure for visualization and extraction of key features;
establishing a training set (20) and a verification set (21) for the OCR character detection network model (1) to perform training learning and detection verification;
and the configuration calling and deploying module (3) is used for converting the trained OCR character detection network model (1) into a universal model in a unified file format, deploying the universal model into an engineering platform, detecting the training set (20) and/or the verification set (21) and outputting an OCR character prediction result.
2. An OCR character detection method based on the YOLOv7 algorithm as claimed in claim 1, characterized in that: the backbone feature extraction network (10) is provided with four first CBS modules (100) which are all composed of a depth separable convolution layer, a batch normalization layer BN and an activation function SiLu, and three first combination modules (101) which are all composed of an MP1 module and an ELAN module, wherein the four first CBS modules (100) and the three first combination modules (101) are sequentially arranged according to a data processing sequence, the three first combination modules (101) respectively output feature maps, and the feature maps output by the three first combination modules (101) are different in size;
the head prediction module (11) is provided with a second combination module (110) consisting of an SPPCSPC module, two UP modules, four ELAN-H modules and two MP2 modules, and three third combination modules (111) consisting of a REP module and a CONV module, wherein the second combination module (110) is used for receiving and fusing three groups of feature maps output by the first combination module (101) to obtain three groups of fused feature results; the three groups of third combination modules (111) are respectively used for receiving and processing the three groups of fusion characteristic results to obtain three groups of network prediction results with different sizes; in addition, the SPPCSPPC module is embedded with the SE attention mechanism module.
3. An OCR character detection method based on the YOLOv7 algorithm as recited in claim 2, wherein: the depth separable convolution layer comprises depth separable convolution with convolution kernel size of 3 multiplied by 3 and point-by-point convolution with convolution kernel size of 1 multiplied by 1;
in each of the first combination modules (101), the MP1 module is composed of three second CBS modules and one maximized pooling layer, the ELAN module is composed of seven third CBS modules and one fully connected layer, and the structures of the second CBS module and the third CBS module are respectively the same as the structure of the first CBS module (100).
4. An OCR character detection method based on the YOLOv7 algorithm as claimed in claim 2, characterized in that: the SPPCSPPC module consists of seven fourth CBS modules, three maximum pooling layers and the SE attention mechanism module, the MP2 module consists of three fifth CBS modules and one maximum pooling layer, and the structures of the fourth CBS modules and the fifth CBS modules are respectively the same as that of the first CBS module (100).
5. An OCR character detection method based on the YOLOv7 algorithm as recited in claim 1, wherein: a method of establishing the training set (20) and the validation set (21), comprising:
acquiring a plurality of target object pictures with OCR characters;
labeling the obtained target object picture to obtain a corresponding labeling frame; dividing the marked target object picture into a training sample set and a verification sample set;
performing enhancement processing on the obtained training sample set to obtain the training set (20);
the verification sample set is directly used as the verification set (21).
6. An OCR character detection method based on the YOLOv7 algorithm as recited in claim 5 wherein: based on the training set (20), the method for iteratively training the OCR character detection network model (1) comprises the following steps:
inputting the training set (20) into the OCR character detection network model (1) in batches for forward propagation training, and obtaining a network prediction result of each batch of forward propagation training after passing through the middle layer of the OCR character detection network model (1);
updating and optimizing the OCR character detection network model (1) by adopting a back propagation algorithm, namely: firstly, calculating the network prediction result of forward propagation training of each batch and the error between the labeling frames corresponding to the training samples of the batch by using a loss function of the OCR character detection network model (1), and then updating the convolution kernel weight parameters and the intermediate layer connection parameters of the OCR character detection network model (1) by using an Adam optimizer until the error value tends to be stable or the OCR character detection network model (1) reaches a set training round to obtain the trained OCR character detection network model (1).
7. An OCR character detection method based on the YOLOv7 algorithm as recited in claim 5 wherein: based on the verification set (21), the method for detecting and verifying the OCR character detection network model (1) comprises the following steps:
and inputting the verification set (21) into the trained OCR character detection network model (1) to obtain the size, position and category of an OCR character image, namely completing a target detection task and verifying the OCR character detection network model (1).
8. An OCR character detection method based on the YOLOv7 algorithm as claimed in claim 1, characterized in that: the file format in the general model is an ONNX format.
9. An OCR character detection device based on a YOLOv7 algorithm is characterized in that: the method comprises the following steps:
the OCR character detection network model (1) is configured to adopt a YOLOv7 network structure as a basic frame, adopt a deep separable convolution operation mode to extract deep features in a backbone feature extraction network (10) of the YOLOv7 network structure, and embed an SE attention mechanism module in a head prediction module (11) of the YOLOv7 network structure to display and extract key features;
a training set (20) configured for training learning for the OCR character detection network model (1);
a validation set (21) configured for detection validation by the OCR character detection network model (1);
and the calling and deploying module (3) is configured to convert the trained OCR character detection network model (1) into a universal model in a unified file format, and after the universal model is deployed in an engineering platform, detect the training set (20) and/or the verification set (21) and output an OCR character prediction result.
CN202211170987.2A 2022-09-23 2022-09-23 OCR character detection method and device based on YOLOv7 algorithm Pending CN115527210A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211170987.2A CN115527210A (en) 2022-09-23 2022-09-23 OCR character detection method and device based on YOLOv7 algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211170987.2A CN115527210A (en) 2022-09-23 2022-09-23 OCR character detection method and device based on YOLOv7 algorithm

Publications (1)

Publication Number Publication Date
CN115527210A true CN115527210A (en) 2022-12-27

Family

ID=84698960

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211170987.2A Pending CN115527210A (en) 2022-09-23 2022-09-23 OCR character detection method and device based on YOLOv7 algorithm

Country Status (1)

Country Link
CN (1) CN115527210A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116129566A (en) * 2023-04-18 2023-05-16 松立控股集团股份有限公司 Intelligent parking spot lock linkage method
CN116246282A (en) * 2023-02-10 2023-06-09 青海师范大学 Scene Tibetan detection method based on improved double-attention YOLOv7
CN116563650A (en) * 2023-07-10 2023-08-08 邦世科技(南京)有限公司 Deep learning-based endplate inflammatory degeneration grading method and system
CN116935473A (en) * 2023-07-28 2023-10-24 山东智和创信息技术有限公司 Real-time detection method and system for wearing safety helmet based on improved YOLO v7 under complex background
CN117315614A (en) * 2023-11-28 2023-12-29 南昌大学 Traffic target detection method based on improved YOLOv7

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116246282A (en) * 2023-02-10 2023-06-09 青海师范大学 Scene Tibetan detection method based on improved double-attention YOLOv7
CN116129566A (en) * 2023-04-18 2023-05-16 松立控股集团股份有限公司 Intelligent parking spot lock linkage method
CN116563650A (en) * 2023-07-10 2023-08-08 邦世科技(南京)有限公司 Deep learning-based endplate inflammatory degeneration grading method and system
CN116563650B (en) * 2023-07-10 2023-10-13 邦世科技(南京)有限公司 Deep learning-based endplate inflammatory degeneration grading method and system
CN116935473A (en) * 2023-07-28 2023-10-24 山东智和创信息技术有限公司 Real-time detection method and system for wearing safety helmet based on improved YOLO v7 under complex background
CN117315614A (en) * 2023-11-28 2023-12-29 南昌大学 Traffic target detection method based on improved YOLOv7
CN117315614B (en) * 2023-11-28 2024-03-29 南昌大学 Traffic target detection method based on improved YOLOv7

Similar Documents

Publication Publication Date Title
CN115527210A (en) OCR character detection method and device based on YOLOv7 algorithm
CN110210486B (en) Sketch annotation information-based generation countermeasure transfer learning method
CN109086722B (en) Hybrid license plate recognition method and device and electronic equipment
US10878284B2 (en) Method and apparatus for training image model, and method and apparatus for category prediction
US8379994B2 (en) Digital image analysis utilizing multiple human labels
CN107944450B (en) License plate recognition method and device
WO2019089578A1 (en) Font identification from imagery
CN111401516A (en) Neural network channel parameter searching method and related equipment
CN107247952B (en) Deep supervision-based visual saliency detection method for cyclic convolution neural network
CN111612051A (en) Weak supervision target detection method based on graph convolution neural network
CN110096938A (en) A kind for the treatment of method and apparatus of action behavior in video
US20230215154A1 (en) Apparatus and method for detecting elements of an assembly
CN108710893A (en) A kind of digital image cameras source model sorting technique of feature based fusion
CN114140683A (en) Aerial image target detection method, equipment and medium
CN112364974B (en) YOLOv3 algorithm based on activation function improvement
CN109165654B (en) Training method of target positioning model and target positioning method and device
CN115631396A (en) YOLOv5 target detection method based on knowledge distillation
CN115082752A (en) Target detection model training method, device, equipment and medium based on weak supervision
CN113903022A (en) Text detection method and system based on feature pyramid and attention fusion
CN112767351A (en) Transformer equipment defect detection method based on sensitive position dependence analysis
CN114170484B (en) Picture attribute prediction method and device, electronic equipment and storage medium
CN112507888A (en) Building identification method and device
CN111860601A (en) Method and device for predicting large fungus species
CN116958729A (en) Training of object classification model, object classification method, device and storage medium
CN111160282A (en) Traffic light detection method based on binary Yolov3 network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination