CN113591866B - Special operation certificate detection method and system based on DB and CRNN - Google Patents
Special operation certificate detection method and system based on DB and CRNN Download PDFInfo
- Publication number
- CN113591866B CN113591866B CN202110865778.9A CN202110865778A CN113591866B CN 113591866 B CN113591866 B CN 113591866B CN 202110865778 A CN202110865778 A CN 202110865778A CN 113591866 B CN113591866 B CN 113591866B
- Authority
- CN
- China
- Prior art keywords
- text
- special operation
- image
- data set
- crnn
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 149
- 102100032202 Cornulin Human genes 0.000 title claims abstract 28
- 101000920981 Homo sapiens Cornulin Proteins 0.000 title claims abstract 28
- 238000000034 method Methods 0.000 claims abstract description 55
- 238000012549 training Methods 0.000 claims description 100
- 238000002372 labelling Methods 0.000 claims description 36
- 230000008569 process Effects 0.000 claims description 24
- 238000007781 pre-processing Methods 0.000 claims description 16
- 238000010276 construction Methods 0.000 claims description 13
- 238000010606 normalization Methods 0.000 claims description 8
- 230000008707 rearrangement Effects 0.000 claims description 5
- 238000012360 testing method Methods 0.000 description 28
- 238000009826 distribution Methods 0.000 description 18
- 239000013598 vector Substances 0.000 description 18
- 230000006870 function Effects 0.000 description 17
- 238000000605 extraction Methods 0.000 description 15
- 238000013527 convolutional neural network Methods 0.000 description 12
- 238000012545 processing Methods 0.000 description 12
- 230000004913 activation Effects 0.000 description 11
- 239000011159 matrix material Substances 0.000 description 11
- 230000009466 transformation Effects 0.000 description 11
- 238000004364 calculation method Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 238000005070 sampling Methods 0.000 description 6
- 238000012015 optical character recognition Methods 0.000 description 5
- 230000007246 mechanism Effects 0.000 description 4
- 238000005457 optimization Methods 0.000 description 4
- 230000011218 segmentation Effects 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 238000011176 pooling Methods 0.000 description 3
- 238000012805 post-processing Methods 0.000 description 3
- 230000000306 recurrent effect Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 230000015654 memory Effects 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 230000006403 short-term memory Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000001351 cycling effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 238000012797 qualification Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a special operation certificate detection method and system based on DB and CRNN, wherein the method comprises the following steps: inputting each target special operation certificate image into a DB text detection network model to determine a text box data set corresponding to each target special operation certificate image; and inputting each target special operation certificate image and the text box data set corresponding to each target special operation certificate image into a CRNN text recognition network model to determine text information in each target text box in each target special operation certificate image. The backup module in the DB text detection network adopts a MobileNet V3-large structure; the partial structure of the CNN module in the CRNN text recognition network adopts a MobileNet V3-small structure. The invention can achieve the purposes of reducing the manual workload and improving the detection efficiency of the certificate image.
Description
Technical Field
The invention relates to the technical field of optical character recognition, in particular to a special operation certificate detection method and system based on DB and CRNN.
Background
In the construction process of the 5G base station, constructors have qualified and effective special operation certificates, and are indispensable safety guarantee in the construction process. At present, the detection of the special operation certificate is mostly carried out manually, the detection efficiency is low, and the feedback of the detection of the special operation certificate can not be timely and effectively obtained.
Disclosure of Invention
The invention aims to provide a special operation certificate detection method and system based on DB and CRNN, so as to achieve the purposes of reducing the manual workload and improving the certificate image detection efficiency.
In order to achieve the above object, the present invention provides the following solutions:
a special operation certificate detection method based on DB and CRNN comprises the following steps:
acquiring a special operation certificate image data set; the special operation certificate image data set comprises a plurality of target special operation certificate images, and each target special operation certificate image has text information; inputting each target special operation certificate image into a DB text detection network model to determine a text box data set corresponding to each target special operation certificate image; the elements in the text box data set represent the position information of the target text box; inputting each target special operation certificate image and a text box data set corresponding to each target special operation certificate image into a CRNN text recognition network model to determine text information in each target text box in each target special operation certificate image; the text information comprises at least one of constructor name, constructor sex, certificate number, operation category and certificate validity date;
The DB text detection network model is obtained by training based on a DB text detection network and a first training data set; the backup module in the DB text detection network adopts a MobileNet V3-large structure; each element in the first training data set comprises a historical special operation certificate image and a first class label corresponding to the historical special operation certificate image; the first category label is the position information of the history text box; the CRNN text recognition network model is obtained based on the CRNN text recognition network and the second training data set; the partial structure of the CNN module in the CRNN text recognition network adopts a MobileNet V3-small structure; each element in the second training data set comprises a history special operation certificate image and a second class label corresponding to the history special operation certificate image; the second category labels are historical text information.
A special job certificate detection system based on DB and CRNN, comprising:
the data acquisition module is used for acquiring a special operation certificate image data set; the special operation certificate image data set comprises a plurality of target special operation certificate images, and each target special operation certificate image has text information; the text box data set determining module is used for inputting each target special operation certificate image into the DB text detection network model so as to determine a text box data set corresponding to each target special operation certificate image; the elements in the text box data set represent the position information of the target text box; the text information determining module is used for inputting each target special operation certificate image and a text box data set corresponding to each target special operation certificate image into the CRNN text recognition network model so as to determine text information in each target text box in each target special operation certificate image; the text information comprises at least one of constructor name, constructor sex, certificate number, operation category and certificate validity date;
The DB text detection network model is obtained by training based on a DB text detection network and a first training data set; the backup module in the DB text detection network adopts a MobileNet V3-large structure; each element in the first training data set comprises a historical special operation certificate image and a first class label corresponding to the historical special operation certificate image; the first category label is the position information of the history text box; the CRNN text recognition network model is obtained based on the CRNN text recognition network and the second training data set; the partial structure of the CNN module in the CRNN text recognition network adopts a MobileNet V3-small structure; each element in the second training data set comprises a history special operation certificate image and a second class label corresponding to the history special operation certificate image; the second category labels are historical text information.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention adopts the DB text detection network model and the CRNN text recognition network model, and can rapidly and accurately complete the detection of the special operation certificate image. The DB text detection network model can be well suitable for a lightweight network to be used as a feature extraction module, and under the condition that extra memory and time are not required to be consumed after the model is light, corresponding texts in special operation certificates are rapidly predicted, text areas are marked by adopting frames, the text areas are extracted from images, and frame information of a text target is obtained. The CRNN text recognition network model carries out text recognition on the predicted text block images, and can introduce BiLSTM and CTC mechanisms aiming at the condition that special operation certificate image data are short texts, so that global prediction of text feature sequences is enhanced and the text feature sequences are directly learned in the short texts (line-level labels), and the CRNN text recognition network model is not required to be used for learning and training additional detailed character-level labels, thereby improving the accuracy and efficiency of text recognition.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a special operation certificate detection method based on DB and CRNN;
FIG. 2 is a schematic diagram of a special job certificate detection system based on DB and CRNN;
FIG. 3 is an overall flow chart of the method for detecting special operation certificates based on DB and CRNN of the invention;
FIG. 4 is a schematic diagram of the overall structure of the DB text detection network of the present invention;
fig. 5 is a schematic diagram of the overall structure of the CRNN text recognition network according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
The invention uses the optical character recognition technology in the deep learning model to efficiently detect the special operation certificate image. The existing optical character recognition method based on deep learning mainly adopts a two-stage mode: text detection and text recognition. The DB (English full name: differentiable Binarization) algorithm used in the invention puts the binarization operation into the network from the pixel level and optimizes the binarization operation at the same time, so that the threshold value of each pixel point can be adaptively predicted, and the DB algorithm finishes binarization and can be slightly achieved when being used together with a segmentation network in an approximate method, thereby simplifying the post-processing process and accelerating the detection speed of a target. The CRNN (English full name: convolutional Recurrent Neural Network) algorithm used in the invention adopts a combination method of CNN, LSTM (Long Short Term Memory) and CTC (Connectionist Temporal Classification), and introduces a CTC method to solve the problem that characters cannot be aligned during training, and serial decoding operation is not needed like Attention OCR, so that the network structure is more optimized.
Example 1
The embodiment discloses a special operation certificate detection method based on DB and CRNN, which predicts the text position from the certificate image data set and identifies the specific information of the text to support the detection of the special operation certificate, thereby judging the qualification of the special operation certificate, belonging to the field of computer vision identification, in particular to the field of optical character identification. Referring to fig. 1, the method for detecting a special job certificate based on DB and CRNN according to the present embodiment includes the following steps.
Step 101: acquiring a special operation certificate image data set; the special operation certificate image data set comprises a plurality of target special operation certificate images, and each target special operation certificate image has text information. Step 102: inputting each target special operation certificate image into a DB text detection network model to determine a text box data set corresponding to each target special operation certificate image; the elements in the text box data set represent location information of the target text box. Step 103: inputting each target special operation certificate image and a text box data set corresponding to each target special operation certificate image into a CRNN text recognition network model to determine text information in each target text box in each target special operation certificate image; the text information includes at least one of constructor name, constructor sex, document number, job category, and document validity date. Step 103: inputting each target special operation certificate image and a text box data set corresponding to each target special operation certificate image into a CRNN text recognition network model to determine text information in each target text box in each target special operation certificate image; the text information includes at least one of constructor name, constructor sex, document number, job category, and document validity date. Step 104: and determining whether each special operation certificate meets the construction operation requirement or not based on the text information.
The DB text detection network model is obtained by training based on a DB text detection network and a first training data set; the backup module in the DB text detection network adopts a MobileNet V3-large structure; each element in the first training data set comprises a historical special operation certificate image and a first class label corresponding to the historical special operation certificate image; the first category label is the position information of the history text box. The CRNN text recognition network model is obtained based on the CRNN text recognition network and the second training data set; the partial structure of the CNN module in the CRNN text recognition network adopts a MobileNet V3-small structure; each element in the second training data set comprises a history special operation certificate image and a second class label corresponding to the history special operation certificate image; the second category labels are historical text information. The training process of the DB text detection network model and the CRNN text recognition network model is described in the third embodiment, and is not repeated here.
Example two
Referring to fig. 2, the special job certificate detection system provided in this embodiment includes:
a data acquisition module 201, configured to acquire a special job certificate image data set; the special operation certificate image data set comprises a plurality of target special operation certificate images, and each target special operation certificate image has text information. A text box data set determining module 202, configured to input each of the target special operation document images into a DB text detection network model, so as to determine a text box data set corresponding to each of the target special operation document images; the elements in the text box data set represent location information of the target text box. The text information determining module 203 is configured to input each of the target special operation certificate images and a text box data set corresponding to each of the target special operation certificate images into a CRNN text recognition network model, so as to determine text information in each of the target text boxes in each of the target special operation certificate images; the text information includes at least one of constructor name, constructor sex, document number, job category, and document validity date. And the detection module 204 is used for determining whether each special job certificate meets the construction operation requirement or not based on the text information.
The text box data set determining module 202 specifically includes: and preprocessing each target special operation certificate image. The pretreatment process is the same as that of the third embodiment, and will not be repeated here. And inputting each preprocessed target special operation certificate image into a DB text detection network model to determine a text box data set corresponding to each target special operation certificate image.
The text information determining module 203 specifically includes: and preprocessing each target special operation certificate image. The pretreatment process is the same as that of the third embodiment, and will not be repeated here. Inputting each preprocessed target special operation certificate image and a text box data set corresponding to each preprocessed target special operation certificate image into a CRNN text recognition network model to determine text information in each target text box in each target special operation certificate image.
For details of the DB text detection network model and the CRNN text recognition network model, see embodiment one. The training process of the DB text detection network model and the CRNN text recognition network model is described in the third embodiment, and is not repeated here.
Example III
In the construction process of the 5G base station, the certificate image data set formed by the special operation certificate images has the characteristics of large text aspect ratio, ordered specification and large data volume. Although accurate text objects can be obtained for regular-shaped text, text boxes with preset shapes cannot well describe text with certain special shapes (such as excessively large aspect ratio or arc shape); while the target detection mode based on PSENT and LSAE of segmentation class can detect irregular-shaped text, complex post-processing is needed, the results of pixel level can be formed to form text lines, prediction cost is high, and the requirement of rapidly completing detection under a large amount of data cannot be met.
According to the invention, a DB text detection network model is adopted, and the binarization operation is performed slightly, so that the problems of complex post-processing process and serious time consumption based on a segmentation method are solved, the detection speed is improved, and a large amount of certificate image data can be rapidly detected. Meanwhile, in the selection of a text recognition method, the problems that the requirement on a training sample is high, additional calculation parameters are needed in a transcription layer, and the detection speed is low are solved, so that the invention selects a CRNN text recognition network model which has higher detection speed and higher recognition precision on short text segments and does not need additional calculation parameters aiming at the condition that special operation certificate images are short text, thereby ensuring the high efficiency of text recognition and detection.
Therefore, the invention is based on the DB text detection network model and the CRNN text recognition network model, firstly, the low-quality special operation certificate images are manually screened and removed, then, a semi-automatic labeling mode is used for obtaining a high-standard image data set and a training data set, the labeling of the special operation certificate images can be rapidly completed based on the semi-automatic labeling mode, and the high accuracy of labeling results is ensured. And then predicting and framing the text position in the special operation certificate image by using the DB text detection network training data set, and identifying text information in the calibrated position by using the CRNN text identification network training data set. Aiming at the characteristics of regular, single type and easy detection of special operation certificate images, a MobileNet V3 network is selected as a feature extraction module of two DB text detection network models and a CRNN text recognition network model, so that the detection accuracy is ensured, and meanwhile, the model is light and the detection speed is improved. In the special operation certificate image detection stage, based on the certificate image text information predicted and identified by the DB text detection network model and the CRNN text recognition network model, the certificate type and the valid period in the special operation certificate image are further deduced through the custom judgment logic, so that the safety detection of the special operation certificate is rapidly completed.
Referring to fig. 3, the method for detecting a special job certificate based on DB and CRNN according to the embodiment of the present invention is divided into 4 steps.
Step (1): a credential image data set is generated. The method comprises the following steps: acquiring a special operation certificate image of constructors from a 5G base station construction site; each special operation certificate image comprises the name, sex, certificate number, operation category, certificate effective date and the like of constructors; secondly, quick labeling is carried out on each special operation certificate image by using a semi-automatic labeling tool, a labeling data set is obtained, and then data preprocessing is carried out on the labeling data set; and finally, dividing the preprocessed labeling data set (namely the generated certificate image data set) into a training set and a testing set. Step (2): construction and training of DB text detection networks. The method comprises the following steps: and sequentially constructing a backbox module, a Neck module and a Head module of the DB text detection network. The backbox module adopts a MobileNet V3-large structure, and is used as a feature pyramid to extract features of an input image so as to obtain a feature image; the Neck module adopts a FPN (Feature Pyramid Networks) structure to further process the obtained characteristic image; and the Head module performs output processing on the processed characteristic image, predicts a probability map and a threshold map, and obtains an approximate binary map based on the probability map and the threshold map. After preparing files required for training and setting parameters required for training, training the DB text detection network based on the training set in the step (1). Step (3): construction and training of CRNN text recognition networks. The method comprises the following steps: and sequentially constructing a CNN module, a BiLSTM (Bi-directional Long Short-Term Memory) module and a CTC network structure of the CRNN text recognition network. The CNN module part adopts a MobileNet V3-small structure and is used for extracting the characteristics of a text image; the BiLSTM module uses the extracted feature images for feature vector fusion, and further extracts the context features of the character sequence to obtain probability distribution of each row of features; the CTC network structure inputs the hidden vector probability distribution, thereby predicting the text sequence. After preparing files required for training and setting parameters required for training, training the CRNN text recognition network based on the training set in the step (1). Step (4): the special operation certificate of the constructor is detected, which comprises the following specific steps: predicting the certificate image to be detected through a DB text detection network model, obtaining a target text box and coordinate position information, identifying texts in the target text box through a CRNN text identification network model, and finally realizing detection of the special operation certificates through a custom special operation certificate judgment logic.
The step (1) specifically comprises:
1.1: labeling an image; the method comprises the following steps: the special operation certificate image data set obtained by sampling the 5G base station construction site has low quality samples of the phenomena of blocked characters, excessive exposure of the certificate, blurry characters, unclear characters, too small a certificate duty ratio image, incapability of accurately classifying and identifying a plurality of certificates in one image, and the like, and needs to be manually screened and removed, thereby obtaining the original certificate image data set L= { L 1 ,l 2 ,...,l n }。
The original document image dataset L has the following characteristics: (1) Compared with other types of image data (such as store signboards, street signs, clothes hanging signs and the like), the special operation certificate image data has the characteristics of regularity, order and easy marking. (2) The special operation certificate image is fixedly provided with text information such as name, gender, certificate number, operation category, date of first date, validity period and the like. (3) The certificate words are clearly visible and free from shielding, and the certificate proportion meets the image requirement proportion (such as 80%). (4) The certificate image is single in type, and text detection and text recognition are easy to realize.
Aiming at the characteristics, the embodiment of the invention compares the manual labeling method and the semiautomatic labeling method, and discovers that the semiautomatic labeling method can obviously improve the labeling efficiency for the certificate images with regular and orderly characteristics and easy labeling. Therefore, for the process of labeling the original document image dataset L, a semi-automatic labeling method is employed. The semi-automatic labeling process is as follows, namely, the steps 1.1.1-1.1.2.
1.1.1: from the raw document image dataset L, an automatic labeling of the semi-automatic labeling process is performed on the raw document image dataset L using PPOCRLabel (Paddle Paddle OCRLabel). PPOCRLabael uses a built-in OCR model (comprising a text detection model and a text recognition model) to predict the text in the image of the original document image data set L, and frames corresponding text, so as to recognize the text in the frame, and obtain an automatically marked data set L ', L ' = { L ' 1 ,L' 2 ,...,L' n },L' i ={L' i1 ,L' i2 ,...,L' it }. Each marked image L' i T text prediction boxes exist in the document image, and the number of t is always a fixed value L 'because the document image is the standard image data' ii ={L' ii1 ,L' ii2 ,L' ii3 ,L' ii4 ,L' ii-t Each automatic labeling image has 5 data values, L' ii1 、L' ii2 、L' ii3 、L' ii4 Respectively represent automatic labeling images L' i Predicting the resulting text box L' ii Upper left, lower left, upper right, lower right, L' ii-t Representing an automatically annotated image L' i Predicting the resulting text box L' ii Is included.
1.1.2: the second step of the semi-automatic labeling process, i.e. manual screening and validation, is performed on the dataset L'. If the text box is not predicted and the coordinate value of the text box is wrong, the coordinate value is corrected manually; if the text in the text box is wrongly identified, manually correcting the text content in the text box, thereby obtaining a marked data set X, X= { X 1 ,,X 2 ,...,X n }. Labeling each document image X in a dataset i With t text prediction boxes, X i ={X i1 ,X i2 ,...,X it Each text prediction box X ii There are 5 data values, X ii ={X ii1 ,X ii2 ,X ii3 ,X ii4 ,X ii-t },X ii1 、X ii2 、X ii3 、X ii4 Respectively the upper left corner coordinate, the lower left corner coordinate, the upper right corner coordinate and the lower right corner coordinate of the text prediction box, X ii-t Is the text content within the text prediction box. After the labeling is completed, a labeling data set X and a corresponding labeling result annotation file Label are obtained and can be used for training a DB text detection network and a CRNN text recognition network, wherein X is used for identifying the text detection network and the CRNN text recognition network ii1 、X ii2 、X ii3 、X ii4 Training labels, X, in DB text detection networks ii-t Training the labels of the CRNN text recognition network.
1.2: the data set is divided into the following specific steps: dividing the marked data set X in the step 1.1 into a training set X train And test set X test Two parts, training set X train The training system is used for training a DB text detection network and a CRNN text recognition network, and accounts for 80 percent; test set X test For testing the trained DB text detection network and the trained CRNN text recognition network, at a 20% ratio.
1.3: the data preprocessing is specifically as follows:
1.3.1: decoding the marked data set X, specifically: inputting a marked data set X, and sequentially aiming at original images X in the marked data set X i The data in (a) is converted into matrix of uint8 type, then decoding is carried out, namely, the image is restored to a three-dimensional matrix from the JPEG format, the color format of the decoded image is BGR (Blue X Green X Red), the dimensions of the matrix are arranged according to the sequence of HWC (Height X Weight X Channel), and a pixel matrix data set X of the image is obtained m ={X 1m ,X 2m ,...,X nm }。
1.3.2: for pixel matrix data set X m Normalization is carried out, specifically: input pixel matrix data set X m The pixel matrix data set X m Image X of (B) im Each pixel point in (i=1, 2,., n) maps to an interval [0,1]In the mapping process, the pixel point is divided by 255, where 255 is a linear transformation parameter (the linear transformation parameter is used to divide the pixel value from the interval 0,255]Transition to interval [0,1 ]]) Subtracting the average value of the corresponding channels, and dividing by the standard deviation of the corresponding channels to obtain a normalized result data set X' m 。
1.3.3: for normalized result dataset X' m Rearrangement is specifically: input normalized result dataset X' m For normalized result dataset X' m Image X 'of (B)' im The pixel points are rearranged, the dimension of the image matrix is converted from HWC format (height×width×channel) to CHW format (channel×height×width), and a new certificate image data set X' is obtained " m 。
1.3.4: for certificate image data set X' m The image scaling is carried out, specifically: input certificate image data set X' m When certificate image data set X' m Image X' in (3) " im If the length or width exceeds the specified maximum size or is smaller than the specified minimum size, the image size is scaled, the scaling process is that the length exceeding the specified side length is scaled to be an integral multiple of 32 within the specified side length range, and the blank area is filled with 0, so that a preprocessed certificate image data set X is obtained.
The step (2) specifically comprises:
2.1: the input image preprocessing is specifically as follows: before inputting the DB text detection network, the certificate image data set X' obtained by preprocessing in the step 1.3 is subjected to scale transformation. The specific process is as follows: the images in the certificate image data set X ' are adjusted to be in line with the input size (640 multiplied by 3) (width pixel value multiplied by height pixel value multiplied by RGB (Red multiplied by Green multiplied by Blue) three channels) calculated by a backlight module in a DB text detection network, and the processed data set X ' is obtained after the adjustment of the scale transformation ' DB And 2.2, inputting a feature extraction module Backbone. If the above-mentioned scale transformation is not performed, the image input size is different from the preset aspect ratio (640×640×3), and a pixel gap is continuously generated after the upsampling operation in the feature enhancement module FPN in step 2.3, which results in that the merging operation between the images in step 2.3 cannot be performed.
2.2: the feature extraction module Backbone is constructed specifically as follows: inputting the certificate image data set X 'obtained after the processing of the step 2.1' DB The document image aiming at the special operation described in the step 1.1 has regular, single type and easy realization textAnd in the step, a MobileNet V3-large network is adopted as a feature extraction backup module of the DB text detection network to ensure that the size of the model is reduced and the detection speed is improved under the condition of high accuracy of the model for extracting the image features. The MobileNet V3-large network pair certificate image data set X' DB Image X 'of (B)' iDB (i=1, 2,., n) performs extraction of feature information, thereby outputting four feature images K 2 ~K 5 . The network structure of MobileNetV3-large is shown in table 1.
Table 1 feature extraction network structure table of MobileNetV3-large network
In the MobileNet V3-large network, the network consists of Conv, bneck_Mix1, bneck_Mix2, bneck_Mix3, bneck_Mix4 and Pool modules. (1) Conv module is used for preprocessing characteristic image K 0 Performing convolution operation to obtain a characteristic image K 1 Characteristic image K 0 In order to perform the preprocessing of the images in the step 1.3 and the step 2.1, the H-swish approximate activation function (2-1) is adopted to replace a swish formula as an activation function for activation, so that the calculation cost is reduced, and the calculation speed is improved. (2) The Bnegk module consists of a 1×1 convolution kernel, a 3×3 depth convolution kernel or a 5×05 depth convolution kernel (when the Bnegk module is a 3×13 depth separable convolution, a 3×3 depth convolution kernel is adopted, and when the Bnegk module is a 5×5 depth separable convolution, a 5×5 depth convolution kernel is adopted), and a 1×1 point-by-point convolution kernel; the method comprises the steps of firstly using a 1 multiplied by 1 convolution kernel to increase the dimension of a feature map, using a 3 multiplied by 3 depth convolution kernel or a 5 multiplied by 5 depth convolution kernel to carry out convolution operation in a higher dimension space to extract features, then using a 1 multiplied by 1 point-by-point convolution kernel to carry out dimension reduction on the feature map, and combining into a depth separable convolution, thereby reducing the number of parameters and the operand of multiply-add operation to the operand originally using common convolution A lightweight attention mechanism model (SE) is introduced simultaneously, the SE model automatically acquires the importance degree of each characteristic channel through learning, and then according to the result, the useful characteristics are promoted and the characteristics which are less useful for the current task are restrained, and the SE model is used for adjusting the weight of each channel. The Bneck_Mix1 module consists of three Bneck modules (3×3 depth convolution kernels) using the ReLU6 activation function (2-2), where the 3×3 depth convolution kernels represent that in the composed Bneck modules the depth convolution kernels used are 3×3 in size, and the same concept is used in the following 5×5 depth convolution kernels and step 3.2. The Bneg_Mix2 module consists of three Bneg modules (5×5 depth convolution kernels) that employ a ReLU6 activation function. The Bneck_Mix3 module consists of six Bneck modules (3×3 depth convolution kernels) that employ an H-swish activation function. The Bneck_Mix4 module consists of three Bneck modules (5×5 depth convolution kernels) that employ an H-swish activation function. These modules are respectively used for characteristic image K 1 、K 2 、K 3 、K 4 Performing separable convolution of a plurality of layers of depth to obtain a characteristic image K 2 、K 3 、K 4 、K 5 . (3) Feature map K by Conv module 5 And then carrying out convolution operation to obtain a characteristic image K 6 . Pool module adopts average pooling technology to make characteristic diagram K 6 Downsampling is performed. After feature pooling is carried out by a Pool module, features are extracted by 1X 1 convolution, and finally the features are divided into K types of output channels, so that a feature graph K of an input image is extracted 9 . According to the MobileNet V3-large network structure constructed in Table 1, calculating the characteristic diagram K of the second layer, the third layer, the fourth layer and the fifth layer of the network in sequence 2 ~K 5 Sequentially taking the input of the feature enhancement module Neck in the step 2.3.
2.3: the feature enhancement module Neck is constructed specifically as follows: the output K obtained using step 2.2 will be used 2 ~K 5 Input C as this step 2 ~C 5 FPN structural partEnhancing Neck module for DB text detection network features, and inputting C through operations such as convolution and up-sampling 2 ~C 5 Transforming into uniform size to obtain P with the same size 2 ~P 5 Finally P is arranged 2 ~P 5 And combining to generate a characteristic image F. The structure of the constructed FPN is shown in Table 2.
Table 2 network structure table of feature enhancement module FPN
Network layer number | Module name | Inputting feature images | Outputting a feature image |
1 | Conv1 module | C 5 (20×20×160) | IN 5 (20×20×96) |
2 | Conv1 module | C 4 (40×40×112) | IN 4 (40×40×96) |
3 | Conv1 module | C 3 (80×80×40) | IN 3 (80×80×96) |
4 | Conv1 module | C 2 (160×160×24) | IN 2 (160×160×96) |
5 | Conv2 module | IN 5 (20×20×96) | P 5 (160×160×24) |
6 | Conv2 module | IN 4 (40×40×96) | P 4 (160×160×24) |
7 | Conv2 module | IN 3 (80×80×96) | P 3 (160×160×24) |
8 | Conv2 module | IN 2 (160×160×96) | P 2 (160×160×24) |
The FPN network structure consists of a Conv1 module and a Conv2 module. (1) The Conv1 module consists of a 1×1 convolution, the 1×1 convolution being used for the input feature image C 2 ~C 5 Performing channel number reduction operation; wherein for IN subjected to the reduced channel number operation 2 ~IN 5 For IN 5 Performing double nearest neighbor upsampling operation, IN 4 IN after upsampling by and with the double nearest neighbor 5 Adding to obtain new IN 4 New IN 4 A double nearest neighbor up-sampling operation is performed, IN (IN) 3 IN after upsampling by and with the double nearest neighbor 4 Adding to obtain new IN 3 ,IN 2 Similar steps as described above and IN are taken 3 Adding to obtain a new IN 2 . (2) The Conv2 module consists of a 3X 3 convolution for the obtained IN 2 ~IN 5 Performing convolution feature fusion smoothing processing to reduce aliasing influence caused by nearest neighbor interpolation; and fusing the convolution characteristics to obtain a characteristic image P 3 、P 4 、P 5 Up-sampling operation of 2 times, 4 times and 8 times is respectively carried out, and finally, the feature image P after the processing is completed 2 ~P 5 And adding the images point by point to obtain a final characteristic image F of the layer of network. This layer network structure pairs image C 2 ~C 5 And (3) performing feature extraction, up-sampling and merging operations, so that the low-level high-resolution information and the high-level strong semantic information are combined to obtain a feature image F with enhanced features, and then inputting the feature image F into an output module Head of the step (2.4).
2.4: and constructing an output module Head. The method comprises the following steps: inputting the characteristic image F obtained through the processing in the step 2.3, taking DB_head as an output module of the DB text detection network, and further processing the characteristic image F so as to output a probability map M p (Procapability Map), threshold Map M T (Threshold Map) and approximate binary image M A (Approximate Binary Map). The constructed db_head network structure is shown in table 3.
Table 3 network structure table of output module db_head
Network layer number | Module name | Inputting feature images | Outputting a feature image |
1 | Conv module | F(160×160×96) | F 1 (160×160×24) |
2 | BN module | F 1 (160×160×24) | F 2 (160×160×24) |
3 | Conv module | F 2 (160×160×24) | F 3 (320×320×6) |
4 | BN module | F 3 (320×320×6) | F 4 (320×320×6) |
5 | Conv module | F 4 (320×320×6) | F 5 (640×640×1) |
(1) DB_Head is composed of Conv and BN (Batch Normalization) modules, wherein the Conv module is composed of one convolution, the convolution of the first layer is 3 multiplied by 3, the convolution of the third layer and the fifth layer is 2 multiplied by 2, the convolution is used for extracting image features, the BN module is used for normalizing data, the average value (2-3) and the variance (2-4) of each training batch of data are obtained, the obtained average value and variance are used for normalizing (2-5) the training data of the batch, the average difference is 0, the variance is 1, and normalization (2-6), namely scale transformation and offset, is carried out. The equation involved in BN layer is as follows:
wherein, (2-3) is a mean formula; (2-4) is a variance formula; (2-5) is a normalization formula; (2-6) reconstructing a transformation formula; and N is mini-batch size (namely, each training, a data set is divided into a plurality of batches and then divided into smaller mini-batches, and gradient descent is carried out), and gamma and beta are learnable reconstruction parameters of corresponding feature graphs (each feature graph has only one pair of learnable parameters, namely gamma and beta, and the learnable parameters are used for enabling the network to recover the feature distribution to be learned of the original network).
(2) Probability map M p And threshold map M T Is generated by: inputting a characteristic image F, firstly compressing the number (dimension) of channels of the characteristic image into 1/4 of input through a 3X 3 convolution layer, then obtaining the characteristic image F through BN operation and ReLU activation functions (2-7) through a BN layer 2 Inputting the feature map into the next layer 2×2 convolution, and performing deconvolution operation to obtain feature map F 3 Repeatedly performing BN operation and ReLU activation function, and repeatedly cycling to obtain final characteristic image F 5 Finally, through Sigmoid function (2-8), probability map M is output p And threshold map M T 。
(3) Approximate binary image M A Is generated by: by calling the micro-binarizable formula (2-9)Probability map M p And threshold map M T Combining to generate approximate binary image M A 。
In the formula (2-9), +.>The method is characterized in that the method is an approximate binarization characteristic Map (Approximate Binary Map), k is an amplification factor, the values are 50, i and j represent coordinate information, P is a Probability characteristic Map (Probability Map), and T is an adaptive Threshold Map (Threshold Map) learned from a DB text detection network.
2.5: calculating DB text detection network regression optimization loss
Inputting K to DB text detection network 0 Obtaining a probability map M of the completion of the processing of the step 2.4 through forward propagation p Threshold map M T And approximate binary image M A And calculating a loss value between the predicted text box and the real text box by using the loss function, and reversely adjusting network parameters of the DB text detection network according to the loss value, so as to iteratively optimize the network parameters and improve the prediction accuracy.
The calculation method of the DB text detection network regression optimization total loss value L is as shown in the formula (2-10):
L=L s +α×L b +β×L t (2-10)。
L S to calculate the shrinkage, the text instance probability map M p The loss value formula (2-11) adopted, L b To compute a binary image M after binarization, the contracted text instance approximates the binary image M A The loss value formula (2-11) adopted, L t Is to calculate a binarized threshold map M T The loss value formula (2-12) is adopted, and α=5, β=10.
L S And L b All adopt binary cross entropy loss functionMeanwhile, a difficult-case mining strategy is additionally adopted, namely, retraining is carried out aiming at a difficult negative sample in the model training process, so that the problem of imbalance of the positive and negative samples is solved. In the formula (2-11), S l Is a sampled data set, and the sampling proportion is positive and negative sample 1:3.y is i Is a true label, x i Is the predicted result.
In the formula (2-12), L t R is an L1 distance loss function d Is G d Pixel index of G d For the threshold map M generated in step 2.3 T The set of mid-text segmentation regions G, expanded by an offset D (2-13), Is a label of the threshold map,>is the predicted result of the threshold map.
In the formulas (2-13), D is the offset, a and L are the area and perimeter of the original divided region set G, respectively, and r is the shrinkage ratio, which is fixedly set to 0.4.
2.6: model parameters of the fixed DB text detection network are specifically: using the test set X divided in step 1.2 test And testing the accuracy of the DB text detection network model. Test set X test And inputting a DB text detection network model, and predicting through steps 1.3-2.5. From the obtained approximate binary image M A Comparing with the actual Label file Label, if all the examples are correctly predicted and the background-free part is predicted as the case of the examples, the image is considered to be correctly predicted, otherwise, the image is mispredicted. Defining the number of positive classes predicted as positive classes as v 1 Mispredict positive class as negative class as v 2 The ratio of the correctly predicted positive class to all the original positive classes in the dataset, namely the model recall (recovery), is calculated by the formula (2-14). Misprediction of negative classes to positive classes is v in number 3 The proportion, i.e., precision, of all positive classes that are classified into positive classes and are actually positive classes is tested by the formulas (2-15). In order to comprehensively evaluate two indexes of recall (recall) and precision (precision), an evaluation Score, namely Score (2-16), is set for judgment, wherein r is the recall (recall), and p is the precision. Finally selecting the DB text detection network model with the highest corresponding Score as the final fixed DB text detection network model, wherein the parameters of the corresponding fixed model are as follows
Step (3), specifically comprising:
3.1: the input image preprocessing is specifically as follows: before inputting CRNN text recognition network, text box data set X predicted by DB text detection network DB The image in the database is subjected to scale transformation to obtain a preprocessing data set X CRNN . The specific process is as follows: firstly scaling the image in equal proportion, ensuring that the height of the image is 32, the part with the width less than 320 is complemented with 0, and the sample with the aspect ratio larger than 10 is directly discarded, obtaining the image input size (320 multiplied by 32 multiplied by 3) (width pixel value multiplied by height pixel value multiplied by RGB three channels) which accords with the operation of a CNN module in a CRNN text recognition network, and taking the image input size as a certificate image data set X CRNN And (3) inputting the visual characteristic into a visual characteristic extraction module CNN in step 3.2.
The step 3.3BiLSTM module has a 1 requirement on the height of the input sequence, and the step 3.2CNN module downsamples the input image by a factor of 32, so the input image height of step 3.2 must be 32. At the same time, the aspect ratio of the input image size of the CRNN text recognition network is ensured to be a fixed value, so that the network model training process adopts a multiple 320 of 32 as a width value.
3.2: the method comprises the steps of constructing a visual feature extraction module CNN, specifically: inputting the certificate image data set X processed in the step 3.1 CRNN Image X therein iCRNN (i=1, 2.,. N×t), n being the number of images in the annotation data set X, each image being predicted by the DB network to obtain t text prediction boxes, so n×t, being sequentially taken as the feature image M 0 And inputting the module. Aiming at the characteristics that the special operation certificate image described in the step 1.1 has regularity, single type and easy realization of text recognition, the CRNN text recognition network adopts a MobileNet V3-small network as a model of a visual feature extraction module CNN, and reduces the size of the CRNN model and improves the detection speed under the condition of ensuring high accuracy of the model in extracting the image features. The network is used for extracting M 0 Corresponding convolution characteristics of (a) to obtain an extracted output characteristic image M 5 And inputting the text into a subsequent step 3.3BiLSTM module for text expression and text classification. Because the input images are changed into small frame images which are much smaller than the original input images after being processed by the DB text detection network, the balance of speed and detection precision can be better ensured by adopting the MobileNet V3-small network model aiming at the condition of low resources. The network structure of mobilenet v3-small is shown in table 4.
TABLE 4 network Structure Table of feature extraction network MobileNet V3-small
Network layer number | Module name | Inputting feature images | Outputting a feature image |
1 | Conv module | M 0 (320×320×3) | M 1 (160×16×16) |
2 | Bneck_Mix5 module | M 1 (160×16×16) | M 2 (160×4×24) |
3 | Bneck_Mix6 module | M 2 (160×4×24) | M 3 (160×1×96) |
4 | Conv module | M 3 (160×1×96) | M 4 (160×1×576) |
5 | Pool module | M 4 (160×1×576) | M 5 (80×1×576) |
The MobileNet V3-small network consists of Conv, bneck_Mix5, bneck_Mix6 and Pool modules. (1) Inputting the certificate image data set X processed in the step 3.1 into a MobileNet V3-small network CRNN Image M of (3) 0 Image M using Conv module 0 Performing convolution operation to obtain a feature map M 1 . (2) The Bneck_Mix5 module consists of three Bneck modules (3×3 depth convolution kernels) using ReLU6 activation functions, and the Bneck_Mix6 module consists of eight Bneck modules (5×5 depth convolution kernels) using H-swish activation functions, which are directed to the feature images M, respectively 1 、M 2 Performing depth separable convolution to obtain a characteristic image M 2 、M 3 . Wherein the Bneck module structure is the same as that described in step 2.2. (3) For characteristic diagram M after depth separable convolution operation 3 Performing convolution operation again to obtain a feature map M 4 Input it into Pool module, for M 4 Carrying out average pooling, namely dividing a characteristic image into 80 rectangular areas, averaging characteristic points of each area, thereby reducing the image to obtain M 5 。
3.3: the method comprises the steps of constructing a sequence feature extraction module BiLSTM, and specifically comprises the following steps: inputting the characteristic image M obtained by processing in the step 3.2 5 Step 3.3 adopts a variant of a recurrent neural network (Recurrent Neural Networks, RNN), a bidirectional long and short time memory network (BiLSTM) is used as a sequence feature extraction module, and is firstly converted into a feature vector sequence S 1 Then, continuously extracting text sequence features to obtain hidden vector probability distribution output S 2 . The network structure of BiLSTM is shown in Table 5.
Table 5 network structure table of sequence feature extraction module BiLSTM
Network layer number | Module name | Inputting feature images | Outputting a feature image |
1 | Reshape module | M 5 (80×1×576) | S 1 (80×576) |
2 | BiLSTM module | S 1 (80×576) | S 2 (80×m) |
The network is composed of Reshape, biLSTM modules. Because the RNN only receives the input of the specific feature vector sequence, the Reshape module is responsible for convoluting and extracting the feature map M of the CNN module in the step 3.2 5 Generating feature vector sequences S by column (left to right) 1 (80×576),S 1 Each column consists of 80 columns of feature vectors, each column containing 576-dimensional features, i.e., the i-th column feature vector is a connection of all 576 feature Map i-th column pixels, each column of the feature Map corresponding to a receptive field of the original image, thereby forming a Sequence of feature vectors, which step is called Map-to-Sequence. BiLSTM module for feature sequence S 1 Predicting, learning each feature vector in the sequence to obtain hidden vector probability distribution output S of all characters 2 Where m in table 5 represents the character set length that each column vector needs to identify.
3.4: the construction of a prediction module CTC is specifically as follows: inputting the hidden vector probability distribution output S of each feature vector obtained by the processing of the step 3.3 2 The CTC module is used as a prediction module of the CRNN text recognition network, and converts the input into a result character sequence l through a de-duplication integration operation. The network structure of the prediction module CTC is shown in table 6.
Table 6 network architecture table of prediction module CTC
Network layer number | Module name | Inputting feature images | Outputting a feature image |
1 | FC+Softmax | S 2 (80×m) | l |
The CTC module consists of FC (Fully Connected Layers), softmax operation and a sequence merging mechanism Blank, and outputs S of hidden vector probability distribution obtained by processing in the step 3.3 2 Input to FC layer, output S to the probability distribution 2 Mapping the character probability distribution into T character probability distributions, and then carrying out sequence merging mechanism processing on the character probability distribution, wherein the specific mode is that a blank symbol blank is added to a marked character set p to form a new marked character set p', so that the length of the character probability distribution accords with the fixed length required by Softmax operation; and selecting a label (character) corresponding to the maximum value by using a Softmax operation (3-1) to obtain character distribution output, and finally eliminating the blank symbol and the predicted repeated character by using a sequence conversion function beta (3-2) to obtain a result character sequence l by decoding.
In the formula (3-1), v ii Representing the ith element in the ith vector in the character probability distribution matrix v, (i)<J) (j is all elements greater than i), S ii Representing the ratio of the index of the element to the sum of the indices of all elements in the column vector. Formula (3-2)Wherein p' is a character set obtained by adding blank symbols to a labeling character set p, and T is a hidden vector probability distribution output S 2 After the length mapped by the FC layer is transformed by beta, a result character sequence p smaller than the sequence length T is output.
3.5: calculating a CRNN text recognition network regression optimization Loss CTC Loss, specifically: inputting the certificate image data set X processed in the step 3.1 into the CRNN text recognition network CRNN Image X of (B) iCRNN (i=1, 2,.. N×t), calculating a loss value between the predicted result l and the true value from the loss function by forward propagation, and reversely adjusting the posterior probability p (l/y) (3-4) of the CTC module output tag l in step 3.4 according to the loss value. The calculation method of the CRNN text recognition network regression optimization Loss CTC Loss is as follows:
L(S)=-ln∑ (I,l)∈S lnp(l/y) (3-3)。
in the formula (3-3), p (l/y) is defined by the formula (3-4), and s= { I, l } is a training set, I is an image input in the training set, and l is a real character sequence output.
CTC equation (3-4) is used for the probability distribution matrix S of the input BiLSTM module after the Map-to-Sequence operation in step 3.3 1 Here will S 1 Let y be taken as y, gives all possible output distributions and outputs the most likely resulting tag sequence l, aiming at maximizing the posterior probability p (l/y) of l.
p(l/y)=∑ π:β(π)=l p(π/y) (3-4)。
In the formula (3-4), y is a probability distribution matrix input, y=y 1 ,y 2 ,...,y T Where T is the length of the sequence, pi: β (pi) =l represents all paths pi after β transformation (3-2) followed by the final result tag sequence l, p (pi/y) is defined by equation (3-5).
In the formula (3-5),indicating possession of the tag pi at time of time stamp t t The subscript t is used to denote each timing of the pi path.
3.6: the model parameters of the CRNN text recognition network are fixed, and the model parameters are specifically as follows: using the test set X divided in step 1.2 test And testing the character recognition accuracy of the CRNN text recognition network. X is to be test After the pretreatment in the step 1.3, the DB network model with fixed parameters is input to obtain a predicted text small-box data setAnd 3.1-3.5, performing test recognition, comparing the obtained result Label sequence l with an actual Label file Label, and counting correct recognition only if the whole line of texts are correctly recognized, otherwise, counting errors.
Defining the correct number of texts identified by the model as l true The number of text in recognition error is l false Calculating the model character recognition accuracy L through the formula (3-6) accuracy . Finally selecting the corresponding L accuracy The highest CRNN training model is used as a final fixed CRNN text recognition network model, and the corresponding fixed parameters are as follows
Step (4), specifically comprising:
4.1: text detection and recognition are carried out on special operation certificates of constructors, and the method specifically comprises the following steps: the DB training model with fixed parameters in the step 2.6 is loaded and converted into a DB text detection network model, and the special operation certificate image set X of the constructor to be detected is input d Certificate image X in (a) id (i=1, 2,., n) obtaining t text predicted block images of the document via a DB text detection network with fixed weights 4 pieces of coordinate information of predictive text box +.>Wherein the upper left corner of the predictive text box is included +.>Left lower corner->Upper right corner->Lower right corner coordinates->4 coordinate-scaled predictive text small block image set obtained by predicting through DB text detection network>Inputting to the CRNN text recognition network with fixed parameters in step 3.6, outputting relevant text recognition information +.>And its character recognition accuracy.
4.2: the judgment logic for detecting the special operation certificate of constructors is specifically as follows: and (3) judging whether the certificate is legal or not according to the text identification information obtained in the step (4.1) through the following logic, and finally obtaining a certificate detection result.
(1) If the certificate image recognizes four words of the valid period through text prediction recognition, judging that the certificate detects the relevant information of the valid period, entering the next step of judgment, if the relevant words are not recognized, judging that the certificate detection fails, and prompting that the certificate shooting is unqualified. (2) And if the valid period is recognized, selecting the predicted text box, extracting words from the relevant year, month and day numbers (such as 20100601 to 20200601) from the beginning valid date to the ending valid date of the certificate after the valid period, extracting eight digits (such as 20210601) through logic processing, and if the eight digits cannot be normally extracted, judging that the certificate cannot be successfully detected and prompting that the valid period of the certificate cannot be normally recognized. (3) If the valid period and the last eight digits of the corresponding text box are successfully identified, judging whether four words of the operation category are identified or not according to the identification text result of the certificate image, if the operation category is identified, judging the next step, if the operation category is not identified, judging that the certificate is not successfully detected, and prompting that the operation category of the certificate is not normally identified. (4) Recognizing the "job category", selecting the predicted text box, extracting the text of the specific category (electrician or high-level job) after the job category, entering the step (5) if the job category is "electrician" and entering the step (6) if the job category is "high-level job". (5) The corresponding certificate image is identified to obtain a corresponding text box last eight digits of the valid period, the corresponding text box last eight digits are compared with the current Beijing time, if the valid period is within the current Beijing time, the certificate is judged to have passed the valid period, the certificate is unqualified, the failure of detection is prompted, and the manual detection is changed; if the effective period is longer than the current Beijing time, the certificate is judged to be qualified, and the certificate is judged to be qualified by detection, so that the 'successful detection of the electrician operation category of the certificate for special operation' is prompted. (6) The corresponding certificate image is identified to obtain a corresponding text box last eight digits of the valid period, the corresponding text box last eight digits are compared with the current Beijing time, if the valid period is within the current Beijing time, the certificate is judged to have passed the valid period, the certificate is unqualified, the failure of detection is prompted, and the manual detection is changed; if the effective period is longer than the current Beijing time, judging that the certificate is qualified, and prompting that the certificate is successfully detected for the high-place operation category of the special operation certificate and the certificate is detected to be qualified through detection.
The method for labeling the image data set by adopting the semiautomatic labeling tool is efficient and accurate; aiming at the characteristics of special operation certificate images, the provided network combination model is small, convenient to deploy and high in detection speed; meanwhile, the custom special operation certificate judgment logic is adopted, so that the programming degree of the method is improved, the detection efficiency of the special operation certificate is effectively improved, and the labor cost is effectively reduced.
Example IV
And detecting the special operation certificate based on the DB and the CRNN, wherein the detection is specifically as follows.
1: the data preprocessing is specifically as follows: according to step 1.1, the certificate image data set obtained by the mobile Yunnan company in China is manually screened and then used as an original certificate image data set L, and the L is marked by a semi-automatic marking method, as shown in a table 7.
According to step 1.2, the labeling data set X is divided into training sets X train And test set X test The ratio was set to 8:2. And (3) performing image decoding, image normalization, rearrangement and image scaling on the marked data set X in sequence according to the step (1.3) to obtain the marked data set X.
2: constructing and training a DB text detection network; the overall structure of its DB text detection network is shown in fig. 4. According to the steps 2.1-2.4, firstly, performing scale transformation on the image in the marked data set X' to obtain an image (640 multiplied by 3); the Backbone, neck, head modules of the DB text detection network are then constructed in sequence. The input and output characteristic image sizes of each network layer are shown in table 8.
Table 8 DB text detection network layers i/o data flow table
In Table 8, a document image inputted as (640X 3) is predicted by a DB text detection network, and finally a prediction result probability map M is outputted P (640X 1), threshold map M T (640X 1), approximate binary image M A (640X 1). After preparing the training file and setting the training parameters, training the DB text detection network according to step 2.5.
(1) A training set train_images folder, a test set test_images folder, a training set matched annotation file train_label.txt, a test set matched annotation file test_label.txt and a training file train.py for training the DB text detection network are prepared. (2) Setting parameters such as epoch, batch size, learning rate and the like in the train. Py, modifying the configuration file, adding a corresponding pre-training weight file and a corresponding training data set, and running the train. Py. After the training file is prepared and the training parameters are set, training of the DB text detection network can be started.
First, X is train And loading the prepared training file into the training file train.py, and calculating L, L by forward propagation according to formulas (2-10) - (2-12) S 、L b 、L t And then, continuously optimizing the network training parameters until the loss function value of the DB text detection network is converged. Finally, X is calculated according to step 2.6 test Inputting into DB text detection network to obtain corresponding approximate binary image M A And corresponding coordinate position information, and comparing it with the test set X test And comparing corresponding image coordinate information in the annotation file, calculating model prediction recall rate (recovery), precision and evaluation Score (Score), and selecting a model with the optimal evaluation Score (Score) as a DB text detection network model of a final fixed parameter.
3: constructing and training a CRNN text recognition network; the overall structure of its CRNN text recognition network is shown in fig. 5. According to steps 3.1-3.4, firstly, a text small box data set X obtained through step 2DB text detection network prediction is processed DB Performing scale transformation on the image in (3) to obtain a (320×32×32) image; then sequentially constructing CNN and BiL of CRNN text recognition networkSTM, CTC module. The input and output characteristic image sizes of each network layer are shown in table 9.
Table 9. Input/output data flow table for each network layer in crnn text recognition network
Network layer number | Module name | Inputting feature images | Outputting a feature image |
1 | Conv module | M 0 (320×320×3) | M 1 (160×16×16) |
2 | Bneck_Mix5 module | M 1 (160×16×16) | M 2 (160×4×24) |
3 | Bneck_Mix6 module | M 2 (160×4×24) | M 3 (160×1×96) |
4 | Conv module | M 3 (160×1×96) | M 4 (160×1×576) |
5 | Pool module | M 4 (160×1×576) | M 5 (80×1×576) |
6 | Reshape module | M 5 (80×1×576) | S 1 (80×576) |
7 | BiLSTM module | S 1 (80×576) | S 2 (80×m) |
8 | FC+Softmax | S 2 (80×m) | l |
In table 9, the document image inputted as (320×32×3) is predicted by the CRNN text recognition network, and then the predicted result sequence l is finally outputted. Preparing a training file and setting training parameters, and training the model according to the step 3.5. (1) A training set train_images folder, a test set test_images folder, two txt files rec_train.txt and rec_test.txt of recording image text content tags, a training file train.py and a dictionary word_subject.txt are prepared for training the CRNN text recognition network. The dictionary is stored in utf-8 encoding format for mapping the characters appearing in the annotation dataset X to the index of the dictionary. (2) Setting parameters such as epoch, batch size, learning rate and the like in the train. Py, modifying the configuration file, adding a corresponding pre-training weight file and a corresponding training data set, and running the train. Py.
After the training file is prepared and the training parameters are set, training of the CRNN text recognition network can be started. First, X is train And loading the prepared training file into the training file train.py, and continuously optimizing the network training parameters after L (S) is calculated by formulas (3-3) - (3-5) through forward propagation until the loss function value of the CRNN text recognition network is converged. Finally, X is calculated according to step 3.6 test Inputting into CRNN text recognition network to obtain predicted result sequence of corresponding image text, and comparing it with test set X test Corresponding annotation text information in the annotation file is compared, and the model character recognition accuracy L is calculated accuracy Selecting an accuracy L accuracy The highest model serves as the CRNN text recognition network model for the final fixed parameters.
4: the special operation certificate of the constructor is detected, which comprises the following specific steps: according to step 4.1, a DB text detection network model with fixed parameters in step 2.6 and a CRNN text recognition network model with fixed parameters in step 3.6 are loaded, and a certificate image data set X to be detected is input first d The model predicts the approximate binary image M of the text target frame target according to the parameters A (640X 1) to obtain the corresponding text box And its coordinate position information +.>Then inputting the predicted text block image into CRNN text recognition network model, outputting the text information +.>
Randomly selecting a certificate image data set X to be detected d A document image X kd As a model output example, image X kd By passing throughThe information after the prediction and recognition of the DB and CRNN stationary model is shown in table 10.
Table 10 example image X kd Information table after DB and CRNN fixed model prediction and identification
According to step 4.2, according to the customized detection and judgment logic of the special operation certificate of the constructor, the obtained text information X kd-t And (3) extracting, and judging whether the special operation certificate of the constructor is qualified and effective or not based on the extracted text information.
4. Compared with the prior art, the invention has the advantages and positive effects
(1) Aiming at the characteristics of regular, orderly and easy marking in special operation certificate image data set provided by a 5G base station construction site of China mobile Yunnan company, the invention provides a high-efficiency semi-automatic certificate image data set marking method, wherein a PPOCRLabael tool is adopted for marking a text box in the certificate image data set and characters in a corresponding text box automatically, and then manual screening is adopted for further carrying out secondary manual modification marking on the text box and the text which are not predicted successfully and marked incorrectly, so that marking efficiency is improved and high accuracy of the marked data set is ensured.
(2) Aiming at the characteristics of regular image data, single type and easy completion of text detection and text recognition of special operation certificates, the invention adopts a MobileNet V3 network as a backbone network of a DB text detection network model and a CRNN text recognition network model for extracting image characteristics. The method has the advantages that the credentials can be accurately detected, the number of characteristic channels of two networks is reduced, the size of a corresponding model is reduced by 90%, and therefore the situation that the computing capacity is limited is adapted; meanwhile, the detection speed of the certificate image is improved, so that the efficiency of the certificate checking method is improved.
(3) The invention combines the given image data set of the special operation certificate of constructors and the prediction recognition result corresponding to the combined model, self-defines the judgment logic of the special operation certificate detection, realizes the unmanned 24-hour automatic detection program of the special operation certificate by a computer, and improves the programming degree of the method.
The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.
Claims (10)
1. A special operation certificate detection method based on DB and CRNN is characterized by comprising the following steps:
acquiring a special operation certificate image data set; the special operation certificate image data set comprises a plurality of target special operation certificate images, and each target special operation certificate image has text information;
inputting each target special operation certificate image into a DB text detection network model to determine a text box data set corresponding to each target special operation certificate image; the elements in the text box data set represent the position information of the target text box;
inputting each target special operation certificate image and a text box data set corresponding to each target special operation certificate image into a CRNN text recognition network model to determine text information in each target text box in each target special operation certificate image; the text information comprises at least one of constructor name, constructor sex, certificate number, operation category and certificate validity date;
the DB text detection network model is obtained by training based on a DB text detection network and a first training data set; the backup module in the DB text detection network adopts a MobileNet V3-large structure; each element in the first training data set comprises a historical special operation certificate image and a first class label corresponding to the historical special operation certificate image; the first category label is the position information of the history text box;
The CRNN text recognition network model is obtained based on the CRNN text recognition network and the second training data set; the partial structure of the CNN module in the CRNN text recognition network adopts a MobileNet V3-small structure; each element in the second training data set comprises a history special operation certificate image and a second class label corresponding to the history special operation certificate image; the second category labels are historical text information.
2. The special job certificate detection method based on DB and CRNN as set forth in claim 1, further comprising: and determining whether each special operation certificate meets the construction operation requirement or not based on the text information.
3. The special operation certificate detection method based on DB and CRNN according to claim 1, wherein the step of inputting each target special operation certificate image into a DB text detection network model to determine a text box data set corresponding to each target special operation certificate image specifically comprises the following steps:
preprocessing each target special operation certificate image; the pretreatment comprises the following steps: decoding, normalization, rearrangement, and image scaling;
And inputting each preprocessed target special operation certificate image into a DB text detection network model to determine a text box data set corresponding to each target special operation certificate image.
4. The special operation certificate detection method based on DB and CRNN according to claim 3, wherein the inputting each target special operation certificate image and the text box data set corresponding to each target special operation certificate image into the CRNN text recognition network model to determine the text information in each target text box in each target special operation certificate image specifically comprises:
and inputting each preprocessed target special operation certificate image and a preprocessed text box data set corresponding to each preprocessed target special operation certificate image into a CRNN text recognition network model to determine text information in each target text box in each target special operation certificate image.
5. The special job certificate detection method based on DB and CRNN as claimed in claim 1, wherein the determination process of the DB text detection network model is as follows:
Constructing a DB text detection network;
determining a first training data set;
and training the DB text detection network based on the first training data set to obtain a DB text detection network model.
6. The method for detecting a special job certificate based on DB and CRNN as set forth in claim 5, wherein the determining the first training data set specifically includes:
acquiring an original certificate image data set; the original document image data set comprises a plurality of original historical special operation document images;
marking each original historical special operation certificate image by adopting a semiautomatic marking tool to obtain each historical marking image and a first type label corresponding to each historical marking image;
preprocessing each history labeling image to obtain a history special operation certificate image; decoding, normalization, rearrangement, and image scaling; the first class label corresponding to the historical special operation certificate image is the first class label corresponding to the historical labeling image.
7. The special job certificate detection method based on DB and CRNN as claimed in claim 1, wherein the determination process of the CRNN text recognition network model is as follows:
Constructing a CRNN text recognition network;
determining a second training data set;
and training the CRNN text recognition network based on the second training data set to obtain a CRNN text recognition network model.
8. The method for detecting special job certificates based on DB and CRNN as set forth in claim 7, wherein the determining the second training data set specifically includes:
acquiring an original certificate image data set; the original document image data set comprises a plurality of original historical special operation document images;
marking each original historical special operation certificate image by adopting a semiautomatic marking tool to obtain each historical marking image and a second class label corresponding to each historical marking image;
preprocessing each history labeling image to obtain a history special operation certificate image; decoding, normalization, rearrangement, and image scaling; the second class labels corresponding to the historical special operation certificate images are the second class labels corresponding to the historical labeling images.
9. A special job certificate detection system based on DB and CRNN, comprising:
the data acquisition module is used for acquiring a special operation certificate image data set; the special operation certificate image data set comprises a plurality of target special operation certificate images, and each target special operation certificate image has text information;
The text box data set determining module is used for inputting each target special operation certificate image into the DB text detection network model so as to determine a text box data set corresponding to each target special operation certificate image; the elements in the text box data set represent the position information of the target text box;
the text information determining module is used for inputting each target special operation certificate image and a text box data set corresponding to each target special operation certificate image into the CRNN text recognition network model so as to determine text information in each target text box in each target special operation certificate image; the text information comprises at least one of constructor name, constructor sex, certificate number, operation category and certificate validity date;
the DB text detection network model is obtained by training based on a DB text detection network and a first training data set; the backup module in the DB text detection network adopts a MobileNet V3-large structure; each element in the first training data set comprises a historical special operation certificate image and a first class label corresponding to the historical special operation certificate image; the first category label is the position information of the history text box;
The CRNN text recognition network model is obtained based on the CRNN text recognition network and the second training data set; the partial structure of the CNN module in the CRNN text recognition network adopts a MobileNet V3-small structure; each element in the second training data set comprises a history special operation certificate image and a second class label corresponding to the history special operation certificate image; the second category labels are historical text information.
10. The special job certificate detection system based on DB and CRNN as set forth in claim 9, further comprising: and the detection module is used for determining whether each special operation certificate meets the construction operation requirement or not based on the text information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110865778.9A CN113591866B (en) | 2021-07-29 | 2021-07-29 | Special operation certificate detection method and system based on DB and CRNN |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110865778.9A CN113591866B (en) | 2021-07-29 | 2021-07-29 | Special operation certificate detection method and system based on DB and CRNN |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113591866A CN113591866A (en) | 2021-11-02 |
CN113591866B true CN113591866B (en) | 2023-07-07 |
Family
ID=78252001
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110865778.9A Active CN113591866B (en) | 2021-07-29 | 2021-07-29 | Special operation certificate detection method and system based on DB and CRNN |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113591866B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114266751B (en) * | 2021-12-23 | 2024-09-24 | 福州大学 | Product packaging bag coding defect detection method and system based on AI technology |
CN115131797B (en) * | 2022-06-28 | 2023-06-09 | 北京邮电大学 | Scene text detection method based on feature enhancement pyramid network |
CN116935396B (en) * | 2023-06-16 | 2024-02-23 | 北京化工大学 | OCR college entrance guide intelligent acquisition method based on CRNN algorithm |
CN116532046B (en) * | 2023-07-05 | 2023-10-10 | 南京邮电大学 | Microfluidic automatic feeding device and method for spirofluorene xanthene |
CN116958998B (en) * | 2023-09-20 | 2023-12-26 | 四川泓宝润业工程技术有限公司 | Digital instrument reading identification method based on deep learning |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020111676A1 (en) * | 2018-11-28 | 2020-06-04 | 삼성전자 주식회사 | Voice recognition device and method |
CN111401371A (en) * | 2020-06-03 | 2020-07-10 | 中邮消费金融有限公司 | Text detection and identification method and system and computer equipment |
WO2020218512A1 (en) * | 2019-04-26 | 2020-10-29 | Arithmer株式会社 | Learning model generating device, character recognition device, learning model generating method, character recognition method, and program |
CN113076992A (en) * | 2021-03-31 | 2021-07-06 | 武汉理工大学 | Household garbage detection method and device |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10672414B2 (en) * | 2018-04-13 | 2020-06-02 | Microsoft Technology Licensing, Llc | Systems, methods, and computer-readable media for improved real-time audio processing |
-
2021
- 2021-07-29 CN CN202110865778.9A patent/CN113591866B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020111676A1 (en) * | 2018-11-28 | 2020-06-04 | 삼성전자 주식회사 | Voice recognition device and method |
WO2020218512A1 (en) * | 2019-04-26 | 2020-10-29 | Arithmer株式会社 | Learning model generating device, character recognition device, learning model generating method, character recognition method, and program |
CN111401371A (en) * | 2020-06-03 | 2020-07-10 | 中邮消费金融有限公司 | Text detection and identification method and system and computer equipment |
CN113076992A (en) * | 2021-03-31 | 2021-07-06 | 武汉理工大学 | Household garbage detection method and device |
Also Published As
Publication number | Publication date |
---|---|
CN113591866A (en) | 2021-11-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113591866B (en) | Special operation certificate detection method and system based on DB and CRNN | |
CN109902622B (en) | Character detection and identification method for boarding check information verification | |
CN111046784B (en) | Document layout analysis and identification method and device, electronic equipment and storage medium | |
CN111428718B (en) | Natural scene text recognition method based on image enhancement | |
US20190180154A1 (en) | Text recognition using artificial intelligence | |
CN111950453A (en) | Optional-shape text recognition method based on selective attention mechanism | |
RU2760471C1 (en) | Methods and systems for identifying fields in a document | |
CN109598517B (en) | Commodity clearance processing, object processing and category prediction method and device thereof | |
CN110705607A (en) | Industry multi-label noise reduction method based on cyclic re-labeling self-service method | |
CN112529210A (en) | Model training method, device and computer readable storage medium | |
CN112365451A (en) | Method, device and equipment for determining image quality grade and computer readable medium | |
CN111539417B (en) | Text recognition training optimization method based on deep neural network | |
CN111340032A (en) | Character recognition method based on application scene in financial field | |
CN117496124A (en) | Large-area photovoltaic panel detection and extraction method based on deep convolutional neural network | |
CN114297987A (en) | Document information extraction method and system based on text classification and reading understanding | |
CN113989577B (en) | Image classification method and device | |
CN116612479A (en) | Lightweight bill OCR (optical character recognition) method and system | |
CN118351380A (en) | Weather detection method based on multi-mode data fusion framework | |
CN111553361B (en) | Pathological section label identification method | |
CN113159071A (en) | Cross-modal image-text association anomaly detection method | |
CN110929013A (en) | Image question-answer implementation method based on bottom-up entry and positioning information fusion | |
CN112613402B (en) | Text region detection method, device, computer equipment and storage medium | |
CN115512340A (en) | Intention detection method and device based on picture | |
CN114419460A (en) | Water body area extraction method and device | |
CN113033170A (en) | Table standardization processing method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
OL01 | Intention to license declared | ||
OL01 | Intention to license declared |