CN113591866A - Special job certificate detection method and system based on DB and CRNN - Google Patents

Special job certificate detection method and system based on DB and CRNN Download PDF

Info

Publication number
CN113591866A
CN113591866A CN202110865778.9A CN202110865778A CN113591866A CN 113591866 A CN113591866 A CN 113591866A CN 202110865778 A CN202110865778 A CN 202110865778A CN 113591866 A CN113591866 A CN 113591866A
Authority
CN
China
Prior art keywords
text
data set
image
certificate image
crnn
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110865778.9A
Other languages
Chinese (zh)
Other versions
CN113591866B (en
Inventor
彭光灵
岳昆
刘伯涛
李忠斌
杨晰
魏立力
段亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunnan University YNU
Original Assignee
Yunnan University YNU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunnan University YNU filed Critical Yunnan University YNU
Priority to CN202110865778.9A priority Critical patent/CN113591866B/en
Publication of CN113591866A publication Critical patent/CN113591866A/en
Application granted granted Critical
Publication of CN113591866B publication Critical patent/CN113591866B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a special job certificate detection method and system based on DB and CRNN, wherein the method comprises the following steps: inputting each target special operation certificate image into a DB text detection network model to determine a text box data set corresponding to each target special operation certificate image; and inputting each target special job certificate image and a text box data set corresponding to each target special job certificate image into a CRNN text recognition network model so as to determine text information in each target text box in each target special job certificate image. A backhaul module in the DB text detection network adopts a MobileNet V3-large structure; the CRNN text identifies that the partial structure of the CNN module in the network adopts a MobileNet V3-small structure. The invention can achieve the purposes of reducing the manual workload and improving the certificate image detection efficiency.

Description

Special job certificate detection method and system based on DB and CRNN
Technical Field
The invention relates to the technical field of optical character recognition, in particular to a special job certificate detection method and system based on DB and CRNN.
Background
In the construction process of the 5G base station, constructors have qualified and effective special operation certificates, and the method is an indispensable safety guarantee in the construction process. At present, the detection of the special operation certificate is mostly carried out in a manual mode, the detection efficiency is low, and the feedback of the detection of the special operation certificate cannot be timely and effectively obtained.
Disclosure of Invention
The invention aims to provide a special job certificate detection method and system based on DB and CRNN, so as to achieve the purposes of reducing the manual workload and improving the certificate image detection efficiency.
In order to achieve the purpose, the invention provides the following scheme:
a special job certificate detection method based on DB and CRNN comprises the following steps:
acquiring a special job certificate image data set; the special operation certificate image data set comprises a plurality of target special operation certificate images, and each target special operation certificate image has text information; inputting each target special job certificate image into a DB text detection network model to determine a text box data set corresponding to each target special job certificate image; elements in the text box data set represent position information of a target text box; inputting each target special job certificate image and a text box data set corresponding to each target special job certificate image into a CRNN text recognition network model to determine text information in each target text box in each target special job certificate image; the text information comprises at least one of constructor name, constructor gender, certificate number, operation category and certificate valid date;
the DB text detection network model is obtained by training based on a DB text detection network and a first training data set; the backhaul module in the DB text detection network adopts a MobileNet V3-large structure; each element in the first training data set comprises a historical special job certificate image and a first class label corresponding to the historical special job certificate image; the first category label is position information of a historical text box; the CRNN text recognition network model is obtained by training based on a CRNN text recognition network and a second training data set; the partial structure of the CNN module in the CRNN text recognition network adopts a MobileNet V3-small structure; each element in the second training data set comprises a historical special job certificate image and a second class label corresponding to the historical special job certificate image; the second category label is historical text information.
A special job certificate detection system based on DB and CRNN comprises:
the data acquisition module is used for acquiring a special job certificate image data set; the special operation certificate image data set comprises a plurality of target special operation certificate images, and each target special operation certificate image has text information; the text box data set determining module is used for inputting each target special job certificate image into a DB text detection network model so as to determine a text box data set corresponding to each target special job certificate image; elements in the text box data set represent position information of a target text box; the text information determining module is used for inputting each target special job certificate image and a text box data set corresponding to each target special job certificate image into a CRNN text recognition network model so as to determine text information in each target text box in each target special job certificate image; the text information comprises at least one of constructor name, constructor gender, certificate number, operation category and certificate valid date;
the DB text detection network model is obtained by training based on a DB text detection network and a first training data set; the backhaul module in the DB text detection network adopts a MobileNet V3-large structure; each element in the first training data set comprises a historical special job certificate image and a first class label corresponding to the historical special job certificate image; the first category label is position information of a historical text box; the CRNN text recognition network model is obtained by training based on a CRNN text recognition network and a second training data set; the partial structure of the CNN module in the CRNN text recognition network adopts a MobileNet V3-small structure; each element in the second training data set comprises a historical special job certificate image and a second class label corresponding to the historical special job certificate image; the second category label is historical text information.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention adopts the DB text detection network model and the CRNN text recognition network model, and can quickly and accurately complete the detection of the special job certificate image. The DB text detection network model can be well adapted to a lightweight network as a feature extraction module, quickly predicts a corresponding text in a special work certificate and marks a text region by adopting a frame under the condition that extra memory and time are not consumed after the model is lightened, and extracts the text region from an image to obtain frame information of a text target. The CRNN text recognition network model performs text recognition on the predicted text block image, aiming at the condition that the image data of the special job certificate is all short text, the CRNN text recognition network model can introduce a BilSTM and CTC mechanism, strengthens global prediction on a text characteristic sequence and directly learns in the short text (line-level labeling), and does not need to be used for learning and training extra detailed character-level labeling, thereby improving the accuracy and efficiency of text recognition.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a schematic flow chart of a special job certificate detection method based on DB and CRNN according to the present invention;
FIG. 2 is a schematic structural diagram of a DB and CRNN-based special job certificate detection system according to the present invention;
FIG. 3 is an overall flowchart of the method for detecting the special job certificate based on DB and CRNN according to the present invention;
FIG. 4 is a schematic diagram of the overall structure of a DB text detection network according to the present invention;
FIG. 5 is a schematic diagram of the overall structure of the CRNN text recognition network of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
The invention uses the optical character recognition technology in the deep learning model to efficiently detect the special job certificate image. The existing optical character recognition method based on deep learning mainly adopts a two-stage mode: text detection and text recognition. The DB (differential binary) algorithm used by the invention is characterized in that Binarization operation is put into a network and optimized simultaneously from the pixel level, so that the threshold value of each pixel point can be self-adaptively predicted, and the DB algorithm is realized by an approximate method and finishes Binarization and microminiaturization when being used together with a segmentation network, thereby simplifying the post-processing process and accelerating the detection speed of a target. The CRNN (common Current Neural Network) algorithm used by the invention adopts a combination method of CNN, LSTM (Long Short Term Memory) and CTC (connected termination Temporal classification), introduces the CTC method to solve the problem that characters can not be aligned during training, and does not need to carry out serial decoding operation like Attention OCR, thereby enabling the Network structure to be more optimized.
Example one
The embodiment discloses a special job certificate detection method based on DB and CRNN, which predicts the text position from certificate image data in a centralized manner and identifies the specific information of the text to support the detection of the special job certificate so as to judge the qualification of the special job certificate, and belongs to the field of computer vision identification, in particular to the field of optical character identification. Referring to fig. 1, the method for detecting a document for a special job based on DB and CRNN includes the following steps.
Step 101: acquiring a special job certificate image data set; the special operation certificate image data set comprises a plurality of target special operation certificate images, and each target special operation certificate image has text information. Step 102: inputting each target special job certificate image into a DB text detection network model to determine a text box data set corresponding to each target special job certificate image; elements in the text box data set represent position information of a target text box. Step 103: inputting each target special job certificate image and a text box data set corresponding to each target special job certificate image into a CRNN text recognition network model to determine text information in each target text box in each target special job certificate image; the text information comprises at least one of constructor name, constructor gender, certificate number, operation category and certificate valid date. Step 103: inputting each target special job certificate image and a text box data set corresponding to each target special job certificate image into a CRNN text recognition network model to determine text information in each target text box in each target special job certificate image; the text information comprises at least one of constructor name, constructor gender, certificate number, operation category and certificate valid date. Step 104: and determining whether each special operation certificate meets the construction operation requirement or not based on the text information.
Step 102 specifically comprises: and preprocessing each target special job certificate image. The preprocessing process is the same as the preprocessing process in the third embodiment, and thus, redundant description is not repeated here. And inputting each preprocessed target special job certificate image into a DB text detection network model to determine a text box data set corresponding to each target special job certificate image.
Step 103 specifically comprises: and preprocessing each target special job certificate image. The preprocessing process is the same as the preprocessing process in the third embodiment, and thus, redundant description is not repeated here. Inputting each preprocessed target special job certificate image and a text box data set corresponding to each preprocessed target special job certificate image into a CRNN text recognition network model so as to determine text information in each target text box in each target special job certificate image.
The DB text detection network model is obtained by training based on a DB text detection network and a first training data set; the backhaul module in the DB text detection network adopts a MobileNet V3-large structure; each element in the first training data set comprises a historical special job certificate image and a first class label corresponding to the historical special job certificate image; the first category label is position information of the historical text box. The CRNN text recognition network model is obtained by training based on a CRNN text recognition network and a second training data set; the partial structure of the CNN module in the CRNN text recognition network adopts a MobileNet V3-small structure; each element in the second training data set comprises a historical special job certificate image and a second class label corresponding to the historical special job certificate image; the second category label is historical text information. Please refer to the third embodiment for the training process of the DB text detection network model and the CRNN text recognition network model, which is not described herein in detail.
Example two
Referring to fig. 2, the special job certificate detection system provided in this embodiment includes:
the data acquisition module 201 is used for acquiring a special job certificate image data set; the special operation certificate image data set comprises a plurality of target special operation certificate images, and each target special operation certificate image has text information. A textbox data set determining module 202, configured to input each target special job certificate image into a DB text detection network model, so as to determine a textbox data set corresponding to each target special job certificate image; elements in the text box data set represent position information of a target text box. The text information determining module 203 is configured to input each target special job certificate image and a text box data set corresponding to each target special job certificate image into a CRNN text recognition network model, so as to determine text information in each target text box in each target special job certificate image; the text information comprises at least one of constructor name, constructor gender, certificate number, operation category and certificate valid date. And the detection module 204 is configured to determine whether each special job certificate meets the requirement of the construction job based on the text information.
The text box data set determining module 202 specifically includes: and preprocessing each target special job certificate image. The preprocessing process is the same as the preprocessing process in the third embodiment, and thus, redundant description is not repeated here. And inputting each preprocessed target special job certificate image into a DB text detection network model to determine a text box data set corresponding to each target special job certificate image.
The text information determining module 203 specifically includes: and preprocessing each target special job certificate image. The preprocessing process is the same as the preprocessing process in the third embodiment, and thus, redundant description is not repeated here. Inputting each preprocessed target special job certificate image and a text box data set corresponding to each preprocessed target special job certificate image into a CRNN text recognition network model so as to determine text information in each target text box in each target special job certificate image.
The details of the DB text detection network model and the CRNN text recognition network model are as shown in embodiment one. Please refer to the third embodiment for the training process of the DB text detection network model and the CRNN text recognition network model, which is not described herein in detail.
EXAMPLE III
In the 5G base station construction process, a certificate image data set consisting of special operation certificate images has the characteristics of large text aspect ratio, standard order and large data volume. Although an accurate text object can be obtained for a regular text by the regression-based text detection method, a text box with a preset shape cannot well describe some texts with special shapes (such as an excessively large aspect ratio or an arc); although the target detection method based on the segmented PSENet and LSAE can detect irregular-shaped texts, complex post-processing is required to form pixel-level results to form text lines, and the prediction cost is high, so that the requirement of rapidly completing detection under a large amount of data cannot be met.
The invention adopts the DB text detection network model, solves the problems of complex post-processing process and serious time consumption based on a segmentation method by miniaturizing the binarization operation, improves the detection speed and enables the method to quickly detect a large amount of certificate image data. Meanwhile, in the selection of the text recognition method, the Attention OCR has the problems of high requirements on training samples, extra calculation parameters in a transcription layer and low detection speed, so that the CRNN text recognition network model which has higher detection speed, higher short text segment recognition precision and does not need extra calculation parameters is selected and adopted aiming at the condition that the images of the special job certificates are all short texts, so that the text recognition and detection efficiency is ensured.
Therefore, the method is based on the DB text detection network model and the CRNN text recognition network model, firstly, the low-quality special work certificate image is manually screened and removed, then, the high-standard image data set and the training data set are obtained in a semi-automatic labeling mode, the labeling of the special work certificate image can be rapidly completed based on the semi-automatic labeling mode, and the high accuracy of the labeling result is ensured. And then, detecting the network training data set through the DB text to predict and frame the text position in the special job certificate image, and then recognizing the network training data set by adopting the CRNN text to recognize the text information in the calibration position. Aiming at the characteristics of regularity, single type and easiness in detection of the special job certificate image, the MobileNet V3 network is selected as a feature extraction module of two DB text detection network models and a CRNN text recognition network model, so that the model is light in weight and the detection speed is increased while the detection accuracy is ensured. In the special operation certificate image detection stage, the certificate image text information of the identification is predicted based on the DB text detection network model and the CRNN text recognition network model, and the certificate category and the valid period in the special operation certificate image are further deduced through self-defined judgment logic, so that the safety detection of the special operation certificate is rapidly completed.
Referring to fig. 3, the method for detecting a document for a special job based on DB and CRNN according to the embodiment of the present invention includes 4 steps.
Step (1): a credential image data set is generated. The method specifically comprises the following steps: acquiring a special operation certificate image of a constructor from a 5G base station construction site; each special operation certificate image comprises the name, the gender, the certificate number, the operation category, the certificate valid date and the like of a constructor; secondly, rapidly labeling each special job certificate image by using a semi-automatic labeling tool to obtain a labeled data set, and then performing data preprocessing on the labeled data set; and finally, dividing the preprocessed labeled data set (namely the generated certificate image data set) into a training set and a test set. Step (2): and (5) building and training a DB text detection network. The method specifically comprises the following steps: and sequentially constructing a backhaul module, a Neck module and a Head module of the DB text detection network. The method comprises the following steps that a Backbone module adopts a MobileNet V3-large structure, the Backbone module is used as a characteristic pyramid, and the characteristics of an input image are extracted to obtain a characteristic image; the neutral module adopts an FPN (feature Pyramid networks) structure, and further processes the obtained characteristic image; and the Head module performs output processing on the processed characteristic image, predicts a probability map and a threshold map, and obtains an approximate binary map based on the probability map and the threshold map. And (3) after preparing files required by training and setting parameters required by training, training the DB text detection network based on the training set in the step (1). And (3): construction and training of the CRNN text recognition network. The method specifically comprises the following steps: and sequentially constructing a CNN module, a BilSTM (Bi-directional Long Short-Term Memory) module and a CTC network structure of the CRNN text recognition network. The CNN module part adopts a MobileNet V3-small structure and is used for extracting the characteristics of the text image; the BilSTM module uses the extracted feature image for feature vector fusion, and further extracts context features of the character sequence to obtain probability distribution of each row of features; the CTC network structure inputs the hidden vector probability distribution, thereby predicting to obtain a text sequence. And (3) after preparing files required by training and setting parameters required by the training, training the CRNN text recognition network based on the training set in the step (1). And (4): detect constructor's special type operation certificate specifically is: and predicting the certificate image to be detected through the DB text detection network model to obtain a target text box and coordinate position information, then identifying the text in the target text box through the CRNN text identification network model, and finally realizing the detection of the special operation certificate through a self-defined special operation certificate judgment logic.
The step (1) specifically comprises the following steps:
1.1: image labeling; the method specifically comprises the following steps: the image data set of the special operation certificate obtained by sampling in the construction site of the 5G base station has low-quality samples with the phenomena of character occlusion, excessive exposure, blurring, unclear characters, too small certificate proportion image, multiple certificates in one image and incapability of definite classification and identification, and the like, and the low-quality samples need to be manually screened and eliminated, so that the original certificate image data set L ═ L1,l2,...,ln}。
The original document image data set L has the characteristics of: (1) compared with the text positions in other image data (such as shop signboards, street signposts, clothes hanging tags and the like), the image data of the special operation certificates has the characteristics of regularity, order and easy labeling. (2) Text information such as name, gender, certificate number, operation type, initial date, expiration date and the like is fixedly stored in the certificate image of the special operation. (3) The characters of the certificate are clearly visible and have no obstruction, and the proportion of the certificate meets the required proportion (such as 80%) of the image. (4) The certificate image has single type, and text detection and text recognition are easy to realize.
Aiming at the characteristics, the embodiment of the invention finds that the semi-automatic labeling method can obviously improve the labeling efficiency for the certificate images with regular and ordered and easy-to-label characteristics by comparing the manual labeling method with the semi-automatic labeling method. Therefore, a semi-automatic labeling method is adopted for the process of labeling the original certificate image data set L. The semi-automatic labeling process comprises the following steps of 1.1.1-1.1.2.
1.1.1: according to the original certificate image data set L, automatic labeling of a semi-automatic labeling process is carried out on the original certificate image data set L by using PPOCCRLabel (Paddle Paddle OCRLLabel). PPOCRLANel uses a built-in OCR model (including a text detection model and a text recognition model) to predict texts in an image of an original certificate image data set L, frames corresponding texts, and further recognizes texts in the frames to obtain an automatically-labeled data set L ', L ' ═ L '1,L'2,...,L'n},L'i={L'i1,L'i2,...,L'it}. Each annotation image L'iT text prediction frames exist in the certificate image, and the number of t is always a fixed value L 'because the certificate image is standard image data'ii={L'ii1,L'ii2,L'ii3,L'ii4,L'ii-tThere are 5 data values, L ', corresponding to each auto-annotated image'ii1、L'ii2、L'ii3、L'ii4Respectively represent automatic annotation images L'iPredict resulting text box L'iiThe coordinates of the upper left corner, the coordinates of the lower left corner, the coordinates of the upper right corner, the coordinates of the lower right corner, L'ii-tRepresents an automatic annotation image L'iPredict resulting text box L'iiThe text content in (1).
1.1.2: the second step of the semi-automatic labeling process, i.e., manual screening and validation, is performed on the data set L'. If the situation that the text box and the text box coordinate value are not predicted to be wrong occurs, the coordinate value is corrected manually; if the text in the text box is identified incorrectly, the text content in the text box is corrected manually, so as to obtain a labeled data set X, wherein X is { X ═ X }1,,X2,...,Xn}. Labeling each document image X in a datasetiThere are t text prediction boxes, Xi={Xi1,Xi2,...,XitCorresponding to each text prediction box XiiOf which there are 5 data values, Xii={Xii1,Xii2,Xii3,Xii4,Xii-t},Xii1、Xii2、Xii3、Xii4Respectively an upper left corner coordinate, a lower left corner coordinate, an upper right corner coordinate, a lower right corner coordinate and X of the text prediction boxii-tText content within the box is predicted for the text. And obtaining a labeling data set X and a corresponding labeling result annotation file Label after the labeling is finished, wherein the labeling data set X and the corresponding labeling result annotation file Label can be used for training a DB text detection network and a CRNN text recognition network, wherein X isii1、Xii2、Xii3、Xii4Label in training DB text detection network, Xii-tThe CRNN text is trained to recognize labels in the web.
1.2: dividing a data set, specifically: dividing the labeled data set X in the step 1.1 into a training set XtrainAnd test set XtestTwo parts, training set XtrainThe method is used for training a DB text detection network and a CRNN text recognition network, and accounts for 80 percent; test set XtestThe test method is used for testing the trained DB text detection network and the trained CRNN text recognition network, and accounts for 20 percent.
1.3: the data preprocessing specifically comprises the following steps:
1.3.1: decoding the marked data set X, specifically: inputting an annotated data set X, and sequentially comparing original images X in the annotated data set XiConverts the data into a fluid 8 type matrix, and then decodes the image from JPEG format into a three-dimensional matrix, the color format of the decoded image is BGR(Blue × Green × Red), the matrix dimensions are arranged in the HWC (Height × Weight × Channel) order, and a pixel matrix data set X of the image is obtainedm={X1m,X2m,...,Xnm}。
1.3.2: for pixel matrix data set XmAnd (3) carrying out normalization, specifically: input pixel matrix dataset XmA pixel matrix data set XmImage X of (1)imEach pixel in (i ═ 1, 2.. times, n) maps to interval [0,1 ═ 1]In the mapping process, the pixel is first divided by 255, where 255 is the linear transformation parameter (the linear transformation parameter is used to transform the pixel value from the interval [0,255 ]]Conversion to the interval [0,1]) Subtracting the average value of the corresponding channel, and finally dividing by the standard deviation of the corresponding channel to obtain a normalized result data set X'm
1.3.3: to normalized result data set X'mRearranging, specifically: inputting a normalized result data set X'mFor normalized result data set X'mPicture X 'of'imRearranging pixels, converting the image matrix dimension from HWC format (Height × Width × Channel) to CHW format (Channel × Height × Width), and obtaining new certificate image data set X "m
1.3.4: for certificate image data set X "mPerforming image scaling, specifically: inputting a certificate image data set X "mWhen document image data set X "mImage X of (1) "imAnd if the length or the width exceeds the specified maximum size or is smaller than the specified minimum size, scaling the image size, wherein the scaling process is to scale the length exceeding the limited side length to be integral multiple of 32 within the limited side length range, and filling the blank area with 0, thereby obtaining the preprocessed certificate image data set X'.
The step (2) specifically comprises the following steps:
2.1: the input image preprocessing specifically comprises the following steps: before inputting into DB text detection network, the certificate image data set X' obtained through preprocessing in step 1.3 is subjected to scale transformation. The specific process is as follows: adjusting the image in the certificate image dataset X' to conform to the Backbone module operation in DB text detection networkInputting a size (640 multiplied by 3) (width pixel value multiplied by height pixel value multiplied by RGB (Red multiplied by Green multiplied by Blue)) and obtaining a processed data set X 'after being subjected to scale conversion and adjustment'DBAnd the feature extraction module is used for inputting the step 2.2. If the scaling is not performed, the input size of the image is different from the preset aspect ratio (640 × 640 × 3), and then a pixel difference is continuously generated after the upsampling operation in the step 2.3 of the feature enhancement module FPN, which may result in that the merging operation between the images in the step 2.3 cannot be performed.
2.2: constructing a feature extraction module Backbone, which specifically comprises the following steps: inputting a certificate image data set X 'obtained after the processing of the step 2.1 is completed'DBAiming at the characteristics that the special job certificate image described in the step 1.1 has regularity and single type and is easy to realize text detection, the MobileNet 3-large network is adopted as a feature extraction backhaul module of the DB text detection network in the step, so that the size of the model is reduced and the detection speed is increased under the condition of ensuring high accuracy of the model in image feature extraction. The MobileNet V3-large network pairs certificate image dataset X'DBPicture X 'of'iDB(i 1, 2.. times.n) to extract feature information, thereby outputting four feature images K2~K5. The network structure of MobileNet V3-large is shown in Table 1.
TABLE 1 network Structure Table for extracting MobileNet V3-large network
Figure BDA0003187288360000101
Figure BDA0003187288360000111
In the MobileNet V3-large network, the network consists of Conv, Bneck _ Mix1, Bneck _ Mix2, Bneck _ Mix3, Bneck _ Mix4 and Pool modules. (1) Conv module pairs the preprocessed feature images K0Performing convolution operation to obtain a characteristic image K1Characteristic image K0Is the image preprocessed in step 1.3 and step 2.1, and adoptsThe H-swish approximate activation function (2-1) replaces a swish formula to be used as an activation function for activation, so that the calculation cost is reduced, and the calculation speed is increased. (2) The Bneck module consists of a 1 multiplied by 1 convolution kernel, a 3 multiplied by 3 depth convolution or a 5 multiplied by 05 depth convolution kernel (when the Bneck module is in 3 multiplied by 13 depth separable convolution, the 3 multiplied by 23 depth convolution kernel is adopted; when the Bneck module is in 5 multiplied by 35 depth separable convolution, the 5 multiplied by 45 depth convolution kernel is adopted), and a 1 multiplied by 51 point-by-point convolution kernel; firstly, a 1 × 61 convolution kernel is used for carrying out dimensionality improvement on a feature map, a 3 × 73 depth convolution kernel or a 5 × 85 depth convolution kernel is used for carrying out convolution operation in a higher-dimensional space to extract features, then, the 1 × 1 point-by-point convolution kernel is used for carrying out dimensionality reduction on the feature map to combine into a depth separable convolution, so that the parameter number and the operation amount of multiplication and addition operation are reduced to one ninth of the operation amount originally using common convolution, meanwhile, a lightweight attention-oriented system model (SE) is introduced, the SE model automatically acquires the importance degree of each feature channel through learning, then, useful features are improved according to the result, the features which are not used for the current task are inhibited, and the weight of each channel is adjusted. The Bneck _ Mix1 module consists of three Bneck modules (3 × 3 depth convolution kernels) using the ReLU6 activation function (2-2), where the 3 × 3 depth convolution kernels indicate that in the composed Bneck modules, the size of the depth convolution kernels used is 3 × 3, and the subsequent 5 × 5 depth convolution kernels and step 3.2 are the same concept. The Bneck _ Mix2 module consists of three Bneck modules (5 × 5 deep convolution kernels) using the ReLU6 activation function. The Bneck _ Mix3 module consists of six Bneck modules (3 x3 deep convolution kernels) that employ the H-swish activation function. The Bneck _ Mix4 module consists of three Bneck modules (5 x5 deep convolution kernels) that use the H-swish activation function. These modules respectively align the feature images K1、K2、K3、K4Performing several layers of depth separable convolution to obtain characteristic image K2、K3、K4、K5. (3) Feature map K by Conv Module pair5Performing convolution operation again to obtain a characteristic image K6. Pool module adopts average pooling technology to pair feature map K6And performing down-sampling. Feature pooling across Pool modulesThen, extracting features through 1 × 1 convolution, finally dividing the features into K types of output channels, and extracting a feature map K of the input image9. According to the MobileNet V3-large network structure constructed in the table 1, a characteristic diagram K obtained by calculating the second layer, the third layer, the fourth layer and the fifth layer of the network in sequence2~K5In turn as input to the step 2.3 feature enhancement module hack.
Figure BDA0003187288360000121
2.3: constructing a feature enhancement module Neck, which specifically comprises the following steps: the output K obtained in step 2.2 will be used2~K5As input C of this step2~C5The FPN structural part is a feature enhancement Neck module of a DB text detection network, and the input C is input through operations such as convolution and upsampling2~C5Converting into uniform size to obtain P of the same size2~P5Finally, P is added2~P5And combining to generate a characteristic image F. The FPN structures constructed are shown in Table 2.
Table 2 network structure table of feature enhancement module FPN
Number of network layers Module name Inputting characteristic images Outputting a feature image
1 Conv1 module C5(20×20×160) IN5(20×20×96)
2 Conv1 module C4(40×40×112) IN4(40×40×96)
3 Conv1 module C3(80×80×40) IN3(80×80×96)
4 Conv1 module C2(160×160×24) IN2(160×160×96)
5 Conv2 module IN5(20×20×96) P5(160×160×24)
6 Conv2 module IN4(40×40×96) P4(160×160×24)
7 Conv2 module IN3(80×80×96) P3(160×160×24)
8 Conv2 module IN2(160×160×96) P2(160×160×24)
The FPN network structure is composed of Conv1 modules and Conv2 modules. (1) The Conv1 module consists of a 1 × 1 convolution, the 1 × 1 convolution being used to convolve the input feature image C2~C5Carrying out channel number reduction operation; wherein for IN operated by reducing channel number2~IN5To IN5Performing a double nearest neighbor upsampling operation, IN4IN after upsampling operation with double nearest neighbor5Adding to obtain new IN4Then new IN is added4Performing a double nearest neighbor upsampling operation, IN3IN after upsampling operation with double nearest neighbor4Adding to obtain new IN3,IN2Taking similar steps as above with IN3Adding to obtain new IN2. (2) Conv2 Module consisting of a 3 × 3 convolution, for the resulting IN2~IN5Performing convolution characteristic fusion smoothing processing to reduce aliasing influence caused by nearest neighbor interpolation; then fusing the convolution characteristic to obtain a characteristic image P3、P4、P5Respectively carrying out 2, 4 and 8 times of upsampling operation, and finally, processing the processed characteristic image P2~P5And adding point by point to obtain a final characteristic image F of the network. This layer network structure is for image C2~C5And (4) performing feature extraction, upsampling and merging operation, so that the low-level high-resolution information and the high-level strong-semantic information are combined to obtain a feature image F with enhanced features, and inputting the feature image F into the output module Head in the step 2.4.
2.4: and constructing an output module Head. The method specifically comprises the following steps: inputting the characteristic image F obtained by the processing in the step 2.3, and using DB _ Head as an output module of the DB text detection network to further process the characteristic image F, thereby outputting a probability map Mp(Probability Map), threshold Map MT(Threshold Map) and an approximate binary Map MA(Appliximate Binary Map). The constructed DB _ Head network structure is shown in table 3.
Table 3 network structure table of output module DB _ Head
Number of network layers Module name Inputting characteristic images Outputting a feature image
1 Conv module F(160×160×96) F1(160×160×24)
2 BN module F1(160×160×24) F2(160×160×24)
3 Conv module F2(160×160×24) F3(320×320×6)
4 BN module F3(320×320×6) F4(320×320×6)
5 Conv module F4(320×320×6) F5(640×640×1)
(1) The DB _ Head is composed of a Conv module and a BN (Batch Normalization) module, the Conv module is composed of a convolution, the convolution of the first layer is 3 x3 convolution, the convolution of the third layer and the fifth layer is 2 x2 convolution, the convolution is used for extracting image features, the BN module is used for carrying out Normalization processing on data, the mean value (2-3) and the variance (2-4) of each training Batch of data are solved, the training data of the Batch are normalized (2-5) by using the solved mean value and variance, the distribution with the mean difference of 0 and the variance of 1 is obtained, and Normalization (2-6), namely scale transformation and offset, is carried out. The formula involved in the BN layer is as follows:
Figure BDA0003187288360000141
Figure BDA0003187288360000142
wherein, (2-3) is a mean value formula; (2-4) is a variance formula; (2-5) is a normalization formula; (2-6) is a reconstruction transformation formula; n is mini-batch size (namely, each training is to divide a data set into a plurality of batches and then into smaller mini-batches, and gradient descent is carried out), and gamma and beta are learnable reconstruction parameters of corresponding feature maps (each feature map only has one pair of learnable parameters gamma and beta, and the learnable parameters are used for enabling the network to recover the feature distribution to be learnt by the original network).
(2) Probability map MpAnd a threshold map MTGeneration of (1): inputting a characteristic image F, passing through3 x3 convolution layer, compressing the channel number (dimension) of the feature map into 1/4 input, then passing through BN layer, obtaining feature map F through BN operation and ReLU activation function (2-7)2Inputting the feature map into next layer 2 × 2 convolution, and performing deconvolution operation to obtain feature map F3And repeating the BN operation and the ReLU activation function and repeating the cycle to obtain the final characteristic image F5Finally, the probability graph M is output through a Sigmoid function (2-8)pAnd a threshold map MT
Figure BDA0003187288360000143
(3) Approximate binary map MAGeneration of (1): the probability map M is transformed by calling the formulation of the differentiable binary (2-9)pAnd a threshold map MTCombining to generate an approximate binary map MA
Figure BDA0003187288360000144
In the formula (2-9), the first,
Figure BDA0003187288360000145
the method is an Approximate Binary feature Map (Approximate Binary Map), k is an amplification factor, the value is 50, i and j represent coordinate information, P is a Probability feature Map (Probability Map), and T is an adaptive Threshold Map (Threshold Map) learned from a DB text detection network.
2.5: calculating DB text detection network regression optimization loss
Detecting network input K to DB text0Obtaining the probability map M processed and completed in step 2.4 through forward propagationpThreshold map MTAnd approximate binary map MAAnd calculating a loss value between the predicted text box and the real text box by using the loss function, and performing reverse adjustment on the network parameters of the DB text detection network according to the loss value, so as to iteratively optimize the network parameters and improve the prediction accuracy.
The calculation method of the DB text detection network regression optimization total loss value L is as the following formula (2-10):
L=Ls+α×Lb+β×Lt (2-10)。
LSto calculate the probability map M of the text instance after contractionpThe formula of the loss value (2-11), LbTo compute the binarizations, the contracted text instances approximate a binary map MAThe formula of the loss value (2-11), LtIs to calculate a binary threshold map MTThe loss value formula (2-12) is adopted, wherein alpha is 5, and beta is 10.
Figure BDA0003187288360000151
LSAnd LbAnd a binary cross entropy loss function is adopted, and meanwhile, a difficult excavation strategy is additionally adopted, namely retraining is carried out on difficult negative samples in the model training process, so that the problem of unbalance of the positive and negative samples is solved. In the formula (2-11), SlIs a sampled data set, and the sampling proportion is positive and negative samples 1: 3. y isiAs a genuine label, xiIs a prediction result.
Figure BDA0003187288360000152
In the formula (2-12), LtUsing the L1 distance loss function, RdIs GdPixel index of (1), GdFor the threshold map M generated in step 2.3TThe set G of the middle text segmentation areas is obtained by expanding through an offset D (2-13),
Figure BDA0003187288360000153
is a label of the threshold value map,
Figure BDA0003187288360000154
is the prediction result of the threshold value map.
Figure BDA0003187288360000155
In the formula (2-13), D isThe offset, a and L are the area and perimeter of the original segmented region set G, respectively, and r is the contraction ratio, which is fixedly set to 0.4.
2.6: the fixed DB text detection network model parameters specifically include: using test set X partitioned in step 1.2testAnd testing the accuracy of the DB text detection network model. Test set XtestAnd inputting a DB text detection network model, and predicting through the steps 1.3-2.5. According to the obtained approximate binary image MAAnd comparing with an actual Label file Label, if all the examples are correctly predicted and the non-background part is predicted as the case of the example, the image is considered to be correctly predicted, otherwise, the image is predicted incorrectly. Defining the number of positive classes to be predicted as positive classes as v1Misprediction of positive class as negative class is v2And calculating the proportion of the positive class obtained by accurate prediction to all the original positive classes in the data set by using a formula (2-14), namely the model recall ratio (call). The number of mispredictions of a negative class as a positive class is v3The proportion of all positive classes classified as positive classes and indeed positive classes, i.e. the precision ratio (precision), is tested by the equations (2-15). In order to comprehensively evaluate two indexes of the recall rate (recall) and the precision (precision), an evaluation Score, namely Score (2-16) is set for judgment, wherein r is the recall rate (recall) and p is the precision (precision). Finally, selecting the DB text detection network model with the highest corresponding Score as a final fixed DB text detection network model, wherein the corresponding fixed model parameters are
Figure BDA0003187288360000161
Figure BDA0003187288360000162
Figure BDA0003187288360000163
Step (3), specifically comprising:
3.1: the input image preprocessing specifically comprises the following steps: inputting CRNN textBefore identifying the network, a text box data set X obtained by DB text detection network prediction is subjected toDBThe image in (1) is subjected to scale transformation to obtain a preprocessed data set XCRNN. The specific process is as follows: firstly, scaling the image in equal proportion to ensure that the height of the image is 32, the width of the part of the image is less than 320 and is supplemented with 0, and samples with the aspect ratio of more than 10 are directly discarded to obtain the image input size (320 multiplied by 32 multiplied by 3) (the width pixel value multiplied by the height pixel value multiplied by the RGB three channels) which accords with the CNN module operation in the CRNN text recognition network, and the image input size is used as a certificate image data set XCRNNAnd (3) inputting the data into a visual feature extraction module CNN in the step 3.2.
The height requirement of the step 3.3BiLSTM module for the input sequence is 1, while the step 3.2CNN module down-samples the input image by a factor of 32, so the input image height of step 3.2 must be 32. Meanwhile, the width-to-height ratio of the size of an input image of the CRNN text recognition network is ensured to be a fixed value, so that the network model training process adopts a multiple 320 of 32 as a width value.
3.2: constructing a visual feature extraction module CNN, which specifically comprises the following steps: inputting certificate image data set X processed in step 3.1CRNNAn image X thereofiCRNN(i ═ 1, 2.,. n × t), where n is the number of images in the labeled data set X, and each image is subjected to DB network prediction to obtain t text prediction frames, so that n × t, which are sequentially used as the feature images M0And inputting the data into the module. Aiming at the characteristics that the special job certificate image described in the step 1.1 has regularity, single type and easy text recognition, the CRNN text recognition network adopts a MobileNet V3-small network as a model of a visual feature extraction module CNN, and reduces the size of the CRNN model and improves the detection speed under the condition of ensuring high accuracy of the model in image feature extraction. The network is used to extract M0To obtain an extracted output feature image M5And inputting the text into a subsequent step 3.3BilSTM module for text expression and text classification. After the DB text detection network processing, the input images are changed into small frame images which are much smaller than the original input images, so the balance between the speed and the detection precision can be better ensured by adopting a MobileNet V3-small network model under the condition of low resources. Of MobileNet V3-smallThe network structure is shown in table 4.
Table 4 network structure table of feature extraction network MobileNetV3-small
Number of network layers Module name Inputting characteristic images Outputting a feature image
1 Conv module M0(320×320×3) M1(160×16×16)
2 Bneck _ Mix5 module M1(160×16×16) M2(160×4×24)
3 Bneck _ Mix6 module M2(160×4×24) M3(160×1×96)
4 Conv module M3(160×1×96) M4(160×1×576)
5 Pool module M4(160×1×576) M5(80×1×576)
The MobileNet V3-small network consists of Conv, Bneck _ Mix5, Bneck _ Mix6 and Pool modules. (1) In the MobileNet V3-small network, the certificate image data set X processed in the step 3.1 is inputCRNNImage M of (1)0Using Conv module to image M0Performing convolution operation to obtain a feature map M1. (2) The Bneck _ Mix5 module consists of three Bneck modules (3 x3 depth convolution kernels) adopting a ReLU6 activation function, and the Bneck _ Mix6 module consists of eight Bneck modules (5 x5 depth convolution kernels) adopting an H-swish activation function, and the modules respectively perform characteristic image M1、M2Performing depth separable convolution to obtain a feature image M2、M3. Wherein the Bneck modular structure is the same as described in step 2.2. (3) For the feature map M after the operation of depth separable convolution3Performing convolution operation again to obtain a feature map M4Inputting it to Pool module, for M4Performing average pooling, i.e. dividing the feature image into 80 rectangular regions, averaging the feature points of each region to reduce the image to obtain M5
3.3: constructing a sequence feature extraction module BilSTM, which specifically comprises the following steps: inputting the characteristic image M obtained by the processing of the step 3.25Step 3.3, a variant of a Recurrent Neural Network (RNN) and a bidirectional long-and-short time memory network (BilSTM) are adopted as a sequence feature extraction module and are firstly converted into a feature vector sequence S1Then, continuing to extract the text sequence features to obtain the hidden vector probability distribution output S2. The network structure of BilSTM is shown in Table 5.
Table 5 network structure table of sequence feature extraction module BiLSTM
Number of network layers Module name Inputting characteristic images Outputting a feature image
1 Reshape module M5(80×1×576) S1(80×576)
2 BilSTM module S1(80×576) S2(80×m)
The network consists of Reshape and BilSTM modules. Since the RNN network only accepts specific feature vector sequence input, the Reshape module is responsible for convolving and extracting the feature map M of the CNN module in the step 3.25Generation of a sequence of eigenvectors S by column (left to right)1(80×576),S1The method is characterized by comprising 80 columns of feature vectors, wherein each column comprises 576-dimensional features, namely the ith column of feature vector is the connection of the ith column pixels of all 576 feature maps, and each column of feature maps corresponds to a receptive field of an original image, so that a feature vector Sequence is formed, and the step is called Map-to-Sequence. The BilSTM module is used for comparing the characteristic sequence S1Predicting, learning each feature vector in the sequence to obtain the hidden vectors of all charactersProbability distribution output S2Where m in table 5 represents the length of the character set that needs to be recognized per column vector.
3.4: constructing a prediction module CTC, specifically: the hidden vector probability distribution output S for each feature vector obtained by the processing of the step 3.3 is input2The CTC module is used as a prediction module of the CRNN text recognition network, and the input is converted through the de-duplication integration operation to obtain a result character sequence l. The network structure of the prediction module CTC is shown in table 6.
Table 6 network architecture table of prediction module CTC
Number of network layers Module name Inputting characteristic images Outputting a feature image
1 FC+Softmax S2(80×m) l
The CTC module consists of FC (full Connected layers), Softmax operation and a sequence merging mechanism Blank, and outputs S to the hidden vector probability distribution obtained by the processing of the step 3.32Inputting FC layer, outputting S to the probability distribution2Mapping to T character probability distribution, and then carrying out sequence merging mechanism processing on the character probability distribution, wherein the specific mode is that a blank symbol blank is added in a labeled character set p to the probability distribution to form a new labeled character set p', so that the character probability distribution is realizedThe length of the fixed length meets the Softmax operation requirement; and selecting a label (character) corresponding to the maximum value by using Softmax operation (3-1) to obtain character distribution output, and finally eliminating the blank symbol and the predicted repeated character by using a sequence conversion function beta (3-2) so as to obtain a result character sequence l by decoding.
Figure BDA0003187288360000191
In the formula (3-1), viiRepresenting the ith element in the ith vector in the character probability distribution matrix v, (i)<J) (j is all elements greater than i), SiiRepresenting the ratio of the index of the element to the sum of the indices of all elements in the column vector. In the formula (3-2), p' is a character set marked with a character set p and blank symbols, and T is a hidden vector probability distribution output S2The length after FC layer mapping, after β transformation, will output a resulting character sequence p "that is less than sequence length T.
3.5: calculating the CRNN text recognition network regression optimization Loss CTC Loss, which specifically comprises the following steps: inputting the certificate image data set X processed in step 3.1 into a CRNN text recognition networkCRNNImage X of (1)iCRNN(i ═ 1, 2.,. n × t), calculating a loss value between the predicted result l and the true value by a loss function through forward propagation, and performing backward adjustment on the posterior probability p (l/y) (3-4) of the CTC module output label l in step 3.4 according to the loss value. The computing method of the CRNN text recognition network regression optimization Loss CTC Loss comprises the following steps:
L(S)=-ln∑(I,l)∈Slnp(l/y) (3-3)。
in equation (3-3), where p (l/y) is defined by equation (3-4), let S ═ { I, l } be the training set, I be the images input in the training set, and l be the true character sequence output.
CTC equation (3-4) for probability distribution matrix S input to BilSTM Module after Map-to-Sequence operation processing in step 3.31Here, S is1Considering y, all possible output distributions are given, and the most likely resulting tag sequence l is output,the aim is to maximize the posterior probability p (l/y) of l.
p(l/y)=∑π:β(π)=lp(π/y) (3-4)。
In the formula (3-4), y is the probability distribution matrix input, and y is y1,y2,...,yTWhere T is the length of the sequence, pi β (pi) ═ l represents all paths pi through the β transformation (3-2) to the final tag sequence l, and p (pi/y) is defined by the formula (3-5).
Figure BDA0003187288360000192
In the formula (3-5),
Figure BDA0003187288360000193
indicating possession of the tag at time of time stamp ttThe subscript t is used to denote each timing of the pi path.
3.6: the model parameters of the fixed CRNN text recognition network are specifically as follows: using test set X partitioned in step 1.2testAnd testing the character recognition accuracy of the CRNN text recognition network. Mixing XtestAfter the preprocessing of the step 1.3, the DB network model with fixed parameters is input to obtain a small box data set of the prediction text
Figure BDA00031872883600002012
And (3) testing and identifying through the steps 3.1-3.5, comparing the obtained result Label sequence l with an actual Label file Label, and judging that the identification is correct only if the identification of the whole line of text is correct, or else, judging that the identification is wrong.
Defining the number of texts with correct model identification as ltrueThe number of recognized wrong texts is lfalseCalculating the model character recognition accuracy L by the formula (3-6)accuracy. Finally selecting the corresponding LaccuracyThe highest CRNN training model is used as a final fixed CRNN text recognition network model, and the corresponding fixed parameters are
Figure BDA0003187288360000201
Figure BDA0003187288360000202
Step (4), specifically comprising:
4.1: carry out text detection and discernment to constructor special type operation certificate, specifically do: loading the DB training model with the fixed parameters in the step 2.6, converting the DB training model into a DB text detection network model, and inputting an image set X of the special operation certificate of the constructor to be detecteddDocument image X in (1)id(i 1, 2.. multidot.n), obtaining t text prediction frame images of the certificate through a DB text detection network with fixed weight
Figure BDA0003187288360000203
Figure BDA0003187288360000204
And predicting 4 coordinate information of the text box
Figure BDA0003187288360000205
Including the upper left corner of the predictive text box
Figure BDA0003187288360000206
Lower left corner
Figure BDA0003187288360000207
The upper right corner
Figure BDA0003187288360000208
Coordinates of lower right corner
Figure BDA0003187288360000209
The predicted text small-frame image set marked by 4 coordinates obtained by DB text detection network prediction
Figure BDA00031872883600002010
Inputting to the CRNN text recognition network with fixed parameters in step 3.6, and outputting the related text recognition information
Figure BDA00031872883600002011
And its character recognition accuracy.
4.2: the judgment logic for detecting the special operation certificate of the constructor specifically comprises the following steps: and judging whether the certificate is legal or not through the following logic according to the text identification information obtained in the step 4.1, and finally obtaining a certificate detection result.
(1) If the certificate image is subjected to text prediction and recognition and then four characters of 'validity period' are recognized, the certificate is judged to detect relevant information of the validity period, next judgment is carried out, if relevant words are not recognized, the certificate is judged to fail to detect, and 'unqualified certificate shooting' is prompted. (2) If the valid period is identified, the predicted text box is selected, characters are extracted from the relevant year, month and day numbers (such as 20100601-20200601) from the beginning valid date to the ending valid date of the certificate after the valid period, the eight-digit numbers (such as 20210601) after the eight-digit numbers are extracted through logic processing, if the eight-digit numbers cannot be extracted normally, the certificate is judged to be not detected successfully, and the 'certificate valid period cannot be identified normally' is prompted. (3) If the valid period and eight digits after the corresponding text box are successfully identified, judging whether four characters of the 'operation type' are identified or not according to the identification text result of the certificate image, if the 'operation type' is identified, judging the next step, if the 'operation type' is not identified, judging that the certificate cannot be successfully detected, and prompting that the 'certificate operation type cannot be normally identified'. (4) The "job type" is recognized, the predicted text box is selected, the text box is extracted for the specific type (electrician job or high-altitude job) after the job type, and the process proceeds to step (5) if the text box is the "electrician job" and proceeds to step (6) if the text box is the "high-altitude job". (5) Comparing eight digits behind a text box corresponding to the valid period obtained by identifying the corresponding certificate image with the current Beijing time, if the valid period is within the current Beijing time, judging that the certificate has passed the valid period and is unqualified, and prompting that 'detection fails, manual detection'; if the validity period is longer than the current Beijing time, the certificate is judged to be qualified, and the detection prompts that the electrician operation type of the special operation certificate is successfully detected and the detection is qualified. (6) Comparing eight digits behind a text box corresponding to the valid period obtained by identifying the corresponding certificate image with the current Beijing time, if the valid period is within the current Beijing time, judging that the certificate has passed the valid period and is unqualified, and prompting that 'detection fails, manual detection'; if the validity period is longer than the current Beijing time, the certificate is judged to be qualified, and the detection prompts that the high-altitude operation type of the special operation certificate is successfully detected and the detection is qualified.
The method for labeling the image data set by adopting the semi-automatic labeling tool is efficient and accurate; aiming at the characteristics of the special job certificate image, the provided network combination model is small, convenient to deploy and high in detection speed; meanwhile, the user-defined special operation certificate judgment logic is adopted, the programming degree of the method is improved, the detection efficiency of the special operation certificate is effectively improved, and the labor cost is effectively reduced.
Example four
And detecting the special job certificate based on the DB and the CRNN, which is concretely as follows.
1: the data preprocessing specifically comprises the following steps: according to the step 1.1, the certificate image data set obtained by the mobile Yunnan company of China is manually screened to be used as an original certificate image data set L, and then the original certificate image data set L is labeled by a semi-automatic labeling method, as shown in a table 7.
TABLE 7 example of image data annotation for special job certificates
Figure BDA0003187288360000223
Watch (A)
Figure BDA0003187288360000221
According to step 1.2, the label data set X is divided into training sets XtrainAnd test set XtestThe ratio was set at 8: 2. And according to the step 1.3, sequentially carrying out image decoding, image normalization, rearrangement and image scaling operations on the marked data set X to obtain a marked data set X'.
2: building and training a DB text detection network; the overall structure of the DB text detection network is shown in fig. 4. According to the steps 2.1-2.4, firstly, carrying out scale transformation on the image in the annotation data set X' to obtain an image (640 multiplied by 3); and then sequentially constructing a backhaul module, a Neck module and a Head module of the DB text detection network. The input and output characteristic image sizes of the network layers are shown in table 8.
Table 8 DB text detection network input/output data flow table for each network layer
Figure BDA0003187288360000222
Figure BDA0003187288360000231
In table 8, the (640 × 640 × 3) certificate image is subjected to DB text detection network prediction, and the prediction result probability map M is finally outputP(640 × 640 × 1), threshold map MT(640X 1), approximate binary map MA(640X 1). And after preparing a training file and setting training parameters, training the DB text detection network according to the step 2.5.
(1) Preparing a training set train _ images folder, a test set test _ images folder, a training set matching label file train _ label.txt, a test set matching label file test _ label.txt and a training file train.py for training a DB text detection network. (2) Setting parameters such as epoch, batch size, learning rate and the like in train. After the training file is prepared and the training parameters are set, the training of the DB text detection network can be started.
Firstly, X is put intrainAnd loading the prepared training file to a training file train, wherein L, L is obtained by calculation of formulas (2-10) to (2-12) through forward propagationS、Lb、LtAnd then, continuously optimizing the network training parameters until the loss function value of the DB text detection network is converged. Finally, according to step 2.6, X is addedtestInput DB text detection networkTo obtain a corresponding approximate binary image MAAnd corresponding coordinate position information, and comparing it with test set XtestAnd comparing corresponding image coordinate information in the annotation file, calculating the model prediction recall rate (call), precision rate (precision) and evaluation Score (Score), and selecting the model with the optimal evaluation Score (Score) as the DB text detection network model with the final fixed parameters.
3: constructing and training a CRNN text recognition network; the overall structure of its CRNN text recognition network is shown in fig. 5. According to the steps 3.1-3.4, firstly, a text small box data set X obtained through DB text detection network prediction in the step 2 is subjected toDBCarrying out scale transformation on the images to obtain (320 multiplied by 32 multiplied by 3) images; and sequentially constructing CNN, BilSTM and CTC modules of the CRNN text recognition network. The input and output characteristic image sizes of the network layers are shown in table 9.
Table 9.CRNN text recognition of input and output data flow table of each network layer in the network
Number of network layers Module name Inputting characteristic images Outputting a feature image
1 Conv module M0(320×320×3) M1(160×16×16)
2 Bneck _ Mix5 module M1(160×16×16) M2(160×4×24)
3 Bneck _ Mix6 module M2(160×4×24) M3(160×1×96)
4 Conv module M3(160×1×96) M4(160×1×576)
5 Pool module M4(160×1×576) M5(80×1×576)
6 Reshape module M5(80×1×576) S1(80×576)
7 BilSTM module S1(80×576) S2(80×m)
8 FC+Softmax S2(80×m) l
In table 9, the (320 × 32 × 3) certificate image is subjected to CRNN text recognition network prediction, and the prediction result sequence l is finally output. Training files are prepared and training parameters are set for training the model according to step 3.5. (1) Preparing a training set text _ images folder, a test set text _ images folder, two txt files rec _ text.txt and rec _ text.txt for recording image text content labels, a training file train.py and a dictionary word _ direct.txt for training a CRNN text recognition network. The dictionary is stored in an utf-8 encoding format and is used for mapping characters appearing in the labeling data set X into indexes of the dictionary. (2) Setting parameters such as epoch, batch size, learning rate and the like in train.
After the training file is prepared and the training parameters are set, the training of the CRNN text recognition network can be started. Firstly, X is put intrainAnd loading the prepared training file to a training file train.py, calculating by formulas (3-3) to (3-5) through forward propagation to obtain L (S), and continuously optimizing the network training parameters until the loss function value of the CRNN text recognition network is converged. Finally, according to step 3.6, X is addedtestInputting the prediction result sequence l into a CRNN text recognition network to obtain a prediction result sequence l of a corresponding image text, and combining the prediction result sequence l with a test set XtestComparing corresponding labeled text information in the labeled file, and calculating the model character recognition accuracy rate LaccuracySelection accuracy LaccuracyThe highest model is used as the final fixed parameter CRNN text recognition network model.
4: detect constructor special type operation certificate specifically does: according to the step 4.1, loading the DB text detection network model with the fixed parameters in the step 2.6 and the CRNN text recognition network model with the fixed parameters in the step 3.6, firstly inputting an image data set X of the certificate to be detecteddAccording to the special job certificate image of the constructor to be detected, the model predicts an approximate binary image M of the text target frame target according to the parametersA(640 × 640 × 1), the corresponding text box is obtained
Figure BDA0003187288360000251
And coordinate position information thereof
Figure BDA0003187288360000252
Then, inputting the text frame image obtained by predicting the DB text detection network model into a CRNN text recognition network model, and outputting the text information obtained by recognition
Figure BDA0003187288360000253
Arbitrarily selecting certificate image data set X to be detecteddOne certificate image X inkdAs an example of model output, image XkdThe information identified by the DB and CRNN fixed model prediction is shown in table 10.
Table 10 example image XkdInformation table after DB and CRNN fixed model prediction identification
Figure BDA0003187288360000254
Figure BDA0003187288360000261
According to the step 4.2, the obtained text information X is detected and judged according to the self-defined special job certificate of the constructorkd-tAnd extracting, and judging whether the special operation certificate of the constructor is qualified and effective based on the extracted text information.
Fourthly, compared with the prior art, the invention has the advantages and positive effects
(1) The invention provides a high-efficiency semi-automatic certificate image data set labeling method aiming at the characteristics of regularity, order and easy labeling of special operation certificate image data sets provided by a construction site of a 5G base station of China Mobile Yunnan company, and the method adopts a PPOCRLael tool to perform the first step of labeling, namely automatically labeling text boxes in the certificate image data sets and characters in corresponding text boxes, and further manually screening to perform secondary manual modification labeling on text boxes and texts which are not predicted successfully and labeled incorrectly so as to improve the labeling efficiency and ensure the high accuracy of the labeled data sets.
(2) Aiming at the characteristics of regular image data, single type and easiness in completing text detection and text recognition of the special job certificate, the invention adopts a MobileNet V3 network as a backbone network of a DB text detection network model and a CRNN text recognition network model for extracting image features. The number of characteristic channels of two networks is reduced while the certificate can be accurately detected, and the size of the corresponding model is reduced by 90%, so that the method is suitable for the condition of limited computing capacity; meanwhile, the detection speed of the certificate image is improved, and therefore the efficiency of the certificate inspection method is improved.
(3) The method combines the image data set of the special job certificate of the given constructor and the corresponding prediction recognition result of the combined model, self-defines the judgment logic of the special job certificate detection, realizes the automatic detection program of the special job certificate in 24 hours without manpower through a computer, and improves the programming degree of the method.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (10)

1. A special job certificate detection method based on DB and CRNN is characterized by comprising the following steps:
acquiring a special job certificate image data set; the special operation certificate image data set comprises a plurality of target special operation certificate images, and each target special operation certificate image has text information;
inputting each target special job certificate image into a DB text detection network model to determine a text box data set corresponding to each target special job certificate image; elements in the text box data set represent position information of a target text box;
inputting each target special job certificate image and a text box data set corresponding to each target special job certificate image into a CRNN text recognition network model to determine text information in each target text box in each target special job certificate image; the text information comprises at least one of constructor name, constructor gender, certificate number, operation category and certificate valid date;
the DB text detection network model is obtained by training based on a DB text detection network and a first training data set; the backhaul module in the DB text detection network adopts a MobileNet V3-large structure; each element in the first training data set comprises a historical special job certificate image and a first class label corresponding to the historical special job certificate image; the first category label is position information of a historical text box;
the CRNN text recognition network model is obtained by training based on a CRNN text recognition network and a second training data set; the partial structure of the CNN module in the CRNN text recognition network adopts a MobileNet V3-small structure; each element in the second training data set comprises a historical special job certificate image and a second class label corresponding to the historical special job certificate image; the second category label is historical text information.
2. The method for detecting special job certificates based on DB and CRNN as claimed in claim 1, further comprising: and determining whether each special operation certificate meets the construction operation requirement or not based on the text information.
3. The method as claimed in claim 1, wherein the step of inputting each target special job certificate image into a DB text detection network model to determine a text box data set corresponding to each target special job certificate image comprises:
preprocessing each target special job certificate image; the pretreatment comprises the following steps: decoding, normalizing, rearranging, and image scaling;
and inputting each preprocessed target special job certificate image into a DB text detection network model to determine a text box data set corresponding to each target special job certificate image.
4. The method as claimed in claim 3, wherein the step of inputting each target special job certificate image and the text box data set corresponding to each target special job certificate image into a CRNN text recognition network model to determine the text information in each target text box of each target special job certificate image comprises:
inputting each preprocessed target special job certificate image and a preprocessed text box data set corresponding to each preprocessed target special job certificate image into a CRNN text recognition network model to determine text information in each target text box in each target special job certificate image.
5. The method for detecting special job certificates based on the DB and the CRNN as claimed in claim 1, wherein the determining process of the DB text detection network model is as follows:
constructing a DB text detection network;
determining a first training data set;
and training the DB text detection network based on the first training data set to obtain a DB text detection network model.
6. The method for detecting special job certificates based on the DB and CRNN according to claim 5, wherein the determining the first training data set specifically includes:
acquiring an original certificate image data set; the original certificate image data set comprises a plurality of original historical special job certificate images;
labeling each original historical special job certificate image by adopting a semi-automatic labeling tool to obtain each historical labeling image and a first class label corresponding to each historical labeling image;
preprocessing each historical annotation image to obtain a historical special job certificate image; decoding, normalizing, rearranging, and image scaling; the first class label corresponding to the historical special job certificate image is the first class label corresponding to the historical annotation image.
7. The method for detecting documents for special jobs based on DB and CRNN as claimed in claim 1, wherein the CRNN text recognition network model determining process is:
constructing a CRNN text recognition network;
determining a second training data set;
and training the CRNN text recognition network based on the second training data set to obtain a CRNN text recognition network model.
8. The method for detecting special job certificates based on DB and CRNN according to claim 7, wherein the determining the second training data set specifically includes:
acquiring an original certificate image data set; the original certificate image data set comprises a plurality of original historical special job certificate images;
labeling each original historical special job certificate image by adopting a semi-automatic labeling tool to obtain each historical labeling image and a second type label corresponding to each historical labeling image;
preprocessing each historical annotation image to obtain a historical special job certificate image; decoding, normalizing, rearranging, and image scaling; and the second type label corresponding to the historical special job certificate image is the second type label corresponding to the historical annotation image.
9. A special job certificate detection system based on DB and CRNN is characterized by comprising:
the data acquisition module is used for acquiring a special job certificate image data set; the special operation certificate image data set comprises a plurality of target special operation certificate images, and each target special operation certificate image has text information;
the text box data set determining module is used for inputting each target special job certificate image into a DB text detection network model so as to determine a text box data set corresponding to each target special job certificate image; elements in the text box data set represent position information of a target text box;
the text information determining module is used for inputting each target special job certificate image and a text box data set corresponding to each target special job certificate image into a CRNN text recognition network model so as to determine text information in each target text box in each target special job certificate image; the text information comprises at least one of constructor name, constructor gender, certificate number, operation category and certificate valid date;
the DB text detection network model is obtained by training based on a DB text detection network and a first training data set; the backhaul module in the DB text detection network adopts a MobileNet V3-large structure; each element in the first training data set comprises a historical special job certificate image and a first class label corresponding to the historical special job certificate image; the first category label is position information of a historical text box;
the CRNN text recognition network model is obtained by training based on a CRNN text recognition network and a second training data set; the partial structure of the CNN module in the CRNN text recognition network adopts a MobileNet V3-small structure; each element in the second training data set comprises a historical special job certificate image and a second class label corresponding to the historical special job certificate image; the second category label is historical text information.
10. The DB and CRNN-based specialty job document detection system according to claim 9, further comprising: and the detection module is used for determining whether each special operation certificate meets the construction operation requirement or not based on the text information.
CN202110865778.9A 2021-07-29 2021-07-29 Special operation certificate detection method and system based on DB and CRNN Active CN113591866B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110865778.9A CN113591866B (en) 2021-07-29 2021-07-29 Special operation certificate detection method and system based on DB and CRNN

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110865778.9A CN113591866B (en) 2021-07-29 2021-07-29 Special operation certificate detection method and system based on DB and CRNN

Publications (2)

Publication Number Publication Date
CN113591866A true CN113591866A (en) 2021-11-02
CN113591866B CN113591866B (en) 2023-07-07

Family

ID=78252001

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110865778.9A Active CN113591866B (en) 2021-07-29 2021-07-29 Special operation certificate detection method and system based on DB and CRNN

Country Status (1)

Country Link
CN (1) CN113591866B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114266751A (en) * 2021-12-23 2022-04-01 福州大学 AI technology-based product packaging bag coding defect detection method and system
CN115131797A (en) * 2022-06-28 2022-09-30 北京邮电大学 Scene text detection method based on feature enhancement pyramid network
CN116532046A (en) * 2023-07-05 2023-08-04 南京邮电大学 Microfluidic automatic feeding device and method for spirofluorene xanthene
CN116935396A (en) * 2023-06-16 2023-10-24 北京化工大学 OCR college entrance guide intelligent acquisition method based on CRNN algorithm
CN116958998A (en) * 2023-09-20 2023-10-27 四川泓宝润业工程技术有限公司 Digital instrument reading identification method based on deep learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190318755A1 (en) * 2018-04-13 2019-10-17 Microsoft Technology Licensing, Llc Systems, methods, and computer-readable media for improved real-time audio processing
WO2020111676A1 (en) * 2018-11-28 2020-06-04 삼성전자 주식회사 Voice recognition device and method
CN111401371A (en) * 2020-06-03 2020-07-10 中邮消费金融有限公司 Text detection and identification method and system and computer equipment
WO2020218512A1 (en) * 2019-04-26 2020-10-29 Arithmer株式会社 Learning model generating device, character recognition device, learning model generating method, character recognition method, and program
CN113076992A (en) * 2021-03-31 2021-07-06 武汉理工大学 Household garbage detection method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190318755A1 (en) * 2018-04-13 2019-10-17 Microsoft Technology Licensing, Llc Systems, methods, and computer-readable media for improved real-time audio processing
WO2020111676A1 (en) * 2018-11-28 2020-06-04 삼성전자 주식회사 Voice recognition device and method
WO2020218512A1 (en) * 2019-04-26 2020-10-29 Arithmer株式会社 Learning model generating device, character recognition device, learning model generating method, character recognition method, and program
CN111401371A (en) * 2020-06-03 2020-07-10 中邮消费金融有限公司 Text detection and identification method and system and computer equipment
CN113076992A (en) * 2021-03-31 2021-07-06 武汉理工大学 Household garbage detection method and device

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114266751A (en) * 2021-12-23 2022-04-01 福州大学 AI technology-based product packaging bag coding defect detection method and system
CN115131797A (en) * 2022-06-28 2022-09-30 北京邮电大学 Scene text detection method based on feature enhancement pyramid network
CN116935396A (en) * 2023-06-16 2023-10-24 北京化工大学 OCR college entrance guide intelligent acquisition method based on CRNN algorithm
CN116935396B (en) * 2023-06-16 2024-02-23 北京化工大学 OCR college entrance guide intelligent acquisition method based on CRNN algorithm
CN116532046A (en) * 2023-07-05 2023-08-04 南京邮电大学 Microfluidic automatic feeding device and method for spirofluorene xanthene
CN116532046B (en) * 2023-07-05 2023-10-10 南京邮电大学 Microfluidic automatic feeding device and method for spirofluorene xanthene
CN116958998A (en) * 2023-09-20 2023-10-27 四川泓宝润业工程技术有限公司 Digital instrument reading identification method based on deep learning
CN116958998B (en) * 2023-09-20 2023-12-26 四川泓宝润业工程技术有限公司 Digital instrument reading identification method based on deep learning

Also Published As

Publication number Publication date
CN113591866B (en) 2023-07-07

Similar Documents

Publication Publication Date Title
CN109902622B (en) Character detection and identification method for boarding check information verification
CN113591866B (en) Special operation certificate detection method and system based on DB and CRNN
CN111325203B (en) American license plate recognition method and system based on image correction
US10817741B2 (en) Word segmentation system, method and device
CN107194400B (en) Financial reimbursement full ticket image recognition processing method
CN111950453A (en) Optional-shape text recognition method based on selective attention mechanism
CN110969129B (en) End-to-end tax bill text detection and recognition method
CN105512611A (en) Detection and identification method for form image
CN114155527A (en) Scene text recognition method and device
CN111539330B (en) Transformer substation digital display instrument identification method based on double-SVM multi-classifier
CN111680690A (en) Character recognition method and device
US20210334573A1 (en) Text line normalization systems and methods
CN115131797A (en) Scene text detection method based on feature enhancement pyramid network
CN114067300A (en) End-to-end license plate correction and identification method
CN114283431B (en) Text detection method based on differentiable binarization
CN112365451B (en) Method, device, equipment and computer readable medium for determining image quality grade
CN111832497B (en) Text detection post-processing method based on geometric features
CN111553361B (en) Pathological section label identification method
CN116681657B (en) Asphalt pavement disease detection method based on improved YOLOv7 model
CN111414917A (en) Identification method of low-pixel-density text
CN111476226A (en) Text positioning method and device and model training method
CN115424250A (en) License plate recognition method and device
Rani et al. Object Detection in Natural Scene Images Using Thresholding Techniques
CN111738255A (en) Guideboard text detection and recognition algorithm based on deep learning
Zhao et al. Text Spotting of Electrical Diagram Based on Improved PP-OCRv3

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant