CN113591866B - Special operation certificate detection method and system based on DB and CRNN - Google Patents

Special operation certificate detection method and system based on DB and CRNN Download PDF

Info

Publication number
CN113591866B
CN113591866B CN202110865778.9A CN202110865778A CN113591866B CN 113591866 B CN113591866 B CN 113591866B CN 202110865778 A CN202110865778 A CN 202110865778A CN 113591866 B CN113591866 B CN 113591866B
Authority
CN
China
Prior art keywords
text
special operation
image
data set
crnn
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110865778.9A
Other languages
Chinese (zh)
Other versions
CN113591866A (en
Inventor
彭光灵
岳昆
刘伯涛
李忠斌
杨晰
魏立力
段亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunnan University YNU
Original Assignee
Yunnan University YNU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunnan University YNU filed Critical Yunnan University YNU
Priority to CN202110865778.9A priority Critical patent/CN113591866B/en
Publication of CN113591866A publication Critical patent/CN113591866A/en
Application granted granted Critical
Publication of CN113591866B publication Critical patent/CN113591866B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a special operation certificate detection method and system based on DB and CRNN, wherein the method comprises the following steps: inputting each target special operation certificate image into a DB text detection network model to determine a text box data set corresponding to each target special operation certificate image; and inputting each target special operation certificate image and the text box data set corresponding to each target special operation certificate image into a CRNN text recognition network model to determine text information in each target text box in each target special operation certificate image. The backup module in the DB text detection network adopts a MobileNet V3-large structure; the partial structure of the CNN module in the CRNN text recognition network adopts a MobileNet V3-small structure. The invention can achieve the purposes of reducing the manual workload and improving the detection efficiency of the certificate image.

Description

Special operation certificate detection method and system based on DB and CRNN
Technical Field
The invention relates to the technical field of optical character recognition, in particular to a special operation certificate detection method and system based on DB and CRNN.
Background
In the construction process of the 5G base station, constructors have qualified and effective special operation certificates, and are indispensable safety guarantee in the construction process. At present, the detection of the special operation certificate is mostly carried out manually, the detection efficiency is low, and the feedback of the detection of the special operation certificate can not be timely and effectively obtained.
Disclosure of Invention
The invention aims to provide a special operation certificate detection method and system based on DB and CRNN, so as to achieve the purposes of reducing the manual workload and improving the certificate image detection efficiency.
In order to achieve the above object, the present invention provides the following solutions:
a special operation certificate detection method based on DB and CRNN comprises the following steps:
acquiring a special operation certificate image data set; the special operation certificate image data set comprises a plurality of target special operation certificate images, and each target special operation certificate image has text information; inputting each target special operation certificate image into a DB text detection network model to determine a text box data set corresponding to each target special operation certificate image; the elements in the text box data set represent the position information of the target text box; inputting each target special operation certificate image and a text box data set corresponding to each target special operation certificate image into a CRNN text recognition network model to determine text information in each target text box in each target special operation certificate image; the text information comprises at least one of constructor name, constructor sex, certificate number, operation category and certificate validity date;
The DB text detection network model is obtained by training based on a DB text detection network and a first training data set; the backup module in the DB text detection network adopts a MobileNet V3-large structure; each element in the first training data set comprises a historical special operation certificate image and a first class label corresponding to the historical special operation certificate image; the first category label is the position information of the history text box; the CRNN text recognition network model is obtained based on the CRNN text recognition network and the second training data set; the partial structure of the CNN module in the CRNN text recognition network adopts a MobileNet V3-small structure; each element in the second training data set comprises a history special operation certificate image and a second class label corresponding to the history special operation certificate image; the second category labels are historical text information.
A special job certificate detection system based on DB and CRNN, comprising:
the data acquisition module is used for acquiring a special operation certificate image data set; the special operation certificate image data set comprises a plurality of target special operation certificate images, and each target special operation certificate image has text information; the text box data set determining module is used for inputting each target special operation certificate image into the DB text detection network model so as to determine a text box data set corresponding to each target special operation certificate image; the elements in the text box data set represent the position information of the target text box; the text information determining module is used for inputting each target special operation certificate image and a text box data set corresponding to each target special operation certificate image into the CRNN text recognition network model so as to determine text information in each target text box in each target special operation certificate image; the text information comprises at least one of constructor name, constructor sex, certificate number, operation category and certificate validity date;
The DB text detection network model is obtained by training based on a DB text detection network and a first training data set; the backup module in the DB text detection network adopts a MobileNet V3-large structure; each element in the first training data set comprises a historical special operation certificate image and a first class label corresponding to the historical special operation certificate image; the first category label is the position information of the history text box; the CRNN text recognition network model is obtained based on the CRNN text recognition network and the second training data set; the partial structure of the CNN module in the CRNN text recognition network adopts a MobileNet V3-small structure; each element in the second training data set comprises a history special operation certificate image and a second class label corresponding to the history special operation certificate image; the second category labels are historical text information.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention adopts the DB text detection network model and the CRNN text recognition network model, and can rapidly and accurately complete the detection of the special operation certificate image. The DB text detection network model can be well suitable for a lightweight network to be used as a feature extraction module, and under the condition that extra memory and time are not required to be consumed after the model is light, corresponding texts in special operation certificates are rapidly predicted, text areas are marked by adopting frames, the text areas are extracted from images, and frame information of a text target is obtained. The CRNN text recognition network model carries out text recognition on the predicted text block images, and can introduce BiLSTM and CTC mechanisms aiming at the condition that special operation certificate image data are short texts, so that global prediction of text feature sequences is enhanced and the text feature sequences are directly learned in the short texts (line-level labels), and the CRNN text recognition network model is not required to be used for learning and training additional detailed character-level labels, thereby improving the accuracy and efficiency of text recognition.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a special operation certificate detection method based on DB and CRNN;
FIG. 2 is a schematic diagram of a special job certificate detection system based on DB and CRNN;
FIG. 3 is an overall flow chart of the method for detecting special operation certificates based on DB and CRNN of the invention;
FIG. 4 is a schematic diagram of the overall structure of the DB text detection network of the present invention;
fig. 5 is a schematic diagram of the overall structure of the CRNN text recognition network according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
The invention uses the optical character recognition technology in the deep learning model to efficiently detect the special operation certificate image. The existing optical character recognition method based on deep learning mainly adopts a two-stage mode: text detection and text recognition. The DB (English full name: differentiable Binarization) algorithm used in the invention puts the binarization operation into the network from the pixel level and optimizes the binarization operation at the same time, so that the threshold value of each pixel point can be adaptively predicted, and the DB algorithm finishes binarization and can be slightly achieved when being used together with a segmentation network in an approximate method, thereby simplifying the post-processing process and accelerating the detection speed of a target. The CRNN (English full name: convolutional Recurrent Neural Network) algorithm used in the invention adopts a combination method of CNN, LSTM (Long Short Term Memory) and CTC (Connectionist Temporal Classification), and introduces a CTC method to solve the problem that characters cannot be aligned during training, and serial decoding operation is not needed like Attention OCR, so that the network structure is more optimized.
Example 1
The embodiment discloses a special operation certificate detection method based on DB and CRNN, which predicts the text position from the certificate image data set and identifies the specific information of the text to support the detection of the special operation certificate, thereby judging the qualification of the special operation certificate, belonging to the field of computer vision identification, in particular to the field of optical character identification. Referring to fig. 1, the method for detecting a special job certificate based on DB and CRNN according to the present embodiment includes the following steps.
Step 101: acquiring a special operation certificate image data set; the special operation certificate image data set comprises a plurality of target special operation certificate images, and each target special operation certificate image has text information. Step 102: inputting each target special operation certificate image into a DB text detection network model to determine a text box data set corresponding to each target special operation certificate image; the elements in the text box data set represent location information of the target text box. Step 103: inputting each target special operation certificate image and a text box data set corresponding to each target special operation certificate image into a CRNN text recognition network model to determine text information in each target text box in each target special operation certificate image; the text information includes at least one of constructor name, constructor sex, document number, job category, and document validity date. Step 103: inputting each target special operation certificate image and a text box data set corresponding to each target special operation certificate image into a CRNN text recognition network model to determine text information in each target text box in each target special operation certificate image; the text information includes at least one of constructor name, constructor sex, document number, job category, and document validity date. Step 104: and determining whether each special operation certificate meets the construction operation requirement or not based on the text information.
Step 102 specifically comprises: and preprocessing each target special operation certificate image. The pretreatment process is the same as that of the third embodiment, and will not be repeated here. And inputting each preprocessed target special operation certificate image into a DB text detection network model to determine a text box data set corresponding to each target special operation certificate image.
Step 103 specifically comprises: and preprocessing each target special operation certificate image. The pretreatment process is the same as that of the third embodiment, and will not be repeated here. Inputting each preprocessed target special operation certificate image and a text box data set corresponding to each preprocessed target special operation certificate image into a CRNN text recognition network model to determine text information in each target text box in each target special operation certificate image.
The DB text detection network model is obtained by training based on a DB text detection network and a first training data set; the backup module in the DB text detection network adopts a MobileNet V3-large structure; each element in the first training data set comprises a historical special operation certificate image and a first class label corresponding to the historical special operation certificate image; the first category label is the position information of the history text box. The CRNN text recognition network model is obtained based on the CRNN text recognition network and the second training data set; the partial structure of the CNN module in the CRNN text recognition network adopts a MobileNet V3-small structure; each element in the second training data set comprises a history special operation certificate image and a second class label corresponding to the history special operation certificate image; the second category labels are historical text information. The training process of the DB text detection network model and the CRNN text recognition network model is described in the third embodiment, and is not repeated here.
Example two
Referring to fig. 2, the special job certificate detection system provided in this embodiment includes:
a data acquisition module 201, configured to acquire a special job certificate image data set; the special operation certificate image data set comprises a plurality of target special operation certificate images, and each target special operation certificate image has text information. A text box data set determining module 202, configured to input each of the target special operation document images into a DB text detection network model, so as to determine a text box data set corresponding to each of the target special operation document images; the elements in the text box data set represent location information of the target text box. The text information determining module 203 is configured to input each of the target special operation certificate images and a text box data set corresponding to each of the target special operation certificate images into a CRNN text recognition network model, so as to determine text information in each of the target text boxes in each of the target special operation certificate images; the text information includes at least one of constructor name, constructor sex, document number, job category, and document validity date. And the detection module 204 is used for determining whether each special job certificate meets the construction operation requirement or not based on the text information.
The text box data set determining module 202 specifically includes: and preprocessing each target special operation certificate image. The pretreatment process is the same as that of the third embodiment, and will not be repeated here. And inputting each preprocessed target special operation certificate image into a DB text detection network model to determine a text box data set corresponding to each target special operation certificate image.
The text information determining module 203 specifically includes: and preprocessing each target special operation certificate image. The pretreatment process is the same as that of the third embodiment, and will not be repeated here. Inputting each preprocessed target special operation certificate image and a text box data set corresponding to each preprocessed target special operation certificate image into a CRNN text recognition network model to determine text information in each target text box in each target special operation certificate image.
For details of the DB text detection network model and the CRNN text recognition network model, see embodiment one. The training process of the DB text detection network model and the CRNN text recognition network model is described in the third embodiment, and is not repeated here.
Example III
In the construction process of the 5G base station, the certificate image data set formed by the special operation certificate images has the characteristics of large text aspect ratio, ordered specification and large data volume. Although accurate text objects can be obtained for regular-shaped text, text boxes with preset shapes cannot well describe text with certain special shapes (such as excessively large aspect ratio or arc shape); while the target detection mode based on PSENT and LSAE of segmentation class can detect irregular-shaped text, complex post-processing is needed, the results of pixel level can be formed to form text lines, prediction cost is high, and the requirement of rapidly completing detection under a large amount of data cannot be met.
According to the invention, a DB text detection network model is adopted, and the binarization operation is performed slightly, so that the problems of complex post-processing process and serious time consumption based on a segmentation method are solved, the detection speed is improved, and a large amount of certificate image data can be rapidly detected. Meanwhile, in the selection of a text recognition method, the problems that the requirement on a training sample is high, additional calculation parameters are needed in a transcription layer, and the detection speed is low are solved, so that the invention selects a CRNN text recognition network model which has higher detection speed and higher recognition precision on short text segments and does not need additional calculation parameters aiming at the condition that special operation certificate images are short text, thereby ensuring the high efficiency of text recognition and detection.
Therefore, the invention is based on the DB text detection network model and the CRNN text recognition network model, firstly, the low-quality special operation certificate images are manually screened and removed, then, a semi-automatic labeling mode is used for obtaining a high-standard image data set and a training data set, the labeling of the special operation certificate images can be rapidly completed based on the semi-automatic labeling mode, and the high accuracy of labeling results is ensured. And then predicting and framing the text position in the special operation certificate image by using the DB text detection network training data set, and identifying text information in the calibrated position by using the CRNN text identification network training data set. Aiming at the characteristics of regular, single type and easy detection of special operation certificate images, a MobileNet V3 network is selected as a feature extraction module of two DB text detection network models and a CRNN text recognition network model, so that the detection accuracy is ensured, and meanwhile, the model is light and the detection speed is improved. In the special operation certificate image detection stage, based on the certificate image text information predicted and identified by the DB text detection network model and the CRNN text recognition network model, the certificate type and the valid period in the special operation certificate image are further deduced through the custom judgment logic, so that the safety detection of the special operation certificate is rapidly completed.
Referring to fig. 3, the method for detecting a special job certificate based on DB and CRNN according to the embodiment of the present invention is divided into 4 steps.
Step (1): a credential image data set is generated. The method comprises the following steps: acquiring a special operation certificate image of constructors from a 5G base station construction site; each special operation certificate image comprises the name, sex, certificate number, operation category, certificate effective date and the like of constructors; secondly, quick labeling is carried out on each special operation certificate image by using a semi-automatic labeling tool, a labeling data set is obtained, and then data preprocessing is carried out on the labeling data set; and finally, dividing the preprocessed labeling data set (namely the generated certificate image data set) into a training set and a testing set. Step (2): construction and training of DB text detection networks. The method comprises the following steps: and sequentially constructing a backbox module, a Neck module and a Head module of the DB text detection network. The backbox module adopts a MobileNet V3-large structure, and is used as a feature pyramid to extract features of an input image so as to obtain a feature image; the Neck module adopts a FPN (Feature Pyramid Networks) structure to further process the obtained characteristic image; and the Head module performs output processing on the processed characteristic image, predicts a probability map and a threshold map, and obtains an approximate binary map based on the probability map and the threshold map. After preparing files required for training and setting parameters required for training, training the DB text detection network based on the training set in the step (1). Step (3): construction and training of CRNN text recognition networks. The method comprises the following steps: and sequentially constructing a CNN module, a BiLSTM (Bi-directional Long Short-Term Memory) module and a CTC network structure of the CRNN text recognition network. The CNN module part adopts a MobileNet V3-small structure and is used for extracting the characteristics of a text image; the BiLSTM module uses the extracted feature images for feature vector fusion, and further extracts the context features of the character sequence to obtain probability distribution of each row of features; the CTC network structure inputs the hidden vector probability distribution, thereby predicting the text sequence. After preparing files required for training and setting parameters required for training, training the CRNN text recognition network based on the training set in the step (1). Step (4): the special operation certificate of the constructor is detected, which comprises the following specific steps: predicting the certificate image to be detected through a DB text detection network model, obtaining a target text box and coordinate position information, identifying texts in the target text box through a CRNN text identification network model, and finally realizing detection of the special operation certificates through a custom special operation certificate judgment logic.
The step (1) specifically comprises:
1.1: labeling an image; the method comprises the following steps: the special operation certificate image data set obtained by sampling the 5G base station construction site has low quality samples of the phenomena of blocked characters, excessive exposure of the certificate, blurry characters, unclear characters, too small a certificate duty ratio image, incapability of accurately classifying and identifying a plurality of certificates in one image, and the like, and needs to be manually screened and removed, thereby obtaining the original certificate image data set L= { L 1 ,l 2 ,...,l n }。
The original document image dataset L has the following characteristics: (1) Compared with other types of image data (such as store signboards, street signs, clothes hanging signs and the like), the special operation certificate image data has the characteristics of regularity, order and easy marking. (2) The special operation certificate image is fixedly provided with text information such as name, gender, certificate number, operation category, date of first date, validity period and the like. (3) The certificate words are clearly visible and free from shielding, and the certificate proportion meets the image requirement proportion (such as 80%). (4) The certificate image is single in type, and text detection and text recognition are easy to realize.
Aiming at the characteristics, the embodiment of the invention compares the manual labeling method and the semiautomatic labeling method, and discovers that the semiautomatic labeling method can obviously improve the labeling efficiency for the certificate images with regular and orderly characteristics and easy labeling. Therefore, for the process of labeling the original document image dataset L, a semi-automatic labeling method is employed. The semi-automatic labeling process is as follows, namely, the steps 1.1.1-1.1.2.
1.1.1: from the raw document image dataset L, an automatic labeling of the semi-automatic labeling process is performed on the raw document image dataset L using PPOCRLabel (Paddle Paddle OCRLabel). PPOCRLabael uses a built-in OCR model (comprising a text detection model and a text recognition model) to predict the text in the image of the original document image data set L, and frames corresponding text, so as to recognize the text in the frame, and obtain an automatically marked data set L ', L ' = { L ' 1 ,L' 2 ,...,L' n },L' i ={L' i1 ,L' i2 ,...,L' it }. Each marked image L' i T text prediction boxes exist in the document image, and the number of t is always a fixed value L 'because the document image is the standard image data' ii ={L' ii1 ,L' ii2 ,L' ii3 ,L' ii4 ,L' ii-t Each automatic labeling image has 5 data values, L' ii1 、L' ii2 、L' ii3 、L' ii4 Respectively represent automatic labeling images L' i Predicting the resulting text box L' ii Upper left, lower left, upper right, lower right, L' ii-t Representing an automatically annotated image L' i Predicting the resulting text box L' ii Is included.
1.1.2: the second step of the semi-automatic labeling process, i.e. manual screening and validation, is performed on the dataset L'. If the text box is not predicted and the coordinate value of the text box is wrong, the coordinate value is corrected manually; if the text in the text box is wrongly identified, manually correcting the text content in the text box, thereby obtaining a marked data set X, X= { X 1 ,,X 2 ,...,X n }. Labeling each document image X in a dataset i With t text prediction boxes, X i ={X i1 ,X i2 ,...,X it Each text prediction box X ii There are 5 data values, X ii ={X ii1 ,X ii2 ,X ii3 ,X ii4 ,X ii-t },X ii1 、X ii2 、X ii3 、X ii4 Respectively the upper left corner coordinate, the lower left corner coordinate, the upper right corner coordinate and the lower right corner coordinate of the text prediction box, X ii-t Is the text content within the text prediction box. After the labeling is completed, a labeling data set X and a corresponding labeling result annotation file Label are obtained and can be used for training a DB text detection network and a CRNN text recognition network, wherein X is used for identifying the text detection network and the CRNN text recognition network ii1 、X ii2 、X ii3 、X ii4 Training labels, X, in DB text detection networks ii-t Training the labels of the CRNN text recognition network.
1.2: the data set is divided into the following specific steps: dividing the marked data set X in the step 1.1 into a training set X train And test set X test Two parts, training set X train The training system is used for training a DB text detection network and a CRNN text recognition network, and accounts for 80 percent; test set X test For testing the trained DB text detection network and the trained CRNN text recognition network, at a 20% ratio.
1.3: the data preprocessing is specifically as follows:
1.3.1: decoding the marked data set X, specifically: inputting a marked data set X, and sequentially aiming at original images X in the marked data set X i The data in (a) is converted into matrix of uint8 type, then decoding is carried out, namely, the image is restored to a three-dimensional matrix from the JPEG format, the color format of the decoded image is BGR (Blue X Green X Red), the dimensions of the matrix are arranged according to the sequence of HWC (Height X Weight X Channel), and a pixel matrix data set X of the image is obtained m ={X 1m ,X 2m ,...,X nm }。
1.3.2: for pixel matrix data set X m Normalization is carried out, specifically: input pixel matrix data set X m The pixel matrix data set X m Image X of (B) im Each pixel point in (i=1, 2,., n) maps to an interval [0,1]In the mapping process, the pixel point is divided by 255, where 255 is a linear transformation parameter (the linear transformation parameter is used to divide the pixel value from the interval 0,255]Transition to interval [0,1 ]]) Subtracting the average value of the corresponding channels, and dividing by the standard deviation of the corresponding channels to obtain a normalized result data set X' m
1.3.3: for normalized result dataset X' m Rearrangement is specifically: input normalized result dataset X' m For normalized result dataset X' m Image X 'of (B)' im The pixel points are rearranged, the dimension of the image matrix is converted from HWC format (height×width×channel) to CHW format (channel×height×width), and a new certificate image data set X' is obtained " m
1.3.4: for certificate image data set X' m The image scaling is carried out, specifically: input certificate image data set X' m When certificate image data set X' m Image X' in (3) " im If the length or width exceeds the specified maximum size or is smaller than the specified minimum size, the image size is scaled, the scaling process is that the length exceeding the specified side length is scaled to be an integral multiple of 32 within the specified side length range, and the blank area is filled with 0, so that a preprocessed certificate image data set X is obtained.
The step (2) specifically comprises:
2.1: the input image preprocessing is specifically as follows: before inputting the DB text detection network, the certificate image data set X' obtained by preprocessing in the step 1.3 is subjected to scale transformation. The specific process is as follows: the images in the certificate image data set X ' are adjusted to be in line with the input size (640 multiplied by 3) (width pixel value multiplied by height pixel value multiplied by RGB (Red multiplied by Green multiplied by Blue) three channels) calculated by a backlight module in a DB text detection network, and the processed data set X ' is obtained after the adjustment of the scale transformation ' DB And 2.2, inputting a feature extraction module Backbone. If the above-mentioned scale transformation is not performed, the image input size is different from the preset aspect ratio (640×640×3), and a pixel gap is continuously generated after the upsampling operation in the feature enhancement module FPN in step 2.3, which results in that the merging operation between the images in step 2.3 cannot be performed.
2.2: the feature extraction module Backbone is constructed specifically as follows: inputting the certificate image data set X 'obtained after the processing of the step 2.1' DB The document image aiming at the special operation described in the step 1.1 has regular, single type and easy realization textAnd in the step, a MobileNet V3-large network is adopted as a feature extraction backup module of the DB text detection network to ensure that the size of the model is reduced and the detection speed is improved under the condition of high accuracy of the model for extracting the image features. The MobileNet V3-large network pair certificate image data set X' DB Image X 'of (B)' iDB (i=1, 2,., n) performs extraction of feature information, thereby outputting four feature images K 2 ~K 5 . The network structure of MobileNetV3-large is shown in table 1.
Table 1 feature extraction network structure table of MobileNetV3-large network
Figure BDA0003187288360000101
Figure BDA0003187288360000111
In the MobileNet V3-large network, the network consists of Conv, bneck_Mix1, bneck_Mix2, bneck_Mix3, bneck_Mix4 and Pool modules. (1) Conv module is used for preprocessing characteristic image K 0 Performing convolution operation to obtain a characteristic image K 1 Characteristic image K 0 In order to perform the preprocessing of the images in the step 1.3 and the step 2.1, the H-swish approximate activation function (2-1) is adopted to replace a swish formula as an activation function for activation, so that the calculation cost is reduced, and the calculation speed is improved. (2) The Bnegk module consists of a 1×1 convolution kernel, a 3×3 depth convolution kernel or a 5×05 depth convolution kernel (when the Bnegk module is a 3×13 depth separable convolution, a 3×3 depth convolution kernel is adopted, and when the Bnegk module is a 5×5 depth separable convolution, a 5×5 depth convolution kernel is adopted), and a 1×1 point-by-point convolution kernel; the method comprises the steps of firstly using a 1 multiplied by 1 convolution kernel to increase the dimension of a feature map, using a 3 multiplied by 3 depth convolution kernel or a 5 multiplied by 5 depth convolution kernel to carry out convolution operation in a higher dimension space to extract features, then using a 1 multiplied by 1 point-by-point convolution kernel to carry out dimension reduction on the feature map, and combining into a depth separable convolution, thereby reducing the number of parameters and the operand of multiply-add operation to the operand originally using common convolution A lightweight attention mechanism model (SE) is introduced simultaneously, the SE model automatically acquires the importance degree of each characteristic channel through learning, and then according to the result, the useful characteristics are promoted and the characteristics which are less useful for the current task are restrained, and the SE model is used for adjusting the weight of each channel. The Bneck_Mix1 module consists of three Bneck modules (3×3 depth convolution kernels) using the ReLU6 activation function (2-2), where the 3×3 depth convolution kernels represent that in the composed Bneck modules the depth convolution kernels used are 3×3 in size, and the same concept is used in the following 5×5 depth convolution kernels and step 3.2. The Bneg_Mix2 module consists of three Bneg modules (5×5 depth convolution kernels) that employ a ReLU6 activation function. The Bneck_Mix3 module consists of six Bneck modules (3×3 depth convolution kernels) that employ an H-swish activation function. The Bneck_Mix4 module consists of three Bneck modules (5×5 depth convolution kernels) that employ an H-swish activation function. These modules are respectively used for characteristic image K 1 、K 2 、K 3 、K 4 Performing separable convolution of a plurality of layers of depth to obtain a characteristic image K 2 、K 3 、K 4 、K 5 . (3) Feature map K by Conv module 5 And then carrying out convolution operation to obtain a characteristic image K 6 . Pool module adopts average pooling technology to make characteristic diagram K 6 Downsampling is performed. After feature pooling is carried out by a Pool module, features are extracted by 1X 1 convolution, and finally the features are divided into K types of output channels, so that a feature graph K of an input image is extracted 9 . According to the MobileNet V3-large network structure constructed in Table 1, calculating the characteristic diagram K of the second layer, the third layer, the fourth layer and the fifth layer of the network in sequence 2 ~K 5 Sequentially taking the input of the feature enhancement module Neck in the step 2.3.
Figure BDA0003187288360000121
2.3: the feature enhancement module Neck is constructed specifically as follows: the output K obtained using step 2.2 will be used 2 ~K 5 Input C as this step 2 ~C 5 FPN structural partEnhancing Neck module for DB text detection network features, and inputting C through operations such as convolution and up-sampling 2 ~C 5 Transforming into uniform size to obtain P with the same size 2 ~P 5 Finally P is arranged 2 ~P 5 And combining to generate a characteristic image F. The structure of the constructed FPN is shown in Table 2.
Table 2 network structure table of feature enhancement module FPN
Network layer number Module name Inputting feature images Outputting a feature image
1 Conv1 module C 5 (20×20×160) IN 5 (20×20×96)
2 Conv1 module C 4 (40×40×112) IN 4 (40×40×96)
3 Conv1 module C 3 (80×80×40) IN 3 (80×80×96)
4 Conv1 module C 2 (160×160×24) IN 2 (160×160×96)
5 Conv2 module IN 5 (20×20×96) P 5 (160×160×24)
6 Conv2 module IN 4 (40×40×96) P 4 (160×160×24)
7 Conv2 module IN 3 (80×80×96) P 3 (160×160×24)
8 Conv2 module IN 2 (160×160×96) P 2 (160×160×24)
The FPN network structure consists of a Conv1 module and a Conv2 module. (1) The Conv1 module consists of a 1×1 convolution, the 1×1 convolution being used for the input feature image C 2 ~C 5 Performing channel number reduction operation; wherein for IN subjected to the reduced channel number operation 2 ~IN 5 For IN 5 Performing double nearest neighbor upsampling operation, IN 4 IN after upsampling by and with the double nearest neighbor 5 Adding to obtain new IN 4 New IN 4 A double nearest neighbor up-sampling operation is performed, IN (IN) 3 IN after upsampling by and with the double nearest neighbor 4 Adding to obtain new IN 3 ,IN 2 Similar steps as described above and IN are taken 3 Adding to obtain a new IN 2 . (2) The Conv2 module consists of a 3X 3 convolution for the obtained IN 2 ~IN 5 Performing convolution feature fusion smoothing processing to reduce aliasing influence caused by nearest neighbor interpolation; and fusing the convolution characteristics to obtain a characteristic image P 3 、P 4 、P 5 Up-sampling operation of 2 times, 4 times and 8 times is respectively carried out, and finally, the feature image P after the processing is completed 2 ~P 5 And adding the images point by point to obtain a final characteristic image F of the layer of network. This layer network structure pairs image C 2 ~C 5 And (3) performing feature extraction, up-sampling and merging operations, so that the low-level high-resolution information and the high-level strong semantic information are combined to obtain a feature image F with enhanced features, and then inputting the feature image F into an output module Head of the step (2.4).
2.4: and constructing an output module Head. The method comprises the following steps: inputting the characteristic image F obtained through the processing in the step 2.3, taking DB_head as an output module of the DB text detection network, and further processing the characteristic image F so as to output a probability map M p (Procapability Map), threshold Map M T (Threshold Map) and approximate binary image M A (Approximate Binary Map). The constructed db_head network structure is shown in table 3.
Table 3 network structure table of output module db_head
Network layer number Module name Inputting feature images Outputting a feature image
1 Conv module F(160×160×96) F 1 (160×160×24)
2 BN module F 1 (160×160×24) F 2 (160×160×24)
3 Conv module F 2 (160×160×24) F 3 (320×320×6)
4 BN module F 3 (320×320×6) F 4 (320×320×6)
5 Conv module F 4 (320×320×6) F 5 (640×640×1)
(1) DB_Head is composed of Conv and BN (Batch Normalization) modules, wherein the Conv module is composed of one convolution, the convolution of the first layer is 3 multiplied by 3, the convolution of the third layer and the fifth layer is 2 multiplied by 2, the convolution is used for extracting image features, the BN module is used for normalizing data, the average value (2-3) and the variance (2-4) of each training batch of data are obtained, the obtained average value and variance are used for normalizing (2-5) the training data of the batch, the average difference is 0, the variance is 1, and normalization (2-6), namely scale transformation and offset, is carried out. The equation involved in BN layer is as follows:
Figure BDA0003187288360000141
Figure BDA0003187288360000142
wherein, (2-3) is a mean formula; (2-4) is a variance formula; (2-5) is a normalization formula; (2-6) reconstructing a transformation formula; and N is mini-batch size (namely, each training, a data set is divided into a plurality of batches and then divided into smaller mini-batches, and gradient descent is carried out), and gamma and beta are learnable reconstruction parameters of corresponding feature graphs (each feature graph has only one pair of learnable parameters, namely gamma and beta, and the learnable parameters are used for enabling the network to recover the feature distribution to be learned of the original network).
(2) Probability map M p And threshold map M T Is generated by: inputting a characteristic image F, firstly compressing the number (dimension) of channels of the characteristic image into 1/4 of input through a 3X 3 convolution layer, then obtaining the characteristic image F through BN operation and ReLU activation functions (2-7) through a BN layer 2 Inputting the feature map into the next layer 2×2 convolution, and performing deconvolution operation to obtain feature map F 3 Repeatedly performing BN operation and ReLU activation function, and repeatedly cycling to obtain final characteristic image F 5 Finally, through Sigmoid function (2-8), probability map M is output p And threshold map M T
Figure BDA0003187288360000143
(3) Approximate binary image M A Is generated by: by calling the micro-binarizable formula (2-9)Probability map M p And threshold map M T Combining to generate approximate binary image M A
Figure BDA0003187288360000144
In the formula (2-9), +.>
Figure BDA0003187288360000145
The method is characterized in that the method is an approximate binarization characteristic Map (Approximate Binary Map), k is an amplification factor, the values are 50, i and j represent coordinate information, P is a Probability characteristic Map (Probability Map), and T is an adaptive Threshold Map (Threshold Map) learned from a DB text detection network.
2.5: calculating DB text detection network regression optimization loss
Inputting K to DB text detection network 0 Obtaining a probability map M of the completion of the processing of the step 2.4 through forward propagation p Threshold map M T And approximate binary image M A And calculating a loss value between the predicted text box and the real text box by using the loss function, and reversely adjusting network parameters of the DB text detection network according to the loss value, so as to iteratively optimize the network parameters and improve the prediction accuracy.
The calculation method of the DB text detection network regression optimization total loss value L is as shown in the formula (2-10):
L=L s +α×L b +β×L t (2-10)。
L S to calculate the shrinkage, the text instance probability map M p The loss value formula (2-11) adopted, L b To compute a binary image M after binarization, the contracted text instance approximates the binary image M A The loss value formula (2-11) adopted, L t Is to calculate a binarized threshold map M T The loss value formula (2-12) is adopted, and α=5, β=10.
Figure BDA0003187288360000151
L S And L b All adopt binary cross entropy loss functionMeanwhile, a difficult-case mining strategy is additionally adopted, namely, retraining is carried out aiming at a difficult negative sample in the model training process, so that the problem of imbalance of the positive and negative samples is solved. In the formula (2-11), S l Is a sampled data set, and the sampling proportion is positive and negative sample 1:3.y is i Is a true label, x i Is the predicted result.
Figure BDA0003187288360000152
In the formula (2-12), L t R is an L1 distance loss function d Is G d Pixel index of G d For the threshold map M generated in step 2.3 T The set of mid-text segmentation regions G, expanded by an offset D (2-13),
Figure BDA0003187288360000153
Is a label of the threshold map,>
Figure BDA0003187288360000154
is the predicted result of the threshold map.
Figure BDA0003187288360000155
In the formulas (2-13), D is the offset, a and L are the area and perimeter of the original divided region set G, respectively, and r is the shrinkage ratio, which is fixedly set to 0.4.
2.6: model parameters of the fixed DB text detection network are specifically: using the test set X divided in step 1.2 test And testing the accuracy of the DB text detection network model. Test set X test And inputting a DB text detection network model, and predicting through steps 1.3-2.5. From the obtained approximate binary image M A Comparing with the actual Label file Label, if all the examples are correctly predicted and the background-free part is predicted as the case of the examples, the image is considered to be correctly predicted, otherwise, the image is mispredicted. Defining the number of positive classes predicted as positive classes as v 1 Mispredict positive class as negative class as v 2 The ratio of the correctly predicted positive class to all the original positive classes in the dataset, namely the model recall (recovery), is calculated by the formula (2-14). Misprediction of negative classes to positive classes is v in number 3 The proportion, i.e., precision, of all positive classes that are classified into positive classes and are actually positive classes is tested by the formulas (2-15). In order to comprehensively evaluate two indexes of recall (recall) and precision (precision), an evaluation Score, namely Score (2-16), is set for judgment, wherein r is the recall (recall), and p is the precision. Finally selecting the DB text detection network model with the highest corresponding Score as the final fixed DB text detection network model, wherein the parameters of the corresponding fixed model are as follows
Figure BDA0003187288360000161
Figure BDA0003187288360000162
Figure BDA0003187288360000163
Step (3), specifically comprising:
3.1: the input image preprocessing is specifically as follows: before inputting CRNN text recognition network, text box data set X predicted by DB text detection network DB The image in the database is subjected to scale transformation to obtain a preprocessing data set X CRNN . The specific process is as follows: firstly scaling the image in equal proportion, ensuring that the height of the image is 32, the part with the width less than 320 is complemented with 0, and the sample with the aspect ratio larger than 10 is directly discarded, obtaining the image input size (320 multiplied by 32 multiplied by 3) (width pixel value multiplied by height pixel value multiplied by RGB three channels) which accords with the operation of a CNN module in a CRNN text recognition network, and taking the image input size as a certificate image data set X CRNN And (3) inputting the visual characteristic into a visual characteristic extraction module CNN in step 3.2.
The step 3.3BiLSTM module has a 1 requirement on the height of the input sequence, and the step 3.2CNN module downsamples the input image by a factor of 32, so the input image height of step 3.2 must be 32. At the same time, the aspect ratio of the input image size of the CRNN text recognition network is ensured to be a fixed value, so that the network model training process adopts a multiple 320 of 32 as a width value.
3.2: the method comprises the steps of constructing a visual feature extraction module CNN, specifically: inputting the certificate image data set X processed in the step 3.1 CRNN Image X therein iCRNN (i=1, 2.,. N×t), n being the number of images in the annotation data set X, each image being predicted by the DB network to obtain t text prediction boxes, so n×t, being sequentially taken as the feature image M 0 And inputting the module. Aiming at the characteristics that the special operation certificate image described in the step 1.1 has regularity, single type and easy realization of text recognition, the CRNN text recognition network adopts a MobileNet V3-small network as a model of a visual feature extraction module CNN, and reduces the size of the CRNN model and improves the detection speed under the condition of ensuring high accuracy of the model in extracting the image features. The network is used for extracting M 0 Corresponding convolution characteristics of (a) to obtain an extracted output characteristic image M 5 And inputting the text into a subsequent step 3.3BiLSTM module for text expression and text classification. Because the input images are changed into small frame images which are much smaller than the original input images after being processed by the DB text detection network, the balance of speed and detection precision can be better ensured by adopting the MobileNet V3-small network model aiming at the condition of low resources. The network structure of mobilenet v3-small is shown in table 4.
TABLE 4 network Structure Table of feature extraction network MobileNet V3-small
Network layer number Module name Inputting feature images Outputting a feature image
1 Conv module M 0 (320×320×3) M 1 (160×16×16)
2 Bneck_Mix5 module M 1 (160×16×16) M 2 (160×4×24)
3 Bneck_Mix6 module M 2 (160×4×24) M 3 (160×1×96)
4 Conv module M 3 (160×1×96) M 4 (160×1×576)
5 Pool module M 4 (160×1×576) M 5 (80×1×576)
The MobileNet V3-small network consists of Conv, bneck_Mix5, bneck_Mix6 and Pool modules. (1) Inputting the certificate image data set X processed in the step 3.1 into a MobileNet V3-small network CRNN Image M of (3) 0 Image M using Conv module 0 Performing convolution operation to obtain a feature map M 1 . (2) The Bneck_Mix5 module consists of three Bneck modules (3×3 depth convolution kernels) using ReLU6 activation functions, and the Bneck_Mix6 module consists of eight Bneck modules (5×5 depth convolution kernels) using H-swish activation functions, which are directed to the feature images M, respectively 1 、M 2 Performing depth separable convolution to obtain a characteristic image M 2 、M 3 . Wherein the Bneck module structure is the same as that described in step 2.2. (3) For characteristic diagram M after depth separable convolution operation 3 Performing convolution operation again to obtain a feature map M 4 Input it into Pool module, for M 4 Carrying out average pooling, namely dividing a characteristic image into 80 rectangular areas, averaging characteristic points of each area, thereby reducing the image to obtain M 5
3.3: the method comprises the steps of constructing a sequence feature extraction module BiLSTM, and specifically comprises the following steps: inputting the characteristic image M obtained by processing in the step 3.2 5 Step 3.3 adopts a variant of a recurrent neural network (Recurrent Neural Networks, RNN), a bidirectional long and short time memory network (BiLSTM) is used as a sequence feature extraction module, and is firstly converted into a feature vector sequence S 1 Then, continuously extracting text sequence features to obtain hidden vector probability distribution output S 2 . The network structure of BiLSTM is shown in Table 5.
Table 5 network structure table of sequence feature extraction module BiLSTM
Network layer number Module name Inputting feature images Outputting a feature image
1 Reshape module M 5 (80×1×576) S 1 (80×576)
2 BiLSTM module S 1 (80×576) S 2 (80×m)
The network is composed of Reshape, biLSTM modules. Because the RNN only receives the input of the specific feature vector sequence, the Reshape module is responsible for convoluting and extracting the feature map M of the CNN module in the step 3.2 5 Generating feature vector sequences S by column (left to right) 1 (80×576),S 1 Each column consists of 80 columns of feature vectors, each column containing 576-dimensional features, i.e., the i-th column feature vector is a connection of all 576 feature Map i-th column pixels, each column of the feature Map corresponding to a receptive field of the original image, thereby forming a Sequence of feature vectors, which step is called Map-to-Sequence. BiLSTM module for feature sequence S 1 Predicting, learning each feature vector in the sequence to obtain hidden vector probability distribution output S of all characters 2 Where m in table 5 represents the character set length that each column vector needs to identify.
3.4: the construction of a prediction module CTC is specifically as follows: inputting the hidden vector probability distribution output S of each feature vector obtained by the processing of the step 3.3 2 The CTC module is used as a prediction module of the CRNN text recognition network, and converts the input into a result character sequence l through a de-duplication integration operation. The network structure of the prediction module CTC is shown in table 6.
Table 6 network architecture table of prediction module CTC
Network layer number Module name Inputting feature images Outputting a feature image
1 FC+Softmax S 2 (80×m) l
The CTC module consists of FC (Fully Connected Layers), softmax operation and a sequence merging mechanism Blank, and outputs S of hidden vector probability distribution obtained by processing in the step 3.3 2 Input to FC layer, output S to the probability distribution 2 Mapping the character probability distribution into T character probability distributions, and then carrying out sequence merging mechanism processing on the character probability distribution, wherein the specific mode is that a blank symbol blank is added to a marked character set p to form a new marked character set p', so that the length of the character probability distribution accords with the fixed length required by Softmax operation; and selecting a label (character) corresponding to the maximum value by using a Softmax operation (3-1) to obtain character distribution output, and finally eliminating the blank symbol and the predicted repeated character by using a sequence conversion function beta (3-2) to obtain a result character sequence l by decoding.
Figure BDA0003187288360000191
In the formula (3-1), v ii Representing the ith element in the ith vector in the character probability distribution matrix v, (i)<J) (j is all elements greater than i), S ii Representing the ratio of the index of the element to the sum of the indices of all elements in the column vector. Formula (3-2)Wherein p' is a character set obtained by adding blank symbols to a labeling character set p, and T is a hidden vector probability distribution output S 2 After the length mapped by the FC layer is transformed by beta, a result character sequence p smaller than the sequence length T is output.
3.5: calculating a CRNN text recognition network regression optimization Loss CTC Loss, specifically: inputting the certificate image data set X processed in the step 3.1 into the CRNN text recognition network CRNN Image X of (B) iCRNN (i=1, 2,.. N×t), calculating a loss value between the predicted result l and the true value from the loss function by forward propagation, and reversely adjusting the posterior probability p (l/y) (3-4) of the CTC module output tag l in step 3.4 according to the loss value. The calculation method of the CRNN text recognition network regression optimization Loss CTC Loss is as follows:
L(S)=-ln∑ (I,l)∈S lnp(l/y) (3-3)。
in the formula (3-3), p (l/y) is defined by the formula (3-4), and s= { I, l } is a training set, I is an image input in the training set, and l is a real character sequence output.
CTC equation (3-4) is used for the probability distribution matrix S of the input BiLSTM module after the Map-to-Sequence operation in step 3.3 1 Here will S 1 Let y be taken as y, gives all possible output distributions and outputs the most likely resulting tag sequence l, aiming at maximizing the posterior probability p (l/y) of l.
p(l/y)=∑ π:β(π)=l p(π/y) (3-4)。
In the formula (3-4), y is a probability distribution matrix input, y=y 1 ,y 2 ,...,y T Where T is the length of the sequence, pi: β (pi) =l represents all paths pi after β transformation (3-2) followed by the final result tag sequence l, p (pi/y) is defined by equation (3-5).
Figure BDA0003187288360000192
In the formula (3-5),
Figure BDA0003187288360000193
indicating possession of the tag pi at time of time stamp t t The subscript t is used to denote each timing of the pi path.
3.6: the model parameters of the CRNN text recognition network are fixed, and the model parameters are specifically as follows: using the test set X divided in step 1.2 test And testing the character recognition accuracy of the CRNN text recognition network. X is to be test After the pretreatment in the step 1.3, the DB network model with fixed parameters is input to obtain a predicted text small-box data set
Figure BDA00031872883600002012
And 3.1-3.5, performing test recognition, comparing the obtained result Label sequence l with an actual Label file Label, and counting correct recognition only if the whole line of texts are correctly recognized, otherwise, counting errors.
Defining the correct number of texts identified by the model as l true The number of text in recognition error is l false Calculating the model character recognition accuracy L through the formula (3-6) accuracy . Finally selecting the corresponding L accuracy The highest CRNN training model is used as a final fixed CRNN text recognition network model, and the corresponding fixed parameters are as follows
Figure BDA0003187288360000201
Figure BDA0003187288360000202
Step (4), specifically comprising:
4.1: text detection and recognition are carried out on special operation certificates of constructors, and the method specifically comprises the following steps: the DB training model with fixed parameters in the step 2.6 is loaded and converted into a DB text detection network model, and the special operation certificate image set X of the constructor to be detected is input d Certificate image X in (a) id (i=1, 2,., n) obtaining t text predicted block images of the document via a DB text detection network with fixed weights
Figure BDA0003187288360000203
Figure BDA0003187288360000204
4 pieces of coordinate information of predictive text box +.>
Figure BDA0003187288360000205
Wherein the upper left corner of the predictive text box is included +.>
Figure BDA0003187288360000206
Left lower corner->
Figure BDA0003187288360000207
Upper right corner->
Figure BDA0003187288360000208
Lower right corner coordinates->
Figure BDA0003187288360000209
4 coordinate-scaled predictive text small block image set obtained by predicting through DB text detection network>
Figure BDA00031872883600002010
Inputting to the CRNN text recognition network with fixed parameters in step 3.6, outputting relevant text recognition information +.>
Figure BDA00031872883600002011
And its character recognition accuracy.
4.2: the judgment logic for detecting the special operation certificate of constructors is specifically as follows: and (3) judging whether the certificate is legal or not according to the text identification information obtained in the step (4.1) through the following logic, and finally obtaining a certificate detection result.
(1) If the certificate image recognizes four words of the valid period through text prediction recognition, judging that the certificate detects the relevant information of the valid period, entering the next step of judgment, if the relevant words are not recognized, judging that the certificate detection fails, and prompting that the certificate shooting is unqualified. (2) And if the valid period is recognized, selecting the predicted text box, extracting words from the relevant year, month and day numbers (such as 20100601 to 20200601) from the beginning valid date to the ending valid date of the certificate after the valid period, extracting eight digits (such as 20210601) through logic processing, and if the eight digits cannot be normally extracted, judging that the certificate cannot be successfully detected and prompting that the valid period of the certificate cannot be normally recognized. (3) If the valid period and the last eight digits of the corresponding text box are successfully identified, judging whether four words of the operation category are identified or not according to the identification text result of the certificate image, if the operation category is identified, judging the next step, if the operation category is not identified, judging that the certificate is not successfully detected, and prompting that the operation category of the certificate is not normally identified. (4) Recognizing the "job category", selecting the predicted text box, extracting the text of the specific category (electrician or high-level job) after the job category, entering the step (5) if the job category is "electrician" and entering the step (6) if the job category is "high-level job". (5) The corresponding certificate image is identified to obtain a corresponding text box last eight digits of the valid period, the corresponding text box last eight digits are compared with the current Beijing time, if the valid period is within the current Beijing time, the certificate is judged to have passed the valid period, the certificate is unqualified, the failure of detection is prompted, and the manual detection is changed; if the effective period is longer than the current Beijing time, the certificate is judged to be qualified, and the certificate is judged to be qualified by detection, so that the 'successful detection of the electrician operation category of the certificate for special operation' is prompted. (6) The corresponding certificate image is identified to obtain a corresponding text box last eight digits of the valid period, the corresponding text box last eight digits are compared with the current Beijing time, if the valid period is within the current Beijing time, the certificate is judged to have passed the valid period, the certificate is unqualified, the failure of detection is prompted, and the manual detection is changed; if the effective period is longer than the current Beijing time, judging that the certificate is qualified, and prompting that the certificate is successfully detected for the high-place operation category of the special operation certificate and the certificate is detected to be qualified through detection.
The method for labeling the image data set by adopting the semiautomatic labeling tool is efficient and accurate; aiming at the characteristics of special operation certificate images, the provided network combination model is small, convenient to deploy and high in detection speed; meanwhile, the custom special operation certificate judgment logic is adopted, so that the programming degree of the method is improved, the detection efficiency of the special operation certificate is effectively improved, and the labor cost is effectively reduced.
Example IV
And detecting the special operation certificate based on the DB and the CRNN, wherein the detection is specifically as follows.
1: the data preprocessing is specifically as follows: according to step 1.1, the certificate image data set obtained by the mobile Yunnan company in China is manually screened and then used as an original certificate image data set L, and the L is marked by a semi-automatic marking method, as shown in a table 7.
TABLE 7 example of labeling of certificate image data for specialty jobs
Figure BDA0003187288360000223
Watch (watch)
Figure BDA0003187288360000221
According to step 1.2, the labeling data set X is divided into training sets X train And test set X test The ratio was set to 8:2. And (3) performing image decoding, image normalization, rearrangement and image scaling on the marked data set X in sequence according to the step (1.3) to obtain the marked data set X.
2: constructing and training a DB text detection network; the overall structure of its DB text detection network is shown in fig. 4. According to the steps 2.1-2.4, firstly, performing scale transformation on the image in the marked data set X' to obtain an image (640 multiplied by 3); the Backbone, neck, head modules of the DB text detection network are then constructed in sequence. The input and output characteristic image sizes of each network layer are shown in table 8.
Table 8 DB text detection network layers i/o data flow table
Figure BDA0003187288360000222
Figure BDA0003187288360000231
In Table 8, a document image inputted as (640X 3) is predicted by a DB text detection network, and finally a prediction result probability map M is outputted P (640X 1), threshold map M T (640X 1), approximate binary image M A (640X 1). After preparing the training file and setting the training parameters, training the DB text detection network according to step 2.5.
(1) A training set train_images folder, a test set test_images folder, a training set matched annotation file train_label.txt, a test set matched annotation file test_label.txt and a training file train.py for training the DB text detection network are prepared. (2) Setting parameters such as epoch, batch size, learning rate and the like in the train. Py, modifying the configuration file, adding a corresponding pre-training weight file and a corresponding training data set, and running the train. Py. After the training file is prepared and the training parameters are set, training of the DB text detection network can be started.
First, X is train And loading the prepared training file into the training file train.py, and calculating L, L by forward propagation according to formulas (2-10) - (2-12) S 、L b 、L t And then, continuously optimizing the network training parameters until the loss function value of the DB text detection network is converged. Finally, X is calculated according to step 2.6 test Inputting into DB text detection network to obtain corresponding approximate binary image M A And corresponding coordinate position information, and comparing it with the test set X test And comparing corresponding image coordinate information in the annotation file, calculating model prediction recall rate (recovery), precision and evaluation Score (Score), and selecting a model with the optimal evaluation Score (Score) as a DB text detection network model of a final fixed parameter.
3: constructing and training a CRNN text recognition network; the overall structure of its CRNN text recognition network is shown in fig. 5. According to steps 3.1-3.4, firstly, a text small box data set X obtained through step 2DB text detection network prediction is processed DB Performing scale transformation on the image in (3) to obtain a (320×32×32) image; then sequentially constructing CNN and BiL of CRNN text recognition networkSTM, CTC module. The input and output characteristic image sizes of each network layer are shown in table 9.
Table 9. Input/output data flow table for each network layer in crnn text recognition network
Network layer number Module name Inputting feature images Outputting a feature image
1 Conv module M 0 (320×320×3) M 1 (160×16×16)
2 Bneck_Mix5 module M 1 (160×16×16) M 2 (160×4×24)
3 Bneck_Mix6 module M 2 (160×4×24) M 3 (160×1×96)
4 Conv module M 3 (160×1×96) M 4 (160×1×576)
5 Pool module M 4 (160×1×576) M 5 (80×1×576)
6 Reshape module M 5 (80×1×576) S 1 (80×576)
7 BiLSTM module S 1 (80×576) S 2 (80×m)
8 FC+Softmax S 2 (80×m) l
In table 9, the document image inputted as (320×32×3) is predicted by the CRNN text recognition network, and then the predicted result sequence l is finally outputted. Preparing a training file and setting training parameters, and training the model according to the step 3.5. (1) A training set train_images folder, a test set test_images folder, two txt files rec_train.txt and rec_test.txt of recording image text content tags, a training file train.py and a dictionary word_subject.txt are prepared for training the CRNN text recognition network. The dictionary is stored in utf-8 encoding format for mapping the characters appearing in the annotation dataset X to the index of the dictionary. (2) Setting parameters such as epoch, batch size, learning rate and the like in the train. Py, modifying the configuration file, adding a corresponding pre-training weight file and a corresponding training data set, and running the train. Py.
After the training file is prepared and the training parameters are set, training of the CRNN text recognition network can be started. First, X is train And loading the prepared training file into the training file train.py, and continuously optimizing the network training parameters after L (S) is calculated by formulas (3-3) - (3-5) through forward propagation until the loss function value of the CRNN text recognition network is converged. Finally, X is calculated according to step 3.6 test Inputting into CRNN text recognition network to obtain predicted result sequence of corresponding image text, and comparing it with test set X test Corresponding annotation text information in the annotation file is compared, and the model character recognition accuracy L is calculated accuracy Selecting an accuracy L accuracy The highest model serves as the CRNN text recognition network model for the final fixed parameters.
4: the special operation certificate of the constructor is detected, which comprises the following specific steps: according to step 4.1, a DB text detection network model with fixed parameters in step 2.6 and a CRNN text recognition network model with fixed parameters in step 3.6 are loaded, and a certificate image data set X to be detected is input first d The model predicts the approximate binary image M of the text target frame target according to the parameters A (640X 1) to obtain the corresponding text box
Figure BDA0003187288360000251
And its coordinate position information +.>
Figure BDA0003187288360000252
Then inputting the predicted text block image into CRNN text recognition network model, outputting the text information +.>
Figure BDA0003187288360000253
Randomly selecting a certificate image data set X to be detected d A document image X kd As a model output example, image X kd By passing throughThe information after the prediction and recognition of the DB and CRNN stationary model is shown in table 10.
Table 10 example image X kd Information table after DB and CRNN fixed model prediction and identification
Figure BDA0003187288360000254
Figure BDA0003187288360000261
According to step 4.2, according to the customized detection and judgment logic of the special operation certificate of the constructor, the obtained text information X kd-t And (3) extracting, and judging whether the special operation certificate of the constructor is qualified and effective or not based on the extracted text information.
4. Compared with the prior art, the invention has the advantages and positive effects
(1) Aiming at the characteristics of regular, orderly and easy marking in special operation certificate image data set provided by a 5G base station construction site of China mobile Yunnan company, the invention provides a high-efficiency semi-automatic certificate image data set marking method, wherein a PPOCRLabael tool is adopted for marking a text box in the certificate image data set and characters in a corresponding text box automatically, and then manual screening is adopted for further carrying out secondary manual modification marking on the text box and the text which are not predicted successfully and marked incorrectly, so that marking efficiency is improved and high accuracy of the marked data set is ensured.
(2) Aiming at the characteristics of regular image data, single type and easy completion of text detection and text recognition of special operation certificates, the invention adopts a MobileNet V3 network as a backbone network of a DB text detection network model and a CRNN text recognition network model for extracting image characteristics. The method has the advantages that the credentials can be accurately detected, the number of characteristic channels of two networks is reduced, the size of a corresponding model is reduced by 90%, and therefore the situation that the computing capacity is limited is adapted; meanwhile, the detection speed of the certificate image is improved, so that the efficiency of the certificate checking method is improved.
(3) The invention combines the given image data set of the special operation certificate of constructors and the prediction recognition result corresponding to the combined model, self-defines the judgment logic of the special operation certificate detection, realizes the unmanned 24-hour automatic detection program of the special operation certificate by a computer, and improves the programming degree of the method.
The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.

Claims (10)

1. A special operation certificate detection method based on DB and CRNN is characterized by comprising the following steps:
acquiring a special operation certificate image data set; the special operation certificate image data set comprises a plurality of target special operation certificate images, and each target special operation certificate image has text information;
inputting each target special operation certificate image into a DB text detection network model to determine a text box data set corresponding to each target special operation certificate image; the elements in the text box data set represent the position information of the target text box;
inputting each target special operation certificate image and a text box data set corresponding to each target special operation certificate image into a CRNN text recognition network model to determine text information in each target text box in each target special operation certificate image; the text information comprises at least one of constructor name, constructor sex, certificate number, operation category and certificate validity date;
the DB text detection network model is obtained by training based on a DB text detection network and a first training data set; the backup module in the DB text detection network adopts a MobileNet V3-large structure; each element in the first training data set comprises a historical special operation certificate image and a first class label corresponding to the historical special operation certificate image; the first category label is the position information of the history text box;
The CRNN text recognition network model is obtained based on the CRNN text recognition network and the second training data set; the partial structure of the CNN module in the CRNN text recognition network adopts a MobileNet V3-small structure; each element in the second training data set comprises a history special operation certificate image and a second class label corresponding to the history special operation certificate image; the second category labels are historical text information.
2. The special job certificate detection method based on DB and CRNN as set forth in claim 1, further comprising: and determining whether each special operation certificate meets the construction operation requirement or not based on the text information.
3. The special operation certificate detection method based on DB and CRNN according to claim 1, wherein the step of inputting each target special operation certificate image into a DB text detection network model to determine a text box data set corresponding to each target special operation certificate image specifically comprises the following steps:
preprocessing each target special operation certificate image; the pretreatment comprises the following steps: decoding, normalization, rearrangement, and image scaling;
And inputting each preprocessed target special operation certificate image into a DB text detection network model to determine a text box data set corresponding to each target special operation certificate image.
4. The special operation certificate detection method based on DB and CRNN according to claim 3, wherein the inputting each target special operation certificate image and the text box data set corresponding to each target special operation certificate image into the CRNN text recognition network model to determine the text information in each target text box in each target special operation certificate image specifically comprises:
and inputting each preprocessed target special operation certificate image and a preprocessed text box data set corresponding to each preprocessed target special operation certificate image into a CRNN text recognition network model to determine text information in each target text box in each target special operation certificate image.
5. The special job certificate detection method based on DB and CRNN as claimed in claim 1, wherein the determination process of the DB text detection network model is as follows:
Constructing a DB text detection network;
determining a first training data set;
and training the DB text detection network based on the first training data set to obtain a DB text detection network model.
6. The method for detecting a special job certificate based on DB and CRNN as set forth in claim 5, wherein the determining the first training data set specifically includes:
acquiring an original certificate image data set; the original document image data set comprises a plurality of original historical special operation document images;
marking each original historical special operation certificate image by adopting a semiautomatic marking tool to obtain each historical marking image and a first type label corresponding to each historical marking image;
preprocessing each history labeling image to obtain a history special operation certificate image; decoding, normalization, rearrangement, and image scaling; the first class label corresponding to the historical special operation certificate image is the first class label corresponding to the historical labeling image.
7. The special job certificate detection method based on DB and CRNN as claimed in claim 1, wherein the determination process of the CRNN text recognition network model is as follows:
Constructing a CRNN text recognition network;
determining a second training data set;
and training the CRNN text recognition network based on the second training data set to obtain a CRNN text recognition network model.
8. The method for detecting special job certificates based on DB and CRNN as set forth in claim 7, wherein the determining the second training data set specifically includes:
acquiring an original certificate image data set; the original document image data set comprises a plurality of original historical special operation document images;
marking each original historical special operation certificate image by adopting a semiautomatic marking tool to obtain each historical marking image and a second class label corresponding to each historical marking image;
preprocessing each history labeling image to obtain a history special operation certificate image; decoding, normalization, rearrangement, and image scaling; the second class labels corresponding to the historical special operation certificate images are the second class labels corresponding to the historical labeling images.
9. A special job certificate detection system based on DB and CRNN, comprising:
the data acquisition module is used for acquiring a special operation certificate image data set; the special operation certificate image data set comprises a plurality of target special operation certificate images, and each target special operation certificate image has text information;
The text box data set determining module is used for inputting each target special operation certificate image into the DB text detection network model so as to determine a text box data set corresponding to each target special operation certificate image; the elements in the text box data set represent the position information of the target text box;
the text information determining module is used for inputting each target special operation certificate image and a text box data set corresponding to each target special operation certificate image into the CRNN text recognition network model so as to determine text information in each target text box in each target special operation certificate image; the text information comprises at least one of constructor name, constructor sex, certificate number, operation category and certificate validity date;
the DB text detection network model is obtained by training based on a DB text detection network and a first training data set; the backup module in the DB text detection network adopts a MobileNet V3-large structure; each element in the first training data set comprises a historical special operation certificate image and a first class label corresponding to the historical special operation certificate image; the first category label is the position information of the history text box;
The CRNN text recognition network model is obtained based on the CRNN text recognition network and the second training data set; the partial structure of the CNN module in the CRNN text recognition network adopts a MobileNet V3-small structure; each element in the second training data set comprises a history special operation certificate image and a second class label corresponding to the history special operation certificate image; the second category labels are historical text information.
10. The special job certificate detection system based on DB and CRNN as set forth in claim 9, further comprising: and the detection module is used for determining whether each special operation certificate meets the construction operation requirement or not based on the text information.
CN202110865778.9A 2021-07-29 2021-07-29 Special operation certificate detection method and system based on DB and CRNN Active CN113591866B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110865778.9A CN113591866B (en) 2021-07-29 2021-07-29 Special operation certificate detection method and system based on DB and CRNN

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110865778.9A CN113591866B (en) 2021-07-29 2021-07-29 Special operation certificate detection method and system based on DB and CRNN

Publications (2)

Publication Number Publication Date
CN113591866A CN113591866A (en) 2021-11-02
CN113591866B true CN113591866B (en) 2023-07-07

Family

ID=78252001

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110865778.9A Active CN113591866B (en) 2021-07-29 2021-07-29 Special operation certificate detection method and system based on DB and CRNN

Country Status (1)

Country Link
CN (1) CN113591866B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114266751B (en) * 2021-12-23 2024-09-24 福州大学 Product packaging bag coding defect detection method and system based on AI technology
CN115131797B (en) * 2022-06-28 2023-06-09 北京邮电大学 Scene text detection method based on feature enhancement pyramid network
CN116935396B (en) * 2023-06-16 2024-02-23 北京化工大学 OCR college entrance guide intelligent acquisition method based on CRNN algorithm
CN116532046B (en) * 2023-07-05 2023-10-10 南京邮电大学 Microfluidic automatic feeding device and method for spirofluorene xanthene
CN116958998B (en) * 2023-09-20 2023-12-26 四川泓宝润业工程技术有限公司 Digital instrument reading identification method based on deep learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020111676A1 (en) * 2018-11-28 2020-06-04 삼성전자 주식회사 Voice recognition device and method
CN111401371A (en) * 2020-06-03 2020-07-10 中邮消费金融有限公司 Text detection and identification method and system and computer equipment
WO2020218512A1 (en) * 2019-04-26 2020-10-29 Arithmer株式会社 Learning model generating device, character recognition device, learning model generating method, character recognition method, and program
CN113076992A (en) * 2021-03-31 2021-07-06 武汉理工大学 Household garbage detection method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10672414B2 (en) * 2018-04-13 2020-06-02 Microsoft Technology Licensing, Llc Systems, methods, and computer-readable media for improved real-time audio processing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020111676A1 (en) * 2018-11-28 2020-06-04 삼성전자 주식회사 Voice recognition device and method
WO2020218512A1 (en) * 2019-04-26 2020-10-29 Arithmer株式会社 Learning model generating device, character recognition device, learning model generating method, character recognition method, and program
CN111401371A (en) * 2020-06-03 2020-07-10 中邮消费金融有限公司 Text detection and identification method and system and computer equipment
CN113076992A (en) * 2021-03-31 2021-07-06 武汉理工大学 Household garbage detection method and device

Also Published As

Publication number Publication date
CN113591866A (en) 2021-11-02

Similar Documents

Publication Publication Date Title
CN113591866B (en) Special operation certificate detection method and system based on DB and CRNN
CN109902622B (en) Character detection and identification method for boarding check information verification
CN111046784B (en) Document layout analysis and identification method and device, electronic equipment and storage medium
CN111428718B (en) Natural scene text recognition method based on image enhancement
US20190180154A1 (en) Text recognition using artificial intelligence
CN111950453A (en) Optional-shape text recognition method based on selective attention mechanism
RU2760471C1 (en) Methods and systems for identifying fields in a document
CN109598517B (en) Commodity clearance processing, object processing and category prediction method and device thereof
CN110705607A (en) Industry multi-label noise reduction method based on cyclic re-labeling self-service method
CN112529210A (en) Model training method, device and computer readable storage medium
CN112365451A (en) Method, device and equipment for determining image quality grade and computer readable medium
CN111539417B (en) Text recognition training optimization method based on deep neural network
CN111340032A (en) Character recognition method based on application scene in financial field
CN117496124A (en) Large-area photovoltaic panel detection and extraction method based on deep convolutional neural network
CN114297987A (en) Document information extraction method and system based on text classification and reading understanding
CN113989577B (en) Image classification method and device
CN116612479A (en) Lightweight bill OCR (optical character recognition) method and system
CN118351380A (en) Weather detection method based on multi-mode data fusion framework
CN111553361B (en) Pathological section label identification method
CN113159071A (en) Cross-modal image-text association anomaly detection method
CN110929013A (en) Image question-answer implementation method based on bottom-up entry and positioning information fusion
CN112613402B (en) Text region detection method, device, computer equipment and storage medium
CN115512340A (en) Intention detection method and device based on picture
CN114419460A (en) Water body area extraction method and device
CN113033170A (en) Table standardization processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
OL01 Intention to license declared
OL01 Intention to license declared