CN113591866A

CN113591866A - Special job certificate detection method and system based on DB and CRNN

Info

Publication number: CN113591866A
Application number: CN202110865778.9A
Authority: CN
Inventors: 彭光灵; 岳昆; 刘伯涛; 李忠斌; 杨晰; 魏立力; 段亮
Original assignee: Yunnan University YNU
Current assignee: Yunnan University YNU
Priority date: 2021-07-29
Filing date: 2021-07-29
Publication date: 2021-11-02
Anticipated expiration: 2041-07-29
Also published as: CN113591866B

Abstract

The invention discloses a special job certificate detection method and system based on DB and CRNN, wherein the method comprises the following steps: inputting each target special operation certificate image into a DB text detection network model to determine a text box data set corresponding to each target special operation certificate image; and inputting each target special job certificate image and a text box data set corresponding to each target special job certificate image into a CRNN text recognition network model so as to determine text information in each target text box in each target special job certificate image. A backhaul module in the DB text detection network adopts a MobileNet V3-large structure; the CRNN text identifies that the partial structure of the CNN module in the network adopts a MobileNet V3-small structure. The invention can achieve the purposes of reducing the manual workload and improving the certificate image detection efficiency.

Description

Special job certificate detection method and system based on DB and CRNN

Technical Field

The invention relates to the technical field of optical character recognition, in particular to a special job certificate detection method and system based on DB and CRNN.

Background

In the construction process of the 5G base station, constructors have qualified and effective special operation certificates, and the method is an indispensable safety guarantee in the construction process. At present, the detection of the special operation certificate is mostly carried out in a manual mode, the detection efficiency is low, and the feedback of the detection of the special operation certificate cannot be timely and effectively obtained.

Disclosure of Invention

The invention aims to provide a special job certificate detection method and system based on DB and CRNN, so as to achieve the purposes of reducing the manual workload and improving the certificate image detection efficiency.

In order to achieve the purpose, the invention provides the following scheme:

a special job certificate detection method based on DB and CRNN comprises the following steps:

acquiring a special job certificate image data set; the special operation certificate image data set comprises a plurality of target special operation certificate images, and each target special operation certificate image has text information; inputting each target special job certificate image into a DB text detection network model to determine a text box data set corresponding to each target special job certificate image; elements in the text box data set represent position information of a target text box; inputting each target special job certificate image and a text box data set corresponding to each target special job certificate image into a CRNN text recognition network model to determine text information in each target text box in each target special job certificate image; the text information comprises at least one of constructor name, constructor gender, certificate number, operation category and certificate valid date;

the DB text detection network model is obtained by training based on a DB text detection network and a first training data set; the backhaul module in the DB text detection network adopts a MobileNet V3-large structure; each element in the first training data set comprises a historical special job certificate image and a first class label corresponding to the historical special job certificate image; the first category label is position information of a historical text box; the CRNN text recognition network model is obtained by training based on a CRNN text recognition network and a second training data set; the partial structure of the CNN module in the CRNN text recognition network adopts a MobileNet V3-small structure; each element in the second training data set comprises a historical special job certificate image and a second class label corresponding to the historical special job certificate image; the second category label is historical text information.

A special job certificate detection system based on DB and CRNN comprises:

the data acquisition module is used for acquiring a special job certificate image data set; the special operation certificate image data set comprises a plurality of target special operation certificate images, and each target special operation certificate image has text information; the text box data set determining module is used for inputting each target special job certificate image into a DB text detection network model so as to determine a text box data set corresponding to each target special job certificate image; elements in the text box data set represent position information of a target text box; the text information determining module is used for inputting each target special job certificate image and a text box data set corresponding to each target special job certificate image into a CRNN text recognition network model so as to determine text information in each target text box in each target special job certificate image; the text information comprises at least one of constructor name, constructor gender, certificate number, operation category and certificate valid date;

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

the invention adopts the DB text detection network model and the CRNN text recognition network model, and can quickly and accurately complete the detection of the special job certificate image. The DB text detection network model can be well adapted to a lightweight network as a feature extraction module, quickly predicts a corresponding text in a special work certificate and marks a text region by adopting a frame under the condition that extra memory and time are not consumed after the model is lightened, and extracts the text region from an image to obtain frame information of a text target. The CRNN text recognition network model performs text recognition on the predicted text block image, aiming at the condition that the image data of the special job certificate is all short text, the CRNN text recognition network model can introduce a BilSTM and CTC mechanism, strengthens global prediction on a text characteristic sequence and directly learns in the short text (line-level labeling), and does not need to be used for learning and training extra detailed character-level labeling, thereby improving the accuracy and efficiency of text recognition.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a schematic flow chart of a special job certificate detection method based on DB and CRNN according to the present invention;

FIG. 2 is a schematic structural diagram of a DB and CRNN-based special job certificate detection system according to the present invention;

FIG. 3 is an overall flowchart of the method for detecting the special job certificate based on DB and CRNN according to the present invention;

FIG. 4 is a schematic diagram of the overall structure of a DB text detection network according to the present invention;

FIG. 5 is a schematic diagram of the overall structure of the CRNN text recognition network of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

The invention uses the optical character recognition technology in the deep learning model to efficiently detect the special job certificate image. The existing optical character recognition method based on deep learning mainly adopts a two-stage mode: text detection and text recognition. The DB (differential binary) algorithm used by the invention is characterized in that Binarization operation is put into a network and optimized simultaneously from the pixel level, so that the threshold value of each pixel point can be self-adaptively predicted, and the DB algorithm is realized by an approximate method and finishes Binarization and microminiaturization when being used together with a segmentation network, thereby simplifying the post-processing process and accelerating the detection speed of a target. The CRNN (common Current Neural Network) algorithm used by the invention adopts a combination method of CNN, LSTM (Long Short Term Memory) and CTC (connected termination Temporal classification), introduces the CTC method to solve the problem that characters can not be aligned during training, and does not need to carry out serial decoding operation like Attention OCR, thereby enabling the Network structure to be more optimized.

Example one

The embodiment discloses a special job certificate detection method based on DB and CRNN, which predicts the text position from certificate image data in a centralized manner and identifies the specific information of the text to support the detection of the special job certificate so as to judge the qualification of the special job certificate, and belongs to the field of computer vision identification, in particular to the field of optical character identification. Referring to fig. 1, the method for detecting a document for a special job based on DB and CRNN includes the following steps.

Step 101: acquiring a special job certificate image data set; the special operation certificate image data set comprises a plurality of target special operation certificate images, and each target special operation certificate image has text information. Step 102: inputting each target special job certificate image into a DB text detection network model to determine a text box data set corresponding to each target special job certificate image; elements in the text box data set represent position information of a target text box. Step 103: inputting each target special job certificate image and a text box data set corresponding to each target special job certificate image into a CRNN text recognition network model to determine text information in each target text box in each target special job certificate image; the text information comprises at least one of constructor name, constructor gender, certificate number, operation category and certificate valid date. Step 103: inputting each target special job certificate image and a text box data set corresponding to each target special job certificate image into a CRNN text recognition network model to determine text information in each target text box in each target special job certificate image; the text information comprises at least one of constructor name, constructor gender, certificate number, operation category and certificate valid date. Step 104: and determining whether each special operation certificate meets the construction operation requirement or not based on the text information.

Step 102 specifically comprises: and preprocessing each target special job certificate image. The preprocessing process is the same as the preprocessing process in the third embodiment, and thus, redundant description is not repeated here. And inputting each preprocessed target special job certificate image into a DB text detection network model to determine a text box data set corresponding to each target special job certificate image.

Step 103 specifically comprises: and preprocessing each target special job certificate image. The preprocessing process is the same as the preprocessing process in the third embodiment, and thus, redundant description is not repeated here. Inputting each preprocessed target special job certificate image and a text box data set corresponding to each preprocessed target special job certificate image into a CRNN text recognition network model so as to determine text information in each target text box in each target special job certificate image.

The DB text detection network model is obtained by training based on a DB text detection network and a first training data set; the backhaul module in the DB text detection network adopts a MobileNet V3-large structure; each element in the first training data set comprises a historical special job certificate image and a first class label corresponding to the historical special job certificate image; the first category label is position information of the historical text box. The CRNN text recognition network model is obtained by training based on a CRNN text recognition network and a second training data set; the partial structure of the CNN module in the CRNN text recognition network adopts a MobileNet V3-small structure; each element in the second training data set comprises a historical special job certificate image and a second class label corresponding to the historical special job certificate image; the second category label is historical text information. Please refer to the third embodiment for the training process of the DB text detection network model and the CRNN text recognition network model, which is not described herein in detail.

Example two

Referring to fig. 2, the special job certificate detection system provided in this embodiment includes:

the data acquisition module 201 is used for acquiring a special job certificate image data set; the special operation certificate image data set comprises a plurality of target special operation certificate images, and each target special operation certificate image has text information. A textbox data set determining module 202, configured to input each target special job certificate image into a DB text detection network model, so as to determine a textbox data set corresponding to each target special job certificate image; elements in the text box data set represent position information of a target text box. The text information determining module 203 is configured to input each target special job certificate image and a text box data set corresponding to each target special job certificate image into a CRNN text recognition network model, so as to determine text information in each target text box in each target special job certificate image; the text information comprises at least one of constructor name, constructor gender, certificate number, operation category and certificate valid date. And the detection module 204 is configured to determine whether each special job certificate meets the requirement of the construction job based on the text information.

The text box data set determining module 202 specifically includes: and preprocessing each target special job certificate image. The preprocessing process is the same as the preprocessing process in the third embodiment, and thus, redundant description is not repeated here. And inputting each preprocessed target special job certificate image into a DB text detection network model to determine a text box data set corresponding to each target special job certificate image.

The text information determining module 203 specifically includes: and preprocessing each target special job certificate image. The preprocessing process is the same as the preprocessing process in the third embodiment, and thus, redundant description is not repeated here. Inputting each preprocessed target special job certificate image and a text box data set corresponding to each preprocessed target special job certificate image into a CRNN text recognition network model so as to determine text information in each target text box in each target special job certificate image.

The details of the DB text detection network model and the CRNN text recognition network model are as shown in embodiment one. Please refer to the third embodiment for the training process of the DB text detection network model and the CRNN text recognition network model, which is not described herein in detail.

EXAMPLE III

In the 5G base station construction process, a certificate image data set consisting of special operation certificate images has the characteristics of large text aspect ratio, standard order and large data volume. Although an accurate text object can be obtained for a regular text by the regression-based text detection method, a text box with a preset shape cannot well describe some texts with special shapes (such as an excessively large aspect ratio or an arc); although the target detection method based on the segmented PSENet and LSAE can detect irregular-shaped texts, complex post-processing is required to form pixel-level results to form text lines, and the prediction cost is high, so that the requirement of rapidly completing detection under a large amount of data cannot be met.

The invention adopts the DB text detection network model, solves the problems of complex post-processing process and serious time consumption based on a segmentation method by miniaturizing the binarization operation, improves the detection speed and enables the method to quickly detect a large amount of certificate image data. Meanwhile, in the selection of the text recognition method, the Attention OCR has the problems of high requirements on training samples, extra calculation parameters in a transcription layer and low detection speed, so that the CRNN text recognition network model which has higher detection speed, higher short text segment recognition precision and does not need extra calculation parameters is selected and adopted aiming at the condition that the images of the special job certificates are all short texts, so that the text recognition and detection efficiency is ensured.

Therefore, the method is based on the DB text detection network model and the CRNN text recognition network model, firstly, the low-quality special work certificate image is manually screened and removed, then, the high-standard image data set and the training data set are obtained in a semi-automatic labeling mode, the labeling of the special work certificate image can be rapidly completed based on the semi-automatic labeling mode, and the high accuracy of the labeling result is ensured. And then, detecting the network training data set through the DB text to predict and frame the text position in the special job certificate image, and then recognizing the network training data set by adopting the CRNN text to recognize the text information in the calibration position. Aiming at the characteristics of regularity, single type and easiness in detection of the special job certificate image, the MobileNet V3 network is selected as a feature extraction module of two DB text detection network models and a CRNN text recognition network model, so that the model is light in weight and the detection speed is increased while the detection accuracy is ensured. In the special operation certificate image detection stage, the certificate image text information of the identification is predicted based on the DB text detection network model and the CRNN text recognition network model, and the certificate category and the valid period in the special operation certificate image are further deduced through self-defined judgment logic, so that the safety detection of the special operation certificate is rapidly completed.

Referring to fig. 3, the method for detecting a document for a special job based on DB and CRNN according to the embodiment of the present invention includes 4 steps.

Step (1): a credential image data set is generated. The method specifically comprises the following steps: acquiring a special operation certificate image of a constructor from a 5G base station construction site; each special operation certificate image comprises the name, the gender, the certificate number, the operation category, the certificate valid date and the like of a constructor; secondly, rapidly labeling each special job certificate image by using a semi-automatic labeling tool to obtain a labeled data set, and then performing data preprocessing on the labeled data set; and finally, dividing the preprocessed labeled data set (namely the generated certificate image data set) into a training set and a test set. Step (2): and (5) building and training a DB text detection network. The method specifically comprises the following steps: and sequentially constructing a backhaul module, a Neck module and a Head module of the DB text detection network. The method comprises the following steps that a Backbone module adopts a MobileNet V3-large structure, the Backbone module is used as a characteristic pyramid, and the characteristics of an input image are extracted to obtain a characteristic image; the neutral module adopts an FPN (feature Pyramid networks) structure, and further processes the obtained characteristic image; and the Head module performs output processing on the processed characteristic image, predicts a probability map and a threshold map, and obtains an approximate binary map based on the probability map and the threshold map. And (3) after preparing files required by training and setting parameters required by training, training the DB text detection network based on the training set in the step (1). And (3): construction and training of the CRNN text recognition network. The method specifically comprises the following steps: and sequentially constructing a CNN module, a BilSTM (Bi-directional Long Short-Term Memory) module and a CTC network structure of the CRNN text recognition network. The CNN module part adopts a MobileNet V3-small structure and is used for extracting the characteristics of the text image; the BilSTM module uses the extracted feature image for feature vector fusion, and further extracts context features of the character sequence to obtain probability distribution of each row of features; the CTC network structure inputs the hidden vector probability distribution, thereby predicting to obtain a text sequence. And (3) after preparing files required by training and setting parameters required by the training, training the CRNN text recognition network based on the training set in the step (1). And (4): detect constructor's special type operation certificate specifically is: and predicting the certificate image to be detected through the DB text detection network model to obtain a target text box and coordinate position information, then identifying the text in the target text box through the CRNN text identification network model, and finally realizing the detection of the special operation certificate through a self-defined special operation certificate judgment logic.

The step (1) specifically comprises the following steps:

1.1: image labeling; the method specifically comprises the following steps: the image data set of the special operation certificate obtained by sampling in the construction site of the 5G base station has low-quality samples with the phenomena of character occlusion, excessive exposure, blurring, unclear characters, too small certificate proportion image, multiple certificates in one image and incapability of definite classification and identification, and the like, and the low-quality samples need to be manually screened and eliminated, so that the original certificate image data set L ═ L₁,l₂,...,l_n}。

The original document image data set L has the characteristics of: (1) compared with the text positions in other image data (such as shop signboards, street signposts, clothes hanging tags and the like), the image data of the special operation certificates has the characteristics of regularity, order and easy labeling. (2) Text information such as name, gender, certificate number, operation type, initial date, expiration date and the like is fixedly stored in the certificate image of the special operation. (3) The characters of the certificate are clearly visible and have no obstruction, and the proportion of the certificate meets the required proportion (such as 80%) of the image. (4) The certificate image has single type, and text detection and text recognition are easy to realize.

Aiming at the characteristics, the embodiment of the invention finds that the semi-automatic labeling method can obviously improve the labeling efficiency for the certificate images with regular and ordered and easy-to-label characteristics by comparing the manual labeling method with the semi-automatic labeling method. Therefore, a semi-automatic labeling method is adopted for the process of labeling the original certificate image data set L. The semi-automatic labeling process comprises the following steps of 1.1.1-1.1.2.

1.1.1: according to the original certificate image data set L, automatic labeling of a semi-automatic labeling process is carried out on the original certificate image data set L by using PPOCCRLabel (Paddle Paddle OCRLLabel). PPOCRLANel uses a built-in OCR model (including a text detection model and a text recognition model) to predict texts in an image of an original certificate image data set L, frames corresponding texts, and further recognizes texts in the frames to obtain an automatically-labeled data set L ', L ' ═ L '₁,L'₂,...,L'_n}，L'_i＝{L'_i1,L'_i2,...,L'_it}. Each annotation image L'_iT text prediction frames exist in the certificate image, and the number of t is always a fixed value L 'because the certificate image is standard image data'_ii＝{L'_ii1,L'_ii2,L'_ii3,L'_ii4,L'_ii-tThere are 5 data values, L ', corresponding to each auto-annotated image'_ii1、L'_ii2、L'_ii3、L'_ii4Respectively represent automatic annotation images L'_iPredict resulting text box L'_iiThe coordinates of the upper left corner, the coordinates of the lower left corner, the coordinates of the upper right corner, the coordinates of the lower right corner, L'_ii-tRepresents an automatic annotation image L'_iPredict resulting text box L'_iiThe text content in (1).

1.1.2: the second step of the semi-automatic labeling process, i.e., manual screening and validation, is performed on the data set L'. If the situation that the text box and the text box coordinate value are not predicted to be wrong occurs, the coordinate value is corrected manually; if the text in the text box is identified incorrectly, the text content in the text box is corrected manually, so as to obtain a labeled data set X, wherein X is { X ═ X }₁，,X₂,...,X_n}. Labeling each document image X in a dataset_iThere are t text prediction boxes, X_i＝{X_i1,X_i2,...,X_itCorresponding to each text prediction box X_iiOf which there are 5 data values, X_ii＝{X_ii1,X_ii2,X_ii3,X_ii4,X_ii-t}，X_ii1、X_ii2、X_ii3、X_ii4Respectively an upper left corner coordinate, a lower left corner coordinate, an upper right corner coordinate, a lower right corner coordinate and X of the text prediction box_ii-tText content within the box is predicted for the text. And obtaining a labeling data set X and a corresponding labeling result annotation file Label after the labeling is finished, wherein the labeling data set X and the corresponding labeling result annotation file Label can be used for training a DB text detection network and a CRNN text recognition network, wherein X is_ii1、X_ii2、X_ii3、X_ii4Label in training DB text detection network, X_ii-tThe CRNN text is trained to recognize labels in the web.

1.2: dividing a data set, specifically: dividing the labeled data set X in the step 1.1 into a training set X_trainAnd test set X_testTwo parts, training set X_trainThe method is used for training a DB text detection network and a CRNN text recognition network, and accounts for 80 percent; test set X_testThe test method is used for testing the trained DB text detection network and the trained CRNN text recognition network, and accounts for 20 percent.

1.3: the data preprocessing specifically comprises the following steps:

1.3.1: decoding the marked data set X, specifically: inputting an annotated data set X, and sequentially comparing original images X in the annotated data set X_iConverts the data into a fluid 8 type matrix, and then decodes the image from JPEG format into a three-dimensional matrix, the color format of the decoded image is BGR(Blue × Green × Red), the matrix dimensions are arranged in the HWC (Height × Weight × Channel) order, and a pixel matrix data set X of the image is obtained_m＝{X_1m,X_2m,...,X_nm}。

1.3.2: for pixel matrix data set X_mAnd (3) carrying out normalization, specifically: input pixel matrix dataset X_mA pixel matrix data set X_mImage X of (1)_imEach pixel in (i ═ 1, 2.. times, n) maps to interval [0,1 ═ 1]In the mapping process, the pixel is first divided by 255, where 255 is the linear transformation parameter (the linear transformation parameter is used to transform the pixel value from the interval [0,255 ]]Conversion to the interval [0,1]) Subtracting the average value of the corresponding channel, and finally dividing by the standard deviation of the corresponding channel to obtain a normalized result data set X'_m。

1.3.3: to normalized result data set X'_mRearranging, specifically: inputting a normalized result data set X'_mFor normalized result data set X'_mPicture X 'of'_imRearranging pixels, converting the image matrix dimension from HWC format (Height × Width × Channel) to CHW format (Channel × Height × Width), and obtaining new certificate image data set X "_m。

1.3.4: for certificate image data set X "_mPerforming image scaling, specifically: inputting a certificate image data set X "_mWhen document image data set X "_mImage X of (1) "_imAnd if the length or the width exceeds the specified maximum size or is smaller than the specified minimum size, scaling the image size, wherein the scaling process is to scale the length exceeding the limited side length to be integral multiple of 32 within the limited side length range, and filling the blank area with 0, thereby obtaining the preprocessed certificate image data set X'.

The step (2) specifically comprises the following steps:

2.1: the input image preprocessing specifically comprises the following steps: before inputting into DB text detection network, the certificate image data set X' obtained through preprocessing in step 1.3 is subjected to scale transformation. The specific process is as follows: adjusting the image in the certificate image dataset X' to conform to the Backbone module operation in DB text detection networkInputting a size (640 multiplied by 3) (width pixel value multiplied by height pixel value multiplied by RGB (Red multiplied by Green multiplied by Blue)) and obtaining a processed data set X 'after being subjected to scale conversion and adjustment'_DBAnd the feature extraction module is used for inputting the step 2.2. If the scaling is not performed, the input size of the image is different from the preset aspect ratio (640 × 640 × 3), and then a pixel difference is continuously generated after the upsampling operation in the step 2.3 of the feature enhancement module FPN, which may result in that the merging operation between the images in the step 2.3 cannot be performed.

2.2: constructing a feature extraction module Backbone, which specifically comprises the following steps: inputting a certificate image data set X 'obtained after the processing of the step 2.1 is completed'_DBAiming at the characteristics that the special job certificate image described in the step 1.1 has regularity and single type and is easy to realize text detection, the MobileNet 3-large network is adopted as a feature extraction backhaul module of the DB text detection network in the step, so that the size of the model is reduced and the detection speed is increased under the condition of ensuring high accuracy of the model in image feature extraction. The MobileNet V3-large network pairs certificate image dataset X'_DBPicture X 'of'_iDB(i 1, 2.. times.n) to extract feature information, thereby outputting four feature images K₂～K₅. The network structure of MobileNet V3-large is shown in Table 1.

TABLE 1 network Structure Table for extracting MobileNet V3-large network

In the MobileNet V3-large network, the network consists of Conv, Bneck _ Mix1, Bneck _ Mix2, Bneck _ Mix3, Bneck _ Mix4 and Pool modules. (1) Conv module pairs the preprocessed feature images K₀Performing convolution operation to obtain a characteristic image K₁Characteristic image K₀Is the image preprocessed in step 1.3 and step 2.1, and adoptsThe H-swish approximate activation function (2-1) replaces a swish formula to be used as an activation function for activation, so that the calculation cost is reduced, and the calculation speed is increased. (2) The Bneck module consists of a 1 multiplied by 1 convolution kernel, a 3 multiplied by 3 depth convolution or a 5 multiplied by 05 depth convolution kernel (when the Bneck module is in 3 multiplied by 13 depth separable convolution, the 3 multiplied by 23 depth convolution kernel is adopted; when the Bneck module is in 5 multiplied by 35 depth separable convolution, the 5 multiplied by 45 depth convolution kernel is adopted), and a 1 multiplied by 51 point-by-point convolution kernel; firstly, a 1 × 61 convolution kernel is used for carrying out dimensionality improvement on a feature map, a 3 × 73 depth convolution kernel or a 5 × 85 depth convolution kernel is used for carrying out convolution operation in a higher-dimensional space to extract features, then, the 1 × 1 point-by-point convolution kernel is used for carrying out dimensionality reduction on the feature map to combine into a depth separable convolution, so that the parameter number and the operation amount of multiplication and addition operation are reduced to one ninth of the operation amount originally using common convolution, meanwhile, a lightweight attention-oriented system model (SE) is introduced, the SE model automatically acquires the importance degree of each feature channel through learning, then, useful features are improved according to the result, the features which are not used for the current task are inhibited, and the weight of each channel is adjusted. The Bneck _ Mix1 module consists of three Bneck modules (3 × 3 depth convolution kernels) using the ReLU6 activation function (2-2), where the 3 × 3 depth convolution kernels indicate that in the composed Bneck modules, the size of the depth convolution kernels used is 3 × 3, and the subsequent 5 × 5 depth convolution kernels and step 3.2 are the same concept. The Bneck _ Mix2 module consists of three Bneck modules (5 × 5 deep convolution kernels) using the ReLU6 activation function. The Bneck _ Mix3 module consists of six Bneck modules (3 x3 deep convolution kernels) that employ the H-swish activation function. The Bneck _ Mix4 module consists of three Bneck modules (5 x5 deep convolution kernels) that use the H-swish activation function. These modules respectively align the feature images K₁、K₂、K₃、K₄Performing several layers of depth separable convolution to obtain characteristic image K₂、K₃、K₄、K₅. (3) Feature map K by Conv Module pair₅Performing convolution operation again to obtain a characteristic image K₆. Pool module adopts average pooling technology to pair feature map K₆And performing down-sampling. Feature pooling across Pool modulesThen, extracting features through 1 × 1 convolution, finally dividing the features into K types of output channels, and extracting a feature map K of the input image₉. According to the MobileNet V3-large network structure constructed in the table 1, a characteristic diagram K obtained by calculating the second layer, the third layer, the fourth layer and the fifth layer of the network in sequence₂～K₅In turn as input to the step 2.3 feature enhancement module hack.

2.3: constructing a feature enhancement module Neck, which specifically comprises the following steps: the output K obtained in step 2.2 will be used₂～K₅As input C of this step₂～C₅The FPN structural part is a feature enhancement Neck module of a DB text detection network, and the input C is input through operations such as convolution and upsampling₂～C₅Converting into uniform size to obtain P of the same size₂～P₅Finally, P is added₂～P₅And combining to generate a characteristic image F. The FPN structures constructed are shown in Table 2.

Table 2 network structure table of feature enhancement module FPN

Number of network layers	Module name	Inputting characteristic images	Outputting a feature image
				1	Conv1 module	C₅(20×20×160)	IN₅(20×20×96)
2	Conv1 module	C₄(40×40×112)	IN₄(40×40×96)
				3	Conv1 module	C₃(80×80×40)	IN₃(80×80×96)
4	Conv1 module	C₂(160×160×24)	IN₂(160×160×96)
				5	Conv2 module	IN₅(20×20×96)	P₅(160×160×24)
6	Conv2 module	IN₄(40×40×96)	P₄(160×160×24)
				7	Conv2 module	IN₃(80×80×96)	P₃(160×160×24)
8	Conv2 module	IN₂(160×160×96)	P₂(160×160×24)

The FPN network structure is composed of Conv1 modules and Conv2 modules. (1) The Conv1 module consists of a 1 × 1 convolution, the 1 × 1 convolution being used to convolve the input feature image C₂～C₅Carrying out channel number reduction operation; wherein for IN operated by reducing channel number₂～IN₅To IN₅Performing a double nearest neighbor upsampling operation, IN₄IN after upsampling operation with double nearest neighbor₅Adding to obtain new IN₄Then new IN is added₄Performing a double nearest neighbor upsampling operation, IN₃IN after upsampling operation with double nearest neighbor₄Adding to obtain new IN₃，IN₂Taking similar steps as above with IN₃Adding to obtain new IN₂. (2) Conv2 Module consisting of a 3 × 3 convolution, for the resulting IN₂～IN₅Performing convolution characteristic fusion smoothing processing to reduce aliasing influence caused by nearest neighbor interpolation; then fusing the convolution characteristic to obtain a characteristic image P₃、P₄、P₅Respectively carrying out 2, 4 and 8 times of upsampling operation, and finally, processing the processed characteristic image P₂～P₅And adding point by point to obtain a final characteristic image F of the network. This layer network structure is for image C₂～C₅And (4) performing feature extraction, upsampling and merging operation, so that the low-level high-resolution information and the high-level strong-semantic information are combined to obtain a feature image F with enhanced features, and inputting the feature image F into the output module Head in the step 2.4.

2.4: and constructing an output module Head. The method specifically comprises the following steps: inputting the characteristic image F obtained by the processing in the step 2.3, and using DB _ Head as an output module of the DB text detection network to further process the characteristic image F, thereby outputting a probability map M_p(Probability Map), threshold Map M_T(Threshold Map) and an approximate binary Map M_A(Appliximate Binary Map). The constructed DB _ Head network structure is shown in table 3.

Table 3 network structure table of output module DB _ Head

Number of network layers	Module name	Inputting characteristic images	Outputting a feature image
				1	Conv module	F(160×160×96)	F₁(160×160×24)
2	BN module	F₁(160×160×24)	F₂(160×160×24)
				3	Conv module	F₂(160×160×24)	F₃(320×320×6)
4	BN module	F₃(320×320×6)	F₄(320×320×6)
				5	Conv module	F₄(320×320×6)	F₅(640×640×1)

(1) The DB _ Head is composed of a Conv module and a BN (Batch Normalization) module, the Conv module is composed of a convolution, the convolution of the first layer is 3 x3 convolution, the convolution of the third layer and the fifth layer is 2 x2 convolution, the convolution is used for extracting image features, the BN module is used for carrying out Normalization processing on data, the mean value (2-3) and the variance (2-4) of each training Batch of data are solved, the training data of the Batch are normalized (2-5) by using the solved mean value and variance, the distribution with the mean difference of 0 and the variance of 1 is obtained, and Normalization (2-6), namely scale transformation and offset, is carried out. The formula involved in the BN layer is as follows:

wherein, (2-3) is a mean value formula; (2-4) is a variance formula; (2-5) is a normalization formula; (2-6) is a reconstruction transformation formula; n is mini-batch size (namely, each training is to divide a data set into a plurality of batches and then into smaller mini-batches, and gradient descent is carried out), and gamma and beta are learnable reconstruction parameters of corresponding feature maps (each feature map only has one pair of learnable parameters gamma and beta, and the learnable parameters are used for enabling the network to recover the feature distribution to be learnt by the original network).

(2) Probability map M_pAnd a threshold map M_TGeneration of (1): inputting a characteristic image F, passing through3 x3 convolution layer, compressing the channel number (dimension) of the feature map into 1/4 input, then passing through BN layer, obtaining feature map F through BN operation and ReLU activation function (2-7)₂Inputting the feature map into next layer 2 × 2 convolution, and performing deconvolution operation to obtain feature map F₃And repeating the BN operation and the ReLU activation function and repeating the cycle to obtain the final characteristic image F₅Finally, the probability graph M is output through a Sigmoid function (2-8)_pAnd a threshold map M_T。

(3) Approximate binary map M_AGeneration of (1): the probability map M is transformed by calling the formulation of the differentiable binary (2-9)_pAnd a threshold map M_TCombining to generate an approximate binary map M_A。

In the formula (2-9), the first,

the method is an Approximate Binary feature Map (Approximate Binary Map), k is an amplification factor, the value is 50, i and j represent coordinate information, P is a Probability feature Map (Probability Map), and T is an adaptive Threshold Map (Threshold Map) learned from a DB text detection network.

2.5: calculating DB text detection network regression optimization loss

Detecting network input K to DB text₀Obtaining the probability map M processed and completed in step 2.4 through forward propagation_pThreshold map M_TAnd approximate binary map M_AAnd calculating a loss value between the predicted text box and the real text box by using the loss function, and performing reverse adjustment on the network parameters of the DB text detection network according to the loss value, so as to iteratively optimize the network parameters and improve the prediction accuracy.

The calculation method of the DB text detection network regression optimization total loss value L is as the following formula (2-10):

L＝L_s+α×L_b+β×L_t (2-10)。

L_Sto calculate the probability map M of the text instance after contraction_pThe formula of the loss value (2-11), L_bTo compute the binarizations, the contracted text instances approximate a binary map M_AThe formula of the loss value (2-11), L_tIs to calculate a binary threshold map M_TThe loss value formula (2-12) is adopted, wherein alpha is 5, and beta is 10.

L_SAnd L_bAnd a binary cross entropy loss function is adopted, and meanwhile, a difficult excavation strategy is additionally adopted, namely retraining is carried out on difficult negative samples in the model training process, so that the problem of unbalance of the positive and negative samples is solved. In the formula (2-11), S_lIs a sampled data set, and the sampling proportion is positive and negative samples 1: 3. y is_iAs a genuine label, x_iIs a prediction result.

In the formula (2-12), L_tUsing the L1 distance loss function, R_dIs G_dPixel index of (1), G_dFor the threshold map M generated in step 2.3_TThe set G of the middle text segmentation areas is obtained by expanding through an offset D (2-13),

is a label of the threshold value map,

is the prediction result of the threshold value map.

In the formula (2-13), D isThe offset, a and L are the area and perimeter of the original segmented region set G, respectively, and r is the contraction ratio, which is fixedly set to 0.4.

2.6: the fixed DB text detection network model parameters specifically include: using test set X partitioned in step 1.2_testAnd testing the accuracy of the DB text detection network model. Test set X_testAnd inputting a DB text detection network model, and predicting through the steps 1.3-2.5. According to the obtained approximate binary image M_AAnd comparing with an actual Label file Label, if all the examples are correctly predicted and the non-background part is predicted as the case of the example, the image is considered to be correctly predicted, otherwise, the image is predicted incorrectly. Defining the number of positive classes to be predicted as positive classes as v₁Misprediction of positive class as negative class is v₂And calculating the proportion of the positive class obtained by accurate prediction to all the original positive classes in the data set by using a formula (2-14), namely the model recall ratio (call). The number of mispredictions of a negative class as a positive class is v₃The proportion of all positive classes classified as positive classes and indeed positive classes, i.e. the precision ratio (precision), is tested by the equations (2-15). In order to comprehensively evaluate two indexes of the recall rate (recall) and the precision (precision), an evaluation Score, namely Score (2-16) is set for judgment, wherein r is the recall rate (recall) and p is the precision (precision). Finally, selecting the DB text detection network model with the highest corresponding Score as a final fixed DB text detection network model, wherein the corresponding fixed model parameters are

Step (3), specifically comprising:

3.1: the input image preprocessing specifically comprises the following steps: inputting CRNN textBefore identifying the network, a text box data set X obtained by DB text detection network prediction is subjected to_DBThe image in (1) is subjected to scale transformation to obtain a preprocessed data set X_CRNN. The specific process is as follows: firstly, scaling the image in equal proportion to ensure that the height of the image is 32, the width of the part of the image is less than 320 and is supplemented with 0, and samples with the aspect ratio of more than 10 are directly discarded to obtain the image input size (320 multiplied by 32 multiplied by 3) (the width pixel value multiplied by the height pixel value multiplied by the RGB three channels) which accords with the CNN module operation in the CRNN text recognition network, and the image input size is used as a certificate image data set X_CRNNAnd (3) inputting the data into a visual feature extraction module CNN in the step 3.2.

The height requirement of the step 3.3BiLSTM module for the input sequence is 1, while the step 3.2CNN module down-samples the input image by a factor of 32, so the input image height of step 3.2 must be 32. Meanwhile, the width-to-height ratio of the size of an input image of the CRNN text recognition network is ensured to be a fixed value, so that the network model training process adopts a multiple 320 of 32 as a width value.

3.2: constructing a visual feature extraction module CNN, which specifically comprises the following steps: inputting certificate image data set X processed in step 3.1_CRNNAn image X thereof_iCRNN(i ═ 1, 2.,. n × t), where n is the number of images in the labeled data set X, and each image is subjected to DB network prediction to obtain t text prediction frames, so that n × t, which are sequentially used as the feature images M₀And inputting the data into the module. Aiming at the characteristics that the special job certificate image described in the step 1.1 has regularity, single type and easy text recognition, the CRNN text recognition network adopts a MobileNet V3-small network as a model of a visual feature extraction module CNN, and reduces the size of the CRNN model and improves the detection speed under the condition of ensuring high accuracy of the model in image feature extraction. The network is used to extract M₀To obtain an extracted output feature image M₅And inputting the text into a subsequent step 3.3BilSTM module for text expression and text classification. After the DB text detection network processing, the input images are changed into small frame images which are much smaller than the original input images, so the balance between the speed and the detection precision can be better ensured by adopting a MobileNet V3-small network model under the condition of low resources. Of MobileNet V3-smallThe network structure is shown in table 4.

Table 4 network structure table of feature extraction network MobileNetV3-small

Number of network layers	Module name	Inputting characteristic images	Outputting a feature image
				1	Conv module	M₀(320×320×3)	M₁(160×16×16)
2	Bneck _ Mix5 module	M₁(160×16×16)	M₂(160×4×24)
				3	Bneck _ Mix6 module	M₂(160×4×24)	M₃(160×1×96)
4	Conv module	M₃(160×1×96)	M₄(160×1×576)
				5	Pool module	M₄(160×1×576)	M₅(80×1×576)

The MobileNet V3-small network consists of Conv, Bneck _ Mix5, Bneck _ Mix6 and Pool modules. (1) In the MobileNet V3-small network, the certificate image data set X processed in the step 3.1 is input_CRNNImage M of (1)₀Using Conv module to image M₀Performing convolution operation to obtain a feature map M₁. (2) The Bneck _ Mix5 module consists of three Bneck modules (3 x3 depth convolution kernels) adopting a ReLU6 activation function, and the Bneck _ Mix6 module consists of eight Bneck modules (5 x5 depth convolution kernels) adopting an H-swish activation function, and the modules respectively perform characteristic image M₁、M₂Performing depth separable convolution to obtain a feature image M₂、M₃. Wherein the Bneck modular structure is the same as described in step 2.2. (3) For the feature map M after the operation of depth separable convolution₃Performing convolution operation again to obtain a feature map M₄Inputting it to Pool module, for M₄Performing average pooling, i.e. dividing the feature image into 80 rectangular regions, averaging the feature points of each region to reduce the image to obtain M₅。

3.3: constructing a sequence feature extraction module BilSTM, which specifically comprises the following steps: inputting the characteristic image M obtained by the processing of the step 3.2₅Step 3.3, a variant of a Recurrent Neural Network (RNN) and a bidirectional long-and-short time memory network (BilSTM) are adopted as a sequence feature extraction module and are firstly converted into a feature vector sequence S₁Then, continuing to extract the text sequence features to obtain the hidden vector probability distribution output S₂. The network structure of BilSTM is shown in Table 5.

Table 5 network structure table of sequence feature extraction module BiLSTM

Number of network layers	Module name	Inputting characteristic images	Outputting a feature image
				1	Reshape module	M₅(80×1×576)	S₁(80×576)
2	BilSTM module	S₁(80×576)	S₂(80×m)

The network consists of Reshape and BilSTM modules. Since the RNN network only accepts specific feature vector sequence input, the Reshape module is responsible for convolving and extracting the feature map M of the CNN module in the step 3.2₅Generation of a sequence of eigenvectors S by column (left to right)₁(80×576)，S₁The method is characterized by comprising 80 columns of feature vectors, wherein each column comprises 576-dimensional features, namely the ith column of feature vector is the connection of the ith column pixels of all 576 feature maps, and each column of feature maps corresponds to a receptive field of an original image, so that a feature vector Sequence is formed, and the step is called Map-to-Sequence. The BilSTM module is used for comparing the characteristic sequence S₁Predicting, learning each feature vector in the sequence to obtain the hidden vectors of all charactersProbability distribution output S₂Where m in table 5 represents the length of the character set that needs to be recognized per column vector.

3.4: constructing a prediction module CTC, specifically: the hidden vector probability distribution output S for each feature vector obtained by the processing of the step 3.3 is input₂The CTC module is used as a prediction module of the CRNN text recognition network, and the input is converted through the de-duplication integration operation to obtain a result character sequence l. The network structure of the prediction module CTC is shown in table 6.

Table 6 network architecture table of prediction module CTC

Number of network layers	Module name	Inputting characteristic images	Outputting a feature image
				1	FC+Softmax	S₂(80×m)	l

The CTC module consists of FC (full Connected layers), Softmax operation and a sequence merging mechanism Blank, and outputs S to the hidden vector probability distribution obtained by the processing of the step 3.3₂Inputting FC layer, outputting S to the probability distribution₂Mapping to T character probability distribution, and then carrying out sequence merging mechanism processing on the character probability distribution, wherein the specific mode is that a blank symbol blank is added in a labeled character set p to the probability distribution to form a new labeled character set p', so that the character probability distribution is realizedThe length of the fixed length meets the Softmax operation requirement; and selecting a label (character) corresponding to the maximum value by using Softmax operation (3-1) to obtain character distribution output, and finally eliminating the blank symbol and the predicted repeated character by using a sequence conversion function beta (3-2) so as to obtain a result character sequence l by decoding.

In the formula (3-1), v_iiRepresenting the ith element in the ith vector in the character probability distribution matrix v, (i)<J) (j is all elements greater than i), S_iiRepresenting the ratio of the index of the element to the sum of the indices of all elements in the column vector. In the formula (3-2), p' is a character set marked with a character set p and blank symbols, and T is a hidden vector probability distribution output S₂The length after FC layer mapping, after β transformation, will output a resulting character sequence p "that is less than sequence length T.

3.5: calculating the CRNN text recognition network regression optimization Loss CTC Loss, which specifically comprises the following steps: inputting the certificate image data set X processed in step 3.1 into a CRNN text recognition network_CRNNImage X of (1)_iCRNN(i ═ 1, 2.,. n × t), calculating a loss value between the predicted result l and the true value by a loss function through forward propagation, and performing backward adjustment on the posterior probability p (l/y) (3-4) of the CTC module output label l in step 3.4 according to the loss value. The computing method of the CRNN text recognition network regression optimization Loss CTC Loss comprises the following steps:

L(S)＝-ln∑_(I,l)∈Slnp(l/y) (3-3)。

in equation (3-3), where p (l/y) is defined by equation (3-4), let S ═ { I, l } be the training set, I be the images input in the training set, and l be the true character sequence output.

CTC equation (3-4) for probability distribution matrix S input to BilSTM Module after Map-to-Sequence operation processing in step 3.3₁Here, S is₁Considering y, all possible output distributions are given, and the most likely resulting tag sequence l is output,the aim is to maximize the posterior probability p (l/y) of l.

p(l/y)＝∑_{π:β(π)＝l}p(π/y) (3-4)。

In the formula (3-4), y is the probability distribution matrix input, and y is y₁,y₂,...,y_TWhere T is the length of the sequence, pi β (pi) ═ l represents all paths pi through the β transformation (3-2) to the final tag sequence l, and p (pi/y) is defined by the formula (3-5).

In the formula (3-5),

indicating possession of the tag at time of time stamp t_tThe subscript t is used to denote each timing of the pi path.

3.6: the model parameters of the fixed CRNN text recognition network are specifically as follows: using test set X partitioned in step 1.2_testAnd testing the character recognition accuracy of the CRNN text recognition network. Mixing X_testAfter the preprocessing of the step 1.3, the DB network model with fixed parameters is input to obtain a small box data set of the prediction text

And (3) testing and identifying through the steps 3.1-3.5, comparing the obtained result Label sequence l with an actual Label file Label, and judging that the identification is correct only if the identification of the whole line of text is correct, or else, judging that the identification is wrong.

Defining the number of texts with correct model identification as l_trueThe number of recognized wrong texts is l_falseCalculating the model character recognition accuracy L by the formula (3-6)_accuracy. Finally selecting the corresponding L_accuracyThe highest CRNN training model is used as a final fixed CRNN text recognition network model, and the corresponding fixed parameters are

Step (4), specifically comprising:

4.1: carry out text detection and discernment to constructor special type operation certificate, specifically do: loading the DB training model with the fixed parameters in the step 2.6, converting the DB training model into a DB text detection network model, and inputting an image set X of the special operation certificate of the constructor to be detected_dDocument image X in (1)_id(i 1, 2.. multidot.n), obtaining t text prediction frame images of the certificate through a DB text detection network with fixed weight

And predicting 4 coordinate information of the text box

Including the upper left corner of the predictive text box

Lower left corner

The upper right corner

Coordinates of lower right corner

The predicted text small-frame image set marked by 4 coordinates obtained by DB text detection network prediction

Inputting to the CRNN text recognition network with fixed parameters in step 3.6, and outputting the related text recognition information

And its character recognition accuracy.

4.2: the judgment logic for detecting the special operation certificate of the constructor specifically comprises the following steps: and judging whether the certificate is legal or not through the following logic according to the text identification information obtained in the step 4.1, and finally obtaining a certificate detection result.

(1) If the certificate image is subjected to text prediction and recognition and then four characters of 'validity period' are recognized, the certificate is judged to detect relevant information of the validity period, next judgment is carried out, if relevant words are not recognized, the certificate is judged to fail to detect, and 'unqualified certificate shooting' is prompted. (2) If the valid period is identified, the predicted text box is selected, characters are extracted from the relevant year, month and day numbers (such as 20100601-20200601) from the beginning valid date to the ending valid date of the certificate after the valid period, the eight-digit numbers (such as 20210601) after the eight-digit numbers are extracted through logic processing, if the eight-digit numbers cannot be extracted normally, the certificate is judged to be not detected successfully, and the 'certificate valid period cannot be identified normally' is prompted. (3) If the valid period and eight digits after the corresponding text box are successfully identified, judging whether four characters of the 'operation type' are identified or not according to the identification text result of the certificate image, if the 'operation type' is identified, judging the next step, if the 'operation type' is not identified, judging that the certificate cannot be successfully detected, and prompting that the 'certificate operation type cannot be normally identified'. (4) The "job type" is recognized, the predicted text box is selected, the text box is extracted for the specific type (electrician job or high-altitude job) after the job type, and the process proceeds to step (5) if the text box is the "electrician job" and proceeds to step (6) if the text box is the "high-altitude job". (5) Comparing eight digits behind a text box corresponding to the valid period obtained by identifying the corresponding certificate image with the current Beijing time, if the valid period is within the current Beijing time, judging that the certificate has passed the valid period and is unqualified, and prompting that 'detection fails, manual detection'; if the validity period is longer than the current Beijing time, the certificate is judged to be qualified, and the detection prompts that the electrician operation type of the special operation certificate is successfully detected and the detection is qualified. (6) Comparing eight digits behind a text box corresponding to the valid period obtained by identifying the corresponding certificate image with the current Beijing time, if the valid period is within the current Beijing time, judging that the certificate has passed the valid period and is unqualified, and prompting that 'detection fails, manual detection'; if the validity period is longer than the current Beijing time, the certificate is judged to be qualified, and the detection prompts that the high-altitude operation type of the special operation certificate is successfully detected and the detection is qualified.

The method for labeling the image data set by adopting the semi-automatic labeling tool is efficient and accurate; aiming at the characteristics of the special job certificate image, the provided network combination model is small, convenient to deploy and high in detection speed; meanwhile, the user-defined special operation certificate judgment logic is adopted, the programming degree of the method is improved, the detection efficiency of the special operation certificate is effectively improved, and the labor cost is effectively reduced.

Example four

And detecting the special job certificate based on the DB and the CRNN, which is concretely as follows.

1: the data preprocessing specifically comprises the following steps: according to the step 1.1, the certificate image data set obtained by the mobile Yunnan company of China is manually screened to be used as an original certificate image data set L, and then the original certificate image data set L is labeled by a semi-automatic labeling method, as shown in a table 7.

TABLE 7 example of image data annotation for special job certificates

Watch (A)

According to step 1.2, the label data set X is divided into training sets X_trainAnd test set X_testThe ratio was set at 8: 2. And according to the step 1.3, sequentially carrying out image decoding, image normalization, rearrangement and image scaling operations on the marked data set X to obtain a marked data set X'.

2: building and training a DB text detection network; the overall structure of the DB text detection network is shown in fig. 4. According to the steps 2.1-2.4, firstly, carrying out scale transformation on the image in the annotation data set X' to obtain an image (640 multiplied by 3); and then sequentially constructing a backhaul module, a Neck module and a Head module of the DB text detection network. The input and output characteristic image sizes of the network layers are shown in table 8.

Table 8 DB text detection network input/output data flow table for each network layer

In table 8, the (640 × 640 × 3) certificate image is subjected to DB text detection network prediction, and the prediction result probability map M is finally output_P(640 × 640 × 1), threshold map M_T(640X 1), approximate binary map M_A(640X 1). And after preparing a training file and setting training parameters, training the DB text detection network according to the step 2.5.

(1) Preparing a training set train _ images folder, a test set test _ images folder, a training set matching label file train _ label.txt, a test set matching label file test _ label.txt and a training file train.py for training a DB text detection network. (2) Setting parameters such as epoch, batch size, learning rate and the like in train. After the training file is prepared and the training parameters are set, the training of the DB text detection network can be started.

Firstly, X is put in_trainAnd loading the prepared training file to a training file train, wherein L, L is obtained by calculation of formulas (2-10) to (2-12) through forward propagation_S、L_b、L_tAnd then, continuously optimizing the network training parameters until the loss function value of the DB text detection network is converged. Finally, according to step 2.6, X is added_testInput DB text detection networkTo obtain a corresponding approximate binary image M_AAnd corresponding coordinate position information, and comparing it with test set X_testAnd comparing corresponding image coordinate information in the annotation file, calculating the model prediction recall rate (call), precision rate (precision) and evaluation Score (Score), and selecting the model with the optimal evaluation Score (Score) as the DB text detection network model with the final fixed parameters.

3: constructing and training a CRNN text recognition network; the overall structure of its CRNN text recognition network is shown in fig. 5. According to the steps 3.1-3.4, firstly, a text small box data set X obtained through DB text detection network prediction in the step 2 is subjected to_DBCarrying out scale transformation on the images to obtain (320 multiplied by 32 multiplied by 3) images; and sequentially constructing CNN, BilSTM and CTC modules of the CRNN text recognition network. The input and output characteristic image sizes of the network layers are shown in table 9.

Table 9.CRNN text recognition of input and output data flow table of each network layer in the network

Number of network layers	Module name	Inputting characteristic images	Outputting a feature image
				1	Conv module	M₀(320×320×3)	M₁(160×16×16)
2	Bneck _ Mix5 module	M₁(160×16×16)	M₂(160×4×24)
				3	Bneck _ Mix6 module	M₂(160×4×24)	M₃(160×1×96)
4	Conv module	M₃(160×1×96)	M₄(160×1×576)
				5	Pool module	M₄(160×1×576)	M₅(80×1×576)
6	Reshape module	M₅(80×1×576)	S₁(80×576)
				7	BilSTM module	S₁(80×576)	S₂(80×m)
8	FC+Softmax	S₂(80×m)	l

In table 9, the (320 × 32 × 3) certificate image is subjected to CRNN text recognition network prediction, and the prediction result sequence l is finally output. Training files are prepared and training parameters are set for training the model according to step 3.5. (1) Preparing a training set text _ images folder, a test set text _ images folder, two txt files rec _ text.txt and rec _ text.txt for recording image text content labels, a training file train.py and a dictionary word _ direct.txt for training a CRNN text recognition network. The dictionary is stored in an utf-8 encoding format and is used for mapping characters appearing in the labeling data set X into indexes of the dictionary. (2) Setting parameters such as epoch, batch size, learning rate and the like in train.

After the training file is prepared and the training parameters are set, the training of the CRNN text recognition network can be started. Firstly, X is put in_trainAnd loading the prepared training file to a training file train.py, calculating by formulas (3-3) to (3-5) through forward propagation to obtain L (S), and continuously optimizing the network training parameters until the loss function value of the CRNN text recognition network is converged. Finally, according to step 3.6, X is added_testInputting the prediction result sequence l into a CRNN text recognition network to obtain a prediction result sequence l of a corresponding image text, and combining the prediction result sequence l with a test set X_testComparing corresponding labeled text information in the labeled file, and calculating the model character recognition accuracy rate L_accuracySelection accuracy L_accuracyThe highest model is used as the final fixed parameter CRNN text recognition network model.

4: detect constructor special type operation certificate specifically does: according to the step 4.1, loading the DB text detection network model with the fixed parameters in the step 2.6 and the CRNN text recognition network model with the fixed parameters in the step 3.6, firstly inputting an image data set X of the certificate to be detected_dAccording to the special job certificate image of the constructor to be detected, the model predicts an approximate binary image M of the text target frame target according to the parameters_A(640 × 640 × 1), the corresponding text box is obtained

And coordinate position information thereof

Then, inputting the text frame image obtained by predicting the DB text detection network model into a CRNN text recognition network model, and outputting the text information obtained by recognition

Arbitrarily selecting certificate image data set X to be detected_dOne certificate image X in_kdAs an example of model output, image X_kdThe information identified by the DB and CRNN fixed model prediction is shown in table 10.

Table 10 example image X_kdInformation table after DB and CRNN fixed model prediction identification

According to the step 4.2, the obtained text information X is detected and judged according to the self-defined special job certificate of the constructor_kd-tAnd extracting, and judging whether the special operation certificate of the constructor is qualified and effective based on the extracted text information.

Fourthly, compared with the prior art, the invention has the advantages and positive effects

(1) The invention provides a high-efficiency semi-automatic certificate image data set labeling method aiming at the characteristics of regularity, order and easy labeling of special operation certificate image data sets provided by a construction site of a 5G base station of China Mobile Yunnan company, and the method adopts a PPOCRLael tool to perform the first step of labeling, namely automatically labeling text boxes in the certificate image data sets and characters in corresponding text boxes, and further manually screening to perform secondary manual modification labeling on text boxes and texts which are not predicted successfully and labeled incorrectly so as to improve the labeling efficiency and ensure the high accuracy of the labeled data sets.

(2) Aiming at the characteristics of regular image data, single type and easiness in completing text detection and text recognition of the special job certificate, the invention adopts a MobileNet V3 network as a backbone network of a DB text detection network model and a CRNN text recognition network model for extracting image features. The number of characteristic channels of two networks is reduced while the certificate can be accurately detected, and the size of the corresponding model is reduced by 90%, so that the method is suitable for the condition of limited computing capacity; meanwhile, the detection speed of the certificate image is improved, and therefore the efficiency of the certificate inspection method is improved.

(3) The method combines the image data set of the special job certificate of the given constructor and the corresponding prediction recognition result of the combined model, self-defines the judgment logic of the special job certificate detection, realizes the automatic detection program of the special job certificate in 24 hours without manpower through a computer, and improves the programming degree of the method.

The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims

1. A special job certificate detection method based on DB and CRNN is characterized by comprising the following steps:

acquiring a special job certificate image data set; the special operation certificate image data set comprises a plurality of target special operation certificate images, and each target special operation certificate image has text information;

inputting each target special job certificate image into a DB text detection network model to determine a text box data set corresponding to each target special job certificate image; elements in the text box data set represent position information of a target text box;

inputting each target special job certificate image and a text box data set corresponding to each target special job certificate image into a CRNN text recognition network model to determine text information in each target text box in each target special job certificate image; the text information comprises at least one of constructor name, constructor gender, certificate number, operation category and certificate valid date;

the DB text detection network model is obtained by training based on a DB text detection network and a first training data set; the backhaul module in the DB text detection network adopts a MobileNet V3-large structure; each element in the first training data set comprises a historical special job certificate image and a first class label corresponding to the historical special job certificate image; the first category label is position information of a historical text box;

the CRNN text recognition network model is obtained by training based on a CRNN text recognition network and a second training data set; the partial structure of the CNN module in the CRNN text recognition network adopts a MobileNet V3-small structure; each element in the second training data set comprises a historical special job certificate image and a second class label corresponding to the historical special job certificate image; the second category label is historical text information.

2. The method for detecting special job certificates based on DB and CRNN as claimed in claim 1, further comprising: and determining whether each special operation certificate meets the construction operation requirement or not based on the text information.

3. The method as claimed in claim 1, wherein the step of inputting each target special job certificate image into a DB text detection network model to determine a text box data set corresponding to each target special job certificate image comprises:

preprocessing each target special job certificate image; the pretreatment comprises the following steps: decoding, normalizing, rearranging, and image scaling;

and inputting each preprocessed target special job certificate image into a DB text detection network model to determine a text box data set corresponding to each target special job certificate image.

4. The method as claimed in claim 3, wherein the step of inputting each target special job certificate image and the text box data set corresponding to each target special job certificate image into a CRNN text recognition network model to determine the text information in each target text box of each target special job certificate image comprises:

inputting each preprocessed target special job certificate image and a preprocessed text box data set corresponding to each preprocessed target special job certificate image into a CRNN text recognition network model to determine text information in each target text box in each target special job certificate image.

5. The method for detecting special job certificates based on the DB and the CRNN as claimed in claim 1, wherein the determining process of the DB text detection network model is as follows:

constructing a DB text detection network;

determining a first training data set;

and training the DB text detection network based on the first training data set to obtain a DB text detection network model.

6. The method for detecting special job certificates based on the DB and CRNN according to claim 5, wherein the determining the first training data set specifically includes:

acquiring an original certificate image data set; the original certificate image data set comprises a plurality of original historical special job certificate images;

labeling each original historical special job certificate image by adopting a semi-automatic labeling tool to obtain each historical labeling image and a first class label corresponding to each historical labeling image;

preprocessing each historical annotation image to obtain a historical special job certificate image; decoding, normalizing, rearranging, and image scaling; the first class label corresponding to the historical special job certificate image is the first class label corresponding to the historical annotation image.

7. The method for detecting documents for special jobs based on DB and CRNN as claimed in claim 1, wherein the CRNN text recognition network model determining process is:

constructing a CRNN text recognition network;

determining a second training data set;

and training the CRNN text recognition network based on the second training data set to obtain a CRNN text recognition network model.

8. The method for detecting special job certificates based on DB and CRNN according to claim 7, wherein the determining the second training data set specifically includes:

labeling each original historical special job certificate image by adopting a semi-automatic labeling tool to obtain each historical labeling image and a second type label corresponding to each historical labeling image;

preprocessing each historical annotation image to obtain a historical special job certificate image; decoding, normalizing, rearranging, and image scaling; and the second type label corresponding to the historical special job certificate image is the second type label corresponding to the historical annotation image.

9. A special job certificate detection system based on DB and CRNN is characterized by comprising:

the data acquisition module is used for acquiring a special job certificate image data set; the special operation certificate image data set comprises a plurality of target special operation certificate images, and each target special operation certificate image has text information;

the text box data set determining module is used for inputting each target special job certificate image into a DB text detection network model so as to determine a text box data set corresponding to each target special job certificate image; elements in the text box data set represent position information of a target text box;

the text information determining module is used for inputting each target special job certificate image and a text box data set corresponding to each target special job certificate image into a CRNN text recognition network model so as to determine text information in each target text box in each target special job certificate image; the text information comprises at least one of constructor name, constructor gender, certificate number, operation category and certificate valid date;

10. The DB and CRNN-based specialty job document detection system according to claim 9, further comprising: and the detection module is used for determining whether each special operation certificate meets the construction operation requirement or not based on the text information.