CN113378812A

CN113378812A - Digital dial plate identification method based on Mask R-CNN and CRNN

Info

Publication number: CN113378812A
Application number: CN202110559663.7A
Authority: CN
Inventors: 张卫星; 韩颖; 余利敏; 赵博学; 徐畅; 李子良; 周旭
Original assignee: Zhengzhou University
Current assignee: Zhengzhou University
Priority date: 2021-05-21
Filing date: 2021-05-21
Publication date: 2021-09-10

Abstract

The invention provides a digital dial plate identification method based on Mask R-CNN and CRNN, which is characterized in that after a digital dial plate identification model based on Mask R-CNN and CRNN is constructed, an original image to be identified is input into the model, and then an identification result can be output. The recognition model is obtained by training according to the following method: collecting an original image, and dividing a training data set, a verification data set and a test data set; performing label operation on the training set image to obtain a label image; obtaining an interested target frame ROI after feature extraction; performing FCN operation on the fusion characteristic graph and outputting a mask; classifying the target frame ROI, and outputting a class and a labeling frame box; calculating coordinates of four points of the area to be identified in the mask, and transforming the quadrilateral identification area into a rectangle through perspective transformation; and performing digital recognition by taking the stored rectangular image as the input of a convolution cyclic neural network model CRNN.

Description

Digital dial plate identification method based on Mask R-CNN and CRNN

Technical Field

The invention belongs to the field of image recognition, and particularly relates to a digital dial plate recognition method based on Mask R-CNN and CRNN.

Background

In recent years, the development of artificial intelligence technology is rapid, especially in the aspect of business application, and more intelligent products appear in places such as markets, families and the like and are also becoming important development strategies of various countries in the future. Particularly, the algorithm evolution taking deep learning as a core, and with the super-strong evolutionary ability, under the support of big data and a hardware platform, a network model similar to a human brain thinking mode is obtained through data training, and various problems can be solved by different network models at present. At present, there are many digital recognition methods based on deep learning, but the number of the methods really applied to practice is very small. In real life, due to the fact that data pictures are subjected to resolution, exposure and the like, processes of feature extraction, data fusion and the like are influenced, data are difficult to train and recognize, and recognition accuracy is influenced to a certain extent.

The problem that data such as balance, total amount and allowance in the gas meter are manually copied and summarized is solved. The Mask R-CNN and the CRNN model are combined to realize the digital identification in the gas table image. The Mask R-CNN performs instance segmentation on the basis of target detection and semantic segmentation, can achieve image classification at a pixel level, and simultaneously outputs a Mask; and compared with the prior art, the CRNN has high accuracy in word recognition of indefinite sequences. However, at present, two models need to be trained separately to identify the roller number and the liquid crystal screen number; for such images, there are few items to recognize the number region first, then to perform number recognition, and to apply the model to practice.

Disclosure of Invention

In order to solve the above problems, it is necessary to provide a digital dial plate recognition method based on Mask R-CNN and CRNN.

The invention provides a digital dial plate recognition model based on Mask R-CNN and CRNN, which is obtained by training according to the following method:

step 1, preprocessing the collected digital dial original image, and then, according to a training data set, a verification data set and a test data set, setting the data set as 6: 2: 2 into three data sets;

step 2, performing label operation on the training set image, and labeling the region needing to be positioned and segmented in the original image by using a labelme tool to obtain a label image;

step 3, inputting the images of the training set into a ResNet101 model for feature extraction, performing feature map fusion by using FPN to obtain fusion feature map, inputting the fusion feature map into a candidate region to generate a network RPN and a ROIAlign layer, and then obtaining an interested target frame ROI;

step 4, performing FCN operation on the feature maps of the fusion feature maps, and outputting a mask; classifying the target frame ROI, and outputting a class and a labeling frame box;

step 5, four-point coordinates of the area to be identified in the mask are calculated, the quadrilateral identification area is converted into a rectangle through perspective conversion, and a rectangular image is stored;

and 6, performing digital recognition by taking the stored rectangular image as the input of the convolution cyclic neural network model CRNN.

The second aspect of the invention provides a digital dial plate identification method based on Mask R-CNN and CRNN, which is characterized in that after a digital dial plate identification model based on Mask R-CNN and CRNN is constructed, an original image to be identified is input into the model, and then an identification result can be output.

The third aspect of the invention provides a terminal, which comprises a processor, a memory and a finger vein image edge detection algorithm program stored in the memory, wherein when the finger vein image edge detection algorithm program is run by the processor, the steps of the digital dial plate identification method based on Mask R-CNN and CRNN are realized.

A fourth aspect of the present invention provides a computer-readable storage medium having computer instructions stored thereon, characterized in that: when being executed by a processor, the computer instructions realize the steps of the digital dial plate identification method based on the Mask R-CNN and the CRNN.

Compared with the prior art, the invention has prominent substantive characteristics and remarkable progress, particularly: the model constructed by the invention realizes the classification of pixel levels by adopting a Mask R-CNN model, and outputs a Mask for digital identification; and then, the CRNN model is adopted for digital recognition, and bidirectional LSTM and CRC are introduced into the image, so that the accuracy of digital recognition is obviously improved.

The model constructed by the invention can be particularly applied to identifying the image number of the gas meter, and compared with an algorithm for respectively modeling by case segmentation and number identification in the traditional technology, the model has the advantages of wide application scene, high accuracy and high identification speed.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a Mask R-CNN model of the digital dial identification model based on Mask R-CNN and CRNN of the present invention.

FIG. 2 shows the ResNet101 and FPN frameworks of the Mask R-CNN model of the present invention.

FIG. 3 is a CRNN network structure of the digital dial identification model of the present invention based on Mask R-CNN and CRNN.

Fig. 4 is an image digit recognition process of the method of the present invention.

FIG. 5 is a schematic diagram of a feature sequence of the CRNN network structure of the present invention

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.

Example 1

As shown in fig. 1, this embodiment proposes a digital dial recognition model based on Mask R-CNN and CRNN, which is obtained by training according to the following method:

step 1, preprocessing the collected digital dial original image, and then, according to a training data set, a verification data set and a test data set, setting the data set as 6: 2: scale of 2 into three data sets.

And 2, performing label operation on the training set image, and labeling the region needing to be positioned and segmented in the original image by using a labelme tool to obtain a label image.

wherein, the ResNet101 model comprises five stages, including conv1, conv2_ x, conv3_ x, conv4_ x and conv5_ x, and outputs C1, C2, C3, C4 and C5 corresponding to the five stages;

the output bottom layer feature layer obtains the same channel number as the upper layer feature layer through convolution of 1x 1, the upper layer feature layer obtains the same length and width as the lower layer feature layer through up-sampling and then is added, so that a new feature layer P2-P6 which is well fused is obtained, and the FPN completes fusion of bottom layer-to-high layer fusion feature map maps; the feature layers P2-P5 are used for predicting class, box and mask of the object, and the feature layers P2-P6 are used for training the RPN, namely P6 is only used in the RPN network.

The candidate region generation network RPN generates 9 kinds of target frames anchors with preset length-width ratios and areas for each position by means of a window sliding on the shared feature map. These 9 initial anchors contained three areas (128 × 128, 256 × 256, 512 × 512), each of which contained three aspect ratios (1:1, 1:2, 2: 1). The tree-shaped structure is adopted, a trunk is a 3 x 3 convolutional layer, branches are two 1x 1 convolutional layers, the first 1x 1 convolutional layer is used for judging whether a target frame anchor covers a target, and the second 1x 1 convolutional layer is used for carrying out coordinate correction on the target frame anchor in the foreground, so that a corrected frame ROI is output.

The RoIAlign layer is slightly modified on the basis of the RoI pooling, and the problem of large pixel point difference when feature maps are restored to the original image in the RoI pooling is solved. The principle is to divide feature maps into k × k units, calculate fixed four coordinate positions in each unit, calculate the values of the four positions by a bilinear interpolation method, and then perform a maximum pooling operation to obtain an accurate target frame ROI.

the FCN can receive an input image with any size, then up-samples the feature map of the last convolutional layer through the deconvolution layer to restore the feature map to the same size as the input image, so that a prediction can be generated for each pixel, space information in the original input image is kept, and finally, each pixel is classified on a feature map with the same size as the input map to output a mask.

The target frame ROI obtains the final class and the marking frame box of the object detection through a full-connection layer network, and is remarkable in that the mask and the classification are parallel in the training process, and the classification is performed before the mask in the prediction process. When the ROI of the target frame is classified, the Loss function of each ROI is as follows:

L＝L_cls+L_box+L_mask

L_cls、L_boxclassification, regression loss, respectively;

N_cls: a classification weight coefficient; p is a radical of_i: probability of the anchor prediction being the target;

whether the target second classification contains the object is 1, and if not, the target second classification is 0;

N_reg: a regression weight coefficient;

real coordinates; t is t_i: four-point prediction coordinates;

the output dimension of the mask branch to each ROI is K m, wherein m represents the size of the mask, and K represents K categories; after the prediction mask is obtained, a sigmoid function value L is calculated for each pixel point value of the mask_maskIs input.

in this step, digital recognition is realized by calculating the position coordinates of the target area in the mask and taking the target area image (quadrangle) as input. Before recognition, the quadrangle should be transformed into a rectangle through perspective transformation. Perspective transformation formula:

u and v are the coordinates of the original picture, the parameter ω is equal to 1, and the picture coordinates x, y are obtained after perspective transformation, wherein,

step 6, taking the stored rectangular image as the input of a convolution cyclic neural network model CRNN to carry out digital recognition;

specifically, the method for performing digital recognition by taking the stored rectangular image as the input of the convolution cyclic neural network model CRNN comprises the following steps:

and after the stored rectangular image is used as the input of the depth CNN for feature extraction to obtain a digital feature map, converting the digital feature map into a feature vector sequence and inputting the feature vector sequence into a BLSTM network, predicting the feature vector sequence, outputting predicted label distribution, and finally converting the obtained label distribution into a final label sequence by using CTC loss to obtain a predicted value, namely a true value of the identified number.

Specifically, the convolutional recurrent neural network model CRNN includes three parts, which are from bottom to top:

the convolutional layer CNN uses the depth CNN to extract features of an input image to obtain a feature map and extract a feature vector sequence; specifically, the grayscale image (H ═ 32) is subjected to feature extraction with the depth CNN to obtain a feature map, a feature vector sequence (H ═ 1) is extracted, and then a feature vector sequence required by RNN is extracted; with some adjustments made to the depth CNN: in order to input the CNN extracted features as input into the RNN network, the window sizes of the last two pooling layers are changed from 2x2 to 1x 2; in order to accelerate the convergence of the model and shorten the training process, a BN module is introduced for normalization. When a feature vector sequence is extracted, generating each feature vector on a feature map from left to right according to rows, wherein each row comprises multidimensional features, namely the ith feature vector is the connection of pixels of the ith row of all the feature maps, the feature vectors form a sequence, and a plurality of sequence vectors are obtained and then correspond to an original map by a certain step length for classifying related areas of the original map;

the loop layer RNN predicts the characteristic vector sequence by using the bidirectional LSTM, learns each characteristic vector in the sequence and outputs the distribution of prediction labels;

inputting the characteristic vector sequence into a two-layer bidirectional LSTM network, predicting which character the characteristic vector corresponds to, and obtaining softmax probability distribution of all characters, wherein the softmax probability distribution is a vector with the length being the character category number and is used as the input of a CTC layer;

the transcription layer CTC loss is used for converting a series of label distributions acquired from the circulation layer into a final label sequence; by converting the prediction of each characteristic vector by RNN into a label sequence and introducing a blank mechanism, the problem of merging continuous characters in the prediction process can be solved.

The specific network configuration is:

where the first row is the top level, "k", "s", and "p" represent kernel size, stride, and fill size, respectively.

The model constructed by the invention adopts a Mask R-CNN model to realize the classification of pixel levels and outputs a Mask for digital identification; and then, the CRNN model is adopted for digital recognition, and bidirectional LSTM and CRC are introduced into the image, so that the accuracy of digital recognition is obviously improved.

It should be noted that, after the training of the recognition model is completed, the verification data set and the test data set are used to verify and test the recognition model.

Example 2

The embodiment provides a digital dial plate identification method based on Mask R-CNN and CRNN, and after the digital dial plate identification model based on Mask R-CNN and CRNN in embodiment 1 is constructed, the original image to be identified is input into the model, and then the identification result can be output.

Specifically, the identification model in embodiment 1 is embedded into front-end software of the gas meter to form a digital identification mode integrating photographing, classification and identification, and the identification model is used according to the following steps:

acquiring a gas meter image, manually photographing and uploading the gas meter image to the front end to obtain an original image;

inputting an original image as an input image into an identification model;

and calculating the recognition model, outputting the number to be recognized of the target area, and feeding back the result to the front end.

Example 3

The embodiment provides a terminal, which includes a processor, a memory, and a finger vein image edge detection algorithm program stored in the memory, where the finger vein image edge detection algorithm program is executed by the processor, and the steps of the Mask R-CNN and CRNN-based digital dial plate recognition method according to embodiment 2 are implemented.

Example 4

The present embodiment provides a computer-readable storage medium, on which computer instructions are stored, and when the computer instructions are executed by a processor, the steps of the Mask R-CNN and CRNN-based digital dial plate recognition method according to embodiment 2 are implemented.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

Each functional unit in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated module may be stored in a computer-readable storage medium if it is implemented in the form of a software functional unit and sold or used as a separate product. Based on such understanding, all or part of the flow in the method of the embodiments described above may be implemented by a computer program, which may be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. The computer program includes computer program code, and the computer program code may be in a source code form, an object code form, an executable file or some intermediate form.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A digital dial plate recognition model based on Mask R-CNN and CRNN is characterized in that the model is obtained by training according to the following method:

2. The Mask R-CNN and CRNN-based digital dial recognition model of claim 1, wherein: in step 3, the ResNet101 model includes five stages, i.e., conv1, conv2_ x, conv3_ x, conv4_ x and conv5_ x, and outputs C1, C2, C3, C4 and C5 corresponding to the five stages;

obtaining the same channel number as the upper layer of feature layer by convolution of 1x 1 on the output bottom layer of feature layer, obtaining the same length and width as the lower layer of feature layer by up-sampling the upper layer of feature layer, adding to obtain the fused feature layers P2-P6, and completing FPN fusion to obtain the fused feature map; the feature layers P2-P5 are used for predicting class, box and mask of the object, and the feature layers P2-P6 are used for training RPN.

3. The Mask R-CNN and CRNN-based digital dial identification model according to claim 1, wherein the candidate region generation network RPN adopts a tree structure, the trunk is a 3 × 3 convolutional layer, the branches are two 1 × 1 convolutional layers, the first 1 × 1 convolutional layer is used to determine whether the target frame anchor covers the target, and the second 1 × 1 convolutional layer is used to perform coordinate correction on the target frame anchor in the foreground, thereby outputting a corrected frame ROI.

4. The Mask R-CNN and CRNN-based digital dial plate recognition model according to claim 3, wherein the target frame ROI is classified by a full-link network to obtain a final class and a final labeling frame box for object detection, and a Loss function of each ROI is:

L＝L_cls+L_box+L_mask

L_cls、L_boxclassification, regression loss, respectively;

N_reg: a regression weight coefficient;

real coordinates; t is t_i: four-point prediction coordinates;

5. The Mask R-CNN and CRNN-based digital dial recognition model of claim 4, wherein the formula of perspective transformation in step 5 is:

6. the Mask R-CNN and CRNN-based digital dial identification model according to claim 5, wherein the convolutional recurrent neural network model CRNN in step 6 includes three parts, from bottom to top:

the convolutional layer CNN uses the depth CNN to extract features of an input image to obtain a feature map and extract a feature vector sequence;

the transcription layer CTC loss is used for converting a series of label distributions acquired from the circulation layer into a final label sequence;

the method for carrying out digital recognition by taking the stored rectangular image as the input of the convolution cyclic neural network model CRNN comprises the following steps:

7. A digital dial plate identification method based on Mask R-CNN and CRNN is characterized in that: after the digital dial plate recognition model based on Mask R-CNN and CRNN as claimed in any one of claims 1 to 6 is constructed, the original image to be recognized is input into the model, and then the recognition result can be output.

8. A terminal, characterized by: the digital dial plate recognition method based on Mask R-CNN and CRNN comprises a processor, a memory and a finger vein image edge detection algorithm program stored in the memory, wherein when the finger vein image edge detection algorithm program is executed by the processor, the steps of the digital dial plate recognition method based on Mask R-CNN and CRNN according to claim 7 are realized.

9. A computer-readable storage medium having stored thereon computer instructions, characterized in that: the computer instructions, when executed by a processor, implement the steps of the Mask R-CNN and CRNN-based digital dial identification method of claim 7.