CN111027443B

CN111027443B - Bill text detection method based on multitask deep learning

Info

Publication number: CN111027443B
Application number: CN201911225976.8A
Authority: CN
Inventors: 刘桂雄; 刘思洋
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2019-12-04
Filing date: 2019-12-04
Publication date: 2023-04-07
Anticipated expiration: 2039-12-04
Also published as: CN111027443A

Abstract

The invention provides a note text detection method based on multitask deep learning, which comprises the following steps: constructing a multilayer convolutional neural network as an image feature extraction backbone network to realize feature extraction of the bill image; marking a note text region and a region center line on the convolution characteristic graph and training to realize note text information region segmentation and text center line detection; advancing along the text center line in the bill text information region by a sliding window method to realize the single character segmentation of the bill text information region; and sequentially carrying out classification and identification on the single segmented character to form finished bill text information. The invention provides an end-to-end multi-task learning method by utilizing strong feature extraction and induction capabilities of deep learning, realizes bill text region segmentation, text character segmentation and text character recognition, and solves the problems of insufficient applicability and low efficiency of a classical bill text information detection method.

Description

Bill text detection method based on multitask deep learning

Technical Field

The invention relates to the field of bill anti-counterfeiting identification, in particular to a bill text detection method based on multi-task deep learning.

Background

The visual detection and identification technology is widely applied due to high accuracy, non-contact and good applicability. The bill image text information has the characteristics of various text information areas, cross mixing of Chinese characters, numbers and English characters and the like, at present, the bill image text information is read by manpower, the work is boring, the repeatability is high, the misreading and the misreading are easy to occur due to the fact that the spirit is not concentrated under the fatigue work, and the bill image text information acquisition method for researching the robot is the research focus in the field.

In recent years, with the rapid development of the electronic hardware industry and the information industry, the computing capability of a computer is rapidly improved, so that large-scale image computation and reasoning become possible. The image detection method based on deep learning is applied to the field of image text information acquisition and achieves remarkable effect. The text information detection method based on deep learning utilizes multilayer convolution operation to extract image features from an image layer by layer, performs feature operation, processing and induction, and forms a text information reading method with high efficiency and strong universality through multi-task combination of text information region positioning, text character segmentation, text character classification and identification and the like. The manual detection and traditional image classification methods have short boards in the field of bill text information detection, and the bill text information acquisition technology based on deep learning has the advantages of strong universality and high detection efficiency, and is beneficial to improving the development and progress of digitization and intellectualization of the financial industry.

Disclosure of Invention

In order to solve the problems and defects, the invention provides a bill text detection method based on multi-task deep learning, which divides the bill text detection into three tasks of bill text region segmentation, bill text character classification and recognition, integrates the three tasks into a deep learning framework, adopts a supervised learning method to realize the acquisition of bill text information, and solves the problems of dependence on manpower and the like in the conventional acquisition of the bill text information.

The purpose of the invention is realized by the following technical scheme:

a note text detection method based on multitask deep learning comprises the following steps:

a, constructing a multilayer convolutional neural network as an image feature extraction backbone network to realize feature extraction of the bill image;

marking a note text region and a region center line on the convolution characteristic diagram and training to realize note text information region segmentation and text center line detection;

c, advancing along the text center line in the bill text information region by a sliding window method to realize the single character segmentation of the bill text information region;

and D, sequentially classifying and identifying the single segmented character to form finished bill text information.

The invention has the beneficial effects that:

by utilizing the advantages of deep learning in feature extraction, induction and reasoning, the bill text detection is divided into three tasks of bill text region segmentation, bill text character classification and recognition, and a deep neural network is trained under the support of a large amount of labeled data, so that the efficient and accurate detection and recognition of the bill text information are realized.

Drawings

FIG. 1 is a flow chart of a method for detecting a bill text based on multitask deep learning according to the present invention.

Detailed Description

The present invention will be described in further detail with reference to the following examples and the accompanying drawings.

The invention discloses a note text detection method based on multitask deep learning, which comprises the following steps of:

step 10, constructing a multilayer convolutional neural network as an image feature extraction backbone network to realize feature extraction of the bill image:

introducing cavity convolution into the convolution layer of the feature extraction backbone network, namely performing bilinear difference on the feature map after the convolution operation of the previous layer to enlarge the resolution of the convolution feature map, and then performing convolution operation of the convolution layer to enlarge the convolution receptive field under the condition of ensuring that the convolution kernel parameters are not changed so as to obtain richer bill image features;

in the process of extracting the bill image features by the multi-convolution layer neural network, the output feature vectors of the lower-layer convolution layer and the output vectors of the higher-layer convolution layer are spliced to form a final output feature map so as to retain the edge and texture features in the lower-layer convolution layer and the semantic features in the higher-layer convolution layer.

Step 20, marking a note text region and a region center line on the convolution characteristic diagram and training to realize note text information region segmentation and text center line detection;

the bill text information region segmentation and text center line detection parameters comprise center line pixel point coordinates (x) _i ,y _i ) And the offset of the central line pixel point to the boundary of the text region

Center line pixel to textPresent area lower boundary offset->

And training the network by taking the parameters as output targets to obtain the results of segmentation of the bill text information region and detection of the text region center line.

Step 30, advancing along the text center line in the bill text information region by a sliding window method to realize the single character segmentation of the bill text information region;

in the bill text information area, the bill text information area is advanced along the text center line in a sliding window method, and for each pixel (x) on the center line _i ,y _i ) The distances between four vertexes of upper left, upper right, lower left and lower back of each character and the central line pixel are predicted to be respectively

The true distance of each character from the center line is->

Constructing a loss function:

wherein alpha is _lt 、α _rt 、α _ld 、α _rd And correcting terms for each distance loss to control the specific gravity of each distance loss.

Step 40, carrying out classification and identification on the single segmented character in sequence to form finished bill text information:

and (3) pre-training a Softmax multi-classifier on the character image data set, and sequentially adopting the Softmax multi-classifier to classify and identify the single character obtained by the segmentation in the step 30 to form complete bill text information.

Although the embodiments of the present invention have been described above, the above descriptions are only for the convenience of understanding the present invention, and are not intended to limit the present invention. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A bill text detection method based on multitask deep learning is characterized by comprising the following steps:

marking a note text area and an area center line on the convolution characteristic diagram, and training the note text area and the area center line to realize note text information area segmentation and text center line detection;

d, sequentially classifying and identifying the single divided character to form finished bill text information;

in the step B, parameters in the bill text information region segmentation and text center line detection are taken as an output target training network, and the bill text information region segmentation and text region center line detection results are obtained; the parameters in the bill text information region segmentation and text center line detection comprise center line pixel point coordinates (x) _i ,y _i ) And the boundary offset from the central line pixel point to the text area

Offset from center line pixel to lower boundary of text region->

In step C, each pixel (x) on the center line of the text is passed _i ,y _i ) Predicting the distances between four vertexes of upper left, upper right, lower left and lower back of each character and the central line pixelFrom is respectively

The true distance of each character from the center line is->

Constructing a loss function:

2. The method for detecting the bill texts based on the multitask deep learning as claimed in claim 1, wherein in the step a, a hole convolution is introduced into a convolution layer of the feature extraction backbone network, that is, a bilinear difference value is performed on a feature map after a previous layer of convolution operation, so that the resolution of the convolution feature map is enlarged, then the convolution operation of the convolution layer is performed, and under the condition that the convolution kernel parameters are not changed, the convolution receptive field is enlarged, so that richer bill image features are obtained.

3. The method for detecting the bill texts based on the multitask deep learning as claimed in claim 1, wherein in the step a, in the process of extracting the bill image features through the multi-convolution layer neural network, the output feature vectors of the low-layer convolution layer and the output vectors of the high-layer convolution layer are spliced to form a final output feature map so as to retain the edge and texture features in the low-layer convolution layer and the semantic features in the high-layer convolution layer.

4. The method for detecting the bill texts based on the multitask deep learning as claimed in claim 1, wherein in the step D, characters Softmax multi-classifiers are pre-trained through a character image data set, and the single characters obtained by the segmentation in the step C are sequentially classified and recognized to form complete bill text information.