CN111027443B - Bill text detection method based on multitask deep learning - Google Patents
Bill text detection method based on multitask deep learning Download PDFInfo
- Publication number
- CN111027443B CN111027443B CN201911225976.8A CN201911225976A CN111027443B CN 111027443 B CN111027443 B CN 111027443B CN 201911225976 A CN201911225976 A CN 201911225976A CN 111027443 B CN111027443 B CN 111027443B
- Authority
- CN
- China
- Prior art keywords
- text
- bill
- center line
- text information
- convolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Character Input (AREA)
Abstract
The invention provides a note text detection method based on multitask deep learning, which comprises the following steps: constructing a multilayer convolutional neural network as an image feature extraction backbone network to realize feature extraction of the bill image; marking a note text region and a region center line on the convolution characteristic graph and training to realize note text information region segmentation and text center line detection; advancing along the text center line in the bill text information region by a sliding window method to realize the single character segmentation of the bill text information region; and sequentially carrying out classification and identification on the single segmented character to form finished bill text information. The invention provides an end-to-end multi-task learning method by utilizing strong feature extraction and induction capabilities of deep learning, realizes bill text region segmentation, text character segmentation and text character recognition, and solves the problems of insufficient applicability and low efficiency of a classical bill text information detection method.
Description
Technical Field
The invention relates to the field of bill anti-counterfeiting identification, in particular to a bill text detection method based on multi-task deep learning.
Background
The visual detection and identification technology is widely applied due to high accuracy, non-contact and good applicability. The bill image text information has the characteristics of various text information areas, cross mixing of Chinese characters, numbers and English characters and the like, at present, the bill image text information is read by manpower, the work is boring, the repeatability is high, the misreading and the misreading are easy to occur due to the fact that the spirit is not concentrated under the fatigue work, and the bill image text information acquisition method for researching the robot is the research focus in the field.
In recent years, with the rapid development of the electronic hardware industry and the information industry, the computing capability of a computer is rapidly improved, so that large-scale image computation and reasoning become possible. The image detection method based on deep learning is applied to the field of image text information acquisition and achieves remarkable effect. The text information detection method based on deep learning utilizes multilayer convolution operation to extract image features from an image layer by layer, performs feature operation, processing and induction, and forms a text information reading method with high efficiency and strong universality through multi-task combination of text information region positioning, text character segmentation, text character classification and identification and the like. The manual detection and traditional image classification methods have short boards in the field of bill text information detection, and the bill text information acquisition technology based on deep learning has the advantages of strong universality and high detection efficiency, and is beneficial to improving the development and progress of digitization and intellectualization of the financial industry.
Disclosure of Invention
In order to solve the problems and defects, the invention provides a bill text detection method based on multi-task deep learning, which divides the bill text detection into three tasks of bill text region segmentation, bill text character classification and recognition, integrates the three tasks into a deep learning framework, adopts a supervised learning method to realize the acquisition of bill text information, and solves the problems of dependence on manpower and the like in the conventional acquisition of the bill text information.
The purpose of the invention is realized by the following technical scheme:
a note text detection method based on multitask deep learning comprises the following steps:
a, constructing a multilayer convolutional neural network as an image feature extraction backbone network to realize feature extraction of the bill image;
marking a note text region and a region center line on the convolution characteristic diagram and training to realize note text information region segmentation and text center line detection;
c, advancing along the text center line in the bill text information region by a sliding window method to realize the single character segmentation of the bill text information region;
and D, sequentially classifying and identifying the single segmented character to form finished bill text information.
The invention has the beneficial effects that:
by utilizing the advantages of deep learning in feature extraction, induction and reasoning, the bill text detection is divided into three tasks of bill text region segmentation, bill text character classification and recognition, and a deep neural network is trained under the support of a large amount of labeled data, so that the efficient and accurate detection and recognition of the bill text information are realized.
Drawings
FIG. 1 is a flow chart of a method for detecting a bill text based on multitask deep learning according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to the following examples and the accompanying drawings.
The invention discloses a note text detection method based on multitask deep learning, which comprises the following steps of:
introducing cavity convolution into the convolution layer of the feature extraction backbone network, namely performing bilinear difference on the feature map after the convolution operation of the previous layer to enlarge the resolution of the convolution feature map, and then performing convolution operation of the convolution layer to enlarge the convolution receptive field under the condition of ensuring that the convolution kernel parameters are not changed so as to obtain richer bill image features;
in the process of extracting the bill image features by the multi-convolution layer neural network, the output feature vectors of the lower-layer convolution layer and the output vectors of the higher-layer convolution layer are spliced to form a final output feature map so as to retain the edge and texture features in the lower-layer convolution layer and the semantic features in the higher-layer convolution layer.
the bill text information region segmentation and text center line detection parameters comprise center line pixel point coordinates (x) i ,y i ) And the offset of the central line pixel point to the boundary of the text regionCenter line pixel to textPresent area lower boundary offset->And training the network by taking the parameters as output targets to obtain the results of segmentation of the bill text information region and detection of the text region center line.
in the bill text information area, the bill text information area is advanced along the text center line in a sliding window method, and for each pixel (x) on the center line i ,y i ) The distances between four vertexes of upper left, upper right, lower left and lower back of each character and the central line pixel are predicted to be respectivelyThe true distance of each character from the center line is->Constructing a loss function:
wherein alpha is lt 、α rt 、α ld 、α rd And correcting terms for each distance loss to control the specific gravity of each distance loss.
and (3) pre-training a Softmax multi-classifier on the character image data set, and sequentially adopting the Softmax multi-classifier to classify and identify the single character obtained by the segmentation in the step 30 to form complete bill text information.
Although the embodiments of the present invention have been described above, the above descriptions are only for the convenience of understanding the present invention, and are not intended to limit the present invention. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (4)
1. A bill text detection method based on multitask deep learning is characterized by comprising the following steps:
a, constructing a multilayer convolutional neural network as an image feature extraction backbone network to realize feature extraction of the bill image;
marking a note text area and an area center line on the convolution characteristic diagram, and training the note text area and the area center line to realize note text information area segmentation and text center line detection;
c, advancing along the text center line in the bill text information region by a sliding window method to realize the single character segmentation of the bill text information region;
d, sequentially classifying and identifying the single divided character to form finished bill text information;
in the step B, parameters in the bill text information region segmentation and text center line detection are taken as an output target training network, and the bill text information region segmentation and text region center line detection results are obtained; the parameters in the bill text information region segmentation and text center line detection comprise center line pixel point coordinates (x) i ,y i ) And the boundary offset from the central line pixel point to the text areaOffset from center line pixel to lower boundary of text region->
In step C, each pixel (x) on the center line of the text is passed i ,y i ) Predicting the distances between four vertexes of upper left, upper right, lower left and lower back of each character and the central line pixelFrom is respectively The true distance of each character from the center line is->Constructing a loss function:
wherein alpha is lt 、α rt 、α ld 、α rd And correcting terms for each distance loss to control the specific gravity of each distance loss.
2. The method for detecting the bill texts based on the multitask deep learning as claimed in claim 1, wherein in the step a, a hole convolution is introduced into a convolution layer of the feature extraction backbone network, that is, a bilinear difference value is performed on a feature map after a previous layer of convolution operation, so that the resolution of the convolution feature map is enlarged, then the convolution operation of the convolution layer is performed, and under the condition that the convolution kernel parameters are not changed, the convolution receptive field is enlarged, so that richer bill image features are obtained.
3. The method for detecting the bill texts based on the multitask deep learning as claimed in claim 1, wherein in the step a, in the process of extracting the bill image features through the multi-convolution layer neural network, the output feature vectors of the low-layer convolution layer and the output vectors of the high-layer convolution layer are spliced to form a final output feature map so as to retain the edge and texture features in the low-layer convolution layer and the semantic features in the high-layer convolution layer.
4. The method for detecting the bill texts based on the multitask deep learning as claimed in claim 1, wherein in the step D, characters Softmax multi-classifiers are pre-trained through a character image data set, and the single characters obtained by the segmentation in the step C are sequentially classified and recognized to form complete bill text information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911225976.8A CN111027443B (en) | 2019-12-04 | 2019-12-04 | Bill text detection method based on multitask deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911225976.8A CN111027443B (en) | 2019-12-04 | 2019-12-04 | Bill text detection method based on multitask deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111027443A CN111027443A (en) | 2020-04-17 |
CN111027443B true CN111027443B (en) | 2023-04-07 |
Family
ID=70204203
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911225976.8A Active CN111027443B (en) | 2019-12-04 | 2019-12-04 | Bill text detection method based on multitask deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111027443B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112926372B (en) * | 2020-08-22 | 2023-03-10 | 清华大学 | Scene character detection method and system based on sequence deformation |
CN112101385B (en) * | 2020-09-21 | 2022-06-10 | 西南大学 | Weak supervision text detection method |
CN112541491B (en) * | 2020-12-07 | 2024-02-02 | 沈阳雅译网络技术有限公司 | End-to-end text detection and recognition method based on image character region perception |
CN113139548B (en) * | 2020-12-31 | 2022-05-06 | 重庆邮电大学 | Mathematical formula identification method based on operator action domain and center line |
CN113744278A (en) * | 2021-01-20 | 2021-12-03 | 北京沃东天骏信息技术有限公司 | Text detection method and device |
CN112949621A (en) * | 2021-03-16 | 2021-06-11 | 新东方教育科技集团有限公司 | Method and device for marking test paper answering area, storage medium and electronic equipment |
CN113191251B (en) * | 2021-04-28 | 2023-04-07 | 北京有竹居网络技术有限公司 | Method and device for detecting stroke order, electronic equipment and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103208004A (en) * | 2013-03-15 | 2013-07-17 | 北京英迈杰科技有限公司 | Automatic recognition and extraction method and device for bill information area |
CN107133616A (en) * | 2017-04-02 | 2017-09-05 | 南京汇川图像视觉技术有限公司 | A kind of non-division character locating and recognition methods based on deep learning |
CN108446621A (en) * | 2018-03-14 | 2018-08-24 | 平安科技(深圳)有限公司 | Bank slip recognition method, server and computer readable storage medium |
CN109977723A (en) * | 2017-12-22 | 2019-07-05 | 苏宁云商集团股份有限公司 | Big bill picture character recognition methods |
CN110033000A (en) * | 2019-03-21 | 2019-07-19 | 华中科技大学 | A kind of text detection and recognition methods of bill images |
CN110175610A (en) * | 2019-05-23 | 2019-08-27 | 上海交通大学 | A kind of bill images text recognition method for supporting secret protection |
CN110490193A (en) * | 2019-07-24 | 2019-11-22 | 西安网算数据科技有限公司 | Single Text RegionDetection method and ticket contents recognition methods |
-
2019
- 2019-12-04 CN CN201911225976.8A patent/CN111027443B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103208004A (en) * | 2013-03-15 | 2013-07-17 | 北京英迈杰科技有限公司 | Automatic recognition and extraction method and device for bill information area |
CN107133616A (en) * | 2017-04-02 | 2017-09-05 | 南京汇川图像视觉技术有限公司 | A kind of non-division character locating and recognition methods based on deep learning |
CN109977723A (en) * | 2017-12-22 | 2019-07-05 | 苏宁云商集团股份有限公司 | Big bill picture character recognition methods |
CN108446621A (en) * | 2018-03-14 | 2018-08-24 | 平安科技(深圳)有限公司 | Bank slip recognition method, server and computer readable storage medium |
CN110033000A (en) * | 2019-03-21 | 2019-07-19 | 华中科技大学 | A kind of text detection and recognition methods of bill images |
CN110175610A (en) * | 2019-05-23 | 2019-08-27 | 上海交通大学 | A kind of bill images text recognition method for supporting secret protection |
CN110490193A (en) * | 2019-07-24 | 2019-11-22 | 西安网算数据科技有限公司 | Single Text RegionDetection method and ticket contents recognition methods |
Also Published As
Publication number | Publication date |
---|---|
CN111027443A (en) | 2020-04-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111027443B (en) | Bill text detection method based on multitask deep learning | |
CN107392141B (en) | Airport extraction method based on significance detection and LSD (least squares distortion) line detection | |
Rehman et al. | Performance analysis of character segmentation approach for cursive script recognition on benchmark database | |
CN111028217A (en) | Image crack segmentation method based on full convolution neural network | |
CN104408449B (en) | Intelligent mobile terminal scene literal processing method | |
CN105931253A (en) | Image segmentation method combined with semi-supervised learning | |
CN114372955A (en) | Casting defect X-ray diagram automatic identification method based on improved neural network | |
CN109766823A (en) | A kind of high-definition remote sensing ship detecting method based on deep layer convolutional neural networks | |
CN108596952B (en) | Rapid deep learning remote sensing image target detection method based on candidate region screening | |
CN110991439A (en) | Method for extracting handwritten characters based on pixel-level multi-feature joint classification | |
CN110751154A (en) | Complex environment multi-shape text detection method based on pixel-level segmentation | |
CN114581782A (en) | Fine defect detection method based on coarse-to-fine detection strategy | |
Alrehali et al. | Historical Arabic manuscripts text recognition using convolutional neural network | |
Van Phan et al. | A nom historical document recognition system for digital archiving | |
Mahajan et al. | Word level script identification using convolutional neural network enhancement for scenic images | |
CN113269049A (en) | Method for detecting handwritten Chinese character area | |
CN114187595A (en) | Document layout recognition method and system based on fusion of visual features and semantic features | |
CN110910388A (en) | Cancer cell image segmentation method based on U-Net and density estimation | |
CN111832497B (en) | Text detection post-processing method based on geometric features | |
CN105844299B (en) | A kind of image classification method based on bag of words | |
CN112633327A (en) | Staged metal surface defect detection method, system, medium, equipment and application | |
Yao et al. | Invoice detection and recognition system based on deep learning | |
Santosh et al. | Scalable arrow detection in biomedical images | |
Radwan et al. | Predictive segmentation using multichannel neural networks in Arabic OCR system | |
Rong et al. | Weakly supervised text attention network for generating text proposals in scene images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |