CN108133212B - Quota invoice amount recognition system based on deep learning - Google Patents

Quota invoice amount recognition system based on deep learning Download PDF

Info

Publication number
CN108133212B
CN108133212B CN201810011763.4A CN201810011763A CN108133212B CN 108133212 B CN108133212 B CN 108133212B CN 201810011763 A CN201810011763 A CN 201810011763A CN 108133212 B CN108133212 B CN 108133212B
Authority
CN
China
Prior art keywords
image
module
deep learning
recognition
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810011763.4A
Other languages
Chinese (zh)
Other versions
CN108133212A (en
Inventor
李顿伟
王直杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Donghua University
Original Assignee
Donghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Donghua University filed Critical Donghua University
Priority to CN201810011763.4A priority Critical patent/CN108133212B/en
Publication of CN108133212A publication Critical patent/CN108133212A/en
Application granted granted Critical
Publication of CN108133212B publication Critical patent/CN108133212B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/243Aligning, centring, orientation detection or correction of the image by compensating for image skew or non-uniform image deformations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/2163Partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Image Analysis (AREA)
  • Character Input (AREA)

Abstract

The invention relates to a quota invoice amount identification system based on deep learning, which comprises an image acquisition module, an image rotation module, an image identification module and a result storage module, wherein the image acquisition module is used for acquiring an image file; the image rotation module is used for correcting the picture file; the image recognition module obtains the specific position of the image file to be recognized by using a deep learning model and performs image recognition; and the result storage module is used for storing the final recognition result. The invention can improve the OCR recognition rate when the image is polluted.

Description

Quota invoice amount recognition system based on deep learning
Technical Field
The invention relates to the technical field of image recognition, in particular to a quota invoice amount recognition system based on deep learning.
Background
The concept of OCR (Optical Character Recognition) was proposed earlier than in the 1920 s and has been an important research direction in the field of pattern Recognition.
In recent years, with the rapid update and iteration of mobile devices and the rapid development of mobile internet, OCR has a wider application scene, from the character recognition of the original scanned files to the recognition of picture characters in natural scenes, such as the characters in identification cards, bank cards, house numbers, bills and various network pictures.
Conventional OCR techniques are as follows:
firstly, text positioning is carried out, then inclined text correction is carried out, then single characters are segmented, the single characters are identified, and finally semantic error correction is carried out based on a statistical model (such as hidden Markov chain (HMM)). The treatment mode can be divided into three stages: a preprocessing stage, an identification stage and a post-processing stage. The key is the preprocessing stage, the quality of which directly determines the final recognition result, and therefore the following preprocessing stage is described in detail herein.
The pretreatment stage comprises three steps:
(1) the method is characterized in that a character area in a picture is positioned, character detection is mainly based on a connected domain analysis method, the main idea is to rapidly separate the character area from a non-character area by clustering character color, brightness and edge information, and two popular algorithms are as follows: the method comprises the following steps that a maximum extremum stable region (MSER) algorithm and a Stroke Width Transformation (SWT) algorithm are adopted, and in a natural scene, due to the interference of illumination intensity, picture shooting quality and character-like background, detection results contain a great number of non-character regions, at present, two main methods for distinguishing true character regions from candidate regions are adopted, and regular judgment or a lightweight neural network model is adopted for distinguishing;
(2) correcting the text region image, which is mainly based on rotation transformation and affine transformation;
(3) the single character is extracted by the line and row division, the line and row division point is found out by binarization and projection by utilizing the characteristic that the characters have gaps between the lines and the row, when the distinguishing degree of the characters and the background is good, the effect is good, the influence of illumination and image pickup quality in the shot picture is caused, and when the character background is difficult to distinguish, the wrong division condition is often caused.
Therefore, the conventional OCR recognition framework has more steps, so that error accumulation is easily caused to influence the final recognition result.
Disclosure of Invention
The invention aims to solve the technical problem of providing a quota invoice amount recognition system based on deep learning, which can improve the OCR recognition rate when an image is polluted.
The technical scheme adopted by the invention for solving the technical problems is as follows: the system comprises an image acquisition module, an image rotation module, an image identification module and a result storage module, wherein the image acquisition module is used for acquiring an image file; the image rotation module is used for correcting the picture file; the image recognition module obtains the specific position of the image file to be recognized by using a deep learning model and performs image recognition; and the result storage module is used for storing the final recognition result.
The image rotation module corrects the picture file in a mode of combining tesseract adjusting direction and opencv rotation adjusting angle.
The image rotation module extracts straight lines through Hough transformation, and respectively calculates the distances from the original points corresponding to a plurality of angles to the straight lines from the pixels at the top ends of the straight lines; and traversing pixel points of the whole image, finding out the distance which is repeated most, obtaining a linear equation corresponding to the straight line, and finally obtaining the rotation angle.
The image rotation module obtains the rotation angle of the image characters by using tesseract.
The image recognition module comprises a sample processing unit, an image training unit and a test unit; the sample processing unit is used for sorting the collected sample pictures and marking the picture types to obtain an xml file corresponding to the pictures, wherein the xml file comprises the type information and the position information of the pictures; the image training unit adopts 24 convolutional layers and 2 full-link layers, wherein the convolutional layers are used for extracting features, the full-link layers are used for predicting results, and the output of the last layer is k dimensions, wherein k is S (B5 + C), k comprises category prediction and bbox coordinate prediction, S is the number of divided grids, B is the number of targets in charge of each grid, and C is the number of categories; the testing unit multiplies the predicted type information of each grid by the predicted authentication information of each bounding box to obtain the optimal score of each bounding box, sets a threshold value to filter out the result with low score, and carries out NMS processing on the retained result to obtain the final detection result.
Advantageous effects
Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects: compared with the traditional OCR recognition framework, the method has the advantages that steps are reduced, and the influence of error accumulation on the final recognition result is reduced. The invention realizes the combination of deep learning and OCR image recognition, can greatly improve the OCR recognition rate when the image is polluted, and is convenient to operate; the system can be applied to the accounting field, can improve the working efficiency of accountants, and can liberate the accountants from fussy work.
Drawings
FIG. 1 is a block diagram of the system of the present invention;
FIG. 2 is an internal structural view of the present invention;
fig. 3A-3B are graphs of recognition results after an embodiment of the present invention is employed.
Detailed Description
The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.
The embodiment of the invention relates to a quota invoice amount recognition system based on deep learning, which comprises an image acquisition module, an image rotation module, an image recognition module and a result storage module, wherein the image acquisition module is used for acquiring an image file; the image rotation module is used for correcting the picture file; the image recognition module obtains the specific position of the image file to be recognized by using a deep learning model and performs image recognition; and the result storage module is used for storing the final recognition result.
As shown in fig. 2, the present embodiment identifies the amount of money on the invoice, the invoice code, and the invoice number, based on the quota invoice scanned by the customer. Since the picture uploaded by the client may be tilted or inverted, a rotation step is added to facilitate later recognition, and the rotated picture is passed ocr to obtain the above fields.
The image rotation aims at that some pictures uploaded by the user scanning are inclined or inverted. The image rotation module in the embodiment corrects the picture by combining the tesseract adjustment direction with the opencv rotation adjustment small angle.
Opencv rotation regulation
The embodiment mainly uses a method of opencv line extraction, and then obtains the inclination angle of the straight line, wherein the method of straight line extraction is Hough transformation.
At any point O (X, Y) in the rectangular coordinate system, any straight line passing through O satisfies Y ═ kX + b (divided by a straight line perpendicular to the X axis). Due to this special case, converting the coordinate system to a polar coordinate system suffices.
In the polar coordinate system, any straight line may be represented by ρ ═ xCos θ + ySin θ.
Assuming that there is a straight line in a 10 × 10 image, the distances from the corresponding origin to the straight line when the angles are 180 °, 135 °, 90 °, 45 °, and 0 ° are calculated respectively from the top pixel points of the straight line of the image. Repeating the previous steps after traversing pixel points of the whole image, finding the distance with the most repetition, obtaining the corresponding linear equation, and obtaining the angle.
When a plurality of straight lines are found in a picture, the angle with the highest angular frequency is taken as the rotation angle of the picture.
Tesseract rotation
Tesseract is an OCR engine developed by Ray Smithf in Hewlett packard laboratories between 1985 and 1995, once named president in the 1995UNLV accuracy test. But development was essentially stopped after 1996. In 2006, Google invited Smith to join, restarting the project. The license for the project is currently Apache 2.0. The project supports mainstream platforms such as Windows, Linux and Mac OS at present. But as an engine it only provides command line tools.
Tesserract can identify most text languages (including Chinese), can obtain the text content on the picture and the rotation angle (270 degrees, 180 degrees, 90 degrees and 0 degrees) of the picture characters, and because the identification precision is not high, the rotation angle of the image characters can be obtained by using only tesseract in the embodiment. tesseract only accepts the grey-scale image, so the input color image needs to be converted into a grey-scale image.
In the embodiment, the image recognition module recognizes by using a deep learning method, and here, a deep learning target detection method yolo (youonly lookup) is used.
The idea of YOLO: the position of the bounding box and the category to which the bounding box belongs are directly returned in the output layer (the whole graph is used as the input of the network, and the Object Detection problem is converted into a Regression problem).
1. Sample treatment:
and (3) arranging the collected sample pictures, and marking the picture categories by using labelme software to obtain corresponding xml files, wherein the files contain the information and the positions of the categories in the pictures.
2. And (3) image training:
first, the picture is normalized to 448 x 448, and the picture is divided into 7 x 7 grids (cells), and the centers of the objects fall into the grids, so that the grids are responsible for predicting the objects.
CNN extraction features and predictions: convolution is responsible for extracting features; the full link part is responsible for the prediction. The final layer output is k dimensions. Wherein
k=S*S(B*5+C) (1)
k contains the class prediction and the bbox coordinate prediction. S is the number of the divided grids, B is the number of targets in charge of each grid, and C is the number of categories. Where 5 includes predicted center point coordinates, width and height, and class prediction. The bbox coordinate prediction is expressed as:
Figure BDA0001540555160000041
wherein if a ground true box (manually marked object) falls in a grid cell, the first term is 1, otherwise 0 is taken. The second term is the IOU value between the predicted bounding box and the actual groudtuthbox.
The network structure is referred to GoogleLeNet. 24 convolutional layers, 2 full link layers. (inceptionmodules replacing Goolenet with 1X 1reduction layers followed by 3X 3 conditional layers)
The design goal of the loss function is to balance the coordinates (x, y, w, h), confidence, classification.
For different sizes of bbox predictions, a small box prediction bias is less tolerable than a large bbox prediction bias. And the same offset loss is the same in the total weighted loss. To alleviate this problem, the present embodiment replaces the original width and height with the square root of the box width and height.
A grid predicts a plurality of bounding boxes, and during training, we hope that only one bounding box is exclusively responsible for (one object and one bbox) in each class (grounttruebox). Specifically, the bounding box with the largest IOU of the group truebox (object) is responsible for the prediction of the group truebox (object). This practice is called specialization of building boxpredictor. Each predictor will predict better and better for a particular (sizes) ingredient or class of object.
3. A test module:
class information Pr (Class) predicted by each grid at the time of testiObject) and bounding box predicted confidence information
Figure BDA0001540555160000051
Multiplying the result to obtain the best score of each bounding box. After the best score of each bbox is obtained, a threshold value is set, the boxes with low scores are filtered out, NMS treatment is carried out on the reserved boxes, and the final detection result is obtained. Fig. 3A-3B are graphs of recognition results after the present invention has been employed.
Compared with the traditional OCR recognition framework, the method reduces steps and reduces the influence on the final recognition result due to error accumulation. The invention realizes the combination of deep learning and OCR image recognition, can greatly improve the OCR recognition rate when the image is polluted, and is convenient to operate; the system can be applied to the accounting field, can improve the working efficiency of accountants, and can liberate the accountants from fussy work.

Claims (4)

1. A quota invoice amount identification system based on deep learning comprises an image acquisition module, an image rotation module, an image identification module and a result storage module, and is characterized in that the image acquisition module is used for acquiring an image file; the image rotation module is used for correcting the picture file, and the image rotation module corrects the picture file in a mode of combining tesseract adjusting direction and opencv rotation adjusting angle; the image recognition module obtains the specific position of the image file to be recognized by using a deep learning model and performs image recognition; and the result storage module is used for storing the final recognition result.
2. The deep learning based quota invoice amount recognition system of claim 1, wherein the image rotation module extracts straight lines through Hough transformation, and calculates distances from origin points corresponding to a plurality of angles to the straight lines respectively from top-end pixel points of the straight lines; and traversing pixel points of the whole image, finding out the distance which is repeated most, obtaining a linear equation corresponding to the straight line, and finally obtaining the rotation angle.
3. The deep learning based quota invoice amount identification system of claim 1, wherein the image rotation module uses tesseract to derive a rotation angle of the image text.
4. The deep learning based quota invoice amount identification system of claim 1, wherein the image recognition module comprises a sample processing unit, an image training unit and a testing unit; the sample processing unit is used for sorting the collected sample pictures and marking the picture types to obtain an xml file corresponding to the pictures, wherein the xml file comprises the type information and the position information of the pictures; the image training unit adopts 24 convolutional layers and 2 full-link layers, wherein the convolutional layers are used for extracting features, the full-link layers are used for predicting results, and the output of the last layer is k dimensions, wherein k is S (B5 + C), k comprises category prediction and bbox coordinate prediction, S is the number of divided grids, B is the number of targets in charge of each grid, and C is the number of categories; the testing unit multiplies the predicted type information of each grid by the predicted authentication information of each bounding box to obtain the optimal score of each bounding box, sets a threshold value to filter out the result with low score, and carries out NMS processing on the retained result to obtain the final detection result.
CN201810011763.4A 2018-01-05 2018-01-05 Quota invoice amount recognition system based on deep learning Active CN108133212B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810011763.4A CN108133212B (en) 2018-01-05 2018-01-05 Quota invoice amount recognition system based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810011763.4A CN108133212B (en) 2018-01-05 2018-01-05 Quota invoice amount recognition system based on deep learning

Publications (2)

Publication Number Publication Date
CN108133212A CN108133212A (en) 2018-06-08
CN108133212B true CN108133212B (en) 2021-06-29

Family

ID=62399437

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810011763.4A Active CN108133212B (en) 2018-01-05 2018-01-05 Quota invoice amount recognition system based on deep learning

Country Status (1)

Country Link
CN (1) CN108133212B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109086756B (en) * 2018-06-15 2021-08-03 众安信息技术服务有限公司 Text detection analysis method, device and equipment based on deep neural network
CN109002768A (en) * 2018-06-22 2018-12-14 深源恒际科技有限公司 Medical bill class text extraction method based on the identification of neural network text detection
CN109816118B (en) * 2019-01-25 2022-12-06 上海深杳智能科技有限公司 Method and terminal for creating structured document based on deep learning model
CN109886257B (en) * 2019-01-30 2022-10-18 四川长虹电器股份有限公司 Method for correcting invoice image segmentation result by adopting deep learning in OCR system
CN109993160B (en) * 2019-02-18 2022-02-25 北京联合大学 Image correction and text and position identification method and system
CN109948617A (en) * 2019-03-29 2019-06-28 南京邮电大学 A kind of invoice image position method
WO2020223859A1 (en) * 2019-05-05 2020-11-12 华为技术有限公司 Slanted text detection method, apparatus and device
CN110348346A (en) * 2019-06-28 2019-10-18 苏宁云计算有限公司 A kind of bill classification recognition methods and system
CN110781726A (en) * 2019-09-11 2020-02-11 深圳壹账通智能科技有限公司 Image data identification method and device based on OCR (optical character recognition), and computer equipment
CN111160395A (en) * 2019-12-05 2020-05-15 北京三快在线科技有限公司 Image recognition method and device, electronic equipment and storage medium
CN111401371B (en) * 2020-06-03 2020-09-08 中邮消费金融有限公司 Text detection and identification method and system and computer equipment
CN112464872A (en) * 2020-12-11 2021-03-09 广东电网有限责任公司 Automatic extraction method and device based on NLP (non-line segment) natural language
CN112686319A (en) * 2020-12-31 2021-04-20 南京太司德智能电气有限公司 Merging method of electric power signal model training files
CN113159086B (en) * 2020-12-31 2024-04-30 南京太司德智能电气有限公司 Efficient electric power signal description model training method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103617415A (en) * 2013-11-19 2014-03-05 北京京东尚科信息技术有限公司 Device and method for automatically identifying invoice
CN104573688A (en) * 2015-01-19 2015-04-29 电子科技大学 Mobile platform tobacco laser code intelligent identification method and device based on deep learning
CN106096607A (en) * 2016-06-12 2016-11-09 湘潭大学 A kind of licence plate recognition method
CN107341523A (en) * 2017-07-13 2017-11-10 浙江捷尚视觉科技股份有限公司 Express delivery list information identifying method and system based on deep learning
CN107358232A (en) * 2017-06-28 2017-11-17 中山大学新华学院 Invoice recognition methods and identification and management system based on plug-in unit

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103617415A (en) * 2013-11-19 2014-03-05 北京京东尚科信息技术有限公司 Device and method for automatically identifying invoice
CN104573688A (en) * 2015-01-19 2015-04-29 电子科技大学 Mobile platform tobacco laser code intelligent identification method and device based on deep learning
CN106096607A (en) * 2016-06-12 2016-11-09 湘潭大学 A kind of licence plate recognition method
CN107358232A (en) * 2017-06-28 2017-11-17 中山大学新华学院 Invoice recognition methods and identification and management system based on plug-in unit
CN107341523A (en) * 2017-07-13 2017-11-10 浙江捷尚视觉科技股份有限公司 Express delivery list information identifying method and system based on deep learning

Also Published As

Publication number Publication date
CN108133212A (en) 2018-06-08

Similar Documents

Publication Publication Date Title
CN108133212B (en) Quota invoice amount recognition system based on deep learning
CN111325203B (en) American license plate recognition method and system based on image correction
CN109086714B (en) Form recognition method, recognition system and computer device
CN110363122B (en) Cross-domain target detection method based on multi-layer feature alignment
CN111611643B (en) Household vectorization data acquisition method and device, electronic equipment and storage medium
CN110619327A (en) Real-time license plate recognition method based on deep learning in complex scene
Lee et al. Binary segmentation algorithm for English cursive handwriting recognition
CN105308944A (en) Classifying objects in images using mobile devices
WO2022121039A1 (en) Bankcard tilt correction-based detection method and apparatus, readable storage medium, and terminal
US20100189316A1 (en) Systems and methods for graph-based pattern recognition technology applied to the automated identification of fingerprints
CN105283884A (en) Classifying objects in digital images captured using mobile devices
US20040086153A1 (en) Methods and systems for recognizing road signs in a digital image
CN104809481A (en) Natural scene text detection method based on adaptive color clustering
CN111783757A (en) OCR technology-based identification card recognition method in complex scene
Yin et al. Robust vanishing point detection for mobilecam-based documents
CN102646193A (en) Segmentation method of character images distributed in ring shape
CN108681735A (en) Optical character recognition method based on convolutional neural networks deep learning model
CN105335760A (en) Image number character recognition method
CN108961262B (en) Bar code positioning method in complex scene
CN113128507A (en) License plate recognition method and device, electronic equipment and storage medium
Wang et al. Scene text recognition via gated cascade attention
CN111695373A (en) Zebra crossing positioning method, system, medium and device
CN116740758A (en) Bird image recognition method and system for preventing misjudgment
CN109657682B (en) Electric energy representation number identification method based on deep neural network and multi-threshold soft segmentation
CN117115614B (en) Object identification method, device, equipment and storage medium for outdoor image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant