CN109977957A - A kind of invoice recognition methods and system based on deep learning - Google Patents

A kind of invoice recognition methods and system based on deep learning Download PDF

Info

Publication number
CN109977957A
CN109977957A CN201910161502.5A CN201910161502A CN109977957A CN 109977957 A CN109977957 A CN 109977957A CN 201910161502 A CN201910161502 A CN 201910161502A CN 109977957 A CN109977957 A CN 109977957A
Authority
CN
China
Prior art keywords
invoice
image
detection model
training
position coordinates
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910161502.5A
Other languages
Chinese (zh)
Inventor
郭近之
杨现
王云龙
刘亚峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suningcom Group Co Ltd
Original Assignee
Suningcom Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suningcom Group Co Ltd filed Critical Suningcom Group Co Ltd
Priority to CN201910161502.5A priority Critical patent/CN109977957A/en
Publication of CN109977957A publication Critical patent/CN109977957A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07DHANDLING OF COINS OR VALUABLE PAPERS, e.g. TESTING, SORTING BY DENOMINATIONS, COUNTING, DISPENSING, CHANGING OR DEPOSITING
    • G07D7/00Testing specially adapted to determine the identity or genuineness of valuable papers or for segregating those which are unacceptable, e.g. banknotes that are alien to a currency
    • G07D7/20Testing patterns thereon
    • G07D7/2008Testing patterns thereon using pre-processing, e.g. de-blurring, averaging, normalisation or rotation
    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07DHANDLING OF COINS OR VALUABLE PAPERS, e.g. TESTING, SORTING BY DENOMINATIONS, COUNTING, DISPENSING, CHANGING OR DEPOSITING
    • G07D7/00Testing specially adapted to determine the identity or genuineness of valuable papers or for segregating those which are unacceptable, e.g. banknotes that are alien to a currency
    • G07D7/20Testing patterns thereon
    • G07D7/202Testing patterns thereon using pattern matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Character Input (AREA)

Abstract

The present invention discloses a kind of invoice recognition methods and system based on deep learning, is related to invoice identification technology field, solve in the prior art because invoice type multiplicity and invoice paste it is lack of standardization caused by invoice OCR data acquisition inaccuracy technical problem.This method comprises: obtaining the more parts of sample expense reports for being labeled with invoice type and position coordinates, training dataset and validation data set are constructed;Based on training dataset using the corresponding a variety of invoice detection models of training of a variety of pre-network combination Faster-RCNN frames;Qualification is carried out to each invoice detection model by validation data set, filters out optimal invoice detection model;Invoice expense report is detected using optimal invoice detection model, identifies each invoice image and corresponding invoice type and position coordinates in invoice expense report;Text region is carried out to the corresponding invoice image of each position coordinate respectively using OCR model, and Text region content, invoice type and/or the position coordinates of each invoice are packaged as invoice recognition result and are exported.

Description

A kind of invoice recognition methods and system based on deep learning
Technical field
The present invention relates to invoice identification technology field more particularly to a kind of invoice recognition methods based on deep learning and it is System.
Background technique
In recent years, with the rapid development of social economy, economic activity is increasingly frequent, and either ordinary consumer is still Various types of enterprises must issue invoice to consumption, the coupon-based understanding that just can be carried out reimbursement is increasingly deep.Enterprise is in expense report It can all require employee that relevant invoice tiling is pasted on paper for image collection during pin, due to an expense reimbursement It may include a plurality of types of invoices, therefore multiple different types of invoices can be pasted on an expense report often, sometimes Also occur to save the breadth space of paper by invoice endways the case where pasting or upsiding down stickup, therefore is based on above-mentioned invoice type Multiplicity and invoice paste nonstandard situation, it will reduce the accuracy of invoice OCR data acquisition, and then will affect enterprise pair The fine-grained management of Fiscal.
Summary of the invention
The purpose of the present invention is to provide a kind of invoice recognition methods and system based on deep learning, are able to solve existing In technology because invoice type multiplicity and invoice paste it is lack of standardization caused by invoice OCR data acquisition inaccuracy technical problem.
To achieve the goals above, an aspect of of the present present invention provides a kind of invoice recognition methods based on deep learning, packet It includes:
Step S1, obtains the more parts of sample expense reports for being labeled with invoice type and position coordinates, building training dataset and Validation data set;
Step S3, based on the training dataset using the corresponding training of a variety of pre-network combination Faster-RCNN frames A variety of invoice detection models;
Step S4 carries out qualification to each invoice detection model by the validation data set, filters out optimal Invoice detection model;
Step S5 detects invoice expense report using optimal invoice detection model, identifies the invoice expense report In each invoice image and corresponding invoice type and position coordinates;
Step S7 carries out Text region to the corresponding invoice image of each position coordinate respectively using OCR model, and by each hair Text region content, invoice type and/or the position coordinates of ticket are packaged as invoice recognition result and export.
Preferably, between step S1 and step S3 further include:
Step S2, the sample expense report concentrated respectively to the training dataset and the verify data carry out image and adopt Collection, and each image is normalized according to parameter preset;
The normalization processing method includes image cropping, picture size adjustment, image rotation adjustment, brightness of image or right It is adjusted than degree, picture tone or saturation degree one of are adjusted or a variety of.
Illustratively, the pre-network includes Inception-v2 network, Resnet-50 network and Resnet-18 net Network.
Preferably, the verify data, which concentrates sample expense report amount of images at least to should be the training data, concentrates sample The one third of expense report amount of images.
Preferably, between step S5 and step S7 further include:
Step S61, the continuous character area in sample reimbursement single image that the training data is concentrated carry out manual frame Choosing mark;
Step S62 can identify that the frame selects the word area detection mould of tab area based on deep neural network training Type.
Further, the step S7 includes:
Step S71 detects the corresponding invoice image of each position coordinate by word area detection model, exports it In frame select tab area;
Step S72 calls OCR model to select tab area to carry out Text region whole frames in each invoice image, obtains Whole frames select the Text region content in tab area in the invoice image;
The Text region content, invoice type and/or position coordinates of each invoice image are associated with are packaged respectively by step S73.
Compared with prior art, the invoice recognition methods provided by the invention based on deep learning has following
The utility model has the advantages that
In invoice recognition methods provided by the invention based on deep learning, more parts of sample expense reports are obtained first, then More parts of sample expense reports are divided into training dataset and validation data set in proportion, it is contemplated that may on every sample expense report Multiple invoices are pasted, since the type of invoice is varied and size specification is different, for the ease of invoice detection model Training, need to the position coordinates of invoice type and stickup to every invoice on sample expense report be labeled, to realize using instruction Practice that the invoice detection model completed can accurately identify on invoice expense report the invoice type of every invoice and the position of stickup is sat Mark;In addition, the present invention also provides the schemes of a variety of pre-network combination Faster-RCNN frames in order to promote detection performance The a variety of invoice detection models of training simultaneously, by successively selecting validation data set to carry out performance to above-mentioned a variety of invoice detection models Verification filters out recognition result is accurate and recognition efficiency is high invoice detection model as optimal invoice detection model, then makes Invoice expense report is carried out with the optimal invoice detection model filtered out actually detected, identifies wherein each invoice image and correspondence Invoice type and position coordinates, and using OCR model respectively to the corresponding invoice image of each position coordinate carry out Text region, Text region content, invoice type and/or the position coordinates of each invoice are packaged as invoice recognition result finally and are exported.
As it can be seen that scheme of the present invention using depth learning technology training invoice detection model, it can be with the identification of efficiently and accurately Invoice type and ticket information on invoice expense report out efficiently solves the slow and error-prone technology of manual identified speed and asks Topic;Furthermore, it is contemplated that claimer is when pasting invoice document on invoice expense report, it will usually for saving expense report paper Purpose and invoice document is pasted too closely, knowing method for distinguishing according to stencil matching in the prior art or seal will It can not be accurately obtained the invoice image of every invoice document, based on this consideration, the present invention uses the side of position coordinates positioning Case then can effectively solve the above problems, and ensure that the validity that every invoice image obtains, and then ensure the standard of recognition result True property.
Another aspect of the present invention provides a kind of invoice identifying system based on deep learning, is applied to above-mentioned technical proposal In the invoice recognition methods based on deep learning, the system comprises:
Sample construction unit, for obtaining the more parts of sample expense reports for being labeled with invoice type and position coordinates, building instruction Practice data set and validation data set;
Invoice detection model training unit uses a variety of pre-network combination Faster- based on the training dataset The corresponding a variety of invoice detection models of training of RCNN frame;
Invoice detection model screening unit carries out performance school to each invoice detection model by the validation data set It tests, filters out optimal invoice detection model;
Invoice detection unit is identified for being detected using the optimal invoice detection model to invoice expense report Each invoice image and corresponding invoice type and position coordinates in the invoice expense report;
Output unit is identified, for carrying out text knowledge to the corresponding invoice image of each position coordinate respectively using OCR model Not, then Text region content, invoice type and/or the position coordinates of each invoice are packaged as invoice recognition result and are exported.
Preferably, between the sample construction unit and the invoice detection model training unit further include:
Image processing unit, sample expense report for concentrating respectively to the training dataset and the verify data into Row Image Acquisition, and each image is normalized according to parameter preset;
The normalization processing method includes image cropping, picture size adjustment, image rotation adjustment, brightness of image or right It is adjusted than degree, picture tone or saturation degree one of are adjusted or a variety of.
Preferably, between the invoice detection unit and the identification output unit further include:
Manual frame modeling block, for the training data to be concentrated sample reimbursement single image in continuous character area into Row craft frame choosing mark;
Word area detection model training module can identify that the frame selects tab area based on deep neural network training Word area detection model.
Preferably, the identification output unit includes:
Frame selects labeling module, is detected by word area detection model to the corresponding invoice image of each position coordinate, It exports frame therein and selects tab area;
OCR model module, for successively selecting tab area to carry out Text region to whole frames in each invoice image, It obtains in the invoice image whole frames and selects Text region content in tab area;
Output module is identified, for respectively by the corresponding Text region content of each invoice image, invoice type and/or position Coordinate association is packaged output.
Compared with prior art, the beneficial effect of the invoice identifying system provided by the invention based on deep learning with it is above-mentioned The beneficial effect for the invoice recognition methods based on deep learning that technical solution provides is identical, and this will not be repeated here.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes a part of the invention, this hair Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is the flow diagram of the invoice recognition methods based on deep learning in the embodiment of the present invention one;
Fig. 2 is the flow diagram of the optimal invoice detection model of training in Fig. 1;
Fig. 3 is the structural block diagram of the invoice identifying system based on deep learning in the embodiment of the present invention two.
Appended drawing reference:
1- sample construction unit, 2- image processing unit;
3- invoice detection model training unit, 4- invoice detection model screening unit;
5- invoice detection unit, 7- identify output unit;
The manual frame modeling block of 61-, 62- word area detection model training module;
71- frame selects labeling module, 72-OCR model module;
73- identifies output module.
Specific embodiment
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, implement below in conjunction with the present invention Attached drawing in example, technical scheme in the embodiment of the invention is clearly and completely described.Obviously, described embodiment Only a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, the common skill in this field Art personnel all other embodiment obtained without creative labor belongs to the model that the present invention protects It encloses.
Embodiment one
Fig. 1 and Fig. 2 are please referred to, the invoice recognition methods based on deep learning that the present embodiment provides a kind of, comprising:
Step S1, obtains the more parts of sample expense reports for being labeled with invoice type and position coordinates, building training dataset and Validation data set;Step S3 is more using the corresponding training of a variety of pre-network combination Faster-RCNN frames based on training dataset Kind invoice detection model;Step S4 carries out qualification to each invoice detection model by validation data set, filters out optimal hair Ticket detection model;Step S5 detects invoice expense report using optimal invoice detection model, identifies in invoice expense report Each invoice image and corresponding invoice type and position coordinates;Step S7, it is corresponding to each position coordinate respectively using OCR model Invoice image carries out Text region, and knows Text region content, invoice type and/or the position coordinates of each invoice as invoice Other result is packaged output.
In invoice recognition methods provided in this embodiment based on deep learning, more parts of sample expense reports are obtained first, so More parts of sample expense reports are divided into training dataset and validation data set in proportion afterwards, it is contemplated that can on every sample expense report Multiple invoices can be pasted, since the type of invoice is varied and size specification is different, for the ease of invoice detection model Training, need to the position coordinates of invoice type and stickup to every invoice on sample expense report be labeled, used with realizing The invoice detection model that training is completed can accurately identify on invoice expense report the invoice type of every invoice and the position of stickup Coordinate;In addition, in order to promote detection performance, the present embodiment additionally provides a variety of pre-network combination Faster-RCNN frames Scheme trains a variety of invoice detection models simultaneously, by successively selecting validation data set to carry out above-mentioned a variety of invoice detection models Qualification filters out recognition result is accurate and recognition efficiency is high invoice detection model as optimal invoice detection model, connects Invoice expense report is carried out using the optimal invoice detection model that filters out it is actually detected, identify wherein each invoice image and Corresponding invoice type and position coordinates, and text is carried out to the corresponding invoice image of each position coordinate respectively using OCR model Identification is finally packaged using Text region content, invoice type and/or the position coordinates of each invoice as invoice recognition result defeated Out.
As it can be seen that scheme of the present embodiment using depth learning technology training invoice detection model, it can be with the knowledge of efficiently and accurately Not Chu invoice type and ticket information on invoice expense report, efficiently solve the slow and error-prone technology of manual identified speed and ask Topic;Furthermore, it is contemplated that claimer is when pasting invoice document on invoice expense report, it will usually for saving expense report paper Purpose and invoice document is pasted too closely, knowing method for distinguishing according to stencil matching in the prior art or seal will It can not be accurately obtained the invoice image of every invoice document, based on this consideration, the present embodiment is using position coordinates positioning Scheme then can effectively solve the above problems, and ensure that the validity that every invoice image obtains, and then ensure recognition result Accuracy.
Preferably, referring to Fig. 1, between step S1 and step S3 further include:
Step S2, the sample expense report progress Image Acquisition that training dataset and verify data are concentrated respectively, and according to Each image is normalized in parameter preset;Normalization processing method includes image cropping, picture size adjustment, image rotation Modulation is whole, brightness of image or contrast adjustment, picture tone or saturation degree one of are adjusted or a variety of.
When it is implemented, the quality and size due to invoice Image Acquisition are different, it need to be to training dataset and verifying Invoice image in data set is normalized, and each invoice image is such as carried out figure according to preset parameterized template As cutting, picture size adjustment, image rotation adjustment, brightness of image or contrast adjustment, picture tone or saturation degree adjust etc. It is handled, to enhance the recognition effect of invoice image.
Optionally, the pre-network in above-described embodiment include Inception-v2 network, Resnet-50 network and Resnet-18 network.Wherein, Resnet-18 network is the network formed after Resnet-50 trimming optimizes.
For the accuracy of certified invoice detection model training, the invoice figure that validation data set and training data need to be concentrated As the scientific configuration of quantity progress, learnt through multiple practice analysis, verify data concentrates sample expense report amount of images at least to should be The one third of training data concentration sample expense report amount of images can just train the invoice detection model of better performance.Example Such as, when the amount total of sample expense report is 10000,7500 sample expense reports can be summarized in training dataset, remained 2500 sample expense reports of remaininging are summarized in validation data set.
Referring to Fig. 1, between step S5 and step S7 further include:
Step S61, the continuous character area in sample reimbursement single image that training data is concentrated carry out manual frame choosing mark Note;Step S62 can identify that frame selects the word area detection model of tab area based on deep neural network training.
For practical application scene, we manual frame choosing are labelled with 7500 sample expense reports, substantially cover actual field Then all invoice types in scape realize model training end to end using deep learning, finally obtain word area detection Model.
Specifically, step S7 includes:
Step S71 detects the corresponding invoice image of each position coordinate by word area detection model, exports it In frame select tab area;Step S72 calls OCR model to select tab area to carry out text whole frames in each invoice image Word identification obtains in the invoice image whole frames and selects Text region content in tab area;Step S73, respectively by each invoice Text region content, invoice type and/or the position coordinates association of image are packaged.By will be in the Text region of each invoice image Appearance, invoice type and/or position coordinates association are packaged storage, the electronic management of reimbursement document are realized, in case subsequent access It retrieves for examination.
Embodiment two
Fig. 1 and Fig. 3 are please referred to, the invoice identifying system based on deep learning that the present embodiment provides a kind of, comprising:
Sample construction unit 1, for obtaining the more parts of sample expense reports for being labeled with invoice type and position coordinates, building instruction Practice data set and validation data set;
Invoice detection model training unit 3 is combined based on training dataset using a variety of pre-networks
The corresponding a variety of invoice detection models of training of Faster-RCNN frame;
Invoice detection model screening unit 4 carries out qualification, screening to each invoice detection model by validation data set Optimal invoice detection model out;
Invoice detection unit 5 identifies invoice for detecting using optimal invoice detection model to invoice expense report Each invoice image and corresponding invoice type and position coordinates in expense report;
Output unit 7 is identified, for carrying out text knowledge to the corresponding invoice image of each position coordinate respectively using OCR model Not, then Text region content, invoice type and/or the position coordinates of each invoice are packaged as invoice recognition result and are exported.
Preferably, between sample construction unit 1 and invoice detection model training unit 3 further include:
Image processing unit 2, the sample expense report for concentrating respectively to training dataset and verify data carry out image Acquisition, and each image is normalized according to parameter preset;
Normalization processing method includes image cropping, picture size adjustment, image rotation adjustment, brightness of image or contrast One of adjusting, picture tone or saturation degree adjusting are a variety of.
Preferably, at single 5 yuan of invoice detection between identification output unit 7 further include:
Manual frame modeling block 61, the continuous character area in sample reimbursement single image for concentrating training data carry out Manual frame choosing mark;
Word area detection model training module 62 can identify that frame selects tab area based on deep neural network training Word area detection model.
Preferably, identification output unit 7 includes:
Frame selects labeling module 71, is examined by word area detection model to the corresponding invoice image of each position coordinate It surveys, exports frame therein and select tab area;
OCR model module 72, for successively selecting tab area to carry out text knowledge to whole frames in each invoice image Not, whole frames are obtained in the invoice image and select Text region content in tab area;
Output module 73 is identified, for respectively by the corresponding Text region content of each invoice image, invoice type and/or position It sets coordinate association and is packaged output.
Compared with prior art, the beneficial effect of the invoice identifying system provided in an embodiment of the present invention based on deep learning Identical as the beneficial effect of invoice recognition methods based on deep learning that above-described embodiment one provides, this will not be repeated here.
It will appreciated by the skilled person that realizing that all or part of the steps in foregoing invention method is can to lead to Program is crossed to instruct relevant hardware and complete, above procedure can store in computer-readable storage medium, the program When being executed, each step including above-described embodiment method, and the storage medium may is that ROM/RAM, magnetic disk, CD, Storage card etc..
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain Lid is within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.

Claims (10)

1. a kind of invoice recognition methods based on deep learning characterized by comprising
Step S1 obtains the more parts of sample expense reports for being labeled with invoice type and position coordinates, constructs training dataset and verifying Data set;
Step S3, it is a variety of using the corresponding training of a variety of pre-network combination Faster-RCNN frames based on the training dataset Invoice detection model;
Step S4 carries out qualification to each invoice detection model by the validation data set, filters out optimal invoice Detection model;
Step S5 detects invoice expense report using optimal invoice detection model, identifies each in the invoice expense report Invoice image and corresponding invoice type and position coordinates;
Step S7 carries out Text region to the corresponding invoice image of each position coordinate respectively using OCR model, and by each invoice Text region content, invoice type and/or position coordinates are packaged as invoice recognition result and export.
2. the method according to claim 1, wherein between step S1 and step S3 further include:
Step S2, the sample expense report concentrated respectively to the training dataset and the verify data carry out Image Acquisition, and Each image is normalized according to parameter preset;
The normalization processing method includes image cropping, picture size adjustment, image rotation adjustment, brightness of image or contrast One of adjusting, picture tone or saturation degree adjusting are a variety of.
3. the method according to claim 1, wherein the pre-network include Inception-v2 network, Resnet-50 network and Resnet-18 network.
4. according to the method described in claim 2, it is characterized in that, the verify data concentrates sample expense report amount of images extremely It should be the one third that the training data concentrates sample expense report amount of images less.
5. the method according to claim 1, wherein between step S5 and step S7 further include:
Step S61, the continuous character area in sample reimbursement single image that the training data is concentrated carry out manual frame choosing mark Note;
Step S62 can identify that the frame selects the word area detection model of tab area based on deep neural network training.
6. according to the method described in claim 5, it is characterized in that, the step S7 includes:
Step S71 detects the corresponding invoice image of each position coordinate by word area detection model, exports therein Frame selects tab area;
Step S72 calls OCR model to select tab area to carry out Text region whole frames in each invoice image, is somebody's turn to do Whole frames select the Text region content in tab area in invoice image;
The Text region content, invoice type and/or position coordinates of each invoice image are associated with are packaged respectively by step S73.
7. a kind of invoice identifying system based on deep learning characterized by comprising
Sample construction unit constructs training number for obtaining the more parts of sample expense reports for being labeled with invoice type and position coordinates According to collection and validation data set;
Invoice detection model training unit uses a variety of pre-network combination Faster-RCNN frames based on the training dataset The corresponding a variety of invoice detection models of training of frame;
Invoice detection model screening unit carries out qualification to each invoice detection model by the validation data set, Filter out optimal invoice detection model;
Invoice detection unit is identified described for being detected using the optimal invoice detection model to invoice expense report Each invoice image and corresponding invoice type and position coordinates in invoice expense report;
Identify output unit, for carrying out Text region to the corresponding invoice image of each position coordinate respectively using OCR model, so Text region content, invoice type and/or the position coordinates of each invoice are packaged as invoice recognition result afterwards and are exported.
8. system according to claim 7, which is characterized in that in the sample construction unit and the invoice detection model Between training unit further include:
Image processing unit, the sample expense report for concentrating respectively to the training dataset and the verify data carry out figure As acquisition, and each image is normalized according to parameter preset;
The normalization processing method includes image cropping, picture size adjustment, image rotation adjustment, brightness of image or contrast One of adjusting, picture tone or saturation degree adjusting are a variety of.
9. system according to claim 7, which is characterized in that in the invoice detection unit and the identification output unit Between further include:
Manual frame modeling block, the continuous character area in sample reimbursement single image for concentrating the training data carry out hand Work frame choosing mark;
Word area detection model training module can identify that the frame selects the text of tab area based on deep neural network training Block domain detection model.
10. system according to claim 7, which is characterized in that the identification output unit includes:
Frame selects labeling module, is detected by word area detection model to the corresponding invoice image of each position coordinate, output Frame therein selects tab area;
OCR model module is obtained for successively selecting tab area to carry out Text region to whole frames in each invoice image Whole frames select the Text region content in tab area in the invoice image;
Output module is identified, for respectively by the corresponding Text region content of each invoice image, invoice type and/or position coordinates Association is packaged output.
CN201910161502.5A 2019-03-04 2019-03-04 A kind of invoice recognition methods and system based on deep learning Pending CN109977957A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910161502.5A CN109977957A (en) 2019-03-04 2019-03-04 A kind of invoice recognition methods and system based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910161502.5A CN109977957A (en) 2019-03-04 2019-03-04 A kind of invoice recognition methods and system based on deep learning

Publications (1)

Publication Number Publication Date
CN109977957A true CN109977957A (en) 2019-07-05

Family

ID=67077833

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910161502.5A Pending CN109977957A (en) 2019-03-04 2019-03-04 A kind of invoice recognition methods and system based on deep learning

Country Status (1)

Country Link
CN (1) CN109977957A (en)

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472524A (en) * 2019-07-25 2019-11-19 广东工业大学 Invoice information management method, system and readable medium based on deep learning
CN110705533A (en) * 2019-09-09 2020-01-17 武汉联析医疗技术有限公司 AI recognition and grabbing system for inspection report
CN110728566A (en) * 2019-09-17 2020-01-24 卓尔智联(武汉)研究院有限公司 Data processing method and device in reimbursement file, computer equipment and storage medium
CN110807455A (en) * 2019-09-19 2020-02-18 平安科技(深圳)有限公司 Bill detection method, device and equipment based on deep learning and storage medium
CN110866495A (en) * 2019-11-14 2020-03-06 杭州睿琪软件有限公司 Bill image recognition method, bill image recognition device, bill image recognition equipment, training method and storage medium
CN110909733A (en) * 2019-10-28 2020-03-24 世纪保众(北京)网络科技有限公司 Template positioning method and device based on OCR picture recognition and computer equipment
CN111104844A (en) * 2019-10-12 2020-05-05 中国平安财产保险股份有限公司 Multi-invoice information input method and device, electronic equipment and storage medium
CN111160188A (en) * 2019-12-20 2020-05-15 中国建设银行股份有限公司 Financial bill identification method, device, equipment and storage medium
CN111222572A (en) * 2020-01-06 2020-06-02 紫光云技术有限公司 Office scene-oriented optical character recognition method
CN111368828A (en) * 2020-02-27 2020-07-03 大象慧云信息技术有限公司 Multi-bill identification method and device
CN111546804A (en) * 2020-04-08 2020-08-18 远光软件股份有限公司 Automatic original bill pasting method and device
CN111652162A (en) * 2020-06-08 2020-09-11 成都知识视觉科技有限公司 Text detection and identification method for medical document structured knowledge extraction
CN111931780A (en) * 2020-08-10 2020-11-13 福建博思软件股份有限公司 Intelligent management method and equipment for accounting documents
CN111966432A (en) * 2020-06-30 2020-11-20 北京百度网讯科技有限公司 Verification code processing method and device, electronic equipment and storage medium
CN112347994A (en) * 2020-11-30 2021-02-09 四川长虹电器股份有限公司 Invoice image target detection and angle detection method based on deep learning
CN112464941A (en) * 2020-10-23 2021-03-09 北京思特奇信息技术股份有限公司 Invoice identification method and system based on neural network
CN112541461A (en) * 2020-12-21 2021-03-23 四川新网银行股份有限公司 Automatic auditing method and device for consumption credentials without fixed format template
CN112801041A (en) * 2021-03-08 2021-05-14 北京市商汤科技开发有限公司 Financial data reimbursement method, device, equipment and storage medium
CN113129285A (en) * 2021-04-20 2021-07-16 国网山东省电力公司安丘市供电公司 Method and system for verifying regional protection pressing plate
CN113205133A (en) * 2021-04-30 2021-08-03 成都国铁电气设备有限公司 Tunnel water stain intelligent identification method based on multitask learning
CN113657162A (en) * 2021-07-15 2021-11-16 福建新大陆软件工程有限公司 Bill OCR recognition method based on deep learning
CN115035510A (en) * 2022-08-11 2022-09-09 深圳前海环融联易信息科技服务有限公司 Text recognition model training method, text recognition device, and medium
US11687704B2 (en) 2020-06-12 2023-06-27 Beijing Baidu Netcom Science Technology Co., Ltd. Method, apparatus and electronic device for annotating information of structured document
CN117114910A (en) * 2023-09-22 2023-11-24 浙江河马管家网络科技有限公司 Automatic ticket business accounting system and method based on machine learning
CN112464941B (en) * 2020-10-23 2024-05-24 北京思特奇信息技术股份有限公司 Invoice identification method and system based on neural network

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104751194A (en) * 2015-04-27 2015-07-01 陈包容 Processing method and processing device for financial expense reimbursement
CN106709917A (en) * 2017-01-03 2017-05-24 青岛海信医疗设备股份有限公司 Neural network model training method, device and system
CN107192690A (en) * 2017-05-19 2017-09-22 重庆大学 Near infrared spectrum Noninvasive Blood Glucose Detection Methods and its detection network model training method
CN107194400A (en) * 2017-05-31 2017-09-22 北京天宇星空科技有限公司 A kind of finance reimbursement unanimous vote is according to picture recognition processing method
CN108171127A (en) * 2017-12-13 2018-06-15 广东电网有限责任公司清远供电局 A kind of invoice automatic identifying method based on deep learning
CN108664897A (en) * 2018-04-18 2018-10-16 平安科技(深圳)有限公司 Bank slip recognition method, apparatus and storage medium
CN109017797A (en) * 2018-08-17 2018-12-18 大陆汽车投资(上海)有限公司 Driver's Emotion identification method and the vehicular control unit for implementing this method
CN109034155A (en) * 2018-07-24 2018-12-18 百卓网络科技有限公司 A kind of text detection and the method and system of identification
CN109064304A (en) * 2018-08-03 2018-12-21 四川长虹电器股份有限公司 Finance reimbursement bill automated processing system and method
CN109086756A (en) * 2018-06-15 2018-12-25 众安信息技术服务有限公司 A kind of text detection analysis method, device and equipment based on deep neural network
CN109145981A (en) * 2018-08-17 2019-01-04 上海非夕机器人科技有限公司 Deep learning automation model training method and equipment

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104751194A (en) * 2015-04-27 2015-07-01 陈包容 Processing method and processing device for financial expense reimbursement
CN106709917A (en) * 2017-01-03 2017-05-24 青岛海信医疗设备股份有限公司 Neural network model training method, device and system
CN107192690A (en) * 2017-05-19 2017-09-22 重庆大学 Near infrared spectrum Noninvasive Blood Glucose Detection Methods and its detection network model training method
CN107194400A (en) * 2017-05-31 2017-09-22 北京天宇星空科技有限公司 A kind of finance reimbursement unanimous vote is according to picture recognition processing method
CN108171127A (en) * 2017-12-13 2018-06-15 广东电网有限责任公司清远供电局 A kind of invoice automatic identifying method based on deep learning
CN108664897A (en) * 2018-04-18 2018-10-16 平安科技(深圳)有限公司 Bank slip recognition method, apparatus and storage medium
CN109086756A (en) * 2018-06-15 2018-12-25 众安信息技术服务有限公司 A kind of text detection analysis method, device and equipment based on deep neural network
CN109034155A (en) * 2018-07-24 2018-12-18 百卓网络科技有限公司 A kind of text detection and the method and system of identification
CN109064304A (en) * 2018-08-03 2018-12-21 四川长虹电器股份有限公司 Finance reimbursement bill automated processing system and method
CN109017797A (en) * 2018-08-17 2018-12-18 大陆汽车投资(上海)有限公司 Driver's Emotion identification method and the vehicular control unit for implementing this method
CN109145981A (en) * 2018-08-17 2019-01-04 上海非夕机器人科技有限公司 Deep learning automation model training method and equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
唐立群等: "基于Cascade算法的检测框架", 《数字图像模式识别方法分析》 *
赵志芳等: "遥感数据处理及遥感影像图制作", 《矿化遥感异常信息研究》 *

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472524A (en) * 2019-07-25 2019-11-19 广东工业大学 Invoice information management method, system and readable medium based on deep learning
CN110472524B (en) * 2019-07-25 2022-09-13 广东工业大学 Invoice information management method and system based on deep learning and readable medium
CN110705533A (en) * 2019-09-09 2020-01-17 武汉联析医疗技术有限公司 AI recognition and grabbing system for inspection report
CN110728566A (en) * 2019-09-17 2020-01-24 卓尔智联(武汉)研究院有限公司 Data processing method and device in reimbursement file, computer equipment and storage medium
CN110728566B (en) * 2019-09-17 2022-08-02 卓尔智联(武汉)研究院有限公司 Data processing method and device in reimbursement file, computer equipment and storage medium
CN110807455A (en) * 2019-09-19 2020-02-18 平安科技(深圳)有限公司 Bill detection method, device and equipment based on deep learning and storage medium
CN111104844A (en) * 2019-10-12 2020-05-05 中国平安财产保险股份有限公司 Multi-invoice information input method and device, electronic equipment and storage medium
CN111104844B (en) * 2019-10-12 2023-11-14 中国平安财产保险股份有限公司 Multi-invoice information input method and device, electronic equipment and storage medium
CN110909733A (en) * 2019-10-28 2020-03-24 世纪保众(北京)网络科技有限公司 Template positioning method and device based on OCR picture recognition and computer equipment
CN110866495B (en) * 2019-11-14 2022-06-28 杭州睿琪软件有限公司 Bill image recognition method, bill image recognition device, bill image recognition equipment, training method and storage medium
CN110866495A (en) * 2019-11-14 2020-03-06 杭州睿琪软件有限公司 Bill image recognition method, bill image recognition device, bill image recognition equipment, training method and storage medium
CN111160188A (en) * 2019-12-20 2020-05-15 中国建设银行股份有限公司 Financial bill identification method, device, equipment and storage medium
CN111222572A (en) * 2020-01-06 2020-06-02 紫光云技术有限公司 Office scene-oriented optical character recognition method
CN111368828A (en) * 2020-02-27 2020-07-03 大象慧云信息技术有限公司 Multi-bill identification method and device
CN111546804B (en) * 2020-04-08 2021-03-23 远光软件股份有限公司 Automatic original bill pasting method and device
CN111546804A (en) * 2020-04-08 2020-08-18 远光软件股份有限公司 Automatic original bill pasting method and device
CN111652162A (en) * 2020-06-08 2020-09-11 成都知识视觉科技有限公司 Text detection and identification method for medical document structured knowledge extraction
US11687704B2 (en) 2020-06-12 2023-06-27 Beijing Baidu Netcom Science Technology Co., Ltd. Method, apparatus and electronic device for annotating information of structured document
CN111966432A (en) * 2020-06-30 2020-11-20 北京百度网讯科技有限公司 Verification code processing method and device, electronic equipment and storage medium
CN111966432B (en) * 2020-06-30 2023-07-28 北京百度网讯科技有限公司 Verification code processing method and device, electronic equipment and storage medium
CN111931780A (en) * 2020-08-10 2020-11-13 福建博思软件股份有限公司 Intelligent management method and equipment for accounting documents
CN112464941A (en) * 2020-10-23 2021-03-09 北京思特奇信息技术股份有限公司 Invoice identification method and system based on neural network
CN112464941B (en) * 2020-10-23 2024-05-24 北京思特奇信息技术股份有限公司 Invoice identification method and system based on neural network
CN112347994B (en) * 2020-11-30 2022-04-22 四川长虹电器股份有限公司 Invoice image target detection and angle detection method based on deep learning
CN112347994A (en) * 2020-11-30 2021-02-09 四川长虹电器股份有限公司 Invoice image target detection and angle detection method based on deep learning
CN112541461A (en) * 2020-12-21 2021-03-23 四川新网银行股份有限公司 Automatic auditing method and device for consumption credentials without fixed format template
CN112801041A (en) * 2021-03-08 2021-05-14 北京市商汤科技开发有限公司 Financial data reimbursement method, device, equipment and storage medium
CN113129285A (en) * 2021-04-20 2021-07-16 国网山东省电力公司安丘市供电公司 Method and system for verifying regional protection pressing plate
CN113205133A (en) * 2021-04-30 2021-08-03 成都国铁电气设备有限公司 Tunnel water stain intelligent identification method based on multitask learning
CN113205133B (en) * 2021-04-30 2024-01-26 成都国铁电气设备有限公司 Tunnel water stain intelligent identification method based on multitask learning
CN113657162A (en) * 2021-07-15 2021-11-16 福建新大陆软件工程有限公司 Bill OCR recognition method based on deep learning
CN115035510A (en) * 2022-08-11 2022-09-09 深圳前海环融联易信息科技服务有限公司 Text recognition model training method, text recognition device, and medium
CN117114910A (en) * 2023-09-22 2023-11-24 浙江河马管家网络科技有限公司 Automatic ticket business accounting system and method based on machine learning

Similar Documents

Publication Publication Date Title
CN109977957A (en) A kind of invoice recognition methods and system based on deep learning
CN105787418B (en) The method and apparatus that original certificate intelligent recognition and identification information automatically generate document
CN109034155A (en) A kind of text detection and the method and system of identification
CN107633239A (en) Bill classification and bill field extracting method based on deep learning and OCR
CN110348441A (en) VAT invoice recognition methods, device, computer equipment and storage medium
CN108537218A (en) A kind of identifying processing method and device of answering card
CN111931780A (en) Intelligent management method and equipment for accounting documents
CN109684957A (en) A kind of method and system showing system data according to paper form automatically
CN106125840A (en) A kind of c bookmart for paper book
CN108805519A (en) Papery schedule electronization generation method, device and electronic agenda table generating method
CN110059647A (en) A kind of file classification method, system and associated component
CN105975802A (en) Grading method and device for CAD drawing
CN113077147B (en) Intelligent student course pushing method and system and terminal equipment
CN110428150A (en) A kind of change index generation method and device
CN113705157B (en) Photographing and modifying method for paper work
Bowers AN EXPLORATION INTO NEW SERVICE DEVELOPMENT; PROCESS, STRUCTURE AND ORGANIZATION (ROI, PERFORMANCE)
CN109711799A (en) Guide the teaching software and its operation method of the standardization office of administration hilllock
CN115482535A (en) Test paper automatic correction method, storage medium and equipment
CN109543512A (en) The evaluation method of picture and text abstract
CN110955727B (en) Automatic student homework recording system
CN112804274B (en) Financial sharing system and method
CN113935296A (en) Method for extracting paper bank flow information by using sliding template technology
CN110443202A (en) Paper font carefully and neatly spends instant analysis platform, method and storage medium
CN112396395A (en) System and method for homework correction
CN205139979U (en) Intelligence books management system of library

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190705

RJ01 Rejection of invention patent application after publication