CN114913538A

CN114913538A - Multi-class invoice identification method and system based on deep learning

Info

Publication number: CN114913538A
Application number: CN202210546183.1A
Authority: CN
Inventors: 郭庆汝; 孙卫超; 赵振江
Original assignee: Shandong Guozi Software Co ltd
Current assignee: Shandong Guozi Software Co ltd
Priority date: 2022-05-19
Filing date: 2022-05-19
Publication date: 2022-08-16

Abstract

The invention belongs to the field of image processing, and provides a method and a system for identifying multi-class invoices based on deep learning. The method comprises the steps of obtaining an invoice to be processed; based on the invoice to be processed, adopting an invoice detection model to obtain an invoice area and an invoice category; preprocessing the invoice to be processed; based on the preprocessed invoice, combining the type corresponding to the invoice, and adopting an invoice specific text region detection model corresponding to the type to obtain an important text block region, and further segmenting into single-line text regions; based on the single-line text region, adopting a text recognition model to obtain text information; based on the text information, regularized correction is carried out on the field information to obtain invoice structured text information, and finally high-accuracy identification can be carried out on the contents in the railway tickets, the linked value-added tax invoices (special invoices, common invoices and electronic invoices), the roll-type value-added tax invoices and the quota invoices.

Description

Multi-class invoice identification method and system based on deep learning

Technical Field

The invention belongs to the field of image processing, and particularly relates to a multi-class invoice identification method and system based on deep learning.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

Invoice reimbursement is an important component of financial management and is the basis for showing the quality and service level of financial management. However, most invoice reimbursement work still needs manual work, so that not only are a lot of repetitive work added to tax staff, but also a lot of extra energy of reimbursers is occupied. The existing invoice identification method is analyzed, and the following problems mainly exist:

1. the current invoices have: linked value-added tax invoices (special invoices, common invoices and electronic invoices), roll-type value-added tax invoices, train tickets, taxi tickets and quota invoices; the existing invoice identification system only identifies the texts of invoices of known types and single types, cannot identify a plurality of unknown invoices and invoices of different types at the same time, and has limited demand function.

2. In the actual invoice reimbursement process, the requirement that a plurality of invoices of different types are tiled and adhered to one piece of A4 paper for identification often exists; the existing invoice identification system can only identify the texts of single and single invoices in a shooting picture, needs manual adjustment, and has limited functions, no intellectualization and no automation.

3. The existing invoice recognition system can only correct slightly inclined invoices at the invoice preprocessing stage, but when the invoices are placed in a way of rotating by 90 degrees, 180 degrees and 270 degrees, the invoices cannot be processed automatically, manual intervention is needed for correction, and the burden of workers is undoubtedly increased.

4. The current invoice identification system only has the function of single invoice text identification, does not have the classification function of associated value-added tax invoices (special invoices, common invoices and electronic invoices), roll-type value-added tax invoices, train tickets, taxi tickets and quota invoices, and is not beneficial to invoice classification and filing processing.

5. When the invoice identification system is used for positioning a specific area of an invoice, the existing target detection algorithm is low in running speed, high in missing detection rate and false detection rate, long in invoice identification time and limited in accuracy.

6. The special tickets, the general tickets and the electronic invoices in the associated value-added tax invoices have slight differences in structural layout, and the existing area detection process has defects, so that the identification accuracy is low.

Disclosure of Invention

In order to solve the technical problems in the background art, the invention provides a multi-class invoice identification method and system based on deep learning, the invention utilizes mass invoices in a real financial reimbursement scene to label a training data set, labels linked value-added tax invoices, rolled value-added tax invoices, train tickets, taxi tickets and quota invoices, trains a yololov 5 invoice detection model, and the obtained detection model can excellently detect the invoices which are tiled and stuck on A4 paper in the real scene and gives classes to obtain text content information with extremely high confidence coefficient and good generalization.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention provides a multi-class invoice identification method based on deep learning.

A multi-category invoice identification method based on deep learning comprises the following steps:

acquiring an invoice to be processed;

based on the invoice to be processed, adopting an invoice detection model to obtain an invoice area and an invoice category;

preprocessing the invoice to be processed;

based on the preprocessed invoice, combining the type of the invoice, and adopting an invoice specific text region detection model corresponding to the type to obtain an important text block region, and further segmenting into single-line text regions; based on the single-line text area, adopting a text recognition model to obtain text information; and correcting the field information based on the text information to obtain invoice structured text information.

A second aspect of the invention provides a deep learning-based multi-class invoice recognition system.

A deep learning based multi-category invoice recognition system, comprising:

a data acquisition module configured to: acquiring an invoice to be processed;

a category identification module configured to: based on the invoice to be processed, adopting an invoice detection model to obtain an invoice area and type;

a pre-processing module configured to: preprocessing the invoice to be processed;

a text recognition module configured to: based on the preprocessed invoice, combining the type corresponding to the invoice, and adopting an invoice specific text region detection model corresponding to the type to obtain an important text block region, and further segmenting into single-line text regions; based on the single-line text region, adopting a text recognition model to obtain text information; based on the text information, regularized correction is carried out on the field information to obtain invoice structured text information.

A third aspect of the invention provides a computer-readable storage medium.

A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, carries out the steps of the deep learning based multi-class invoice recognition method according to the first aspect above.

A fourth aspect of the invention provides a computer apparatus.

A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the deep learning based multi-class invoice recognition method according to the first aspect when executing the program.

Compared with the prior art, the invention has the beneficial effects that:

(1) the invention can simultaneously identify a plurality of unknown and different kinds of invoices such as linked value-added tax invoices (special invoices, common invoices and electronic invoices), rolled value-added tax invoices, train tickets, taxi tickets, quota invoices and the like.

(2) The invention can automatically perform text recognition on a plurality of invoices of different types on one A4 paper without manual adjustment.

(3) The invention can automatically correct the angle of the invoice without manual intervention.

(4) The invention can classify and identify the associated value-added tax invoices (special invoices, common invoices and electronic invoices), roll-type value-added tax invoices, train tickets, taxi tickets and quota invoices, and is favorable for classified filing processing of the invoices.

(5) The method can be used for positioning the specific area of the invoice, and has the advantages of high operation speed of the target detection algorithm, low missing detection rate, low false detection rate, short invoice identification time consumption and high accuracy.

(6) The method can identify the difference of the special invoice, the common invoice and the electronic invoice in the associated value-added tax invoice on the structural layout, and has high identification accuracy.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.

Fig. 1 is a flowchart illustrating a method for identifying multi-class invoices based on deep learning according to an embodiment of the present invention.

Detailed Description

The invention is further described with reference to the following figures and examples.

It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

It is noted that the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of methods and systems according to various embodiments of the present disclosure. It should be noted that each block in the flowchart or block diagrams may represent a module, a segment, or a portion of code, which may comprise one or more executable instructions for implementing the logical function specified in the respective embodiment. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Example one

As shown in fig. 1, the present embodiment provides a method for identifying multiple classes of invoices based on deep learning, and the present embodiment is illustrated by applying the method to a server, it is to be understood that the method may also be applied to a terminal, and may also be applied to a system including a terminal and a server, and is implemented by interaction between the terminal and the server. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network server, cloud communication, middleware service, a domain name service, a security service CDN, a big data and artificial intelligence platform, and the like. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein. In this embodiment, the method includes the steps of:

acquiring an invoice to be processed;

based on the invoice to be processed, adopting a Yolov5 invoice detection model to obtain an invoice area and category;

wherein the categories of the invoice include: linked value-added tax invoices (special invoices, common invoices and electronic invoices), roll-type value-added tax invoices, train tickets, taxi tickets and quota invoices.

Preprocessing the invoice to be processed;

the preprocessing comprises the steps of carrying out edge detection, contour detection, affine transformation and angle rotation correction on the invoice to be processed to obtain the preprocessed invoice.

Based on the preprocessed invoice, combining the type corresponding to the invoice, and adopting a Yolov5 text region detection model corresponding to the type to obtain an important structure text block region, and further segmenting a single-line text region;

the invoice text region detection model comprises: the method comprises the following steps of detecting a text region of a train ticket, detecting a text region of a linked value-added tax invoice (a special invoice, a common invoice and an electronic invoice), detecting a text region of a roll-type value-added tax invoice, detecting a text region of a quota invoice and detecting a text region of a taxi ticket.

Because the structural layout of the train ticket, the quota invoice and the taxi ticket is fixed, and the important fields are all single-line texts, the single-line text regions are obtained by directly detecting and segmenting by adopting respectively trained Yolov5 single-line text detection models. Since a few fields of the roll-type value-added tax invoice have the phenomenon of line feed of text information, a text block region is obtained by using a Yolov5 text region detection model trained for the invoice, and then a CTPN text detection model is used for segmenting a plurality of lines of text regions in the region block into single-line text regions.

The process of detecting the text region detection model of the associated value-added tax invoice (special invoice, common invoice and electronic invoice) comprises the following steps: the field structure layout of the associated value-added tax special invoice and the associated value-added tax common invoice is fixed and consistent, so that the two invoices train the same Yolov5 text region detection model; the field structure layout of the connected value-added tax electronic invoice is different from that of the two invoices, so that a Yolov5 text region detection model is trained independently. The invoices are subjected to respective Yolov5 text region detection models to obtain text block regions, and the CTPN text detection model is utilized to segment the multi-line text regions in the region blocks into single-line text regions due to the phenomenon of line feed of field text information.

Aiming at the problem that the associated value-added tax special invoices, the associated value-added tax common invoices and the associated value-added tax electronic invoices are similar in appearance (only have differences on the layout of a few fields), and are not easy to classify, the system firstly detects and extracts invoice head-up areas (namely 'invoice type' areas) by using the associated value-added tax special invoices and associated value-added tax common invoice Yolov5 text area detection models obtained by training, identifies text information in the invoice head-up by using the trained text identification model (CRNN + CTC) so as to obtain specific types of the invoices, obtains text area block areas by using the corresponding Yolov5 text area detection models of the three types of associated value-added tax invoices, and further improves the classification and identification accuracy.

Based on a single-line text region, adopting a text recognition model (CRNN + CTC) to obtain text information;

based on the text information, regularized information correction is carried out on the field information to obtain invoice structured text information.

The specific scheme of the implementation can be realized by referring to the following contents:

the existing invoice recognition system can only correct slightly inclined invoices at the invoice preprocessing stage, but when the invoices are placed in a way of rotating by 90 degrees, 180 degrees and 270 degrees, the invoice recognition system cannot be automatically processed, manual intervention is needed for correction, and the burden of workers is undoubtedly increased. For the problem, the embodiment trains the text direction detection model by using massive invoice data, and the obtained detection model can classify invoices placed at 90 °, 180 °, and 270 °.

In the embodiment, image processing and affine transformation are firstly carried out on a single invoice intercepted by the Yolov5 invoice detection model, the inclined invoice is corrected, then the inclined angle of the invoice is obtained by utilizing the trained text direction detection model, and finally, corresponding rotation correction is carried out on the image to obtain a corrected invoice.

After the invoice detection model, the text direction detection model and the angle rotation correction processing, a plurality of randomly placed invoices become invoices which can be filed and stored in a single sheet according to categories, and the invoices are written into a database to facilitate financial management.

Aiming at the problems that when the existing invoice identification system locates the specific text area of the invoice, the existing target detection algorithm is low in running speed, high in omission factor and false detection rate, long in detection time and the like, the Yolov5 target detection algorithm with more excellent performance is adopted to detect the specific text area in the invoice in the embodiment.

The method is characterized in that important field information in a railway ticket, a taxi ticket and a quota invoice is a single-line text, so that the important field area of the railway ticket is labeled according to the form of 'invoice number, ticket checking place, listing number, originating station, destination station, departure time, seat number, seat type, ticket price, ticket purchaser information and invoice information', the invoice field of the taxi is labeled according to the form of 'invoice code, invoice number, date, departure time, getting-off time and ticket price', and the quota invoice field is labeled according to the form of 'invoice type, invoice code and invoice number'.

Secondly, the important fields of the roll-type value-added tax invoice are marked as follows: the invoice code, the invoice number, the name of a seller, the taxpayer identification number, the invoicing date, the name of a purchaser, the taxpayer identification number, the capital writing amount, the lowercase writing amount and the check code are marked, and as the line change phenomenon of text information exists in a few fields, a text block area is obtained by using a Yolov5 text area detection model trained aiming at the invoice, then text detection is carried out on a plurality of lines of texts in the area block of the seller and the purchaser by using a CTPN text detection model, and a single-line text area is obtained by segmentation, so that the omission phenomenon caused by the existence of the text line change phenomenon is avoided.

And thirdly, aiming at the phenomenon that important field information of the linked value-added tax invoices (special invoices, common invoices and electronic invoices) and the roll-type value-added tax invoices has line change, the embodiment utilizes a Yolov5 text region detection model to segment and segment the invoice text region, so that one-time segmentation is realized, and the text block region is obtained. The text area of the linked value-added tax common invoice is labeled according to the invoice code, the invoice type, the invoice number, the capital amount, the lower capital amount, the invoicing date, the check code, the pre-tax amount, the buyer information and the seller information; the text area of the special linked value-added tax invoice is labeled according to the invoice type, invoice number, check code, invoice date, buyer information, capital amount, lowercase amount, pre-tax amount and seller information, and the text area of the electronic linked value-added tax invoice is labeled according to the invoice type, invoice code, invoice number, check code, buyer information, capital amount, lowercase amount, pre-tax amount and seller information.

The invoice is processed by a Yolov5 text region detection model to obtain an invoice type text region of the invoice, and the type of the invoice (common invoice, special invoice and electronic invoice) can be obtained after the invoice is processed by a text recognition model (CRNN + CTC). Compared with the layout modes of the three invoices, the layout mode can obviously discover that the common invoice and the special invoice are consistent in field information layout, and the electronic invoice has difference in the layout of the position of the upper right corner, so that the common invoice and the special invoice can share one text region detection model, and the electronic invoice has one text region detection model.

The invoice data passes through a primary Yolov5 text region detection model to obtain a corresponding text block region, and the processing speed and the positioning accuracy are greatly improved compared with the existing positioning mode. However, the phenomenon of line feed of important field text information exists in a text area block of 'buyer information and seller information', so that more precise secondary segmentation is carried out on the basis, a single-line text area with better quality is obtained by a CTPN text detection model and an image processing technology, and finally the 'buyer name, seller name, address, telephone, account number and taxpayer identification number' in the block area is divided into the single-line text area by combining logic analysis.

After different types of invoices pass through respective text region detection models, obtaining corresponding text region modules; then, a single-line text detection model is combined with the traditional image processing technology to finally obtain a clean single-line text region; in this embodiment, a text recognition model obtained through training of a mass data set is used to recognize a single-line text region, so as to obtain corresponding structured text information. And after the structured text information is corrected by corresponding regularized information, high-accuracy invoice text content identification is realized.

Example two

The embodiment provides a multi-class invoice recognition system based on deep learning.

A deep learning-based multi-category invoice recognition system, comprising:

a data acquisition module configured to: acquiring an invoice to be processed;

a category identification module configured to: obtaining the category of the invoice by adopting an invoice detection model based on the invoice to be processed;

It should be noted here that the data acquisition module, the type recognition module, the preprocessing module and the text recognition module are the same as the example and the application scenario realized by the steps in the first embodiment, but are not limited to the disclosure of the first embodiment. It should be noted that the modules described above as part of a system may be implemented in a computer system such as a set of computer-executable instructions.

EXAMPLE III

The present embodiment provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps in the deep learning-based multi-class invoice recognition method as described in the first embodiment above.

Example four

The embodiment provides a computer device, which includes a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps in the deep learning-based multi-class invoice identification method according to the first embodiment.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A multi-class invoice identification method based on deep learning is characterized by comprising the following steps:

acquiring an invoice to be processed;

obtaining the type of the invoice by adopting an invoice detection model based on the invoice to be processed;

preprocessing the invoice to be processed;

based on the preprocessed invoice, combining the type corresponding to the invoice, and adopting an invoice specific text region detection model corresponding to the type to obtain an important text block region, and further segmenting into single-line text regions; based on the single-line text region, adopting a text recognition model to obtain text information; based on the text information, regularized correction is carried out on the field information to obtain invoice structured text information.

2. The deep learning-based multi-class invoice recognition method of claim 1, wherein the classes of invoices include: train ticket, linked value-added tax invoice, roll-type value-added tax invoice and quota invoice.

3. The method according to claim 1, wherein the preprocessing comprises performing edge detection, contour detection, affine transformation and/or angular rotation correction on the invoice to be processed to obtain the preprocessed invoice.

4. The deep learning-based multi-class invoice recognition method of claim 1, wherein the invoice text region detection model comprises: the system comprises a train ticket text region detection model, a linked value-added tax invoice text region detection model, a roll-type value-added tax invoice text region detection model and a quota invoice text region detection model.

5. The deep learning-based multi-class invoice recognition method according to claim 4, characterized in that the roll-type value-added tax invoice text region detection model detection process comprises: based on the preprocessed roll type value-added tax invoice, a text region detection model of the roll type value-added tax invoice is adopted to obtain a text information block region, and an image processing and single-line text detection model are adopted to obtain a single-line text region for the text information block region.

6. The method for identifying the multi-class invoices based on the deep learning of claim 4 wherein the process of the detection of the associated value added tax invoice text area detection model comprises the following steps: based on the preprocessed invoice, text recognition is carried out on the invoice head-up area by adopting a linked value-added tax invoice text area detection model to obtain the category of the linked value-added tax invoice; the types of the associated value-added tax invoices comprise associated value-added tax common invoices, associated value-added tax special invoices and associated value-added tax electronic invoices;

based on the common linked value-added tax invoice and the special linked value-added tax invoice, a text region detection model of the linked value-added tax and the special invoice is adopted to obtain a text information block region and a single-line text region.

7. The method for identifying the multi-category invoices based on the deep learning of claim 6 wherein the method is characterized in that a text information block region and a single line text region are obtained by adopting a text region detection model of the linked electronic invoices based on the linked value-added tax electronic invoices; and based on the text information block region, obtaining a single-line text region by adopting an image processing and single-line text detection model.

8. A deep learning-based multi-category invoice recognition system, comprising:

a data acquisition module configured to: acquiring an invoice to be processed;

a category identification module configured to: obtaining the type of the invoice by adopting an invoice detection model based on the invoice to be processed;

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the deep learning based multi-class invoice recognition method according to any one of claims 1-7.

10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps in the deep learning based multi-class invoice recognition method according to any one of claims 1-7 when executing the program.