CN111753744A

CN111753744A - Method, device and equipment for classifying bill images and readable storage medium

Info

Publication number: CN111753744A
Application number: CN202010595991.8A
Authority: CN
Inventors: 杜泓江; 庞敏辉; 谢国斌; 陈兴波; 李丹青; 曲福; 韩光耀; 冯博豪; 杨舰
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-06-28
Filing date: 2020-06-28
Publication date: 2020-10-09
Anticipated expiration: 2040-06-28
Also published as: CN111753744B

Abstract

The embodiment of the application discloses a method and a device for classifying bill images, electronic equipment and a computer-readable storage medium, and relates to the field of image processing, cloud computing and the technical field of natural language processing. One embodiment of the method comprises: acquiring a bill image to be classified; carrying out preset image classification operation, OCR (optical character recognition) operation and handwriting recognition operation on the bill image to be classified by utilizing a multithreading technology, and correspondingly obtaining an image classification result, an OCR recognition result and a handwriting recognition result; the hand recognition operation is completed based on a handwriting recognition model constructed by combining a plurality of sub-models with a voting mechanism, and each sub-model is constructed based on different handwritten bill characteristics; and determining the actual category of the bill image to be classified according to at least one of the image classification result, the OCR recognition result and the handwriting recognition result. The implementation mode provides a parallel classification scheme capable of being automatically executed, and the execution efficiency of classification operation and the recognition accuracy of the handwritten bills can be improved.

Description

Method, device and equipment for classifying bill images and readable storage medium

Technical Field

The embodiment of the application relates to the technical field of data classification, in particular to the technical field of image processing and natural language processing, and particularly relates to a method and device for classifying bill images, electronic equipment and a computer-readable storage medium.

Background

With the development of electronic information technology, more and more data need to be converted into binary data for storage through the electronic technology.

In the prior art, for example, various paper bills are still recognized and classified by financial staff based on naked eyes, so that the electronic input of corresponding bill contents is completed according to recognized categories.

Disclosure of Invention

The embodiment of the application provides a method and a device for classifying bill images, electronic equipment and a computer-readable storage medium.

In a first aspect, an embodiment of the present application provides a method for classifying a document image, including: acquiring a bill image to be classified; carrying out preset image classification operation, Optical Character Recognition (OCR) operation and handwriting recognition operation on the bill image to be classified by utilizing a multithreading technology, and correspondingly obtaining an image classification result, an OCR recognition result and a handwriting recognition result; the hand recognition operation is completed based on a handwriting recognition model constructed by combining a plurality of sub-models with a voting mechanism, and each sub-model is constructed based on different handwritten bill features; and determining the actual category of the bill image to be classified according to at least one of the image classification result, the OCR recognition result and the handwriting recognition result.

In a second aspect, an embodiment of the present application provides an apparatus for classifying a bill image, including: the image acquiring unit of the bills to be classified is configured to acquire images of the bills to be classified; the parallel recognition unit is configured to perform preset image classification operation, Optical Character Recognition (OCR) recognition operation and handwriting recognition operation on the bill image to be classified simultaneously by utilizing a multithreading technology, and correspondingly obtain an image classification result, an OCR recognition result and a handwriting recognition result; the hand recognition operation is completed based on a handwriting recognition model constructed by combining a plurality of sub-models with a voting mechanism, and each sub-model is constructed based on different handwritten bill features; and the actual category determining unit is configured to determine the actual category of the bill image to be classified according to at least one of the image classification result, the OCR recognition result and the handwriting recognition result.

In a third aspect, an embodiment of the present application provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for document image classification as described in any one of the implementations of the first aspect when executed.

In a fourth aspect, the present application provides a non-transitory computer-readable storage medium storing computer instructions for enabling a computer to implement the method for classifying a ticket image as described in any one of the implementation manners of the first aspect.

According to the method, the device, the electronic equipment and the computer-readable storage medium for classifying the bill images, the bill images to be classified are obtained; then, simultaneously carrying out preset image classification operation, Optical Character Recognition (OCR) recognition operation and handwriting recognition operation on the bill image to be classified by utilizing a multithreading technology, and correspondingly obtaining an image classification result, an OCR recognition result and a handwriting recognition result; the hand recognition operation is completed based on a handwriting recognition model constructed by combining a plurality of sub-models with a voting mechanism, and each sub-model is constructed based on different handwritten bill characteristics; and then, determining the actual category of the bill image to be classified according to at least one of the image classification result, the OCR recognition result and the handwriting recognition result.

The method is different from the existing manual bill image classification operation, the parallel automatic bill image classification scheme is provided through the technical scheme, the image classification operation, the OCR recognition operation and the handwriting recognition operation are simultaneously executed, the execution efficiency of the classification operation can be remarkably improved, the category recognition result as comprehensive as possible can be obtained, and the recognition accuracy of the handwritten bill is improved through the handwriting recognition model constructed based on the multi-submodel and the voting mechanism.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture to which the present application may be applied;

FIG. 2 is a flow chart of a method for classifying images of documents according to an embodiment of the present application;

FIG. 3 is a flowchart of a method for performing handwriting recognition operation on a to-be-classified bill image in the method for classifying bill images provided by the embodiment of the application;

FIG. 4 is a flowchart of a method for performing an image classification operation on a to-be-classified document image in the document image classification method provided in the embodiment of the present application;

fig. 5 is a flowchart of a specific implementation of a method for classifying bill images in a specific application scenario according to an embodiment of the present application;

FIG. 6 is a block diagram of an apparatus for classifying bill images according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device suitable for executing a method for classifying bill images according to an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

FIG. 1 illustrates an exemplary system architecture 100 to which embodiments of the present methods, apparatuses, electronic devices and computer-readable storage media for document image classification may be applied.

As shown in fig. 1, system architecture 100 may include an image capture device 101, a network 102, and a server 103. Network 102 serves as a medium to provide a communication link between image capture device 101 and server 103. Network 102 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

A user may use image capture device 101 to interact with server 103 over network 102 to receive or send messages or the like. The image capturing device 101 and the server 103 may be installed with various applications for implementing information communication between the two, such as an image uploading application, a bill image classification application, an instant messaging application, and the like.

The image capturing apparatus 101 and the server 103 may be hardware or software. When the image capturing device 101 is a hardware device, it may be various electronic devices with a display screen and a camera, including but not limited to a smart phone, a tablet computer, a computer, various independent camera devices, and the like; when the image capturing device 101 is software, it may be installed in the electronic devices listed above, and it may be implemented as multiple software or software modules, or may be implemented as a single software or software module, and is not limited in this respect. When the server 103 is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server; when the server 103 is software, it may be implemented as a plurality of software or software modules, or may be implemented as a single software or software module, and is not limited in this respect.

The server 103 can provide various services through various built-in applications, taking a ticket image classification application which can provide a ticket image classification service as an example, the server 103 can realize the following effects when running the ticket image classification application: firstly, acquiring a to-be-classified bill image from an image acquisition device 101 through a network 102, then simultaneously performing preset image classification operation, OCR (Optical Character Recognition) operation and handwriting Recognition operation on the to-be-classified bill image by utilizing a multithreading technology, and correspondingly obtaining an image classification result, an OCR Recognition result and a handwriting Recognition result; and finally, determining the actual category of the bill image to be classified according to at least one of the image classification result, the OCR recognition result and the handwriting recognition result. Namely, the server 103 finally outputs the actual category of the bill image to be classified through the automatic execution of the above steps so as to assist the bill information input personnel in judging the bill category.

It should be noted that besides being acquired from the image capture device 101 through the network 102, the document image to be classified may also be stored locally in the server 103 in advance in various ways. Thus, when the server 103 detects that such data is already stored locally (e.g., a document image classification task to be classified that was retained prior to beginning processing), it may choose to retrieve such data directly from locally, in which case the exemplary system architecture 100 may also not include the image capture device 101 and the network 102.

Since the judgment of the actual category to which the bill image to be classified belongs needs to occupy more computing resources and stronger computing power, the method for extracting the structured information provided in the following embodiments of the present application is generally executed by the server 103 having stronger computing power and more computing resources, and accordingly, the device for classifying the bill image is generally disposed in the server 103.

However, it should be noted that when the image capturing device 101 also has the computing capability and computing resource meeting the requirements, the image capturing device 101 may also complete the above-mentioned operations performed by the server 103 through the bill image classification application installed thereon, and then output the same result as the server 103, especially when there are multiple image capturing devices with different computing capabilities. For example, when the image capturing device in which the bill image classifying application is located has a strong computing capability and a large amount of computing resources are left, the current image capturing device may execute the above-mentioned operations, so as to appropriately reduce the computing pressure of the server 103, or the image capturing device may provide a coarse classification result belonging to an actual category, so as to appropriately reduce the classifying computing pressure of the server 103. Accordingly, a device for sorting the bill image may be provided in the image pickup apparatus 101. In such a case, exemplary system architecture 100 may also not include server 103 and network 102.

It should be understood that the number of image capture device networks and servers in fig. 1 is merely illustrative. There may be any number of image capture devices, networks, and servers, as desired for implementation.

With continuing reference to FIG. 2, an implementation flow 200 of one embodiment of a method for document image classification according to the present application is shown, comprising the steps of:

step 201: and acquiring a bill image to be classified.

This step is intended to acquire a ticket image to be sorted by an execution subject (for example, the server 103 shown in fig. 1) for ticket image sorting. The bill images to be classified refer to all bill images needing to be subjected to category identification, and the bill images comprise images of electronic bills and paper bills.

It should be noted that the image of the bill to be sorted can be obtained by the executing entity directly from a local storage device, or can be obtained from a non-local storage device (for example, the image capturing device 101 shown in fig. 1). The local storage device may be a data storage module arranged in the execution main body, such as a server hard disk, in which case, the bill image to be classified can be read locally and quickly; the non-local storage device may also be any other electronic device configured to store data, such as some user terminals, in which case the executing entity may obtain the document image to be classified by receiving a document image classification request sent by the electronic device.

Step 202: and simultaneously carrying out preset image classification operation, OCR (optical character recognition) operation and handwriting recognition operation on the bill image to be classified by utilizing a multithreading technology to correspondingly obtain an image classification result, an OCR recognition result and a handwriting recognition result.

On the basis of step 201, the step is to perform parallel processing operations of image classification operation, OCR recognition operation and handwriting recognition operation on the to-be-classified bill image by the executing body by using the multithreading technology, so as to obtain an image classification result, an OCR recognition result and a handwriting recognition result as simultaneously as possible.

The multithreading technology is that a plurality of sub-threads are established in one process, and the plurality of sub-threads are executed simultaneously to achieve the parallel execution effect, for example, one to-be-classified bill classification process is established, an image classification thread, an OCR recognition thread and a handwriting recognition thread are respectively established, an image to be classified is simultaneously used as input data of three threads, and the three threads are controlled to execute in parallel. However, in some programming language environments, parallel execution among multiple sub-threads cannot be well achieved, but multiple sub-threads are executed at high speed in turn, so that an illusion of parallel execution is created, and execution efficiency cannot be really improved.

The OCR recognition operation aims at identifying whether recognizable characters exist on a bill to be classified or not so as to judge whether the bill image to be classified is a real bill image or not, because a large number of meaningful characters exist, and structured information among texts is a remarkable difference of the bill image compared with a non-bill image; the handwriting recognition operation aims at judging whether the bill image to be classified is a handwriting bill, the handwriting bill is different from the machine-made bill in existence compared with the machine-made bill, the handwriting bill has various handwriting characteristics different from the machine-made bill, such as smearing characteristics, handwriting, ink penetration characteristics and the like, and the handwriting recognition operation can be carried out by recognizing the characteristics which can be different from the machine-made bill; the purpose of the image classification operation is to provide reference opinions for which types of bills the bill images to be classified belong to, such as whether the bill images are value-added tax invoices with specified formats, and further, such as catering types, import and export types and the like in the value-added tax invoices.

In order to achieve the best possible classification or recognition effect for the above-mentioned image classification operation, OCR recognition operation and handwriting recognition operation, each operation can be performed according to a pre-created deep learning model, for example, by a pre-created image classification model, an OCR recognition model and a handwriting recognition model. Wherein, compared with the mature image classification mechanism and OCR recognition mechanism, the handwritten bill is very complex due to handwriting, in order to improve the recognition accuracy of the handwritten recognition model to the classified image as much as possible, the handwriting recognition model used in the embodiment is constructed by combining a plurality of submodels and a voting mechanism, each submodel is constructed based on different handwritten bill characteristics, taking the handwriting characteristics, the ink penetration characteristics and the smearing characteristics which can respectively represent a bill as the handwritten bill to a certain degree as examples, a handwriting existence distinguishing submodel, a handwriting using ink distinguishing submodel and a handwriting smearing submodel are respectively constructed according to training samples of the three characteristics, under the help of the voting models such as a random forest model and a decision tree model or a suitable weighting mechanism, the respective discrimination results of the submodels can be well integrated, and finally a relatively integrated and accurate handwriting recognition result is obtained.

Step 203: and determining the actual category of the bill image to be classified according to at least one of the image classification result, the OCR recognition result and the handwriting recognition result.

On the basis of step 202, this step is intended to determine the actual classification of the image of the bill to be classified according to at least one of the image classification result, the OCR recognition result and the handwriting recognition result by the executing body, that is, the classification result of the bill to be classified requires only one of the results at least, and at most three results need to be combined simultaneously, and several meaningful classification results and required results are given as follows:

1) when the OCR recognition result is no text content, determining that the to-be-classified bill image is a non-bill image, and indicating that the classified bill image has a high probability of belonging to an image which is input due to errors and abnormity; 2) when the OCR recognition result is that text content exists and the handwriting recognition result is that the handwriting feature exists, determining that the bill image to be classified is a handwriting bill, and the text content indicates that the bill image belongs to the handwriting bill, on the basis, the handwriting feature exists and is judged to belong to the handwriting bill, and because the handwriting bill does not have a uniform standard for making the bill and has strong randomness, a machine or equipment further judges that the specific category of the handwriting bill is meaningless, the subsequent sub-category is not usually subdivided for the handwriting bill; 3) and when the OCR recognition result is that text content exists and the handwriting recognition result is that no handwriting feature exists, determining that the ticket image to be classified is the machine ticketing, and taking the image classification result as the real category of the machine ticketing data.

In order to further improve the accuracy of the classification of the image classification result on the further classification of the machine invoicing data, whether the machine invoicing data belongs to the value-added tax invoice can be identified through a preset expert knowledge base, only the image classification result is used as the real classification of the machine invoicing belonging to the value-added tax invoice, and the expert knowledge base is obtained based on abstraction and construction of multiple pieces of expert knowledge with strong identification of the value-added tax invoice.

On the basis of the above embodiment, in order to deepen how to implement the handwriting recognition operation on the handwriting recognition model constructed based on the multi-submodel and the voting mechanism, the present application also provides a flow 300 of a method for executing the handwriting recognition operation on a to-be-classified bill image through fig. 3, and a specific scheme is provided for executing the handwriting recognition operation on the to-be-classified bill image, including the following steps;

step 301: extracting handwriting characteristics, machine-made format characteristics and smearing characteristics from the bill image to be classified;

the handwriting characteristics refer to characteristics extracted from the handwriting of the content written by a person holding the pen based on self intention and writing habit so as to represent whether the handwriting is obtained or not, and are used for distinguishing from machine-made handwriting; the smearing characteristic refers to the characteristic of smearing operation of written contents by a person due to errors; compared with the method that the handwriting characteristics and the smearing characteristics of the writer are reflected from the front side, the machine-made format characteristics are that whether the format characteristics specific to the machine-made bill can be extracted from the bill image to be classified is verified from the back side, so that whether the classified bill image is the image of the handwritten bill or not is judged in the reverse direction.

Step 302: respectively and correspondingly inputting the handwriting characteristics, the mechanical printing format characteristics and the smearing characteristics into a handwriting recognition submodel, a mechanical printing format recognition submodel and a smearing recognition submodel of the handwriting recognition model to correspondingly obtain a handwriting recognition result, a mechanical printing format recognition result and a smearing recognition result;

the handwriting characteristics, the machine format characteristics and the smearing characteristics extracted in the step 301 are all actual characteristics extracted from the note images to be classified, and the corresponding handwriting recognition submodel, the machine format recognition submodel and the smearing recognition submodel are obtained based on training sample training and are used for judging the category of the actual characteristics.

Step 303: and performing handwritten bill voting on the handwriting recognition result, the machine-made format recognition result and the smearing recognition result through at least one preset voting model, and taking the obtained voting result as a handwriting recognition result.

On the basis of step 302, the step aims to perform handwritten bill voting on the handwriting recognition result, the machine format recognition result and the smearing recognition result by at least one preset voting model, and taking the obtained comprehensive voting result as a handwriting recognition result.

To improve the accuracy of the voting result as much as possible, the voting model may include at least one of the following: the random forest model, the LightGBM decision tree model (an improved decision tree model), and the XGBOOTST decision tree model (an improved decision tree model) have better performances than other common voting models. Correspondingly, when the number of the voting models is greater than 1, the method further comprises the following steps: and carrying out weighting processing on the handwritten bill voting results of different voting models.

On the basis of having all the beneficial effects of the previous embodiment, the embodiment provides a specific implementation manner for the handwriting recognition operation executed on the bill image to be classified in step 202, three different features are selected from the front and back sides for identifying the handwriting bill, and in order to better integrate the discrimination results of the three features, the method is implemented by combining at least one voting model in a voting manner, so that the discrimination accuracy of the handwriting recognition result is improved as much as possible.

It should be noted that, this embodiment belongs to the lower level scheme of the handwriting recognition operation performed on the to-be-classified document image in step 202, and other steps and other operations in the process 200 are not changed, but only to highlight the specific implementation scheme of the handwriting recognition operation, and other schemes that are not other complete schemes in this embodiment are not used, and the technical schemes of steps 301 to 303 can be completely combined with the process 200 to form a new complete embodiment.

On the basis of any of the above embodiments, in order to improve the accuracy of the result obtained by the image classification operation as much as possible, this embodiment further provides a flow 400 of a method for performing an image classification operation on a to-be-classified bill image through fig. 4, including the following steps:

step 401: inputting a bill image to be classified into a preset image classification model;

step 402: extracting features by using a residual error network in the image classification model to obtain actual features;

in order to solve and avoid the problems of gradient diffusion/explosion and Network accuracy degradation caused by excessive iteration as much as possible, a Residual Network (often abbreviated as ResNet) is used as the feature extraction Network in the embodiment.

Step 403: performing semantic representation processing on the actual features by using a semantic representation sub-model in the image classification model to obtain semantic representation features;

on the basis of step 402, this step aims to perform semantic representation processing on the actual features by using a BERT (Bidirectional encoder representation from transforms, which is translated into Bidirectional encoder tokens from a transformer, and is generally called a semantic representation model) model, so as to obtain semantic representation features.

The BERT model is used here because the problems of gradient diffusion/explosion and network accuracy degradation solved by the residual error network are originally aimed at the image processing field, and the problems of gradient diffusion/explosion and network accuracy degradation can be solved well by the residual error network due to the 'narrow and deep' characteristic in the image processing field. However, the conventional natural language processing field shows a "wide and shallow" characteristic, and cannot well exert the characteristic of the residual error network, but the characteristic of the processed characteristic can also show a "narrow and deep" characteristic by the processing using the BERT model, so that the characteristic of the residual error network can be well combined with the natural language processing field.

Step 404: and processing the semantic representation characteristics by using a multivariate classification network in the image classification model, and taking the obtained processing result as an image classification result.

On the basis of step 403, this step is intended to process the semantic representation features using a Softmax network (multivariate classification network) and take the resulting processing result as an image classification result. Compared with the conventional binary network which can only realize the binary network, the Softmax network is suitable for solving the multivariate classification, and for the machine bill making, a large number of sub-classifications exist, so that an accurate classification result can be better output through the Softmax model.

On the basis of having all the beneficial effects of the above embodiments, the embodiment provides a specific implementation manner for the image classification operation performed on the to-be-classified bill image in step 202, and provides a high-accuracy identification and classification scheme including image identification and natural language processing through the models of the residual error network, the BERT model and the multivariate classification network, thereby improving the discrimination accuracy of the image classification result as much as possible.

It should also be noted that, the present embodiment belongs to the lower scheme of the image classification operation performed on the document image to be classified in step 202, and other steps and other operations in the process 200 are not changed, but only to highlight the specific implementation scheme of the image classification operation, and other schemes that are not other complete schemes in the present embodiment are not used, and the technical schemes of steps 301 to 303 can be completely combined with the process 200 to form a new complete embodiment.

For further understanding, the present application also provides a specific implementation scheme in conjunction with a specific application scenario, please refer to the timing diagram shown in fig. 5.

Step 501-1: the main process sends the received bill image to be classified to an image classification process;

step 501-2: the main process sends the received bill image to be classified to an OCR recognition process;

step 501-3: the main process sends the received bill image to be classified to the handwriting recognition process;

steps 501-1, 501-2 and 501-3 are performed simultaneously under the control of the main process, and an image classification process, an OCR recognition process and a handwriting recognition process are created in advance.

Step 502-1: the image classification process executes image classification operation on the received image to be classified to obtain an image classification result;

step 502-2: the OCR recognition process executes OCR recognition operation on the received image to be classified to obtain an OCR recognition result;

step 502-3: the handwriting recognition process executes handwriting recognition operation on the received image to be classified to obtain a handwriting recognition result;

the processes are performed simultaneously, via steps 502-1, 502-2 and 502-3, respectively, in response to receipt of a document image to be sorted.

Step 503-1: the image classification process returns the image classification result to the main process;

step 503-2: the OCR recognition process returns an OCR recognition result to the main process;

step 503-3: the handwriting recognition process returns the handwriting recognition result to the main process;

step 504: the main process integrates the results to determine the actual category of the bill image to be classified;

specifically, the results input by each process may form a tuple in the form of (picture id, process id, and specific content of the identification result) and be pushed into a shared queue maintained by the main process, and then the main process determines according to the process id and the specific content of the identification result recorded in the shared queue. Therefore, as the main process continuously receives new bill images to be classified, the image classification process, the OCR recognition process and the handwriting recognition process can be continuously executed.

Step 505: the host process outputs the actual class.

With further reference to fig. 6, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of an apparatus for classifying bill images, which corresponds to the embodiment of the method shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 6, the apparatus 600 for classifying bill images of the present embodiment may include: the image acquiring unit 601 of the bill to be classified, the parallel identifying unit 602 and the actual class determining unit 603. The image acquiring unit 601 of the bill to be classified is configured to acquire the image of the bill to be classified; the parallel recognition unit 602 is configured to perform preset image classification operation, OCR recognition operation and handwriting recognition operation on the to-be-classified bill image simultaneously by using a multithreading technology, and correspondingly obtain an image classification result, an OCR recognition result and a handwriting recognition result; the hand recognition operation is completed based on a handwriting recognition model constructed by combining a plurality of sub-models with a voting mechanism, and each sub-model is constructed based on different handwritten bill features; an actual category determining unit 603 configured to determine an actual category of the to-be-classified bill image according to at least one of the image classification result, the OCR recognition result, and the handwriting recognition result.

In the present embodiment, in the apparatus 600 for sorting bill images: the detailed processing and the technical effects thereof of the to-be-classified bill image obtaining unit 601, the parallel recognition unit 602, and the actual category determining unit 603 can refer to the related descriptions of step 201 and step 203 in the corresponding embodiment of fig. 2, which are not described herein again.

In some optional implementations of this embodiment, the parallel recognition unit 602 comprises a handwriting recognition subunit configured to: extracting handwriting characteristics, machine-made format characteristics and smearing characteristics from the bill image to be classified; respectively and correspondingly inputting the handwriting characteristics, the mechanical printing format characteristics and the smearing characteristics into a handwriting recognition submodel, a mechanical printing format recognition submodel and a smearing recognition submodel of the handwriting recognition model to correspondingly obtain a handwriting recognition result, a mechanical printing format recognition result and a smearing recognition result; and performing handwritten bill voting on the handwriting recognition result, the machine-made format recognition result and the smearing recognition result through at least one preset voting model, and taking the obtained voting result as a handwriting recognition result.

In some optional implementations of this embodiment, the voting model includes at least one of: a random forest model, a LightGBM decision tree model and an XGBOTST decision tree model; correspondingly, when the number of the voting models is greater than 1, the method further comprises the following steps: and carrying out weighting processing on the handwritten bill voting results of different voting models.

In some optional implementations of this embodiment, the parallel recognition unit 602 may include an image classification subunit, which may be configured to: inputting a bill image to be classified into a preset image classification model; extracting features by using a residual error network in the image classification model to obtain actual features; performing semantic representation processing on the actual features by using a semantic representation sub-model in the image classification model to obtain semantic representation features; and processing the semantic representation characteristics by using a multivariate classification network in the image classification model, and taking the obtained processing result as an image classification result.

In some optional implementations of this embodiment, the actual category determining unit 603 may be further configured to: when the OCR recognition result is no text content, determining that the bill image to be classified is a non-bill image; when the OCR recognition result is that text content exists and the handwriting recognition result is that the handwriting feature exists, determining the bill image to be classified as a handwriting bill; and when the OCR recognition result is that text content exists and the handwriting recognition result is that no handwriting characteristic exists, determining the bill image to be classified as the machine bill, and taking the image classification result as the real category of the machine bill.

In some optional implementations of this embodiment, the apparatus 600 for classifying a bill image may further include: a value-added tax invoice determination unit configured to determine whether the value-added tax invoice data is a value-added tax invoice using a preset expert knowledge base; and the value-added tax invoice category determination unit is configured to determine the image classification result as the real category of the value-added tax invoice when the machine bill is the value-added tax invoice.

The embodiment exists as an embodiment of an apparatus corresponding to the above method embodiment, and is different from the existing manual bill image classification operation, the apparatus for classifying bill images provided by the embodiment provides a parallel and automatically executable bill image classification scheme through the above technical scheme, and by setting that an image classification operation, an OCR recognition operation and a handwriting recognition operation are simultaneously executed, the execution efficiency of the classification operation can be remarkably improved, a category recognition result as comprehensive as possible can also be obtained, and the recognition accuracy of a handwritten bill is improved by a handwriting recognition model constructed based on a multi-sub model and a voting mechanism.

According to an embodiment of the present application, an electronic device and a computer-readable storage medium are also provided.

FIG. 7 shows a block diagram of an electronic device suitable for use in implementing the method for document image classification of embodiments of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 7, the electronic apparatus includes: one or more processors 701, a memory 702, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 7, one processor 701 is taken as an example.

The memory 702 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for document image classification provided herein. A non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method for document image classification provided herein.

The memory 702, which is a non-transitory computer-readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the method for document image classification in the embodiment of the present application (for example, the document image to be classified acquiring unit 601, the parallel recognition unit 602, and the actual category determining unit 603 shown in fig. 6). The processor 701 executes various functional applications of the server and data processing, namely, implements the method for classifying the ticket image in the above-described method embodiment, by running the non-transitory software program, instructions and modules stored in the memory 702.

The memory 702 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store various types of data created by the electronic device in performing the method for classifying the bill image, and the like. Further, the memory 702 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 702 optionally includes memory located remotely from the processor 701, which may be connected via a network to an electronic device adapted to perform a method for document image classification. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device adapted to perform the method for document image classification may further comprise: an input device 703 and an output device 704. The processor 701, the memory 702, the input device 703 and the output device 704 may be connected by a bus or other means, and fig. 7 illustrates an example of a connection by a bus.

The input device 703 may receive input numeric or character information and generate key signal inputs related to user settings and function control of an electronic apparatus suitable for performing the method for ticket image classification, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer, one or more mouse buttons, a track ball, a joystick, or the like. The output devices 704 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the parallel automatic execution bill image classification scheme provided by the embodiment, the image classification operation, the OCR (optical character recognition) operation and the handwriting recognition operation are simultaneously executed, the execution efficiency of the classification operation can be remarkably improved, the category recognition result as comprehensive as possible can be obtained, and the recognition accuracy of the handwriting bill is improved by the handwriting recognition model constructed based on the multi-submodel and the voting mechanism.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for document image classification, comprising:

acquiring a bill image to be classified;

performing preset image classification operation, Optical Character Recognition (OCR) operation and handwriting recognition operation on the bill image to be classified by utilizing a multithreading technology, and correspondingly obtaining an image classification result, an OCR recognition result and a handwriting recognition result; the hand recognition operation is completed based on a handwriting recognition model constructed by combining a plurality of sub-models with a voting mechanism, and each sub-model is constructed based on different handwritten bill features;

and determining the actual category of the bill image to be classified according to at least one of the image classification result, the OCR recognition result and the handwriting recognition result.

2. The method of claim 1, wherein the performing a preset handwriting recognition operation on the to-be-classified bill image to obtain a handwriting recognition result comprises:

extracting handwriting characteristics, machine-made format characteristics and smearing characteristics from the bill image to be classified;

correspondingly inputting the handwriting characteristics, the machine-made format characteristics and the smearing characteristics into a handwriting recognition submodel, a machine-made format recognition submodel and a smearing recognition submodel of the handwriting recognition model respectively to correspondingly obtain a handwriting recognition result, a machine-made format recognition result and a smearing recognition result;

and performing handwritten bill voting on the handwriting recognition result, the machine format recognition result and the smearing recognition result through at least one preset voting model, and taking the obtained voting result as the handwriting recognition result.

3. The method of claim 2, wherein the voting model comprises at least one of: a random forest model, a LightGBM decision tree model and an XGBOTST decision tree model;

correspondingly, when the number of the voting models is greater than 1, the method further comprises:

and carrying out weighting processing on the handwritten bill voting results of different voting models.

4. The method of claim 1, wherein the step of performing a preset image classification operation on the to-be-classified bill image to obtain an image classification result comprises:

inputting the bill image to be classified into a preset image classification model;

extracting features by using a residual error network in the image classification model to obtain actual features;

performing semantic representation processing on the actual features by utilizing a semantic representation sub-model in the image classification model to obtain semantic representation features;

and processing the semantic representation characteristics by utilizing a multivariate classification network in the image classification model, and taking an obtained processing result as the image classification result.

5. The method of any of claims 1 to 4, wherein determining the actual category of the document image to be classified according to at least one of the image classification result, the OCR recognition result, and the handwriting recognition result comprises:

when the OCR recognition result is no text content, determining that the bill image to be classified is a non-bill image;

when the OCR recognition result is text content and the handwriting recognition result is handwriting characteristic, determining the bill image to be classified as a handwritten bill;

and when the OCR recognition result is that text content exists and the handwriting recognition result is that no handwriting feature exists, determining that the bill image to be classified is a machine bill, and taking the image classification result as a real category of the machine bill.

6. The method of claim 5, further comprising:

determining whether the machine invoice data is a value-added tax invoice or not by using a preset expert knowledge base;

and if the machine billing is the value-added tax invoice, determining that the image classification result is used as the real category of the value-added tax invoice.

7. An apparatus for document image classification, comprising:

the image acquiring unit of the bills to be classified is configured to acquire images of the bills to be classified;

the parallel recognition unit is configured to perform preset image classification operation, Optical Character Recognition (OCR) operation and handwriting recognition operation on the to-be-classified bill image simultaneously by utilizing a multithreading technology, and correspondingly obtain an image classification result, an OCR recognition result and a handwriting recognition result; the hand recognition operation is completed based on a handwriting recognition model constructed by combining a plurality of sub-models with a voting mechanism, and each sub-model is constructed based on different handwritten bill features;

an actual category determining unit configured to determine an actual category of the to-be-classified bill image according to at least one of the image classification result, the OCR recognition result and the handwriting recognition result.

8. The apparatus of claim 7, wherein the parallel recognition unit comprises a handwriting recognition subunit configured to:

9. The apparatus of claim 8, wherein the voting model comprises at least one of: a random forest model, a LightGBM decision tree model and an XGBOTST decision tree model;

10. The apparatus of claim 7, wherein the parallel recognition unit comprises an image classification subunit configured to:

11. The apparatus according to any of claims 7 to 10, wherein the actual category determining unit is further configured to:

12. The apparatus of claim 11, further comprising:

a value-added tax invoice determination unit configured to determine whether the machine invoice data is a value-added tax invoice using a preset expert knowledge base;

and the value-added tax invoice category determination unit is configured to determine the image classification result as the real category of the value-added tax invoice when the machine bills the value-added tax invoice.

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method for document image classification of any one of claims 1-6.

14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method for document image classification of any one of claims 1-6.