CN111008635A

CN111008635A - OCR-based multi-bill automatic identification method and system

Info

Publication number: CN111008635A
Application number: CN201911192294.1A
Authority: CN
Inventors: 章珏
Original assignee: Dilu Technology Co Ltd
Current assignee: Dilu Technology Co Ltd
Priority date: 2019-11-28
Filing date: 2019-11-28
Publication date: 2020-04-14

Abstract

The invention discloses an OCR-based multi-bill automatic identification method and an OCR-based multi-bill automatic identification system, which comprise the following steps of obtaining an OCR bill sample; the image acquisition module acquires a bill image to be identified; the bill image is input into an image preprocessing module to be processed to obtain a secondary image; the denoising module is used for denoising the secondary image to obtain a standard image; and the standard image is input into the bill recognition module to be detected and recognized. The invention has the beneficial effects that: the OCR-based multi-bill automatic recognition method can reduce recognition difference of recognizing the existence of a plurality of different bills in one image.

Description

OCR-based multi-bill automatic identification method and system

Technical Field

The invention relates to the technical field of character recognition, in particular to a multi-bill automatic recognition method based on OCR and a multi-bill automatic recognition system based on OCR.

Background

In recent years, bill identification services are developed rapidly, but the bill identification rate is still relatively low, so that after bill identification, bill entry personnel need to perform comprehensive manual verification on each identified field to correct error information of automatic identification. The recognition rate is low, the manual verification process is relatively time-consuming, and the commercial utilization rate of the bill recognition service is always low.

In the intelligent financial reimbursement system based on AI, can carry out the automatic identification of invoice with the help of techniques such as OCR to reduce reimburser and type in work load, reimburse auditor's examination work load etc. promote reimbursement degree of automation and reimbursement efficiency. For a long time, the bill recognition engines do not form a uniform specification, and service APIs provided by the recognition engines are greatly different and cannot be mutually compatible. Despite the increasing development of electronic payment, electronic bills and the like, the traditional paper bills are still one of the widely used modes in real work and life, such as various paper invoices, financial bills and the like. The existing bill identification aims at different types of identification samples, and the detection and identification effects of characters of the existing bill identification are greatly different.

Disclosure of Invention

This section is for the purpose of summarizing some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. In this section, as well as in the abstract and the title of the invention of this application, simplifications or omissions may be made to avoid obscuring the purpose of the section, the abstract and the title, and such simplifications or omissions are not intended to limit the scope of the invention.

The present invention has been made in view of the above-mentioned conventional problems.

Therefore, one technical problem solved by the present invention is: a method for identifying different types of bills is provided, and small identification difference can be kept when different samples are identified.

In order to solve the technical problems, the invention provides the following technical scheme: a multi-bill automatic identification method based on OCR comprises the following steps of obtaining a bill sample of OCR; the image acquisition module acquires a bill image to be identified; the bill image is input into an image preprocessing module to be processed to obtain a secondary image; the denoising module is used for denoising the secondary image to obtain a standard image; and the standard image is input into the bill recognition module to be detected and recognized.

As a preferable scheme of the OCR-based multi-note automatic recognition method of the present invention, wherein: the image preprocessing module comprises the following preprocessing steps of rotating or perspective zooming the bill image; aligning the characters in the bill image along the horizontal and vertical directions after rotating or perspective zooming; and clipping the aligned image to obtain the secondary image.

As a preferable scheme of the OCR-based multi-note automatic recognition method of the present invention, wherein: the denoising module comprises the following steps of performing denoising processing on the secondary image; adjusting histogram information of the secondary image; reserving light pixels in the light area and dark pixels in the dark area; the standard image of a high contrast sample is obtained.

As a preferable scheme of the OCR-based multi-note automatic recognition method of the present invention, wherein: the bill recognition module comprises the following recognition processing steps of analyzing a structure of the standard image containing the character to be recognized; denoising and correcting the object to be detected by using a threshold value; performing row-column segmentation on the text information; and introducing the divided character image into a recognition model for processing to obtain character information in the original image.

As a preferable scheme of the OCR-based multi-note automatic recognition method of the present invention, wherein: the identification model adopts a CTPN algorithm model and comprises the following identification steps of detecting different unit blocks formed by dividing horizontal characters in a complex scene; adding a vertical Anchor to detect vertical characters; learning spatial features and sequence features in the image by using a bidirectional LSTM layer; the regular expression is used to find the corresponding meaning of each character in the bill image.

As a preferable scheme of the OCR-based multi-note automatic recognition method of the present invention, wherein: the character segmentation comprises the following steps of cutting a single character by an image non-uniform segmentation method; obtaining the width of each character by using a function, and selecting a group suitable for segmentation from a plurality of approximate classifications; and using a CNN algorithm model to identify and recognize the classified group of characters.

As a preferable scheme of the OCR-based multi-note automatic recognition method of the present invention, wherein: the CTPN algorithm model comprises the following steps of obtaining feature maps with the first 5 Convstage of VGG16, wherein the size of the feature maps is W × H × C; extracting features on the feature map with 3 x 3 sliding windows; predicting a target to-be-selected area defined by a plurality of anchors by utilizing the extracted features; outputting W x 256 results in the LSTM layer with bidirectional extracted characteristic input values; inputting the result to a 512-dimensional full connection layer; and finally, obtaining the recognized output through classification or regression.

As a preferable scheme of the OCR-based multi-note automatic recognition method of the present invention, wherein: the output comprises the coordinates of the height and the center of the selection frame on the y axis, the category information of k anchors and the horizontal offset of the selection frame; the category information can specify whether it is a character.

As a preferable scheme of the OCR-based multi-note automatic recognition method of the present invention, wherein: the image preprocessing module comprises the following steps of obtaining the secondary image through uniform size and alignment; setting a global threshold value T for the secondary image; dividing data of the image into two parts by T, wherein the two parts comprise pixel groups larger than T and pixel groups smaller than T; the pixel values of the pixel groups larger than T are set to white and the pixel values of the pixel groups smaller than T are set to black.

The invention solves the technical problems that: the method is realized by depending on the system, and can keep small identification difference when different samples are identified.

In order to solve the technical problems, the invention provides the following technical scheme: an OCR-based multi-bill automatic identification system comprises an image acquisition module, an image preprocessing module, a noise removing module and a bill identification module; the image acquisition module is used for acquiring a bill image to be identified; the image preprocessing module is used for processing the acquired image to obtain a secondary image; the denoising module is used for denoising the secondary image to obtain a standard image; the bill identification module is used for detecting and identifying the standard image to generate an identification result.

The invention has the beneficial effects that: the OCR-based multi-bill automatic recognition method can reduce recognition difference of recognizing the existence of a plurality of different bills in one image.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise. Wherein:

fig. 1 is a schematic flow chart of a CTPN algorithm according to a first embodiment of this embodiment;

fig. 2 is a schematic diagram of a cnn algorithm model according to the first embodiment of the present invention;

FIG. 3 is a schematic diagram of a two-way LSTM structure model according to the first embodiment of the present invention;

FIG. 4 is a schematic structural diagram of an overall principle of an OCR-based multi-ticket automatic recognition system according to a second embodiment of the present embodiment;

fig. 5 is a schematic diagram of an actual recognition effect according to the second embodiment of this embodiment;

fig. 6 is a schematic diagram of an actual recognition effect of multiple tickets according to the second embodiment of the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, specific embodiments accompanied with figures are described in detail below, and it is apparent that the described embodiments are a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making creative efforts based on the embodiments of the present invention, shall fall within the protection scope of the present invention.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.

Furthermore, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.

The present invention will be described in detail with reference to the drawings, wherein the cross-sectional views illustrating the structure of the device are not enlarged partially in general scale for convenience of illustration, and the drawings are only exemplary and should not be construed as limiting the scope of the present invention. In addition, the three-dimensional dimensions of length, width and depth should be included in the actual fabrication.

Meanwhile, in the description of the present invention, it should be noted that the terms "upper, lower, inner and outer" and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of describing the present invention and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation and operate, and thus, cannot be construed as limiting the present invention. Furthermore, the terms first, second, or third are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

The terms "mounted, connected and connected" in the present invention are to be understood broadly, unless otherwise explicitly specified or limited, for example: can be fixedly connected, detachably connected or integrally connected; they may be mechanically, electrically, or directly connected, or indirectly connected through intervening media, or may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

Example 1

In this embodiment, an OCR (optical character recognition) is proposed, which refers to a process in which an electronic device (e.g., a scanner or a digital camera) checks a character printed on paper, determines a shape thereof by detecting a dark and light pattern, and then translates the shape into a computer text by a character recognition method; the method is a technology for converting characters in a paper document into an image file with a black-white dot matrix in an optical mode aiming at print characters, and converting the characters in the image into a text format through recognition software for further editing and processing by word processing software. How to debug or use the auxiliary information to improve the recognition accuracy is a main indicator for measuring the performance of the OCR system: the rejection rate, the false recognition rate, the recognition speed, the user interface friendliness, the product stability, the usability, the feasibility and the like.

Referring to the illustrations of fig. 1 to 3, the OCR utilizes optical technology and computer technology to read out characters printed or written on paper and converts the characters into a form that can be understood by both a computer and a person, and the embodiment provides an OCR-based multi-note automatic recognition method, which specifically comprises the following steps,

s1: acquiring an OCR bill sample;

s2: the image acquisition module 100 acquires a bill image to be identified;

s3: the bill image is input into an image preprocessing module 200 to be processed to obtain a secondary image; the image pre-processing module 200 in this step comprises the following pre-processing steps,

rotating or perspective zooming the bill image;

aligning the characters in the bill image along the horizontal and vertical directions after rotating or perspective zooming;

the aligned image is cropped to obtain a secondary image.

In order to reduce the recognition difference between different images, the method also comprises the following steps,

obtaining a secondary image through uniform size and alignment;

setting a global threshold value T for the secondary image;

dividing data of the image into two parts by T, wherein the two parts comprise pixel groups larger than T and pixel groups smaller than T;

setting the pixel value of the pixel group larger than T as white and the pixel value of the pixel group smaller than T as black

S4: the denoising module 300 denoises the secondary image to obtain a standard image; the denoising module 300 includes the following steps,

performing decolorizing processing on the secondary image;

adjusting histogram information of the secondary image;

reserving light pixels in the light area and dark pixels in the dark area;

a standard image of the high contrast sample is obtained.

S5: the standard image is input into the bill recognition module 400 for detection and recognition.

The ticket identification module 400 includes the following identification processing steps,

analyzing a structure of a standard image containing characters to be recognized;

denoising and correcting the object to be detected by using a threshold value;

performing row-column segmentation on the text information;

and introducing the divided character image into a recognition model for processing to obtain character information in the original image.

Furthermore, the identification model adopts a CTPN algorithm model and comprises the following identification steps,

detecting different unit blocks divided by characters in horizontal rows in a complex scene;

adding a vertical Anchor to detect vertical characters;

learning spatial features and sequence features in the image by using a bidirectional LSTM layer;

the regular expression is used to find the corresponding meaning of each character in the bill image.

The text segmentation in this embodiment comprises the following steps,

cutting a single character by an image non-uniform segmentation method;

obtaining the width of each character by using a function, and selecting a group suitable for segmentation from a plurality of approximate classifications;

and using a CNN algorithm model to identify and recognize the classified group of characters.

The CTPN algorithm model includes the following steps,

using the first 5 Convstage of VGG16 to obtain feature map with the size of W × H × C;

extracting features on feature map with 3 × 3 sliding window;

predicting a target to-be-selected area defined by a plurality of anchors by utilizing the extracted features;

outputting W x 256 results in the LSTM layer with bidirectional extracted characteristic input values;

inputting the result to a 512-dimensional full connection layer;

and finally, obtaining the recognized output through classification or regression.

Wherein the output comprises the height of the selection frame, the coordinate of the y axis of the center, the category information of k anchors and the horizontal offset of the selection frame; the category information can specify whether it is a character or not.

Scene one:

the technical effects adopted in the method are verified and explained, different methods selected in the embodiment and the method are adopted for comparison and test, and the test results are compared by means of scientific demonstration to verify the real effect of the method. The traditional technical scheme has the defect of insufficient accuracy in identification.

In order to verify that the method has higher identification precision compared with other methods.

In this embodiment, the ocr algorithm tesseract of the google open source and the method are adopted to respectively identify different bills, so as to compare the bills.

In the embodiment, 50 high-iron tickets and 50 value-added tax invoices are used as test samples to test the performances of the two methods. tesseract is mainly tested using google open source code. The method herein is tested in the python programming language.

The results are shown in table 1 below.

Table 1: and (6) testing results.

	False recognition rate of 50 high railway tickets	Increment of 50 sheetsFalse rate of tax receipt
			tesseract	10％	16％
Methods of the invention	4％	8％

From the table above it can be seen that the method proposed herein is superior to tesseract.

It should be recognized that embodiments of the present invention can be realized and implemented by computer hardware, a combination of hardware and software, or by computer instructions stored in a non-transitory computer readable memory. The methods may be implemented in a computer program using standard programming techniques, including a non-transitory computer-readable storage medium configured with the computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner, according to the methods and figures described in the detailed description. Each program may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on a programmed application specific integrated circuit for this purpose.

Further, the operations of processes described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes described herein (or variations and/or combinations thereof) may be performed under the control of one or more computer systems configured with executable instructions, and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) collectively executed on one or more processors, by hardware, or combinations thereof. The computer program includes a plurality of instructions executable by one or more processors.

Further, the method may be implemented in any type of computing platform operatively connected to a suitable interface, including but not limited to a personal computer, mini computer, mainframe, workstation, networked or distributed computing environment, separate or integrated computer platform, or in communication with a charged particle tool or other imaging device, and the like. Aspects of the invention may be embodied in machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, optically read and/or write storage medium, RAM, ROM, or the like, such that it may be read by a programmable computer, which when read by the storage medium or device, is operative to configure and operate the computer to perform the procedures described herein. Further, the machine-readable code, or portions thereof, may be transmitted over a wired or wireless network. The invention described herein includes these and other different types of non-transitory computer-readable storage media when such media include instructions or programs that implement the steps described above in conjunction with a microprocessor or other data processor. The invention also includes the computer itself when programmed according to the methods and techniques described herein. A computer program can be applied to input data to perform the functions described herein to transform the input data to generate output data that is stored to non-volatile memory. The output information may also be applied to one or more output devices, such as a display. In a preferred embodiment of the invention, the transformed data represents physical and tangible objects, including particular visual depictions of physical and tangible objects produced on a display.

Example 2

Referring to the illustrations of fig. 4-6, an OCR-based multi-bill automatic identification system includes an image acquisition module 100, an image preprocessing module 200, a noise removing module 300 and a bill identification module 400; the image acquisition module 100 is used for acquiring a bill image to be identified; the image preprocessing module 200 is used for processing the acquired image to obtain a secondary image; the denoising module 300 is used for denoising the secondary image to obtain a standard image; the bill identifying module 400 is used for detecting the identification standard image and generating an identification result. The image capturing module 100 is a shooting device of a camera, and acquires a front-end image. The image preprocessing module 200, the denoising module 300 and the bill identifying module 400 are software modules of a computer, which are corresponding to hardware parts of a computer processor, and practice corresponding processing and identifying functions through program codes.

Referring to fig. 6, a schematic diagram of a plurality of bill identifications included in an image is shown. The segmentation and field identification of a plurality of invoices on one picture can be obviously seen from the identification effect of fig. 6, the method provided by the embodiment can identify the edge of each invoice and successfully identify the effective field.

As used in this application, the terms "component," "module," "system," and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to being: a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of example, both an application running on a computing device and the computing device can be a component. One or more components can reside within a process and/or thread of execution and a component can be localized on one computer and/or distributed between two or more computers. In addition, these components can execute from various computer readable media having various data structures thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from one component interacting with another component in a local system, distributed system, and/or across a network such as the internet with other systems by way of the signal).

It should be noted that the above-mentioned embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims

1. An OCR-based multi-bill automatic identification method is characterized in that: comprises the following steps of (a) carrying out,

acquiring an OCR bill sample;

the image acquisition module (100) acquires a bill image to be identified;

the bill image is input into an image preprocessing module (200) to be processed to obtain a secondary image;

a denoising module (300) is used for denoising the secondary image to obtain a standard image;

and the standard image is input into a bill recognition module (400) for detection and recognition.

2. An OCR-based multi-note automatic recognition method according to claim 1, characterized in that: the image pre-processing module (200) comprises the following pre-processing steps,

rotating or perspective zooming the bill image;

and clipping the aligned image to obtain the secondary image.

3. An OCR-based multi-note automatic recognition method according to claim 1 or 2, characterized in that: the denoising module (300) comprises the steps of,

performing a decoloring process on the secondary image;

adjusting histogram information of the secondary image;

reserving light pixels in the light area and dark pixels in the dark area;

the standard image of a high contrast sample is obtained.

4. An OCR-based multi-note automatic recognition method according to claim 3, characterized in that: the ticket identification module (400) includes the following identification processing steps,

analyzing the structure of the standard image containing the character to be recognized;

denoising and correcting the object to be detected by using a threshold value;

performing row-column segmentation on the text information;

5. An OCR-based multi-note automatic recognition method according to any one of claims 1 to 2 or 4, characterized in that: the identification model adopts a CTPN algorithm model and comprises the following identification steps,

adding a vertical Anchor to detect vertical characters;

6. An OCR-based multi-note automatic recognition method according to claim 5, characterized in that: the text segmentation comprises the following steps of,

cutting a single character by an image non-uniform segmentation method;

7. An OCR-based multi-note automatic recognition method according to claim 6, characterized in that: the CTPN algorithm model includes the following steps,

extracting features on the feature map with 3 x 3 sliding windows;

inputting the result to a 512-dimensional full connection layer;

8. An OCR-based multi-note automatic recognition method according to claim 7, characterized in that: the output comprises the coordinates of the height and the center of the selection frame on the y axis, the category information of k anchors and the horizontal offset of the selection frame; the category information can specify whether it is a character.

9. An OCR-based multi-note automatic recognition method according to claim 8, characterized in that: the image pre-processing module (200) comprises the steps of,

obtaining the secondary image through uniform size and alignment;

setting a global threshold value T for the secondary image;

the pixel values of the pixel groups larger than T are set to white and the pixel values of the pixel groups smaller than T are set to black.

10. An OCR-based multi-bill automatic identification system is characterized in that: the device comprises an image acquisition module (100), an image preprocessing module (200), a noise-removing module (300) and a bill identification module (400);

the image acquisition module (100) is used for acquiring a bill image to be identified;

the image preprocessing module (200) is used for processing the acquired image to obtain a secondary image;

the denoising module (300) is used for denoising the secondary image to obtain a standard image;

the bill identification module (400) is used for detecting and identifying the standard image to generate an identification result.