CN110688998A - Bill identification method and device - Google Patents

Bill identification method and device Download PDF

Info

Publication number
CN110688998A
CN110688998A CN201910921362.7A CN201910921362A CN110688998A CN 110688998 A CN110688998 A CN 110688998A CN 201910921362 A CN201910921362 A CN 201910921362A CN 110688998 A CN110688998 A CN 110688998A
Authority
CN
China
Prior art keywords
bill
data
field
similarity
tuple
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910921362.7A
Other languages
Chinese (zh)
Inventor
丁平
杨春明
郭铸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN201910921362.7A priority Critical patent/CN110688998A/en
Publication of CN110688998A publication Critical patent/CN110688998A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Character Discrimination (AREA)

Abstract

The invention discloses a bill identification method and a bill identification device, wherein the method comprises the following steps: acquiring an OCR recognition result of the bill to be recognized, wherein the OCR recognition result comprises data recognition results of each field contained in each bill element in the bill to be recognized; acquiring a plurality of data tuples corresponding to each bill element according to an OCR recognition result, wherein each data tuple comprises real data of a corresponding field; calculating the similarity of each bill element and each corresponding data tuple according to the data identification result of each field contained in each bill element and the real data of the corresponding field in each data tuple corresponding to each bill element; determining the data tuple with the maximum similarity as the recognition result of each bill element; and generating an identification result of the bill to be identified according to the identification result of each bill element in the bill to be identified. The invention can improve the accuracy of diversified bill identification and meet the bill identification requirements of more application scenes.

Description

Bill identification method and device
Technical Field
The invention relates to the field of image processing, in particular to a bill identification method and a bill identification device.
Background
This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.
The bill, as the original certificate of financial accounting of the enterprise, needs to be processed electronically to realize digital flow. Traditional manual entry mode, the enterprise often need drop into a large amount of human costs and time cost, has not only increased the operation cost, enters the speed moreover and is difficult to promote, often appears the mistake easily. With the rapid development and wide application of Optical Character Recognition (OCR) technology, bill recognition efficiency is greatly improved. The OCR technology is used as a high-efficiency and low-cost data acquisition scheme, and powerful support is provided for the rapid development of enterprise business.
Since OCR recognition technology cannot achieve one hundred percent recognition accuracy, some OCR post-processing methods come in succession. The prior OCR post-processing method is based on a general language database, and post-processes characters recognized by OCR through technologies such as an N-Gram language model, a context-free model, an N-POS model, a decision tree-based language model and the like. The OCR recognition method based on the general corpus can play a certain role in improving the recognition accuracy of general notes, but is difficult to meet the recognition post-processing of some special notes.
For example, a bank may generate a large number of bills during the transaction. In order to realize electronic storage of the paper bills with huge quantity and various types, the works of bill scanning, data entry, manual proofreading and the like are required, and the OCR bill recognition plays a great role. Compared with the traditional manual entry mode, the intelligent entry of OCR bill recognition has strong advantages, the recognition speed of the intelligent entry is far faster than that of manual entry, a large amount of human resources are saved, the resource configuration is optimized, and personnel can be allocated to more meaningful work. However, since some large banks have a wide operation range, a large variety of bills are generated in the process of handling business. The variety of the bill types not only increases the recognition difficulty of OCR, but also some newly added bill field contents may not be included in the general corpus, which may cause the OCR recognition error. In addition, if the contents of the fields stored in the general corpus are inconsistent with the contents of the fields defined by the bank, the OCR recognition may be mistaken for a certain field.
Therefore, a bill identification method is urgently needed in the prior art, the bill identification efficiency is improved, meanwhile, the bill identification requirements of more application scenes can be met, and the diversified bill identification accuracy is improved.
Disclosure of Invention
The embodiment of the invention provides a bill recognition method, which is used for solving the technical problem of low accuracy rate of diversified bill recognition in the conventional OCR recognition method based on a general corpus, and comprises the following steps: acquiring an OCR recognition result of a bill to be recognized, wherein the bill to be recognized comprises at least one bill element, each bill element comprises a plurality of fields with an association relation, and the OCR recognition result comprises a data recognition result of each field contained in each bill element in the bill to be recognized; acquiring a plurality of data tuples corresponding to each bill element according to an OCR recognition result, wherein each data tuple comprises real data of a corresponding field; calculating the similarity of each bill element and each corresponding data tuple according to the data identification result of each field contained in each bill element and the real data of the corresponding field in each data tuple corresponding to each bill element; determining the data tuple with the maximum similarity with each bill element in the plurality of data tuples corresponding to each bill element as the identification result of each bill element; generating the recognition result of the bill to be recognized according to the recognition result of each bill element in the bill to be recognized
The embodiment of the invention also provides a bill recognition device, which is used for solving the technical problem of low accuracy rate of diversified bill recognition by the conventional OCR recognition method based on the universal corpus, and comprises the following steps: the bill OCR recognition unit is used for acquiring an OCR recognition result of a bill to be recognized, wherein the bill to be recognized comprises at least one bill element, each bill element comprises a plurality of fields with an association relation, and the OCR recognition result comprises a data recognition result of each field contained in each bill element in the bill to be recognized; the data tuple acquisition unit is used for acquiring a plurality of data tuples corresponding to each bill element according to an OCR recognition result, wherein each data tuple comprises real data of a corresponding field; the data similarity calculation unit is used for calculating the similarity between each bill element and each corresponding data tuple according to the data identification result of each field contained in each bill element and the real data of the corresponding field in each data tuple corresponding to each bill element; the data similarity comparison unit is used for determining the data tuple with the maximum similarity with each bill element in the plurality of data tuples corresponding to each bill element as the identification result of each bill element; and the bill identification result generation unit is used for generating the identification result of the bill to be identified according to the identification result of each bill element in the bill to be identified.
The embodiment of the invention also provides computer equipment for solving the technical problem of low accuracy rate of diversified bill identification by using the conventional OCR identification method based on the general corpus.
The embodiment of the invention also provides a computer readable storage medium, which is used for solving the technical problem of low accuracy rate of diversified bill identification in the conventional OCR identification method based on the general corpus, and the computer readable storage medium stores a computer program for executing the bill identification method.
In the embodiment of the invention, after a note to be recognized is recognized by adopting an OCR technology and an OCR recognition result of the note to be recognized is obtained, a data tuple corresponding to each note element in the OCR recognition result is obtained, because the data tuple of each note element comprises real data of a corresponding field in each note element, the similarity between each note element and each corresponding data tuple is calculated and calculated according to the data recognition result of each field in each note element and the real data of the corresponding field in the corresponding data tuple, and then the data tuple with the maximum similarity with each note element in a plurality of data tuples corresponding to each note element is determined as the recognition result of each note element; and generating an identification result of the bill to be identified according to the identification result of each bill element in the bill to be identified.
By the embodiment of the invention, the accuracy of the OCR recognition method based on the general corpus to diversified bill recognition can be improved, and the bill recognition requirements of more application scenes can be met.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts. In the drawings:
FIG. 1 is a flow chart of a bill identification method provided in an embodiment of the present invention;
fig. 2 is a schematic view of a bill identifying device provided in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.
As can be seen from the content introduced in the background section of the present application, the conventional OCR recognition method based on a general corpus has a low recognition accuracy for diversified tickets. The inventor finds that the relevant fields of various bills of the bank have electronic archived data along with the improvement of the electronization degree of the bank, so that the bill OCR recognition result is post-processed by utilizing the electronic archived data corresponding to the bill fields, the bill recognition accuracy can be improved, and the bill recognition requirements of more application scenes can be met.
The embodiment of the invention provides a bill identification method, fig. 1 is a flowchart of the bill identification method provided in the embodiment of the invention, and as shown in fig. 1, the method includes the following steps:
s101, obtaining an OCR recognition result of the bill to be recognized, wherein the bill to be recognized comprises at least one bill element, each bill element comprises a plurality of fields with an association relation, and the OCR recognition result comprises a data recognition result of each field contained in each bill element in the bill to be recognized.
It should be noted that the ticket to be identified may be any paper ticket, including but not limited to any of the following: special bills, special checks, bills of lading, and the like; because the work of bill identification is mainly to identify the data corresponding to each field contained in the bill, and many fields in the bill have an association relationship, the field having the association relationship in the bill to be identified is taken as a bill element in the embodiment of the invention. Thus, the OCR recognition result in the ticket to be recognized in S101 described above includes the data recognition result of each field included in each ticket element in the ticket to be recognized.
For example, an "account name and account number" of a payer in a certain bank bill is taken as a bill element, and the bill element includes two fields, wherein the first field is the account name, and the second field is the account number corresponding to the account name.
As an optional implementation manner, the step S101 may specifically include the following steps: collecting a bill image of a bill to be identified; and identifying the bill image by adopting an OCR (optical character recognition) algorithm to obtain an OCR identification result of the bill to be identified.
It should be noted that the bill to be identified may be an image obtained by scanning a paper bill through various scanning electronic devices (e.g., a scanner or a camera); and recognizing the words or characters on the bill image by adopting an OCR recognition algorithm, and converting the words or characters into words or characters which can be processed by a computer.
Optionally, before the OCR recognition algorithm is used to recognize the image of the bill, the bill to be recognized may be preprocessed, including but not limited to image binarization, image denoising, tilt correction, and the like.
Because the bill image of the bill to be identified can be a color image, the amount of information contained in the color image is large, and the calculation efficiency is influenced. Therefore, the color image is divided into the foreground and the background, and the foreground and background information are respectively defined as black and white to obtain the binary image corresponding to the bill to be identified. Through image binarization, the speed of recognizing characters by a computer can be improved.
Because the acquired bill image of the bill to be recognized may contain noise information, before the bill image is recognized by adopting the OCR recognition algorithm, the bill image of the bill to be recognized can be denoised by adopting various denoising algorithms so as to improve the bill recognition accuracy.
In addition, when a user collects a bill image to be recognized through the scanning electronic device, the collected bill image may be inclined due to human factors, and therefore, before the bill image is recognized through the OCR recognition algorithm, the collected bill image needs to be subjected to inclination correction processing. Through inclination correction, the bill identification accuracy can be improved.
S102, acquiring a plurality of data tuples corresponding to each bill element according to the OCR recognition result, wherein each data tuple comprises real data of a corresponding field.
It should be noted that, because there is an association relationship between fields included in each ticket element in the ticket to be recognized, after the OCR recognition result of the ticket to be recognized is obtained through S101, each field included in each ticket element in the ticket to be recognized can be recognized from the OCR recognition result of the ticket to be recognized, all real data corresponding to the corresponding field are obtained from various ticket electronic systems according to the fields, a plurality of data tuples including the fields are constructed, and each data tuple includes one real data of the corresponding field.
Taking the note element "account name and account number" as an example, the data tuple corresponding to the note element may be expressed as < (K1, V1), (K2, V2) >, where K1 is the account name and V1 is the value corresponding to the account name; k2 is the account number, and V2 is the actual value of the account number. For example, one data tuple of the ticket element is < ("account number name", "XX environmental protection company"), ("account number", "214234132143284372414") >.
It should be noted that, for different bills, corresponding element information can be obtained from different electronic systems, and the element information corresponds to some fields having association in the bills. For bank bills, a plurality of data tuples corresponding to bill elements 'account names and account numbers' can be constructed according to the account names and account numbers of all users in a bank system.
Therefore, in an optional implementation manner, before executing S102, the method for identifying a ticket according to an embodiment of the present invention may further include the following steps: collecting a plurality of real data corresponding to each field from a plurality of electronic systems according to each field contained in each bill element in an OCR recognition result; and generating a plurality of data tuples corresponding to each bill element according to a plurality of real data corresponding to each field.
Specifically, after each field contained in each bill element in the bill to be recognized is obtained according to an OCR recognition result of the bill to be recognized, real data of relevant fields are crawled from each electronic system; in order to further improve the speed of bill identification, the embodiment of the invention can only crawl the data of the corresponding field in the electronic system associated with the bill to be identified according to the type of the bill to be identified.
Still taking the note element "account name and account number" as an example, if a certain note is a note for an enterprise user, only the enterprise account name and account number are crawled; if a certain ticket is a ticket for an individual user, only the individual account name and account number may be crawled.
S103, calculating the similarity between each bill element and each corresponding data tuple according to the data identification result of each field contained in each bill element and the real data of the corresponding field in each data tuple corresponding to each bill element.
It should be noted that, when calculating the similarity between the recognition result of each field data and the real data, the embodiment of the present invention may determine the similarity between the two data by using the editing distance between the two data, where the editing distance is the minimum number of editing operations required to convert one character string into another character string, and the editing operations include: a replacement operation of replacing one character with another character, an addition operation of adding one character newly, and a deletion operation of deleting one character; the smaller the editing distance is, the greater the similarity of the two data is, specifically, the relationship between the similarity between the two data and the editing distance is as follows:
Figure BDA0002217672890000061
wherein, γ1,2Representing a similarity of the first data and the second data; l is1,2Representing a minimum number of edits required to replace the first data with the second data; l is1A character string length indicating first data; l is2A character string length indicating the second data; max { L1,L2Denotes L1And L2The larger value of (a).
Since each field included in each ticket element may have different data types, for example, the value of the "account name" field in the "account name and account" field in the ticket element is a chinese character, and the value of the "account" field is a number, in order to reduce the computational complexity and improve the computational efficiency, the embodiment of the present invention performs similarity calculation on each field included in each ticket element.
Therefore, as an optional implementation manner, the step S103 may specifically include the following steps: calculating the similarity between each field contained in each bill element and the corresponding field in each corresponding data tuple according to the data identification result of each field contained in each bill element and the real data of the corresponding field in each corresponding data tuple; and determining the sum of the similarity of each field contained in each bill element and the corresponding field in each corresponding data tuple as the similarity of each bill element and each corresponding data tuple.
Assuming that the OCR recognition result of the ticket element "account name and account" is R, when calculating the similarity between the data tuple < (K1, V1), (K2, V2) > and the recognition result R, performing similarity calculation on the account name value in R and V1 to obtain a similarity W1; calculating the similarity between the account value in the R and V2 to obtain the similarity W2; finally, the similarity is summed up by W-1 + W2, and W is used as the similarity between the data tuple < (K1, V1), (K2, V2) > and the recognition result R.
Preferably, in order to further improve the bill recognition accuracy, for the OCR recognition result of each bill element, when determining the similarity between the bill element and each data tuple according to the similarity of each field, a corresponding weight may be configured for each field in the bill element in advance, and finally, the similarity between the data recognition result of each field in each bill element and the real data of the corresponding field in each data tuple is weighted and averaged, and the weighted average of the similarity of each field is used as the similarity between the bill element and each data tuple.
For example, for the ticket element "account name and account number", the weight of the "account name" field is configured to be a, the weight of the "account number" field is configured to be b, and a + b is 1; similarity calculation is carried out on the account name value in the R and V1 to obtain similarity W1, and similarity calculation is carried out on the account value in the R and V2 to obtain similarity W2; finally, the similarity of each field is weighted and averaged to obtain W ═ aW1+ bW2, and W is used as the similarity between the data tuple < (K1, V1), (K2, V2) > and the recognition result R.
And S104, determining the data tuple with the maximum similarity with each bill element in the plurality of data tuples corresponding to each bill element as the identification result of each bill element.
Specifically, after the similarity between each bill element and each data tuple is obtained through calculation, the data tuples are sorted from small to small according to the similarity, and then the data tuple with the maximum similarity is determined as the recognition result of the corresponding bill element. Because the data of each field in the data tuple is the real data collected from the bill electronic system, the accuracy rate can reach one hundred percent, therefore, the data tuple containing the real data is determined as the recognition result of each bill element in the bill to be recognized, and the one hundred percent of bill recognition accuracy rate can be realized.
And S105, generating the recognition result of the bill to be recognized according to the recognition result of each bill element in the bill to be recognized.
Specifically, because the bill to be recognized contains a plurality of bill elements, the recognition result of the bill to be recognized can be generated according to the recognition results of all the bill elements in the bill to be recognized.
As can be seen from the above, in the bill identifying method provided in the embodiment of the present invention, after the bill to be identified is identified by using the OCR technology and the OCR identification result of the bill to be identified is obtained, the data tuple corresponding to each bill element in the OCR identification result is obtained, because the data tuple of each bill element includes the real data of the corresponding field in each bill element, the similarity between each bill element and each corresponding data tuple is calculated according to the data identification result of each field in each bill element and the real data of the corresponding field in the corresponding data tuple, and then the data tuple with the maximum similarity to each bill element in the plurality of data tuples corresponding to each bill element is determined as the identification result of each bill element; and generating an identification result of the bill to be identified according to the identification result of each bill element in the bill to be identified.
By the bill recognition method provided by the embodiment of the invention, the accuracy of the OCR recognition method based on the general corpus to diversified bill recognition can be improved, and the bill recognition requirements of more application scenes can be met.
Based on the same inventive concept, the embodiment of the present invention further provides a bill identifying device, as described in the following embodiments. Because the principle of solving the problems of the embodiment of the device is similar to that of the bill identification method, the implementation of the embodiment of the device can refer to the implementation of the method, and repeated parts are not described again.
Fig. 2 is a schematic diagram of a bill identifying apparatus provided in an embodiment of the present invention, and as shown in fig. 2, the apparatus may include: the bill OCR recognition unit 21, the data tuple acquisition unit 22, the data similarity calculation unit 23, the data similarity comparison unit 24 and the bill recognition result generation unit 25;
the bill OCR recognition unit 21 is configured to obtain an OCR recognition result of a bill to be recognized, where the bill to be recognized includes at least one bill element, each bill element includes a plurality of fields having an association relationship, and the OCR recognition result includes a data recognition result of each field included in each bill element in the bill to be recognized; the data tuple obtaining unit 22 is configured to obtain, according to the OCR recognition result, a plurality of data tuples corresponding to each ticket element, where each data tuple includes one real data of a corresponding field; the data similarity calculation unit 23 is configured to calculate a similarity between each ticket element and each corresponding data tuple according to a data identification result of each field included in each ticket element and real data of a corresponding field in each data tuple corresponding to each ticket element; the data similarity comparison unit 24 is configured to determine, as an identification result of each ticket element, a data tuple with the maximum similarity to each ticket element among the multiple data tuples corresponding to each ticket element; and the bill identification result generation unit 25 is used for generating an identification result of the bill to be identified according to the identification result of each bill element in the bill to be identified.
As can be seen from the above, in the bill recognition device provided in the embodiment of the present invention, the bill to be recognized is recognized by the bill OCR recognition unit 21 by using the OCR technology, so as to obtain an OCR recognition result of the bill to be recognized; acquiring a data tuple corresponding to each bill element in the OCR recognition result according to the OCR recognition result of the bill to be recognized by the data tuple acquisition unit 22; because the data tuple of each bill element contains the real data of the corresponding field in each bill element, the similarity between each bill element and the corresponding data tuple is calculated and calculated by the data similarity calculation unit 23 according to the data identification result of each field in each bill element and the real data of the corresponding field in the corresponding data tuple, and then the data tuple with the maximum similarity with each bill element in the plurality of data tuples corresponding to each bill element is determined as the identification result of each bill element by the data similarity comparison unit 24; and finally, generating the identification result of the bill to be identified according to the identification result of each bill element in the bill to be identified by a bill identification result generation unit 25.
The bill recognition device provided by the embodiment of the invention can improve the accuracy of the OCR recognition method based on the general language database to the diversified bill recognition and meet the bill recognition requirements of more application scenes.
In an alternative embodiment, in the bill identifying apparatus provided in the embodiment of the present invention, the data similarity calculating unit 23 may include: the first calculating module 231 is configured to calculate, according to the data identification result of each field included in each ticket element and the real data corresponding to the corresponding field in each data tuple, a similarity between each field included in each ticket element and the corresponding field in each data tuple; the second calculating module 232 is configured to determine a sum of similarity between each field included in each ticket element and a corresponding field in each corresponding data tuple as the similarity between each ticket element and each corresponding data tuple.
In an optional embodiment, the bill identifying apparatus provided in the embodiment of the present invention may further include: the bill data acquisition unit 26 is used for acquiring a plurality of real data corresponding to each field from a plurality of electronic systems according to each field contained in each bill element in the OCR recognition result; and the bill data processing unit 27 is configured to generate a plurality of data tuples corresponding to each bill element according to the plurality of real data corresponding to each field.
Based on any one of the above optional apparatus embodiments, as an optional embodiment, in the ticket recognition apparatus provided in the embodiment of the present invention, the ticket OCR recognizing unit 21 may include: the image acquisition module 211 is used for acquiring a bill image of a bill to be identified; and the OCR recognition module 212 is used for recognizing the bill image by adopting an OCR recognition algorithm to obtain an OCR recognition result of the bill to be recognized.
In summary, the bill identification method provided by the embodiment of the invention uses the bill electronic system to obtain the field information corresponding to different bills, constructs the field corpus for the bills, performs post-processing on the bill OCR recognition result, and finally generates the bill identification result according to the field real data with the largest similarity to the field OCR recognition result, so that the bill identification error rate caused by inconsistent fields can be reduced, and the bill identification method can be applied to bill identification in more application scenarios.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method of bill identification, comprising:
acquiring an OCR recognition result of a bill to be recognized, wherein the bill to be recognized comprises at least one bill element, each bill element comprises a plurality of fields with an association relation, and the OCR recognition result comprises a data recognition result of each field contained in each bill element in the bill to be recognized;
acquiring a plurality of data tuples corresponding to each bill element according to the OCR recognition result, wherein each data tuple comprises real data of a corresponding field;
calculating the similarity of each bill element and each corresponding data tuple according to the data identification result of each field contained in each bill element and the real data of the corresponding field in each data tuple corresponding to each bill element;
determining the data tuple with the maximum similarity with each bill element in the plurality of data tuples corresponding to each bill element as the identification result of each bill element;
and generating the recognition result of the bill to be recognized according to the recognition result of each bill element in the bill to be recognized.
2. The method of claim 1, wherein calculating the similarity of each ticket element and the corresponding data tuple according to the data identification result of each field included in each ticket element and the real data of the corresponding field in the corresponding data tuple of each ticket element comprises:
calculating the similarity between each field contained in each bill element and the corresponding field in each corresponding data tuple according to the data identification result of each field contained in each bill element and the real data of the corresponding field in each corresponding data tuple;
and determining the sum of the similarity of each field contained in each bill element and the corresponding field in each corresponding data tuple as the similarity of each bill element and each corresponding data tuple.
3. The method of claim 1, wherein prior to obtaining the plurality of data tuples corresponding to each ticket element from the OCR recognition results, the method further comprises:
collecting a plurality of real data corresponding to each field from a plurality of electronic systems according to each field contained in each bill element in the OCR recognition result;
and generating a plurality of data tuples corresponding to each bill element according to a plurality of real data corresponding to each field.
4. The method of any one of claims 1 to 3, wherein obtaining OCR recognition results for the ticket to be recognized comprises:
collecting a bill image of the bill to be identified;
and identifying the bill image by adopting an OCR (optical character recognition) algorithm to obtain an OCR identification result of the bill to be identified.
5. A bill identifying apparatus, comprising:
the bill OCR recognition unit is used for acquiring an OCR recognition result of a bill to be recognized, wherein the bill to be recognized comprises at least one bill element, each bill element comprises a plurality of fields with an association relation, and the OCR recognition result comprises a data recognition result of each field contained by each bill element in the bill to be recognized;
the data tuple acquisition unit is used for acquiring a plurality of data tuples corresponding to each bill element according to the OCR recognition result, wherein each data tuple comprises real data of a corresponding field;
the data similarity calculation unit is used for calculating the similarity between each bill element and each corresponding data tuple according to the data identification result of each field contained in each bill element and the real data of the corresponding field in each data tuple corresponding to each bill element;
the data similarity comparison unit is used for determining the data tuple with the maximum similarity with each bill element in the plurality of data tuples corresponding to each bill element as the identification result of each bill element;
and the bill identification result generation unit is used for generating the identification result of the bill to be identified according to the identification result of each bill element in the bill to be identified.
6. The apparatus of claim 5, wherein the data similarity calculation unit comprises:
the first calculation module is used for calculating the similarity between each field contained in each bill element and the corresponding field in each corresponding data tuple according to the data identification result of each field contained in each bill element and the real data of the corresponding field in each corresponding data tuple;
and the second calculation module is used for determining the sum of the similarity of each field contained in each bill element and the corresponding field in each corresponding data tuple as the similarity of each bill element and each corresponding data tuple.
7. The apparatus of claim 5, wherein the apparatus further comprises:
the bill data acquisition unit is used for acquiring a plurality of real data corresponding to each field from a plurality of electronic systems according to each field contained in each bill element in the OCR recognition result;
and the bill data processing unit is used for generating a plurality of data tuples corresponding to each bill element according to the plurality of real data corresponding to each field.
8. The apparatus of any of claims 5 to 7, wherein the ticket OCR recognition unit comprises:
the image acquisition module is used for acquiring a bill image of the bill to be identified;
and the OCR recognition module is used for recognizing the bill image by adopting an OCR recognition algorithm to obtain an OCR recognition result of the bill to be recognized.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the ticket recognition method of any one of claims 1 to 4 when executing the computer program.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for executing the ticket identification method according to any one of claims 1 to 4.
CN201910921362.7A 2019-09-27 2019-09-27 Bill identification method and device Pending CN110688998A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910921362.7A CN110688998A (en) 2019-09-27 2019-09-27 Bill identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910921362.7A CN110688998A (en) 2019-09-27 2019-09-27 Bill identification method and device

Publications (1)

Publication Number Publication Date
CN110688998A true CN110688998A (en) 2020-01-14

Family

ID=69110516

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910921362.7A Pending CN110688998A (en) 2019-09-27 2019-09-27 Bill identification method and device

Country Status (1)

Country Link
CN (1) CN110688998A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291726A (en) * 2020-03-12 2020-06-16 泰康保险集团股份有限公司 Medical bill sorting method, device, equipment and medium
CN113239921A (en) * 2021-05-10 2021-08-10 上海交大慧谷通用技术有限公司 Task grading and distributing method and system for OCR (optical character recognition) service
CN114495031A (en) * 2022-03-31 2022-05-13 青岛海信网络科技股份有限公司 License plate information correction method, equipment and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103927352A (en) * 2014-04-10 2014-07-16 江苏唯实科技有限公司 Chinese business card OCR (optical character recognition) data correction system utilizing massive associated information of knowledge base
CN107610320A (en) * 2017-09-06 2018-01-19 深圳怡化电脑股份有限公司 A kind of bank slip recognition method and apparatus
CN109684440A (en) * 2018-12-13 2019-04-26 北京惠盈金科技术有限公司 Address method for measuring similarity based on level mark
CN109784342A (en) * 2019-01-24 2019-05-21 厦门商集网络科技有限责任公司 A kind of OCR recognition methods and terminal based on deep learning model
CN109919076A (en) * 2019-03-04 2019-06-21 厦门商集网络科技有限责任公司 The method and medium of confirmation OCR recognition result reliability based on deep learning
CN111046879A (en) * 2019-10-15 2020-04-21 平安科技(深圳)有限公司 Certificate image classification method and device, computer equipment and readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103927352A (en) * 2014-04-10 2014-07-16 江苏唯实科技有限公司 Chinese business card OCR (optical character recognition) data correction system utilizing massive associated information of knowledge base
CN107610320A (en) * 2017-09-06 2018-01-19 深圳怡化电脑股份有限公司 A kind of bank slip recognition method and apparatus
CN109684440A (en) * 2018-12-13 2019-04-26 北京惠盈金科技术有限公司 Address method for measuring similarity based on level mark
CN109784342A (en) * 2019-01-24 2019-05-21 厦门商集网络科技有限责任公司 A kind of OCR recognition methods and terminal based on deep learning model
CN109919076A (en) * 2019-03-04 2019-06-21 厦门商集网络科技有限责任公司 The method and medium of confirmation OCR recognition result reliability based on deep learning
CN111046879A (en) * 2019-10-15 2020-04-21 平安科技(深圳)有限公司 Certificate image classification method and device, computer equipment and readable storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291726A (en) * 2020-03-12 2020-06-16 泰康保险集团股份有限公司 Medical bill sorting method, device, equipment and medium
CN111291726B (en) * 2020-03-12 2023-08-08 泰康保险集团股份有限公司 Medical bill sorting method, device, equipment and medium
CN113239921A (en) * 2021-05-10 2021-08-10 上海交大慧谷通用技术有限公司 Task grading and distributing method and system for OCR (optical character recognition) service
CN114495031A (en) * 2022-03-31 2022-05-13 青岛海信网络科技股份有限公司 License plate information correction method, equipment and device

Similar Documents

Publication Publication Date Title
US10943105B2 (en) Document field detection and parsing
EP3440591B1 (en) Improving optical character recognition (ocr) accuracy by combining results across video frames
CN109543690B (en) Method and device for extracting information
CN110909725A (en) Method, device and equipment for recognizing text and storage medium
US8838657B1 (en) Document fingerprints using block encoding of text
CN111914835A (en) Bill element extraction method and device, electronic equipment and readable storage medium
CN105930159A (en) Image-based interface code generation method and system
CN111353491B (en) Text direction determining method, device, equipment and storage medium
CN110688998A (en) Bill identification method and device
CN110287125B (en) Software instantiation test method and device based on image recognition
WO2000052645A1 (en) Document image processor, method for extracting document title, and method for imparting document tag information
CN111444795A (en) Bill data identification method, electronic device, storage medium and device
CN110110325B (en) Repeated case searching method and device and computer readable storage medium
CN111401099A (en) Text recognition method, device and storage medium
GB2588251A (en) Partial perceptual image hashing for invoice deconstruction
CN112949455A (en) Value-added tax invoice identification system and method
CN104966109A (en) Medical laboratory report image classification method and apparatus
CN112508000B (en) Method and equipment for generating OCR image recognition model training data
CN113469005A (en) Recognition method of bank receipt, related device and storage medium
CN114529933A (en) Contract data difference comparison method, device, equipment and medium
CN110147516A (en) The intelligent identification Method and relevant device of front-end code in Pages Design
CN111325207A (en) Bill identification method and device based on preprocessing
CN116798061A (en) Bill auditing and identifying method, device, terminal and storage medium
CN111797922B (en) Text image classification method and device
CN112861843A (en) Method and device for analyzing selection frame based on feature image recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200114

RJ01 Rejection of invention patent application after publication