CN110263239B - Invoice identification method and device, storage medium and computer equipment - Google Patents

Invoice identification method and device, storage medium and computer equipment Download PDF

Info

Publication number
CN110263239B
CN110263239B CN201910469612.8A CN201910469612A CN110263239B CN 110263239 B CN110263239 B CN 110263239B CN 201910469612 A CN201910469612 A CN 201910469612A CN 110263239 B CN110263239 B CN 110263239B
Authority
CN
China
Prior art keywords
invoice
data
field
image
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910469612.8A
Other languages
Chinese (zh)
Other versions
CN110263239A (en
Inventor
周晓凤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910469612.8A priority Critical patent/CN110263239B/en
Publication of CN110263239A publication Critical patent/CN110263239A/en
Application granted granted Critical
Publication of CN110263239B publication Critical patent/CN110263239B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07DHANDLING OF COINS OR VALUABLE PAPERS, e.g. TESTING, SORTING BY DENOMINATIONS, COUNTING, DISPENSING, CHANGING OR DEPOSITING
    • G07D7/00Testing specially adapted to determine the identity or genuineness of valuable papers or for segregating those which are unacceptable, e.g. banknotes that are alien to a currency
    • G07D7/20Testing patterns thereon

Abstract

The application provides a method, a device, a storage medium and computer equipment for invoice identification, wherein the method comprises the following steps: acquiring an invoice image of a target invoice; identifying invoice information in the invoice image, wherein the invoice information comprises an invoice field and invoice data corresponding to the invoice field; and according to the corresponding relation between the text field of the invoice verification system and the invoice field of the invoice information, inputting corresponding invoice data in the invoice information into the text field of the invoice verification system, and acquiring a verification result fed back by the invoice verification system. The method does not need manual verification, can save time and cost of staff greatly, and improves speed and efficiency of verifying invoice authenticity.

Description

Invoice identification method and device, storage medium and computer equipment
Technical Field
The present application relates to the field of invoice identification technologies, and in particular, to a method, an apparatus, a storage medium, and a computer device for invoice identification.
Background
The invoice refers to business certificates which are issued and collected by all units and individuals in the process of purchasing and selling goods, providing or receiving services and engaging in other business activities, and is an original basis for accounting and an important basis for law enforcement inspection of audit institutions and tax institutions.
At present, with the development of economy, illegal actions such as invoice counterfeiting and sales counterfeiting are increased, and financial staff or daily consumers need to independently inquire invoice information in order to verify the authenticity of the invoice. In the traditional mode, when the authenticity of the invoice needs to be inquired, an official tax website is generally required to be found, information such as an invoice number, an invoice code and the like is manually input in an inquiry window of the tax website to inquire, the tax website verifies the authenticity of the invoice, the operation is complex, errors are easy to occur, a large amount of labor time is wasted, and the efficiency is low.
Disclosure of Invention
In order to solve the problems, the application provides a method, a device, a storage medium and computer equipment for invoice identification.
According to a first aspect of the present application there is provided a method of invoice identification, comprising:
acquiring an invoice image of a target invoice;
identifying invoice information in the invoice image, wherein the invoice information comprises an invoice field and invoice data corresponding to the invoice field;
and according to the corresponding relation between the text field of the invoice verification system and the invoice field of the invoice information, inputting corresponding invoice data in the invoice information into the text field of the invoice verification system, and acquiring a verification result fed back by the invoice verification system.
In one possible implementation, the identifying invoice information in the invoice image includes:
identifying all pending data in the invoice image, determining one or more invoice fields corresponding to the pending data according to the data format of the pending data, and respectively determining one or more pending data corresponding to each invoice field;
when the invoice field corresponds to one piece of undetermined data, the undetermined data is used as invoice data corresponding to the invoice field;
when the invoice field corresponds to a plurality of the pending data, selecting one pending data from the plurality of the pending data as effective pending data, and using the effective pending data as the invoice data corresponding to the invoice field.
In one possible implementation manner, the selecting one pending data from the plurality of pending data as valid pending data includes:
marking the undetermined data with the determined corresponding relation with the invoice field as an identified state;
and rejecting the undetermined data in the identified state in the plurality of undetermined data corresponding to the invoice field, and selecting one undetermined data from the rest undetermined data as effective undetermined data.
In one possible implementation manner, the selecting one pending data from the plurality of pending data as valid pending data includes:
taking invoice fields corresponding to a plurality of the undetermined data as target invoice fields, and determining the position of each undetermined data corresponding to the target invoice fields in the invoice image;
identifying an invoice text corresponding to the target invoice field in the invoice image, and determining the position of the invoice text in the invoice image;
respectively determining the distance between the invoice text and the undetermined data according to the position of the invoice text in the invoice image and the position of each undetermined data in the invoice image, wherein the distance comprises row distance and/or column distance;
and taking the pending data corresponding to the minimum interval in all the intervals as effective pending data.
In one possible implementation manner, the selecting one pending data from the plurality of pending data as valid pending data includes:
determining relevant identification information according to the undetermined data of which the corresponding relation with the invoice field is determined and the corresponding invoice field;
and taking the invoice fields corresponding to the plurality of undetermined data as target invoice fields, and taking undetermined data matched with the identification information as effective undetermined data when the identification information is contained in the invoice data corresponding to the target invoice fields.
In one possible implementation, the identifying invoice information in the invoice image includes:
determining valid invoice fields required by an invoice verification system;
and identifying invoice information related to the effective invoice field in the invoice image, wherein the invoice field in the invoice information is the effective invoice field.
In one possible implementation manner, after the verification result fed back by the invoice verification system is obtained, the method further includes:
and setting a reimbursement flag bit for the invoice information when the verification result is verification passing, and updating the reimbursement flag bit of the invoice information after the target invoice corresponding to the invoice information is reimbursed.
According to a second aspect of the present application there is provided an apparatus for invoice recognition, comprising:
the image acquisition module is used for acquiring an invoice image of the target invoice;
the identification module is used for identifying invoice information in the invoice image, wherein the invoice information comprises an invoice field and invoice data corresponding to the invoice field;
and the verification module is used for inputting corresponding invoice data in the invoice information into the text field of the invoice verification system according to the corresponding relation between the text field of the invoice verification system and the invoice field of the invoice information, and acquiring a verification result fed back by the invoice verification system.
According to a third aspect of the present application there is provided a computer readable storage medium having stored thereon computer readable instructions which when executed by a processor perform the above steps.
According to a fourth aspect of the present application there is provided a computer device comprising a memory, a processor and computer readable instructions stored on the memory and executable on the processor, the processor implementing the steps described above when executing the computer readable instructions.
The method, the device, the storage medium and the computer equipment for identifying the invoice can automatically identify the invoice, identify the invoice field and the invoice data in the invoice image, automatically fill the invoice data into the invoice verification system to inquire and verify the authenticity of the invoice, and acquire the verification result fed back by the invoice verification system, thereby automatically determining the authenticity of the invoice. The method does not need manual verification, can save time and cost of staff greatly, and improves speed and efficiency of verifying invoice authenticity. Meanwhile, the processing amount of image recognition can be reduced and the recognition efficiency can be improved by determining the effective invoice field. The accuracy of the identification is further improved by firstly identifying invoice data; and firstly, the invoice field to which the pending data possibly belongs is judged based on the data format of the pending data, and then correction is carried out based on the invoice text, so that the invoice data corresponding to the target invoice field can be more accurately determined.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
The technical scheme of the application is further described in detail through the drawings and the embodiments.
Drawings
The accompanying drawings are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate the application and together with the embodiments of the application, serve to explain the application. In the drawings:
FIG. 1 is a flow chart of a method for invoice recognition in an embodiment of the application;
FIG. 2 is a flow chart of identifying invoice information in an invoice image in an embodiment of the application;
FIG. 3 is a schematic representation of an invoice image in accordance with an embodiment of the application;
FIG. 4 is a schematic diagram of a first configuration of an apparatus for invoice recognition according to an embodiment of the present application;
FIG. 5 is a second schematic diagram of an apparatus for invoice recognition according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a computer device for executing the invoice recognition method according to an embodiment of the present application.
Detailed Description
The preferred embodiments of the present application will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present application only, and are not intended to limit the present application.
The method for identifying the invoice provided by the embodiment of the application, which is shown in fig. 1, comprises the following steps:
step 101: and acquiring an invoice image of the target invoice.
In the embodiment of the application, the target invoice is an invoice needing identification and verification, namely the authenticity of the target invoice needs to be verified. Specifically, an invoice image of a target invoice can be obtained in a scanning mode; or if the target invoice is an electronic invoice, the image corresponding to the electronic invoice can be directly used as an invoice image.
Step 102: invoice information in the invoice image is identified, the invoice information including an invoice field and invoice data corresponding to the invoice field.
In the embodiment of the application, after the invoice image is acquired, the invoice field and corresponding invoice data in the invoice image can be identified, namely the invoice information in the invoice image is identified. Among other things, invoice information in an invoice image may be identified based on image recognition techniques, such as OCR (Optical Character Recognition ) techniques, and the like.
In the embodiment of the application, the invoice field comprises one or more of invoice codes, invoice numbers, invoicing dates, invoicing amounts (without tax), check codes, purchaser names, purchaser tax payer identification numbers, seller names, seller tax payer identification numbers and the like, and the invoice data is the data corresponding to the invoice field. For example, the invoice field is an 8-bit "invoice number" and the corresponding invoice data is "12345678".
Step 103: and according to the corresponding relation between the text field of the invoice verification system and the invoice field of the invoice information, inputting corresponding invoice data in the invoice information into the text field of the invoice verification system, and acquiring a verification result fed back by the invoice verification system.
In the embodiment of the application, the invoice verification system is a system or an official platform capable of verifying the authenticity of the invoice, such as an official website of the national tax bureau. The text field is a control of the invoice verification system, and in this embodiment, the text field may be a text box control of a single row or multiple rows. Each text field corresponds to an invoice field and is used for facilitating a user to submit corresponding invoice data, so that the user can inquire whether the invoice is true or false. In the embodiment of the application, after the invoice field and the invoice data are identified, if the corresponding relation between the text field and the invoice field of the invoice verification system can be determined, the corresponding invoice data can be automatically filled into the corresponding text field, the target invoice is verified through the invoice verification system, and the verification result fed back by the invoice verification system can be obtained. If the automatically submitted invoice data is correct, the invoice verification system feeds back a verification result that the target invoice is a legal invoice; otherwise, the invoice verification system feeds back the verification result of the target invoice and the illegal invoice.
Optionally, since the invoice includes multiple information, but when the invoice verification system verifies the authenticity of the invoice, only a part of information of the invoice needs to be submitted by the user, step 102 of identifying the invoice information in the invoice image in the embodiment of the application specifically includes:
determining valid invoice fields required by an invoice verification system; invoice information related to the effective invoice field in the invoice image is identified, and the invoice field in the invoice information is the effective invoice field.
In the embodiment of the application, the effective invoice field is the field required by the invoice verification system to verify the authenticity of the invoice, and only the effective invoice field and the corresponding invoice data need to be identified when the invoice information is identified in the step 102. For example, an invoice verification system requires a user to provide three invoice fields when verifying an invoice: the invoice code, the invoice number and the issuing amount, namely the effective invoice field comprises the invoice code, the invoice number and the issuing amount, and when the invoice information of the target invoice is identified, only the invoice data corresponding to the invoice code, the invoice number and the issuing amount are identified. By determining the effective invoice field, the processing amount of image recognition can be reduced, and the recognition efficiency can be improved.
The invoice identification method provided by the embodiment of the application can automatically identify the invoice, identify the invoice field and the invoice data in the invoice image, automatically fill the invoice data into the invoice verification system to inquire and verify the authenticity of the invoice, and acquire the verification result fed back by the invoice verification system, thereby automatically determining the authenticity of the invoice. The method does not need manual verification, can save time and cost of staff greatly, and improves speed and efficiency of verifying invoice authenticity. Meanwhile, the processing amount of image recognition can be reduced and the recognition efficiency can be improved by determining the effective invoice field.
On the basis of the above embodiment, in order to more accurately identify the invoice information, as shown in fig. 2, the step 102 of identifying invoice information in an invoice image includes:
step 1021: all the undetermined data in the invoice image are identified, one or more invoice fields corresponding to the undetermined data are determined according to the data format of the undetermined data, and one or more undetermined data corresponding to each invoice field are respectively determined.
In the embodiment of the application, the invoice data and the corresponding invoice fields are determined by firstly identifying the invoice data. Since invoice data in an invoice is generally composed of numbers, that is, "invoice data" in the present embodiment refers to data containing numbers; for example, the invoice data corresponding to "the amount of the posting" is "100.00" or "100.00", and the invoice data corresponding to "the posting date" is "2018, 1 month, 1 day" or "2018/1/1", etc. In general, the accuracy rate of identifying numbers based on the image identification technology is high, and in this embodiment, the accuracy rate of identification can be improved by first identifying invoice data.
Specifically, in this embodiment, first, pending data in an invoice image needs to be identified, where the pending data is data including numbers; since invoice data corresponding to different invoice fields in an invoice generally has a specific format, in this embodiment, according to the data format of the pending data, which one or more invoice fields the pending data may correspond to may be determined, that is, one or more invoice fields corresponding to the pending data may be determined. The data format of the pending data refers to the number of bits of the data, the inherent expression form of the data, and the like. For example, a 12-bit invoice code (the invoice code of the previous value-added tax invoice is 10 bits), an 8-bit invoice number, an "X year, X month and X day" format, an "X-X" format, or an "X/X" format of the date of the invoice, an "XXX" format, or an "xx.xx" format of the amount of the invoice, and the like. For example, the invoice number is 8 digits, the telephone (such as a supervision telephone) is also 8 digits, and if a certain pending data is "12345678", the pending data "12345678" corresponds to the "invoice number" and "telephone", that is, "12345678" may be the invoice number or the telephone.
Meanwhile, as described above, the correspondence between the pending data and the invoice field is one-to-one or one-to-many, and accordingly, the correspondence between the invoice field and the pending data may be one-to-one or one-to-many, that is, one invoice field may correspond to one pending data or may correspond to a plurality of pending data. For example, pending data having a data format of 8 digits includes "12345678" and "87654321", and the invoice field "invoice number" corresponds to two pending data "12345678" and "87654321", i.e., the invoice number may be 12345678 or 87654321.
It should be noted that "pending data" and "invoice data" in this embodiment are two designations of the same data at different time points. Specifically, the "pending data" refers to data that is initially identified from the invoice image, and at this time, it is not known which invoice field the "pending data" corresponds to; while "invoice data" refers to data that knows which invoice field corresponds to, i.e., if it is known which invoice field a certain data corresponds to, the data is referred to as "invoice data". For example, data "12345678" is identified from the invoice image, and this data "12345678" is referred to as "pending data" at this time, and if it is later determined that this data corresponds to the field "invoice number", this data "12345678" is referred to as "invoice data".
Step 1022: when the invoice field corresponds to one piece of undetermined data, the undetermined data is taken as invoice data corresponding to the invoice field.
Step 1023: when the invoice field corresponds to a plurality of pending data, selecting one pending data from the plurality of pending data as effective pending data, and using the effective pending data as invoice data corresponding to the invoice field.
In the embodiment of the application, if the invoice field and the undetermined data are in one-to-one correspondence, the undetermined data can be directly determined to be the invoice data corresponding to the invoice field. When the invoice field corresponds to a plurality of undetermined data, one undetermined data can be selected from the undetermined data to serve as invoice data corresponding to the invoice field.
Specifically, the step 1023 "selecting one pending data from the plurality of pending data as valid pending data" includes:
step A1: and marking the undetermined data with the determined corresponding relation with the invoice field as an identified state.
In the embodiment of the application, for part of the pending data, the invoice field corresponding to the pending data can be uniquely determined, and the pending data is the "pending data with the corresponding relation with the invoice field determined", which can be marked as the identified state. The pending data determined in step 1022 is "pending data determined to correspond to the invoice field". For example, the data format of the date of the invoice is "xx year xx month xx day" or "xx/xx/xx", and only one piece of pending data in the invoice image accords with the data format of the date of the invoice, the pending data is the invoice data corresponding to the invoice image, and the pending data can be marked as an identified state at this time.
Step A2: and removing the undetermined data in the identified state from the plurality of undetermined data corresponding to the invoice field, and selecting one undetermined data from the rest undetermined data as effective undetermined data.
In the embodiment of the application, when the invoice field corresponds to a plurality of pieces of pending data, if one or some pieces of pending data are in an identified state, the corresponding relation between the pending data and other invoice fields is determined, namely the pending data are not necessarily related to the current invoice field, at the moment, the pending data in the identified state can be removed, and one piece of pending data is selected from the remaining pending data to be used as effective pending data. And if the undetermined data in the identified state is removed, the invoice field corresponds to only one piece of the remaining undetermined data, and the remaining unique undetermined data is the valid undetermined data. In this embodiment, the accuracy in selecting valid pending data may be further improved by removing pending data in the identified state.
Optionally, since the invoice types are more, such as a common invoice, a value-added tax invoice, and the like, and each invoice also has a plurality of subclasses, such as the value-added tax invoice further comprises: the special invoice of the value-added tax, the common invoice of the value-added tax, the unified invoice of the motor vehicle sales, the unified invoice of the second hand vehicle sales, the special invoice of the value-added tax of the goods transportation industry, and the like, and the rating invoice, the general machine invoice, and the like, the size and typesetting of the face of each invoice are different, so that the identification of the invoice information in the invoice image is difficult. To overcome the above problems, in the embodiment of the present application, the step 1023 "selecting one pending data from the plurality of pending data as valid pending data" includes:
step B1: and taking the invoice field corresponding to the plurality of pending data as a target invoice field, and determining the position of each pending data corresponding to the target invoice field in the invoice image.
Step B2: and identifying invoice text corresponding to the target invoice field in the invoice image, and determining the position of the invoice text in the invoice image.
In the embodiment of the present application, if a certain invoice field corresponds to a plurality of pending data, for convenience of subsequent description, the invoice field is referred to as a target invoice field. Meanwhile, the undetermined data are data identified from the invoice image, the undetermined data are located at specific positions in the invoice image, and at the moment, the position of each undetermined data corresponding to the target invoice field in the invoice image can be determined. For example, a two-dimensional coordinate system is established for the invoice image, and at this time, the position of the pending data may be described in the form of coordinate points, for example, coordinates of a point in the lower left corner of the pending data are taken as the position of the pending data, or coordinates of a center point of the pending data are taken as the position of the pending data.
Further, in the above-described previous embodiment, when identifying invoice information, invoice data is first identified; since the invoice fields included in each invoice are known and even identical, the invoice fields in the invoice information identified in this embodiment are determined based on the identified invoice data and are not directly identified from the invoice image by the image identification technique. Likewise, the target invoice field is not directly identified from the invoice image. However, in this embodiment, in order to determine which invoice field corresponds to the invoice data correctly, the invoice text in the invoice image is identified based on the image identification technology, that is, the invoice field in text form is identified from the invoice image, and if the identified invoice text corresponds to the target invoice field, the position of the invoice text in the invoice image is determined. The determining the position of the invoice text in the invoice image is similar to the process of determining the position of the pending data in the invoice image, and is not described herein.
Step B3: and respectively determining the distance between the invoice text and the to-be-determined data according to the position of the invoice text in the invoice image and the position of each to-be-determined data in the invoice image, wherein the distance comprises a row distance and/or a column distance.
Step B4: and taking the pending data corresponding to the minimum interval in all the intervals as effective pending data.
In the embodiment of the application, the row spacing or the column spacing of the invoice text and the pending data in the invoice image can be determined according to the positions of the invoice text and the pending data in the invoice image. Specifically, referring to fig. 3, an expression form of an invoice image is shown, and in fig. 3, an invoice image containing four invoice texts and three pending data is illustrated as an example.
In the embodiment of the application, the line spacing between the invoice text and the undetermined data refers to the difference value of the position coordinates of the invoice text and the position coordinates of the undetermined data in the longitudinal axis direction; accordingly, the column spacing between the invoice text and the pending data refers to the difference in the horizontal axis direction between the position coordinates of the invoice text and the position coordinates of the pending data. The smaller the line spacing, the more likely the invoice text and pending data are in the same line; the smaller the column spacing, the more likely the invoice text is in the same column as the pending data. As shown in fig. 3, the line spacing between the invoice text a and the pending data 1 is small, and the column spacing between the invoice text d and the pending data 1, 2, 3 is small.
If the line space or the column space between the invoice text and the undetermined data is the smallest space between the invoice text and all other related undetermined data, the line space or the column space is the smallest space, undetermined data corresponding to the smallest space and the invoice text are most likely to be positioned in the same line or the same column, the undetermined data corresponding to the smallest space and the invoice text are in a corresponding relation, and at the moment, the undetermined data corresponding to the smallest space in all spaces can be used as effective undetermined data. As shown in fig. 3, if the invoice text corresponding to the target invoice field is the invoice text a, the pending data corresponding to the target invoice field includes pending data 1, pending data 2, and pending data 3; at this time, the minimum distance (minimum line distance) exists between the invoice text a and the undetermined data 1, and the undetermined data 1 can be used as valid undetermined data corresponding to the invoice text a.
In this embodiment, the invoice field corresponding to the pending data is determined by the minimum distance, so that different invoice styles can be adapted. Meanwhile, the invoice text and the pending data of the machine invoice can be different lines, even can be wrong lines (similar to the one shown in fig. 3), for example, the invoicing amount is printed to one line of the invoicing date; if the field corresponding to the pending data is determined directly according to the positional relationship between the invoice data and the invoice text, there may be erroneous judgment. In this embodiment, the invoice field to which the pending data may belong is first determined based on the data format of the pending data, and then corrected based on the invoice text, so that the invoice data corresponding to the target invoice field can be more accurately determined.
It should be noted that, in fig. 3, "invoice text d" corresponding to a plurality of pending data is generally "amount", and the corresponding invoice texts a and b are consumption items, such as "mobile phone" and "computer", respectively, and then the corresponding pending data 1 is the price of the mobile phone, and the pending data 2 is the price of the computer. However, the "issued amount" in the invoice field is actually the total amount, and if the invoice text c is the "total amount", the pending data 3 corresponding to the invoice text c is the invoice data corresponding to the issued amount. In this case, "invoice text c" may be used as invoice text corresponding to the "target invoice field", i.e. only the smallest line or column spacing of which pending data is considered to be in relation to invoice text c.
In addition, the invoice comprises a plurality of numbers, namely, the undetermined data, such as corresponding numbers of invoice codes, invoice numbers, invoice, check codes and the like; some invoices are directly written with the meaning of numerals, such as "invoice number: 12345678", and some are only digits, such as" 12345678", which do not write that this digit is an" invoice number ". That is, the invoice data in the invoice is relatively complete, while the invoice text may be incomplete. Alternatively, the same invoice field may have different representations in different invoices, i.e., different invoice texts, such as an invoice field of "invoice number", and the specific representation in the invoice image (i.e., the invoice text) may be "invoice number", or "No.. In this embodiment, the invoice data is first identified, and then the corresponding invoice field is determined, so that even if the invoice field in the invoice is incomplete, all invoice information can be clearly and completely determined.
Based on the above embodiment, the step 1023 "selecting one pending data from the plurality of pending data as valid pending data" includes:
step C1: and determining relevant identification information according to the undetermined data of which the corresponding relation with the invoice field is determined and the corresponding invoice field.
Step C2: and taking the invoice field corresponding to the plurality of undetermined data as a target invoice field, and taking undetermined data matched with the identification information as effective undetermined data when the invoice data corresponding to the target invoice field contains the identification information.
In the embodiment of the application, other unidentified invoice data can be identified according to the identified invoice data and the invoice field. Specifically, relevant identification information is determined according to the identified invoice data and invoice fields, and then the undetermined data corresponding to the target invoice fields are identified according to the identification information. The identification information may specifically be date information, a positional relationship, and the like. For example, the general invoicing date is special in format and easy to identify, and after invoice data corresponding to the invoicing date is identified, the year of the invoicing (namely identification information) can be determined; and bits 6-7 in the invoice code represent year codes (for example, 18 represents 2018), and at this time, the 6-7 bits of the pending data can be selected according to the year of the date of invoicing as valid pending data. Alternatively, the identification information may be a positional relationship. For example, the invoice code and the invoice number are arranged adjacently one above the other, and after the departure code is identified, the 8-digit number below the invoice code may be used as the invoice number.
Optionally, with the popularity of electronic invoices, the invoice is subject to repeated reimbursement. To solve this problem, in this embodiment, after "obtaining the verification result fed back by the invoice verification system" in step 103, the method further includes: and when the verification result is that verification passes, setting a reimbursement flag bit for the invoice information, and updating the reimbursement flag bit of the invoice information after the target invoice corresponding to the invoice information is reimbursed.
In the embodiment of the application, an invoice database can be established, invoice information identified according to the invoice image can be stored in the invoice database, and a reimbursement flag bit is set for each piece of invoice information in the database, for example, the reimbursement flag bit of unreported invoice information is 0, and the reimbursement flag bit of reimbursement invoice information is 1. And determining whether the invoice corresponding to the invoice information is reimbursed according to the reimbursement flag bit, so that repeated reimbursement can be avoided.
The invoice identification method provided by the embodiment of the application can automatically identify the invoice, identify the invoice field and the invoice data in the invoice image, automatically fill the invoice data into the invoice verification system to inquire and verify the authenticity of the invoice, and acquire the verification result fed back by the invoice verification system, thereby automatically determining the authenticity of the invoice. The method does not need manual verification, can save time and cost of staff greatly, and improves speed and efficiency of verifying invoice authenticity. Meanwhile, the processing amount of image recognition can be reduced and the recognition efficiency can be improved by determining the effective invoice field. The accuracy of the identification is further improved by firstly identifying invoice data; and firstly, the invoice field to which the pending data possibly belongs is judged based on the data format of the pending data, and then correction is carried out based on the invoice text, so that the invoice data corresponding to the target invoice field can be more accurately determined.
The flow of the invoice recognition method is described in detail above, the method can also be realized by a corresponding device, and the structure and the function of the device are described in detail below.
The device for identifying the invoice provided by the embodiment of the application, which is shown in fig. 4, comprises:
an image acquisition module 41, configured to acquire an invoice image of a target invoice;
an identification module 42, configured to identify invoice information in the invoice image, where the invoice information includes an invoice field and invoice data corresponding to the invoice field;
and the verification module 43 is configured to input corresponding invoice data in the invoice information into the text field of the invoice verification system according to a correspondence between the text field of the invoice verification system and the invoice field of the invoice information, and obtain a verification result fed back by the invoice verification system.
On the basis of the above embodiment, the identification module 42 includes:
the identification unit is used for identifying all pending data in the invoice image, determining one or more invoice fields corresponding to the pending data according to the data format of the pending data, and respectively determining one or more pending data corresponding to each invoice field;
the first determining unit is used for taking the undetermined data as invoice data corresponding to the invoice field when the invoice field corresponds to one undetermined data;
and the second determining unit is used for selecting one piece of pending data from the plurality of pieces of pending data as effective pending data when the invoice field corresponds to the plurality of pieces of pending data, and taking the effective pending data as invoice data corresponding to the invoice field.
On the basis of the above embodiment, the second determining unit selecting one pending data from the plurality of pending data as valid pending data includes:
marking the undetermined data with the determined corresponding relation with the invoice field as an identified state;
and rejecting the undetermined data in the identified state in the plurality of undetermined data corresponding to the invoice field, and selecting one undetermined data from the rest undetermined data as effective undetermined data.
On the basis of the above embodiment, the second determining unit selecting one pending data from the plurality of pending data as valid pending data includes:
taking invoice fields corresponding to a plurality of the undetermined data as target invoice fields, and determining the position of each undetermined data corresponding to the target invoice fields in the invoice image;
identifying an invoice text corresponding to the target invoice field in the invoice image, and determining the position of the invoice text in the invoice image;
respectively determining the distance between the invoice text and the undetermined data according to the position of the invoice text in the invoice image and the position of each undetermined data in the invoice image, wherein the distance comprises row distance and/or column distance;
and taking the pending data corresponding to the minimum interval in all the intervals as effective pending data.
On the basis of the above embodiment, the second determining unit selecting one pending data from the plurality of pending data as valid pending data includes:
determining relevant identification information according to the undetermined data of which the corresponding relation with the invoice field is determined and the corresponding invoice field;
and taking the invoice fields corresponding to the plurality of undetermined data as target invoice fields, and taking undetermined data matched with the identification information as effective undetermined data when the identification information is contained in the invoice data corresponding to the target invoice fields.
On the basis of the above embodiment, the identifying module 42 identifies invoice information in the invoice image including:
determining valid invoice fields required by an invoice verification system;
and identifying invoice information related to the effective invoice field in the invoice image, wherein the invoice field in the invoice information is the effective invoice field.
On the basis of the above embodiment, referring to fig. 5, the apparatus further includes a marking module 44;
after the verification module 43 obtains the verification result fed back by the invoice verification system, the marking module 44 is configured to: and setting a reimbursement flag bit for the invoice information when the verification result is verification passing, and updating the reimbursement flag bit of the invoice information after the target invoice corresponding to the invoice information is reimbursed.
The invoice identification device provided by the embodiment of the application can automatically identify the invoice, identify the invoice field and the invoice data in the invoice image, automatically fill the invoice data into the invoice verification system to inquire and verify the authenticity of the invoice, and acquire the verification result fed back by the invoice verification system, thereby automatically determining the authenticity of the invoice. The method does not need manual verification, can save time and cost of staff greatly, and improves speed and efficiency of verifying invoice authenticity. Meanwhile, the processing amount of image recognition can be reduced and the recognition efficiency can be improved by determining the effective invoice field. The accuracy of the identification is further improved by firstly identifying invoice data; and firstly, the invoice field to which the pending data possibly belongs is judged based on the data format of the pending data, and then correction is carried out based on the invoice text, so that the invoice data corresponding to the target invoice field can be more accurately determined.
The embodiment of the application also provides a computer readable storage medium, which stores computer readable instructions containing a program for executing the invoice recognition method described above, and the computer readable instructions can execute the method in any of the method embodiments described above.
The computer-readable storage medium may be any available medium or data storage device that can be accessed by a computer, including, but not limited to, magnetic storage (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical storage (e.g., CD, DVD, BD, HVD, etc.), semiconductor storage (e.g., ROM, EPROM, EEPROM, nonvolatile storage (NAND FLASH), solid State Disk (SSD)), etc.
Fig. 6 shows a block diagram of a computer device according to another embodiment of the application. The computer device 1100 may be a host server with computing capabilities, a personal computer PC, or a portable computer or terminal that can be carried, etc. The specific embodiments of the present application are not limited to the specific implementation of a computer device.
The computer device 1100 includes at least one processor 1110, a communication interface (Communications Interface) 1120, a memory array 1130, and a bus 1140. Wherein processor 1110, communication interface 1120, and memory 1130 communicate with each other through bus 1140.
The communication interface 1120 is used to communicate with network elements including, for example, virtual machine management centers, shared storage, and the like.
The processor 1110 is used to execute programs. The processor 1110 may be a central processing unit CPU, or an application specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present application.
Memory 1130 is used to store computer-readable instructions. Memory 1130 may include high-speed RAM memory or non-volatile memory (nonvolatile memory), such as at least one magnetic disk memory. Memory 1130 may also be a memory array. Memory 1130 may also be partitioned and the blocks may be combined into virtual volumes according to certain rules. The instructions stored in memory 1130 may be executable by processor 1110 to enable processor 1110 to perform the methods of any of the method embodiments described above.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (8)

1. A method of invoice identification, comprising:
acquiring an invoice image of a target invoice;
identifying invoice information in the invoice image, wherein the invoice information comprises an invoice field and invoice data corresponding to the invoice field;
inputting corresponding invoice data in invoice information into a text field of an invoice verification system according to a corresponding relation between the text field of the invoice verification system and an invoice field of the invoice information, and acquiring a verification result fed back by the invoice verification system;
wherein, the identifying invoice information in the invoice image includes:
identifying all pending data in the invoice image, determining one or more invoice fields corresponding to the pending data according to the data format of the pending data, and respectively determining one or more pending data corresponding to each invoice field;
when the invoice field corresponds to one piece of undetermined data, the undetermined data is used as invoice data corresponding to the invoice field;
when the invoice field corresponds to a plurality of the pending data, taking the invoice field corresponding to the plurality of the pending data as a target invoice field, and determining the position of each of the pending data corresponding to the target invoice field in the invoice image; identifying an invoice text corresponding to the target invoice field in the invoice image, and determining the position of the invoice text in the invoice image; respectively determining the distance between the invoice text and the undetermined data according to the position of the invoice text in the invoice image and the position of each undetermined data in the invoice image, wherein the distance comprises row distance and/or column distance; taking the pending data corresponding to the minimum interval in all the intervals as effective pending data; and taking the valid pending data as invoice data corresponding to the invoice field.
2. The method according to claim 1, wherein the method further comprises:
marking the undetermined data with the determined corresponding relation with the invoice field as an identified state;
and rejecting the undetermined data in the identified state in the plurality of undetermined data corresponding to the invoice field, and selecting one undetermined data from the rest undetermined data as effective undetermined data.
3. The method according to claim 1, wherein the method further comprises:
determining relevant identification information according to the undetermined data of which the corresponding relation with the invoice field is determined and the corresponding invoice field;
and taking the invoice fields corresponding to the plurality of undetermined data as target invoice fields, and taking undetermined data matched with the identification information as effective undetermined data when the identification information is contained in the invoice data corresponding to the target invoice fields.
4. The method of claim 1, wherein the identifying invoice information in the invoice image comprises:
determining valid invoice fields required by an invoice verification system;
and identifying invoice information related to the effective invoice field in the invoice image, wherein the invoice field in the invoice information is the effective invoice field.
5. The method of any one of claims 1-4, further comprising, after the obtaining the verification result fed back by the invoice verification system:
and setting a reimbursement flag bit for the invoice information when the verification result is verification passing, and updating the reimbursement flag bit of the invoice information after the target invoice corresponding to the invoice information is reimbursed.
6. An apparatus for invoice recognition, comprising:
the image acquisition module is used for acquiring an invoice image of the target invoice;
the identification module is used for identifying invoice information in the invoice image, wherein the invoice information comprises an invoice field and invoice data corresponding to the invoice field;
the verification module is used for inputting corresponding invoice data in the invoice information into the text field of the invoice verification system according to the corresponding relation between the text field of the invoice verification system and the invoice field of the invoice information, and acquiring a verification result fed back by the invoice verification system;
the identification module is further configured to: identifying all pending data in the invoice image, determining one or more invoice fields corresponding to the pending data according to the data format of the pending data, and respectively determining one or more pending data corresponding to each invoice field;
when the invoice field corresponds to one piece of undetermined data, the undetermined data is used as invoice data corresponding to the invoice field;
when the invoice field corresponds to a plurality of the pending data, taking the invoice field corresponding to the plurality of the pending data as a target invoice field, and determining the position of each of the pending data corresponding to the target invoice field in the invoice image; identifying an invoice text corresponding to the target invoice field in the invoice image, and determining the position of the invoice text in the invoice image; respectively determining the distance between the invoice text and the undetermined data according to the position of the invoice text in the invoice image and the position of each undetermined data in the invoice image, wherein the distance comprises row distance and/or column distance; taking the pending data corresponding to the minimum interval in all the intervals as effective pending data; and taking the valid pending data as invoice data corresponding to the invoice field.
7. A computer readable storage medium having stored thereon computer readable instructions, which when executed by a processor, implement the steps of the method of any of claims 1 to 5.
8. A computer device comprising a memory storing computer readable instructions and a processor, wherein the processor when executing the computer readable instructions performs the steps of the method of any one of claims 1 to 5.
CN201910469612.8A 2019-05-31 2019-05-31 Invoice identification method and device, storage medium and computer equipment Active CN110263239B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910469612.8A CN110263239B (en) 2019-05-31 2019-05-31 Invoice identification method and device, storage medium and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910469612.8A CN110263239B (en) 2019-05-31 2019-05-31 Invoice identification method and device, storage medium and computer equipment

Publications (2)

Publication Number Publication Date
CN110263239A CN110263239A (en) 2019-09-20
CN110263239B true CN110263239B (en) 2023-08-22

Family

ID=67916338

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910469612.8A Active CN110263239B (en) 2019-05-31 2019-05-31 Invoice identification method and device, storage medium and computer equipment

Country Status (1)

Country Link
CN (1) CN110263239B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111104844B (en) * 2019-10-12 2023-11-14 中国平安财产保险股份有限公司 Multi-invoice information input method and device, electronic equipment and storage medium
CN111932766A (en) * 2020-08-11 2020-11-13 上海眼控科技股份有限公司 Invoice verification method and device, computer equipment and readable storage medium
CN112085885A (en) * 2020-09-24 2020-12-15 理光图像技术(上海)有限公司 Ticket recognition device and ticket information management system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013111782A (en) * 2011-11-25 2013-06-10 National Printing Bureau Printed matter capable of distinguishing authenticity
CN105046553A (en) * 2015-07-09 2015-11-11 胡昭 Cloud intelligent invoice recognition inspection system and method based on mobile phone
CN105528604A (en) * 2016-01-31 2016-04-27 华南理工大学 Bill automatic identification and processing system based on OCR
CN108122139A (en) * 2016-11-29 2018-06-05 阿里巴巴集团控股有限公司 A kind of invoice data processing method, equipment and system
CN109472918A (en) * 2018-10-12 2019-03-15 深圳壹账通智能科技有限公司 Invoice validation method, financing checking method, device, equipment and medium
CN109800747A (en) * 2018-12-14 2019-05-24 平安科技(深圳)有限公司 Medical invoice recognition methods, user equipment, storage medium and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013111782A (en) * 2011-11-25 2013-06-10 National Printing Bureau Printed matter capable of distinguishing authenticity
CN105046553A (en) * 2015-07-09 2015-11-11 胡昭 Cloud intelligent invoice recognition inspection system and method based on mobile phone
CN105528604A (en) * 2016-01-31 2016-04-27 华南理工大学 Bill automatic identification and processing system based on OCR
CN108122139A (en) * 2016-11-29 2018-06-05 阿里巴巴集团控股有限公司 A kind of invoice data processing method, equipment and system
CN109472918A (en) * 2018-10-12 2019-03-15 深圳壹账通智能科技有限公司 Invoice validation method, financing checking method, device, equipment and medium
CN109800747A (en) * 2018-12-14 2019-05-24 平安科技(深圳)有限公司 Medical invoice recognition methods, user equipment, storage medium and device

Also Published As

Publication number Publication date
CN110263239A (en) 2019-09-20

Similar Documents

Publication Publication Date Title
CN110188755B (en) Image recognition method and device and computer readable storage medium
CN110263239B (en) Invoice identification method and device, storage medium and computer equipment
US11062132B2 (en) System and method for identification of missing data elements in electronic documents
US20150039707A1 (en) Document processing
CN110675546B (en) Invoice picture identification and verification method, system, equipment and readable storage medium
WO2020233402A1 (en) Accounts payable order validation method, apparatus and device, and storage medium
WO2022041834A1 (en) Transaction data processing method and apparatus
CN111914729A (en) Voucher association method and device, computer equipment and storage medium
CN112465601A (en) Electronic order generation method and device and storage medium
CN109886076B (en) Invoice storage method
CN112270580B (en) Invoice issuing method, invoice issuing device, invoice issuing equipment and storage medium
CN109214362A (en) Bill processing method and relevant device
CN109324963B (en) Method for automatically testing profit result and terminal equipment
CN112579608A (en) Case data query method, system, device and computer readable storage medium
CN111768565A (en) Method for identifying and post-processing invoice codes in value-added tax invoices
CN111159183A (en) Report generation method, electronic device and computer readable storage medium
JP2017033351A (en) Enterprise information matching apparatus and program for enterprise information matching
CN111292068B (en) Contract information auditing method and device, electronic equipment and storage medium
CN114169306A (en) Method, device and equipment for generating electronic receipt and readable storage medium
CN114897590A (en) Form checking method and device, computer equipment and storage medium
CN107430746B (en) Credit transaction management system and method thereof
CN111340517A (en) Method, system and related equipment for rapidly inquiring authenticity of invoice
CN112529700A (en) Business handling and auditing method, system, equipment and readable storage medium
CN111445330A (en) Account checking method and device
JP6224669B2 (en) Payment application system, payment application method, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant