CN109886076B - Invoice storage method - Google Patents

Invoice storage method Download PDF

Info

Publication number
CN109886076B
CN109886076B CN201811624838.2A CN201811624838A CN109886076B CN 109886076 B CN109886076 B CN 109886076B CN 201811624838 A CN201811624838 A CN 201811624838A CN 109886076 B CN109886076 B CN 109886076B
Authority
CN
China
Prior art keywords
invoice
information
code
image
added
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811624838.2A
Other languages
Chinese (zh)
Other versions
CN109886076A (en
Inventor
赵成军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aisino Corp
Original Assignee
Aisino Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aisino Corp filed Critical Aisino Corp
Priority to CN201811624838.2A priority Critical patent/CN109886076B/en
Publication of CN109886076A publication Critical patent/CN109886076A/en
Application granted granted Critical
Publication of CN109886076B publication Critical patent/CN109886076B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

An invoice storage method, comprising: step 1: acquiring an invoice image, and identifying invoice information contained in the invoice image; step 2: and coding the identified invoice information, and storing the codes in an invoice library. The invention identifies and uniformly codes the existing invoice so as to efficiently store the invoice information; before adding new invoice information into the invoice database, verification is firstly carried out in the invoice database and/or the national tax bureau, so that the authenticity and the validity of the invoice information are ensured.

Description

Invoice storage method
Technical Field
The invention relates to the technical field of computers, in particular to an invoice storage method.
Background
In supply chain finance, receivables financing is an important form of financing. The receivable financing depends on the specific transaction of the supply and demand parties and takes the trade authenticity as the premise. However, because there are many participating entities in the supply chain, the traditional supply chain financial business generally has the problems of asymmetric information, opaque credit and the like, and the financial institution faces the risks that the authenticity of the trade background is difficult to define, accounts receivable are repeatedly financed and the like. The invoice serves as a certificate of transaction between enterprises and has an important reference value on the definition of trade authenticity. Therefore, how to realize the quick and accurate identification and effective storage of the invoice is very important for the reliable operation of the supply chain financial service. In the prior art, in the aspect of invoice identification, storage and retrieval processes, the defects of low identification and storage efficiency and success rate, high storage cost, inflexible information retrieval and the like exist, and the efficient and reliable operation of supply chain financial services is influenced.
Disclosure of Invention
The invention aims to provide an invoice storage method which is beneficial to efficient and reliable storage and subsequent inquiry and verification of invoices.
The invention provides an invoice storage method, which comprises the following steps:
step 1: acquiring an invoice image, and identifying invoice information contained in the invoice image;
step 2: the identified invoice information is encoded and the code is stored in an invoice repository.
Preferably, the invoice information contained in the invoice image is identified according to the following steps:
step 101: preprocessing the invoice image;
step 102: respectively positioning a seal part, a printing information part and a fixed information part in the preprocessed invoice image;
step 103: associating the fixed information part with the corresponding machine printing information part, and respectively extracting images of the fixed information part and the corresponding machine printing information part;
step 104: and respectively identifying the character information of the seal part, the fixed information part and the associated machine printing information part by using an optical character identification method.
Preferably, the preprocessing the invoice image comprises at least one of the following steps:
performing edge detection on the invoice image to obtain a target area of the invoice image;
performing tilt correction and scaling on the invoice image;
and denoising the invoice image.
Preferably, the identified invoice information is encoded according to the following steps:
step 201: defining invoice information corresponding to each field of the code;
step 202: and filling the identified invoice information into the corresponding code field to obtain the corresponding code.
Preferably, the invoice information includes at least one of invoice type, invoice code, invoice number, invoicing date, validation code, sales unit information, purchase unit information, goods or taxable labor information.
Preferably, the invoice storage method further comprises:
and step 3: identifying invoice information of the invoice to be added, and verifying the invoice information of the invoice to be added through the invoice database;
and 4, step 4: and coding the invoice information of the invoice to be added which passes the verification, and storing the code in the invoice library.
Preferably, the verifying the invoice information of the invoice to be added through the invoice repository comprises:
step 301: extracting the invoice code and the invoice number of the invoice to be added according to the invoice information of the invoice to be added, and judging the invoice category of the invoice to be added;
step 302: searching all codes of corresponding categories in the invoice database according to the invoice categories of the invoices to be added;
step 303: comparing the invoice code and the invoice number of the invoice to be added with the retrieved code respectively, if the invoice code and the invoice number of the invoice to be added are the same as the invoice code and the invoice number corresponding to the retrieved code respectively, judging that the invoice to be added is an abnormal invoice, and returning abnormal information; otherwise, the verification is passed.
Preferably, the invoice storage method further comprises the following steps after the step 1:
and calling an API (application program interface) provided by the State tax administration, verifying the identified invoice information and obtaining the corresponding invoice state.
Preferably, the invoice repository is a distributed file system (HDFS), and the storing the code in the invoice repository includes:
creating a folder for storing the code in the HDFS, and storing the code in the folder.
Preferably, the code is stored in the folder by an invoice data structure, and the invoice data structure is used for storing the following invoice information corresponding to the code: the invoice data structure is also used for storing a timestamp corresponding to the code, and the timestamp represents the time when the code is written into the invoice data structure.
Preferably, the invoice storage method further comprises:
performing an invoice query in the HDFS by:
retrieving in the HDFS according to the invoice code and the invoice number of the invoice to be inquired to obtain at least one piece of corresponding invoice information;
selecting invoice information with the timestamp closest to the current time from the at least one piece of retrieved invoice information as valid invoice information; and
and returning the invoice status of the effective invoice information.
The invention has the beneficial effects that: identifying and uniformly coding the existing invoice so as to efficiently store the invoice information; before adding new invoice information into the invoice library, firstly verifying the invoice library and/or the national tax bureau so as to ensure the authenticity and validity of the invoice information; and invoice information storage is carried out based on the HDFS, so that massive invoice information storage can be realized.
The method of the present invention has other features and advantages which will be apparent from or are set forth in detail in the accompanying drawings and the following detailed description, which are incorporated herein, and which together serve to explain certain principles of the invention.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent by describing in more detail exemplary embodiments thereof with reference to the attached drawings, in which like reference numerals generally represent like parts.
Fig. 1 shows a flowchart of an invoice storage method according to an exemplary embodiment of the present invention.
Fig. 2 shows an example of an invoice image of a value-added tax-specific invoice according to an exemplary embodiment of the present invention.
Fig. 3 shows the basic architecture of HDFS.
Description of reference numerals:
1. a name node; 2. a data node; 3. a data block; 4. and (4) a client.
Detailed Description
The invention will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
Fig. 1 shows a flowchart of an invoice validation method according to an exemplary embodiment of the present invention, and as shown in fig. 1, the invoice validation method according to an embodiment of the present invention includes the following steps:
step 1: and acquiring an invoice image, and identifying invoice information contained in the invoice image. The step 1 specifically comprises the following steps:
step 101: and obtaining an invoice image, and preprocessing the invoice image.
In particular, the invoice image may be in various colors. For example, in some invoice images, red represents a chapter, light yellow represents a fixed text area, and black or blue characters are variable specific information. In addition, the face of the ticket generally has a table border, a boundary, and the like. The form frame, boundary line and the like on the bill surface can be used as the basis of bill inclination correction or area positioning. In addition, the location of the fixed information on the invoice is generally fixed, and the printed portion may not be fixed. However, in general, the relative positional offset of the variable partial character from its corresponding fixed character is fixed.
Based on the characteristics of the invoice, the invoice image is preprocessed, and the method mainly comprises the following steps:
(1) and carrying out edge detection on the invoice image to obtain a target area of the invoice image. And identifying the rectangular area where the invoice is located as a target area of the invoice image.
(2) And (5) performing inclination correction and scaling on the invoice image. Since the invoice image is acquired by manual scanning or photographing, there may be a case where the acquired invoice image is skewed or reversed. The inclination correction of the invoice image is the premise and the basis for positioning the invoice surface information in the invoice image. In the embodiment of the invention, in the process of preprocessing the invoice image, the invoice image is subjected to inclination correction so as to improve the positioning accuracy of the invoice surface information in the invoice image and further improve the invoice identification accuracy. Specifically, the image correction processing technology in the existing image processing technology can be adopted to perform the inclination correction on the invoice image, and the embodiment of the invention is not described again. In addition, the invoice image may be scaled to a reduced standard image.
(3) And denoising the invoice image. Since the invoice image is generally acquired by scanning or shooting except for electronic generation, certain noise such as blurring, shading and the like is inevitable. Therefore, an image filtering method in the image processing technology can be adopted to carry out denoising processing on the invoice image.
It should be noted that, in the invoice image preprocessing described in the above steps (1) to (3), each step is beneficial to improve the accuracy of invoice image recognition, and generally improve the efficiency of invoice image recognition. In an embodiment of the present invention, at least one of the above steps may be used to pre-process the invoice image. Preferably, the invoice image may also be preprocessed in a manner of combining steps (1) to (3), which is not limited in this embodiment of the present invention.
Step 102: and respectively positioning a seal part, a printing information part and a fixed information part in the preprocessed invoice image.
In one embodiment, the face information of the invoice image can be positioned in three parts, namely a seal part, a fixed information part and a machine printing information part.
For the positioning of the stamp part, a stamp outline can be preset, and the range of the stamp outline is smaller than the outline range of the invoice image. The preset stamp outline can be a rectangle, all the outlines identified in the invoice image are scanned through the rectangle, if one outline in the invoice image can be contained in the stamp outline, the area corresponding to the outline is judged to be a stamp part, and character information in the outline is stamp information.
Secondly, the fixed information part and the machine printing information part in the invoice image are positioned based on the prior knowledge and the RGB characteristics. Specifically, the machine printing information part in the invoice image is different from the RGB characteristics of the fixed information part, the R value in the RGB characteristics of the fixed information part is the largest, and the G value is close to the B value; and the B value of the machine printing information part is the maximum, and the G value is close to the R value. By using the prior knowledge of the type and the like and combining with the RGB characteristics of all parts of the invoice image for analysis, the fixed information part and the machine printing information part in the invoice image can be positioned. Specifically, RGB (red, green and blue) characteristics of each part of the invoice image are analyzed, and the part, with the maximum R value and the difference between the G value and the B value smaller than a first threshold value, of the RGB characteristics is positioned as a fixed information part; and positioning the image part with the maximum B value and the difference between the G value and the R value smaller than a second threshold value in the RGB characteristics as a machine printing information part.
After each portion in the invoice image is located, the invoice image can be grayed. Because the color invoice image contains a large amount of color information, huge storage space is occupied. In order to reduce the consumption of the invoice image to the storage resources, image graying processing is carried out. Optionally, the invoice image may be grayed by a plurality of methods, such as a maximum value method, an average value method, a weighted average method, and the like in the image processing technology, which is easily understood by those skilled in the art and is not described herein again.
Step 103: and associating the fixed information part with the corresponding machine printing information part, and respectively extracting images of the fixed information part and the corresponding machine printing information part.
And associating the fixed information part of the invoice image with the corresponding machine printing information part. Optionally, the fixed information part is associated with the corresponding machine-printing information part according to a principle of closest distance. Then, image extraction is respectively carried out on the fixed information part and the corresponding machine printing information part.
Optionally, after the fixed information portion and the machine-printed information portion are positioned, the fixed information portion and the machine-printed information portion corresponding to the fixed information portion may be respectively taken as a whole to perform image extraction. For example, for the fixed information part "date of invoicing", an image area including four characters of "date of invoicing" may be extracted according to its positioning in the invoice image. Alternatively, image extraction may be performed for each character of the fixed information portion or the machine-type information portion. Optionally, the gray threshold value is set in advance according to the gray distribution rule of the character part and the background part in the invoice image. And respectively projecting the images of the fixed information part and the machine-printed information part in the vertical direction by adopting a projection characteristic-based method. The part of the projection part with the gray value higher than the gray threshold value is judged as a character part, and the part lower than the gray threshold value is set as a background part. Thereby, the fixed information part and the machine-type information part can be divided into a plurality of independent character images respectively.
Step 104: and respectively identifying the character information of the seal part, the fixed information part and the associated machine-typing information part by using an optical character identification method.
After the image extraction, each part of the invoice image can be identified by using Optical Character Recognition (OCR), and Character information corresponding to each part is obtained.
Specifically, for the value-added tax special invoice and the value-added tax general invoice, a standard image is respectively corresponding to each invoice, and therefore, character information corresponding to the fixed information part thereof is also fixed. The character information corresponding to the fixed information part of the value-added tax special invoice comprises the following steps: invoice code, invoice number, invoicing date and amount. Fig. 2 shows an example of an image of a value-added tax-specific invoice, and as can be seen from fig. 2, the character information of the fixed information part and the associated machine-type information part is shown in the following table:
TABLE 1 value added tax-specific invoice character information example
Character information corresponding to fixed information part Associating character information corresponding to the typing information part
3200012143 (invoice code)
No. (invoice number) 0001704
Date of billing 2010-6-25
Amount of money 49003.20
Similarly, the character information corresponding to the fixed information part of the value-added tax general invoice includes: invoice code, invoice number, invoicing date, verification code and the like.
For the value-added tax special invoice and the value-added tax general invoice, the following information (optional information) can be identified: sales unit information (including name, taxpayer identification number, address, phone, account opening row and account number); the information of purchasing units (including name, taxpayer identification number, address, telephone, account opening row and account number); goods or taxable labor information (including name, specification, model, unit, quantity, unit price, tax rate, tax amount), etc.
Further, when the invoice image has two-dimensional code information, the two-dimensional code information of the invoice is collected through an OCR (optical character recognition), and the two-dimensional code information is converted into character information.
Step 2: the identified invoice information is encoded and the code is stored.
In step 1, invoice information contained in the invoice image is identified, and particularly, character information of a fixed information part and an associated machine-type information part in the identified invoice information is identified. In step 2, the code may be encoded according to the identified invoice information, and the corresponding code may be stored in an invoice repository.
Optionally, the identified invoice information is encoded as follows:
step 201: and defining invoice information corresponding to each field of the code. For example, the invoice type is represented by a first field, the invoice code is represented by a second field, the invoice number is represented by a third field, the invoicing date is represented by a fourth field, the amount is represented by a fifth field, and the invoice two-dimensional code is represented by a sixth field. The length of each field can be determined according to the invoice information corresponding to the field, for example, the first field represents the invoice type, the length can be 2 bytes, the second field represents the invoice code, and the length can be 4 bytes.
Step 202: and filling the identified invoice information into the corresponding code field to obtain the corresponding code. In particular, the invoice type can be represented by means of flag bits, for example, 01 represents a value-added tax special invoice, and 02 represents a value-added tax general invoice. And filling the identified invoice information in the corresponding code fields in sequence to obtain the corresponding codes. In particular, if the invoice information corresponding to a certain field fails to be identified or the invoice does not contain corresponding information, the field is empty.
Finally, the obtained code is stored in an invoice bank.
According to another embodiment of the present invention, new invoices may be added to the invoice repository. Specifically, after the above steps 1 and 2, the following steps 3 and 4 may be further performed, thereby adding a new invoice to the invoice repository:
and step 3: identifying the invoice information of the invoice to be added, and verifying the invoice information of the invoice to be added through an invoice library.
The invoice information of the invoice to be added can be identified through the above steps 101 to 104, and is not described in detail herein. Particularly, if the invoice image to be added with the invoice contains two-dimensional code information, the two-dimensional code information is analyzed, and corresponding invoice information is obtained.
Then, the invoice information of the invoice to be added is verified through an invoice library according to the following steps:
step 301: and extracting the invoice code and the invoice number of the invoice to be added according to the invoice information of the invoice to be added, and judging the invoice category of the invoice to be added. The invoice categories may include value-added tax general invoices, value-added tax special invoices and the like.
Step 302: and searching all codes of corresponding categories in an invoice database according to the invoice categories of the invoices to be added. The first field of the code represents the invoice category, so all codes of the corresponding category can be retrieved in the invoice repository through the first field. For example, if the invoice category to which the invoice is to be added is a value-added tax common invoice, all codes with the first field of 02 are retrieved in the invoice database.
Step 303: comparing the invoice code and the invoice number of the invoice to be added with the retrieved code respectively, if the invoice code and the invoice number of the invoice to be added are the same as the invoice code and the invoice number corresponding to the retrieved code respectively, judging that the invoice to be added is an abnormal invoice, returning abnormal information, and failing to pass verification; otherwise, the verification is passed.
Because the invoice code and the invoice number can uniquely identify one invoice, if the invoice code and the invoice number of the two invoices are completely consistent, at least one invoice of the two invoices is a false invoice. Therefore, in this step, if the invoice code and the invoice number of the invoice to be added are respectively the same as the invoice code and the invoice number corresponding to the code, the invoice to be added is judged to be an abnormal invoice, and abnormal information is returned, and the invoice cannot pass the verification.
And 4, step 4: and coding the invoice information of the invoice to be added which passes the verification, and storing the code in an invoice library. The method of encoding the invoice to be added is as described above in step 202.
According to another embodiment of the present invention, in order to ensure that the invoice information in the invoice database is true and valid, before the invoice information is stored in the invoice database, the following steps are further performed: and calling an API (application program interface) provided by the State tax administration, verifying the identified invoice information and obtaining the corresponding invoice state.
And calling an API (application program interface) provided by a national tax bureau to verify according to the identified invoice information so as to obtain the corresponding invoice state. Specifically, the invoice status includes that the invoice status is normal (the invoice verification information input by the taxpayer is consistent with the electronic information of the tax authority, and the invoice is in a normal status), the invoice is invalid (the invoice verification information input by the taxpayer is consistent with the electronic information of the tax authority, but the invoice is already subjected to the invalidation treatment by the invoice issuing party, and the invoice can not be used as a financial reimbursement certificate), the invoice is inconsistent (at least one item of the invoice information input by the taxpayer is inconsistent with the electronic information of the tax authority, if the input checking item is confirmed to be consistent with the invoice surface, please contact and check the tax authority with the invoicing party or the invoicing party to manage the tax authority), and the related invoice cannot be retrieved in the electronic information of the tax authority due to false invoice or problems of offline self-invoicing of the invoicing party, synchronization lag of invoice electronic data (usually at least 1 day), wrong input of a checker and the like.
And if the invoice state is 'normal invoice state', encoding the identified invoice information, and storing the code in an invoice database.
According to another embodiment of the present invention, in order to solve the problem of insufficient storage space, the invoice code storage may be performed based on HDFS (Hadoop Distributed File System).
Specifically, the invoice repository is a distributed file system (HDFS), a folder for storing codes is created in the HDFS, and the codes are stored in the folder.
Fig. 3 shows the basic architecture of HDFS. The HDFS architecture is subject to a master-slave mode, and includes a name node (namenode) and a plurality of data nodes (datanodes). The namenodes are used to manipulate file or directory operations of the file namespace, such as open, close, rename, and the like, while determining the mapping of blocks to data nodes. The data node is responsible for reading and writing requests from the file system client, and simultaneously executes the creation of the block, the block copying instruction from the name node and the like.
In the embodiment of the invention, the HDFS is used as an invoice database, and the invoice codes are stored in the HDFS. Specifically, the client sends a request for writing a file to the name node, requesting to create a folder for storing the invoice code.
Aiming at a file writing request, after a name node verifies a request of a client, a piece of metadata information about a file is added in a metadata structure maintained by the name node, and information capable of creating a new file and the metadata information of the file are sent to the client, and the path information of a data node used for storing invoice information by the client is carried. And after receiving the information, the client creates a folder in the corresponding data node, and writes an invoice code in the corresponding data node.
In addition, the HDFS has many advantages, such as supporting high fault tolerance, being deployable on cheap machines, having the advantages of low cost, high data consistency, and the like. However, the design principle of the HDFS is write-once and read-many, and does not support data modification. The lack of support for data modification ensures data consistency, but creates difficulties in many situations where data modification is required. In this case, the invoice information that needs to be modified mainly refers to invoice information that needs to modify the invoice status, and cannot be directly modified by the HDFS, and only data can be added.
Specifically, to accommodate the storage of HDFS, the code is stored in a folder by an invoice data structure. The invoice data structure is shown in table 2 below:
table 2 invoice data Structure
Invoice status Time stamp Invoice type Invoice code Invoice number Other invoice information … …
As shown in table 2, the invoice data structure is used to store the following invoice information corresponding to the code: the invoice data structure is also used for storing a timestamp corresponding to the code, and the timestamp represents the time when the code is written into the invoice data structure. The byte length corresponding to each portion in the invoice data structure is fixed (e.g., invoice status is represented by 2 bytes, timestamp is represented by 4 bytes, etc.).
In this embodiment, if the invoice information code is already entered into the invoice repository, the information cannot be modified. At this moment, according to the invoice data structure, the modification of the invoice information code can be realized.
For example, a code corresponding to invoice information in a normal state (hereinafter referred to as first invoice information) is entered into an invoice library, and the HDFS stores the invoice state (normal state), a timestamp (entry time), an invoice type, an invoice code, an invoice number and other necessary information of the first invoice information (for example, for a value-added tax special invoice, at least information such as a tax free amount and an invoice date of the invoice is also included).
Then the invoice is invalidated for some reason, or the invoice in abnormal state is wrongly entered as the invoice in normal state, at this moment, the modification process of the first invoice information is as follows: the HDFS is used to additionally record the second invoice information, which includes information such as invoice status (invalid or other abnormal status), timestamp (additionally recorded time), invoice code and invoice number (consistent with the first invoice information).
When the invoice needs to be inquired, the corresponding invoice information can be obtained by searching in the HDFS storage system through the invoice code and the invoice number. If more than one piece of invoice information is inquired about a group of invoice codes and invoice numbers, according to the timestamp in the invoice information, the invoice information with the timestamp closest to the inquiry time is used as effective invoice information, the state corresponding to the effective invoice information is used as the state of the invoice, and the invoice information is returned to the user.
Further, in the embodiment, a folder corresponding to each client may be created for each client, so as to store the invoice information of the client. And creating a unique mapping relation between the unique client identifier and the folder corresponding to the client in the description file. When the invoice is required to be inquired, the client information (the unique client identifier) can be acquired firstly, the folder for storing the invoice information of the client is acquired according to the unique client identifier, and then the invoice of the client is inquired, so that the inquiry efficiency is further improved.
According to the embodiment, diversified and flexible combination retrieval can be realized according to application scenes, various invoice information needing to be inquired can be quickly acquired through specific fields or combinations in the invoice information, and the flexibility and the efficiency of invoice searching are improved. For example, the complete invoice information is inquired according to the stored invoice code and invoice number; or searching all invoice information issued by the seller according to the stored seller information field; or searching all invoice information of the buyer according to the stored buyer information field; or searching an invoice set in a certain time period according to the stored invoicing time, and the like. Furthermore, a field used for representing the invoice state and a timestamp field used for representing the input invoice time are created for each piece of invoice information, so that the problem caused by incapability of modifying data in the HDFS is solved, and the accurate invoice state can be obtained.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments.

Claims (10)

1. An invoice storage method, comprising:
step 1: acquiring an invoice image, and identifying invoice information contained in the invoice image;
step 2: encoding the identified invoice information and storing the encoding in an invoice repository;
identifying invoice information contained in the invoice image according to the following steps:
step 101: preprocessing the invoice image;
step 102: respectively positioning a seal part, a printing information part and a fixed information part in the preprocessed invoice image;
step 103: associating the fixed information part with the corresponding machine-typing information part, and respectively extracting images of each character of the fixed information part and the corresponding machine-typing information part;
step 104: respectively identifying the character information of the seal part, the fixed information part and the associated machine printing information part by using an optical character identification method;
after the stamp part is positioned, the machine printing information part and the fixed information part are positioned according to the following steps:
acquiring RGB (red, green and blue) characteristics of all parts of the invoice image except the seal part;
locating a portion of the invoice image for which the RGB features satisfy a maximum R value and a difference between a G value and a B value is less than a predetermined first threshold as a fixed information portion;
positioning a part of the invoice image, of which the RGB characteristics meet the condition that the B value is maximum and the difference value between the G value and the R value is smaller than a preset second threshold value, as a machine printing information part;
the image extraction is respectively carried out on each character of the fixed information part and the corresponding machine-typing information part, and the image extraction specifically comprises the following steps:
setting a gray threshold value in advance according to a gray distribution rule of a character part and a background part in an invoice image;
respectively projecting images of the fixed information part and the machine-printed information part in the vertical direction by adopting a method based on projection characteristics;
judging the part of the projection part with the gray value higher than the gray threshold value as a character part, and setting the part of the projection part with the gray value lower than the gray threshold value as a background part;
the fixed information portion and the machine-type information portion are divided into a plurality of independent character images, respectively.
2. The invoice storage method according to claim 1, wherein the preprocessing of invoice images comprises at least one of the following steps:
performing edge detection on the invoice image to obtain a target area of the invoice image;
performing tilt correction and scaling on the invoice image;
and denoising the invoice image.
3. The invoice storage method according to claim 1, characterized in that the identified invoice information is encoded according to the following steps:
step 201: defining invoice information corresponding to each field of the code;
step 202: and filling the identified invoice information into the corresponding code field to obtain the corresponding code.
4. The invoice storage method according to claim 1, wherein the invoice information comprises at least one of invoice type, invoice code, invoice number, invoicing date, validation code, sales unit information, purchase unit information, goods or taxable labor information.
5. The invoice storage method according to claim 1, further comprising:
and step 3: identifying invoice information of the invoice to be added, and verifying the invoice information of the invoice to be added through the invoice database;
and 4, step 4: and coding the invoice information of the invoice to be added which passes the verification, and storing the code in the invoice library.
6. The invoice storage method according to claim 5, wherein the verifying the invoice information of the invoice to be added through the invoice repository comprises:
step 301: extracting the invoice code and the invoice number of the invoice to be added according to the invoice information of the invoice to be added, and judging the invoice category of the invoice to be added;
step 302: searching all codes of corresponding categories in the invoice database according to the invoice categories of the invoices to be added;
step 303: comparing the invoice code and the invoice number of the invoice to be added with the retrieved code respectively, if the invoice code and the invoice number of the invoice to be added are the same as the invoice code and the invoice number corresponding to the retrieved code respectively, judging that the invoice to be added is an abnormal invoice, and returning abnormal information; otherwise, the verification is passed.
7. The invoice storage method according to claim 1, further comprising performing the following steps after the step 1:
and calling an API (application program interface) provided by the State tax administration, verifying the identified invoice information and obtaining the corresponding invoice state.
8. The invoice storage method according to claim 1, wherein the invoice repository is a distributed file system (HDFS), and the storing the code in the invoice repository comprises:
creating a folder for storing the code in the HDFS, and storing the code in the folder.
9. The invoice storage method according to claim 8, wherein the code is stored in the folder by an invoice data structure for storing the following invoice information corresponding to the code: the invoice data structure is also used for storing a timestamp corresponding to the code, and the timestamp represents the time when the code is written into the invoice data structure.
10. The invoice storage method according to claim 9, further comprising:
performing an invoice query in the HDFS by:
retrieving in the HDFS according to the invoice code and the invoice number of the invoice to be inquired to obtain at least one piece of corresponding invoice information;
selecting invoice information with the timestamp closest to the current time from the at least one piece of retrieved invoice information as valid invoice information; and
and returning the invoice status of the effective invoice information.
CN201811624838.2A 2018-12-28 2018-12-28 Invoice storage method Active CN109886076B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811624838.2A CN109886076B (en) 2018-12-28 2018-12-28 Invoice storage method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811624838.2A CN109886076B (en) 2018-12-28 2018-12-28 Invoice storage method

Publications (2)

Publication Number Publication Date
CN109886076A CN109886076A (en) 2019-06-14
CN109886076B true CN109886076B (en) 2021-07-30

Family

ID=66925315

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811624838.2A Active CN109886076B (en) 2018-12-28 2018-12-28 Invoice storage method

Country Status (1)

Country Link
CN (1) CN109886076B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111199222B (en) * 2019-12-30 2023-08-25 航天信息软件技术有限公司 Bill management method and electronic equipment
CN112732955A (en) * 2021-03-31 2021-04-30 国网浙江省电力有限公司 Financial certificate storage and recording method in standard cost accounting
CN115017272B (en) * 2022-08-09 2022-11-04 盛业信息科技服务(深圳)有限公司 Intelligent verification method and device based on registration data

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103885945B (en) * 2012-12-19 2017-06-23 中国银联股份有限公司 Magnanimity information processing method and system
KR102144125B1 (en) * 2014-01-14 2020-08-12 한국전자통신연구원 Apparatus and Method Measuring Mail Size and Acquiring Mail Information
CN105701905A (en) * 2014-11-28 2016-06-22 航天信息股份有限公司 Invoice recognition method and system
CN106296376A (en) * 2015-06-09 2017-01-04 北京京东尚科信息技术有限公司 The acquisition and recording method and system of electronics accounting data
CN106934632A (en) * 2015-12-30 2017-07-07 远光软件股份有限公司 Invoice verification method and invoice true check system
CN206931140U (en) * 2017-04-05 2018-01-26 青海航天信息有限公司 A kind of tax invoice management system
CN107688975A (en) * 2017-07-26 2018-02-13 太仓华淏信息科技有限公司 A kind of tax invoice management system
CN108305106A (en) * 2018-01-31 2018-07-20 复旦大学 A kind of electronic invoice register method based on block chain

Also Published As

Publication number Publication date
CN109886076A (en) 2019-06-14

Similar Documents

Publication Publication Date Title
CN109887153B (en) Finance and tax processing method and system
US20230377032A1 (en) System and method for processing transaction records for users
US7937307B1 (en) Electronic check presentment system and method having an item sequence capability
US20190279170A1 (en) Dynamic resource management associated with payment instrument exceptions processing
CN109886076B (en) Invoice storage method
US10108941B2 (en) Check data lift for online accounts
US9384393B2 (en) Check data lift for error detection
CA3068331C (en) Instrument disambiguation to facilitate electronic data consolidation
US20150356545A1 (en) Machine Implemented Method of Processing a Transaction Document
CN105243583A (en) Data processing method and data processing system
US20150120563A1 (en) Check data lift for ach transactions
US9378416B2 (en) Check data lift for check date listing
WO2020233402A1 (en) Accounts payable order validation method, apparatus and device, and storage medium
CN115017272B (en) Intelligent verification method and device based on registration data
US20080000962A1 (en) Method and system for processing image returns
CN111127010A (en) Transaction bill checking method, device, equipment and storage medium
CN110263239B (en) Invoice identification method and device, storage medium and computer equipment
CN109727138B (en) Confidence-based certificate matching method and system
US20050043972A1 (en) System and method for reconciling an insurance payment with an insurance claim
CN111984734A (en) Data processing method, device and equipment based on block chain and storage medium
US20200193525A1 (en) System and method for automatic verification of expense note
TWM575887U (en) Intelligent accounting system
CN116797329A (en) Abnormal data alarming method, device, computer equipment and storage medium
CN113065939A (en) Unattended financial bill reimbursement method, unattended financial bill reimbursement system, electronic equipment and storage medium
KR20120036523A (en) Method and system for certificating data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant