CN111222412B - Method and device for generating value-added tax ordinary invoice reimbursement information based on image recognition - Google Patents
Method and device for generating value-added tax ordinary invoice reimbursement information based on image recognition Download PDFInfo
- Publication number
- CN111222412B CN111222412B CN201911210795.8A CN201911210795A CN111222412B CN 111222412 B CN111222412 B CN 111222412B CN 201911210795 A CN201911210795 A CN 201911210795A CN 111222412 B CN111222412 B CN 111222412B
- Authority
- CN
- China
- Prior art keywords
- invoice
- image
- tax
- denoising
- reimbursement
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/30—Noise filtering
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Image Analysis (AREA)
- Character Input (AREA)
Abstract
Aiming at the problems of manual processing and low working efficiency of invoice receipts generated by the value-added tax reimbursement information, the method and the device for generating the value-added tax reimbursement information based on image recognition are provided to improve the accuracy of an invoice automatic processing process, and specifically, the method and the device for generating the value-added tax reimbursement information realize the automatic generation of the value-added tax reimbursement information by establishing a corresponding relation between an electronic invoice and a financial reimbursement subject, acquiring an electronic invoice image, preprocessing the electronic invoice image, denoising, region positioning and template matching operations and comparing the information.
Description
Technical Field
The invention relates to the technical field of financial information electronic processing, in particular to a value-added tax ordinary invoice reimbursement information generation method and device based on image recognition.
Background
In recent years, with the rapid development of Chinese economy, the types and the quantity of bills are in an annual rising trend, and the value-added tax common invoice is one of them. The use of a large number of value-added tax invoices brings serious challenges to corresponding invoice recognition technology and invoice automatic generation technology.
In the automatic identification process of the invoice image, an identification area is set through customizing a form template, identification attributes are set, special characters are called, option area identification is carried out, identification post-processing is carried out according to the identification attributes, and finally a structured identification result is output; or based on the use of the bloom TH-OCR technology, the invoice is subjected to a plurality of preprocessing operations, and particularly has the functions of rectifying deviation, correcting color cast, filtering color, reducing noise, binarizing, enhancing the contrast of a recognition unit and the like, and the functions can be flexibly configured and freely combined to output the optimal image quality for the later recognition.
However, at present, the problem of generating a lot of value-added tax common invoice reimbursement information is solved, and a lot of enterprises and public institutions need reimbursement after normal purchasing, and a financial system needs to manually process a lot of invoice receipts, so that a lot of manpower and material resources are consumed, and the working efficiency is low, and therefore, the automatic identification processing of the receipts can efficiently improve the working efficiency of financial departments. However, if the bill automation processing process has low effective recognition rate, not only can bring business risk, but also can increase the workload for the subsequent manual processing, so that the accuracy of bill automation processing is very necessary to be improved.
Disclosure of Invention
According to the technical problems, the invention provides a value-added tax common invoice reimbursement information generation method and device based on image recognition to improve the accuracy of an invoice automatic processing process. The technical scheme is as follows: according to a first aspect of an embodiment of the present disclosure, a method for generating value-added tax ordinary invoice reimbursement information by image recognition is provided, including: step 1, establishing a corresponding relation between an electronic invoice and a financial reimbursement subject, and generating an electronic invoice-reimbursement corresponding table, wherein table fields comprise: buyer tax payer identification number, invoice code, invoice number, goods or tax service/service name, invoice amount.
Step 2, acquiring an electronic invoice image, and performing preprocessing, denoising, region positioning and template matching on the electronic invoice image to obtain seller name, seller tax payer identification number, purchaser name, purchaser tax payer identification number, invoice code, invoice number, goods or tax service/service name, invoice amount, rechecking information and drawer information in the electronic invoice image.
And step 3, comparing the identified tax payer identification number of the buyer with the tax payer identification number of the buyer in the electronic invoice-reimbursement corresponding table, if the comparison result is consistent, entering step 4, otherwise ending the generation of the invoice reimbursement information.
And 4, comparing the checked information obtained by identification with the drawer information, ending the generation of the invoice reimbursement information if the comparison result is consistent, and automatically filling the invoice code data, the invoice number data, the goods or tax service name data and the invoice amount data of the electronic invoice image identification result into corresponding items in an electronic invoice-reimbursement corresponding table if the comparison result is inconsistent.
The step 2 specifically comprises the following steps:
s1, acquiring an image of a normal invoice of a value-added tax, obtaining an original color image of the normal invoice of the value-added tax with 24 bits, and extracting an R component of the original color image of the normal invoice of the value-added tax as a gray level image to be identified, wherein the gray level value of a pixel point on the gray level image to be identified is 0 or 255;
s2, regularized denoising treatment is carried out on the gray level image to reduce noise points, a denoised gray level image is obtained, and then binarization treatment of self-adaptive threshold segmentation is carried out on the denoised gray level image, so that a value-added tax common invoice self-adaptive threshold binarization image is obtained;
s3, roughly positioning the areas of the tax payer identification number, the invoice code, the invoice number, the billing date and the amount according to the position priori information of the tax payer identification number, the invoice code, the billing date and the amount, precisely positioning the areas by adopting a horizontal projection and a vertical crossing number body distance method, and carrying out character segmentation normalization processing on the precisely positioned areas to obtain the to-be-identified tax payer identification number, the invoice code, the invoice number, the billing date and the amount;
s4, identifying the identification number, the invoice code, the invoice number, the billing date and the amount of the buyer taxpayer to be identified by using a template feature matching algorithm; and obtaining a recognition result.
Further, the method further comprises the following steps: in step S2, in order to perform denoising better, the present invention selects a non-local mean kernel, so that similarity between pixels can be quantified according to edge metrics derived by blurring edge supplementation, and specifically, the regularized denoising process includes the following steps: establishing a pixel mu in coordinates (x, y) xy Edge detector ED of (a) (μxy) The following formula (1): ED(u xy )=|e ⊥ (x, y) -e (x, y) | (x, y) ∈Ω (1), ED is close to 0 in the smoothed region, ED gets larger near the edge, and ED is close to 0 in the noisy region; denoising model based on total variation and edge detector ED (μxy) The denoising model is presented as follows:where λ is the regularization parameter, f=μ * +ω(μ * Original unknown image, ω is gaussian noise), +.>Delta is a positive parameter for controlling the gradual decay of ψ (ED (μ)) from 2 to 1, the regularization parameter λ has the effect of mediating the approximation term, when λ is sufficiently large, the second term in the model is known to determine its effect, and when λ ->At 0, the first term controls the whole objective function, so that the selection of lambda is important in solving, the selection of regularization parameters is related to the noise variance of the initial addition, and the corresponding lambda expression is:
further, in step S2, the regularized denoising processing includes the steps of using a gradient descent method and obtaining a lagrangian equation of the denoising model formula (2):wherein the diffusion function isLet Φ(s) =s ED(u) 。
Further, in step S2, the lagrangian equation is solved using a partial differential equation based method:wherein mu NN Is the second derivative in the N direction, mu TT Is the second derivative of N in the vertical direction T.
Further, in step S2Said mu NN sum μ TT The method comprises the following steps of:wherein mu xx 、μ yy Sum mu xy Representing the second derivative, and t is the transpose operator, the discrete model given by equation (4) is as follows:and determining iteration stop time according to the energy check of the images before and after denoising.
The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects: by establishing an edge detector ED (mu xy) of a pixel mu xy on coordinates (x, y), carrying out denoising processing by proposing a denoising model based on a total variation denoising model and the edge detector ED (mu xy), better denoising effect is obtained in the identification process of the common value-added tax invoice, and different areas in the invoice identification process can be more accurately positioned by adopting a horizontal projection and vertical crossing number body distance method, so that the common value-added tax invoice influenced by noise such as a seal can be more accurately processed
According to a second aspect of the embodiments of the present disclosure, there is provided an image-identified value-added tax ordinary invoice reimbursement information generating device, the generating device including:
the data construction module is used for establishing a corresponding relation between the electronic invoice and the financial reimbursement subjects, generating an electronic invoice-reimbursement corresponding table, and the table fields comprise: buyer tax payer identification number, invoice code, invoice number, goods or tax service/service name, invoice amount.
The data processing module is used for acquiring an electronic invoice image, preprocessing the electronic invoice image, denoising, region positioning and template matching to obtain seller name, seller tax payer identification number, buyer name, buyer tax payer identification number, invoice code, invoice number, goods or tax payment/service name, invoice amount, rechecking information and drawer information in the electronic invoice image.
And the data matching module is used for comparing the identified tax payer identification number of the buyer with the tax payer identification number of the buyer in the electronic invoice-reimbursement corresponding table, if the comparison result is consistent, the data generating module is used, and if not, the generation of the invoice reimbursement information is finished.
And the data generation module is used for comparing the rechecking information obtained by identification with the drawer information, ending the generation of the invoice reimbursement information if the comparison result is consistent, and automatically filling the invoice code data, the invoice number data, the goods or tax service name data and the invoice amount data of the electronic invoice image identification result into the corresponding items in the electronic invoice-reimbursement corresponding table if the comparison result is inconsistent.
The data processing module comprises: the image acquisition module is configured to acquire an original value-added tax common invoice color image with 24 bits, extract an R component of the original value-added tax common invoice color image and serve as a gray image to be identified, wherein the gray value of a pixel point on the gray image to be identified is 0 or 255;
the image denoising module is configured to perform regularization denoising treatment on the gray level image to reduce noise points, obtain a denoised gray level image, and then perform binarization treatment of adaptive threshold segmentation on the denoised gray level image to obtain a value-added tax common invoice adaptive threshold binarization image;
the image positioning module is configured to roughly position the areas of the tax payer identification number, the invoice code, the invoice number, the invoicing date and the amount according to the position priori information of the tax payer identification number, the invoice code, the invoice number, the invoicing date and the amount, precisely position the areas by adopting a horizontal projection and a method for vertically crossing the distance of a number body, and acquire the to-be-identified tax payer identification number, the invoice code, the invoicing date and the amount after character segmentation normalization processing is carried out on the precisely positioned areas;
the image recognition module is configured to recognize the to-be-recognized buyer tax payer identification number, invoice code, invoice number, invoicing date and amount by using a template feature matching algorithm; and obtaining a recognition result.
Further, the image denoising module is further configured to select a non-local mean kernel, determine a denoising model proposed in the application according to the total variation denoising model and the edge detector, and comprises the following steps: establishing a pixel mu in coordinates (x, y) xy Edge detector ED of (a) (μxy) The following formula (1): ED (u) xy )=|e ⊥ (x, y) -e (x, y) | (x, y) ∈Ω (1), ED is close to 0 in the smoothed region, ED gets larger near the edge, and ED is close to 0 in the noisy region; denoising model based on total variation and edge detector ED (μxy) The denoising model is presented as follows:where λ is the regularization parameter, f=μ * +ω(μ * Original unknown image, ω is gaussian noise), +.>Delta is a positive parameter for controlling the gradual decay of ψ (ED (μ)) from 2 to 1, the regularization parameter λ has the effect of mediating the approximation term, when λ is sufficiently large, the second term in the model is known to determine its effect, and when λ ->At 0, the first term controls the whole objective function, so that the selection of lambda is important in solving, the selection of regularization parameters is related to the noise variance of the initial addition, and the corresponding lambda expression is:
further, the image denoising module is further configured to include the steps of: obtaining a Lagrange equation of the denoising model type (2) by using a gradient descent method:wherein the diffusion function is->Let Φ(s) =s ED(u) 。
Further, image denoisingThe module is further configured to include the steps of: solving the lagrangian equation using a partial differential equation based method:wherein mu NN Is the second derivative in the N direction, mu TT Is the second derivative of N in the vertical direction T.
Further, the image denoising module is further configured to include the steps of: said mu NN sum μ TT The method comprises the following steps of:wherein mu xx 、μ yy Sum mu xy Representing the second derivative, and t is the transpose operator, the discrete model given by equation (4) is as follows:and determining iteration stop time according to the energy check of the images before and after denoising.
The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects: by establishing a pixel mu in coordinates (x, y) xy Edge detector ED of (a) (μxy) Denoising model based on total variation and edge detector ED (μxy) The denoising model is provided for denoising treatment, so that a better denoising effect is obtained in the identification process of the common value-added tax invoice, and different areas in the invoice identification process can be positioned more accurately by adopting a horizontal projection and vertical crossing number distance method, so that the common value-added tax invoice influenced by noise such as a seal can be processed more accurately.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.
Fig. 1 is a flowchart illustrating a method for generating image-identified value-added tax general invoice reimbursement information, according to an exemplary embodiment.
Fig. 2 is a block diagram illustrating an image-identified value-added tax general invoice reimbursement information generation apparatus according to an exemplary embodiment.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
When the method and the device are used for identifying the common value-added tax invoice, the first: by establishing a pixel mu in coordinates (x, y) xy Edge detector ED of (a) (μxy) Denoising model based on total variation and edge detector ED (μxy) The denoising model is provided for denoising treatment, so that a better denoising effect is obtained in the identification process of the common value-added tax invoice, and different areas in the invoice identification process can be positioned more accurately by adopting a horizontal projection and vertical crossing number distance method, so that the common value-added tax invoice influenced by noise such as a seal can be processed more accurately. The method and the device for generating the value-added tax ordinary invoice reimbursement information based on image recognition are described in detail below with reference to the accompanying drawings.
Fig. 1 is a flowchart illustrating a method of generating image-identified value-added tax general invoice reimbursement information, according to an exemplary embodiment, the method may include the steps of: step 1, establishing a corresponding relation between an electronic invoice and a financial reimbursement subject, and generating an electronic invoice-reimbursement corresponding table, wherein table fields comprise: buyer tax payer identification number, invoice code, invoice number, goods or tax service/service name, invoice amount.
Step 2, acquiring an electronic invoice image, and performing preprocessing, denoising, region positioning and template matching on the electronic invoice image to obtain seller name, seller tax payer identification number, purchaser name, purchaser tax payer identification number, invoice code, invoice number, goods or tax service/service name, invoice amount, rechecking information and drawer information in the electronic invoice image.
And step 3, comparing the identified tax payer identification number of the buyer with the tax payer identification number of the buyer in the electronic invoice-reimbursement corresponding table, if the comparison result is consistent, entering step 4, otherwise ending the generation of the invoice reimbursement information. And 4, comparing the checked information obtained by identification with the drawer information, ending the generation of the invoice reimbursement information if the comparison result is consistent, and automatically filling the invoice code data, the invoice number data, the goods or tax service name data and the invoice amount data of the electronic invoice image identification result into corresponding items in an electronic invoice-reimbursement corresponding table if the comparison result is inconsistent.
The step 2 specifically comprises the following steps:
s1, acquiring an image of a normal invoice of a value-added tax, namely acquiring an original color image of the normal invoice of the value-added tax with 24 bits, extracting an R component of the original color image of the normal invoice of the value-added tax, and taking the R component of the original color image of the normal invoice as a gray level image, wherein the gray level value of a pixel point on the gray level image to be identified is 0 or 255, and the image can be acquired by using modes such as camera shooting during image acquisition;
s2, regularized denoising treatment is carried out on the gray level image to reduce noise points, a denoised gray level image is obtained, and then binarization treatment of self-adaptive threshold segmentation is carried out on the denoised gray level image, so that a value-added tax common invoice self-adaptive threshold binarization image is obtained;
s3, roughly positioning the areas of the tax payer identification number, the invoice code, the invoice number, the billing date and the amount according to the position priori information of the tax payer identification number, the invoice code, the billing date and the amount, precisely positioning the areas by adopting a horizontal projection and a vertical crossing number body distance method, and carrying out character segmentation normalization processing on the precisely positioned areas to obtain the to-be-identified tax payer identification number, the invoice code, the invoice number, the billing date and the amount; the method has the advantages that the information required to be identified for the invoice to be identified can be better obtained by carrying out regional positioning on the invoice;
s4, identifying the identification number, the invoice code, the invoice number, the billing date and the amount of the buyer taxpayer to be identified by using a template feature matching algorithm; and obtaining a recognition result.
Further, the method further comprises the following steps: in step S2, in order to perform denoising better, the present invention selects a non-local mean kernel, so that similarity between pixels can be quantified according to edge metrics derived by blurring edge supplementation, and specifically, the regularized denoising process includes the following steps: establishing a pixel mu in coordinates (x, y) xy Edge detector ED of (a) (μxy) The following formula (1): ED (u) xy )=|e ⊥ (x, y) -e (x, y) | (x, y) ∈Ω (1), ED is close to 0 in the smoothed region, ED gets larger near the edge, and ED is close to 0 in the noisy region; denoising model based on total variation and edge detector ED (μxy) The denoising model is presented as follows:where λ is the regularization parameter, f=μ * +ω(μ * Original unknown image, ω is gaussian noise), +.>Delta is a positive parameter for controlling the gradual decay of ψ (ED (μ)) from 2 to 1, the regularization parameter λ has the effect of mediating the approximation term, when λ is sufficiently large, the second term in the model is known to determine its effect, and when λ ->At 0, the first term controls the whole objective function, so that the selection of lambda is important in solving, the selection of regularization parameters is related to the noise variance of the initial addition, and the corresponding lambda expression is:
further, in step S2, the regularized denoising process includes the steps of: obtaining the denoising model by using a gradient descent method(2) Lagrangian equation of (c):wherein the diffusion function isLet Φ(s) =s ED(u) :
Further, the image denoising module is further configured to include the steps of: solving the lagrangian equation using a partial differential equation based method:wherein mu NN Is the second derivative in the N direction, mu TT Is the second derivative of N in the vertical direction T.
Further, the image denoising module is further configured to include the steps of: said mu NN sum μ TT The method comprises the following steps of:wherein mu xx 、μ yy Sum mu xy Represents the second derivative and t is the transpose operator giving the equation +.>The discrete model of (2) is as follows: />And determining iteration stop time according to the energy check of the images before and after denoising.
The following are device embodiments of the present disclosure that may be used to perform method embodiments of the present disclosure. For details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the method of the present disclosure.
Fig. 2 is a diagram illustrating an image-identified value-added tax general invoice reimbursement information generation apparatus according to an exemplary embodiment. The generating means may be implemented as part or all of the terminal device by software, hardware or a combination of both. Referring to fig. 2, the apparatus includes: the device comprises a data construction module, a data processing module, a data matching module and a data generation module.
The data construction module is used for establishing a corresponding relation between the electronic invoice and the financial reimbursement subjects, generating an electronic invoice-reimbursement corresponding table, and the table fields comprise: buyer tax payer identification number, invoice code, invoice number, goods or tax service/service name, invoice amount.
The data processing module is used for acquiring an electronic invoice image, preprocessing the electronic invoice image, denoising, region positioning and template matching to obtain seller name, seller tax payer identification number, buyer name, buyer tax payer identification number, invoice code, invoice number, goods or tax payment/service name, invoice amount, rechecking information and drawer information in the electronic invoice image.
And the data matching module is used for comparing the identified tax payer identification number of the buyer with the tax payer identification number of the buyer in the electronic invoice-reimbursement corresponding table, if the comparison result is consistent, the data generating module is used, and if not, the generation of the invoice reimbursement information is finished.
And the data generation module is used for comparing the rechecking information obtained by identification with the drawer information, ending the generation of the invoice reimbursement information if the comparison result is consistent, and automatically filling the invoice code data, the invoice number data, the goods or tax service name data and the invoice amount data of the electronic invoice image identification result into the corresponding items in the electronic invoice-reimbursement corresponding table if the comparison result is inconsistent.
Wherein the data processing module comprises: the image acquisition module is configured to acquire an original value-added tax common invoice color image with 24 bits, extract an R component of the original value-added tax common invoice color image as a gray image, and the gray value of a pixel point on the gray image to be identified is 0 or 255;
the image denoising module is configured to perform regularization denoising treatment on the gray level image to reduce noise points, obtain a denoised gray level image, and then perform binarization treatment of adaptive threshold segmentation on the denoised gray level image to obtain a value-added tax common invoice adaptive threshold binarization image;
the image area positioning module is configured to roughly position areas of the tax payer identification number, the invoice code, the invoice number, the invoicing date and the amount according to the position priori information of the tax payer identification number, the invoice code, the invoice number, the invoicing date and the amount, accurately position the areas by adopting a horizontal projection and a method for vertically crossing the distance of a number body, and acquire the tax payer identification number, the invoice code, the invoice number, the invoicing date and the amount to be identified after carrying out character segmentation normalization on the accurately positioned areas;
the image recognition module is configured to recognize the to-be-recognized buyer tax payer identification number, invoice code, invoice number, invoicing date and amount by using a template feature matching algorithm; and obtaining a recognition result.
Further, the image denoising module is further configured to select a non-local mean kernel, determine a denoising model proposed in the application according to the total variation denoising model and the edge detector, and comprises the following steps: establishing a pixel mu in coordinates (x, y) xy Edge detector ED of (a) (μxy) The following formula (1): ED (u) xy )=|e ⊥ (x, y) -e (x, y) | (x, y) ∈Ω (1), ED is close to 0 in the smoothed region, ED gets larger near the edge, and ED is close to 0 in the noisy region; denoising model based on total variation and edge detector ED (μxy) The denoising model is presented as follows:where λ is the regularization parameter, f=μ * +ω(μ * Original unknown image, ω is gaussian noise), +.>Delta is a positive parameter for controlling the gradual decay of ψ (ED (μ)) from 2 to 1, the regularization parameter λ has the effect of mediating the approximation term, when λ is sufficiently large, the second term in the model is known to determine its effect, and when λ ->When 0, the first term controls the whole objective function, so that the selection of lambda is very important in solving and the selection of regularization parametersTaking the relation to the noise variance of the initial addition, the corresponding lambda expression is:
further, the image denoising module is further configured to include the steps of: obtaining a Lagrange equation of the denoising model type (2) by using a gradient descent method:wherein the diffusion function is->Let Φ(s) =s ED(u) 。
Further, the image denoising module is further configured to include the steps of: obtaining a Lagrange equation of the denoising model type (2) by using a gradient descent method:wherein the diffusion function is->Let Φ(s) =s ED(u) 。
Further, in step S2, the μ NN sum μ TT The method comprises the following steps of:wherein mu xx 、μ yy Sum mu xy Representing the second derivative, and t is the transpose operator, the discrete model given by equation (4) is as follows:and determining iteration stop time according to the energy check of the images before and after denoising.
In summary, the device provided in this embodiment is configured to create the pixel μ in coordinates (x, y) xy Edge detector ED of (a) (μxy) Denoising model based on total variation and edge detector ED (μxy) Extracting a denoising model to perform denoising treatment, and recognizing common value-added tax invoiceBetter denoising effect is obtained in other processes, and different areas in the ticket departure recognition process can be more accurately positioned by adopting a horizontal projection and vertical crossing number distance method, so that the common value-added tax invoice influenced by noise such as a seal can be more accurately processed.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
The image recognition device of the value-added tax common invoice can be a mobile phone, a computer, a tablet device and the like.
The value added tax plain invoice image recognition device may include one or more of the following components: a processing component, a memory, an image acquisition component, a power supply component, a multimedia component, an audio component, an input/output (I/O) interface, a sensor component, and a communication component.
The processing component generally controls overall operation of the image recognition device of the value added tax plain invoice, such as operations associated with display, telephone call, data communication, camera operations, and recording operations. The processing component may include one or more processors to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component may include one or more modules that facilitate interactions between the processing component and other components.
The memory is configured to store various types of data to support operations at the device. Examples of such data include instructions for any application or method operating on the device, contact data, phonebook data, messages, pictures, videos, and the like. The memory may be implemented by any type of volatile or nonvolatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk
The power supply assembly provides power to the various components of the device. The power components may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the device 800.
The I/O interface provides an interface between the processing assembly and a peripheral interface module, which may be a keyboard, click wheel, button, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button
The image acquisition component can be a CCD camera or an author scanning component and is used for acquiring the image of the common value-added tax invoice to be identified.
The communication component is configured to facilitate communication between the apparatus and other devices in a wired or wireless manner. The device may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In one exemplary embodiment, the communication component receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component further comprises a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as a memory, comprising instructions executable by a processor of an apparatus to perform the above-described method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
A non-transitory computer readable storage medium, which when executed by a processor of an apparatus, causes the apparatus to perform a value-added tax plain invoice reimbursement information generation method of image recognition.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.
Claims (5)
1. A value-added tax ordinary invoice reimbursement information generation method based on image recognition is characterized by comprising the following steps of:
step 1, establishing a corresponding relation between an electronic invoice and a financial reimbursement subject, and generating an electronic invoice-reimbursement corresponding table, wherein table fields comprise: buyer tax payer identification number, invoice code, invoice number, goods or tax service/service name, invoice amount;
step 2, acquiring an electronic invoice image, and performing preprocessing, denoising, region positioning and template matching on the electronic invoice image to obtain seller name, seller tax payer identification number, purchaser name, purchaser tax payer identification number, invoice code, invoice number, goods or tax service/service name, invoice amount, rechecking information and drawer information in the electronic invoice image; the step 2 specifically comprises the following steps: s1, acquiring an image of a value-added tax common invoice by using a camera to obtain an original value-added tax common invoice color image with 24 bits, and extracting an R component of the original value-added tax common invoice color image to be used as a gray level image to be identified, wherein the gray level value of a pixel point on the gray level image to be identified is 0 or 255; s2, for the ashRegularization denoising treatment is carried out on the degree image to reduce noise points, a denoised gray level image is obtained, then binarization treatment of self-adaptive threshold segmentation is carried out on the denoised gray level image, and a value-added tax common invoice self-adaptive threshold binarization image is obtained; s3, roughly positioning the areas of the tax payer identification number, the invoice code, the invoice number, the billing date and the amount according to the position priori information of the tax payer identification number, the invoice code, the billing date and the amount, precisely positioning the areas by adopting a horizontal projection and a vertical crossing number body distance method, and carrying out character segmentation normalization processing on the precisely positioned areas to obtain the to-be-identified tax payer identification number, the invoice code, the invoice number, the billing date and the amount; s4, identifying the identification number, the invoice code, the invoice number, the billing date and the amount of the buyer taxpayer to be identified by using a template feature matching algorithm; obtaining an identification result; the regularized denoising processing process comprises the following steps: establishing a pixel mu in coordinates (x, y) xy Edge detector ED of (a) (μxy) The following formula (1): ED (u) xy )=|e ⊥ (x, y) -e (x, y) | (x, y) ∈Ω (1), ED is close to 0 in the smoothed region, ED gets larger near the edge, and ED is close to 0 in the noisy region; denoising model based on total variation and edge detector ED (μxy) The denoising model is proposed as follows (2):where λ is the regularization parameter, f=μ * +ω, wherein μ * For the original unknown image, ω is Gaussian noise, +.>Delta is a positive parameter for controlling the gradual decay of ψ (ED (μ)) from 2 to 1; using a gradient descent method and obtaining a Lagrange equation of the denoising model type (2): />Wherein the diffusion function is Θ ->Let Φ(s) =s ED(u) ;
Step 3, comparing the identified tax payer identification number of the buyer with the tax payer identification number of the buyer in the electronic invoice-reimbursement corresponding table, if the comparison result is consistent, entering step 4, otherwise ending the generation of the invoice reimbursement information;
and 4, comparing the checked information obtained by identification with the drawer information, ending the generation of the invoice reimbursement information if the comparison result is consistent, and automatically filling the invoice code data, the invoice number data, the goods or tax service name data and the invoice amount data of the electronic invoice image identification result into corresponding items in an electronic invoice-reimbursement corresponding table if the comparison result is inconsistent.
2. The image recognition-based electronic invoice reimbursement information generation method as recited in claim 1, wherein the lagrangian equation is solved using a partial differential equation-based method: wherein mu NN Is the second derivative in the N direction, mu TT Is the second derivative of N in the vertical direction T.
3. The electronic invoice reimbursement information generation method based on image recognition as claimed in claim 2, wherein the μ is NN sum μ TT The method comprises the following steps of:
wherein mu xx 、μ yy Sum mu xy Representing the second derivative, and t is the transpose operator, the discrete model given by equation (4) is as follows:
and determining iteration stop time according to the energy check of the images before and after denoising.
4. An image recognition-based value-added tax ordinary invoice reimbursement information generation device is characterized in that the generation device comprises:
the data construction module is used for establishing a corresponding relation between the electronic invoice and the financial reimbursement subjects, generating an electronic invoice-reimbursement corresponding table, and the table fields comprise: buyer tax payer identification number, invoice code, invoice number, goods or tax service/service name, invoice amount;
the data processing module is used for acquiring an electronic invoice image, preprocessing the electronic invoice image, denoising, region positioning and template matching to obtain seller name, seller tax payer identification number, buyer name, buyer tax payer identification number, invoice code, invoice number, goods or tax payment/service name, invoice amount, rechecking information and drawer information in the electronic invoice image; the data processing module comprises: the image acquisition module is used for carrying out image acquisition on the normal invoice of the value-added tax, obtaining an original color image of the normal invoice of the value-added tax with 24 bits, extracting an R component of the original color image of the normal invoice of the value-added tax as a gray level image to be identified, wherein the gray level value of a pixel point on the gray level image to be identified is 0 or 255; the image denoising module is used for regularizing and denoising the gray level image to reduce noise points, obtaining a denoised gray level image, and then performing binarization processing of self-adaptive threshold segmentation on the denoised gray level image to obtain a value-added tax common invoice self-adaptive threshold binarization image; the image positioning module is used for greatly positioning the purchasing tax payer according to the position priori information of the identification number, the invoice code, the invoice number, the billing date and the money amount of the purchasing tax payerCoarsely positioning areas of a tax payer identification number, an invoice code, an invoice number, an invoicing date and an amount, precisely positioning the areas by adopting a horizontal projection and vertical crossing number distance method, and obtaining the tax payer identification number, the invoice code, the invoice number, the invoicing date and the amount to be identified after character segmentation normalization processing is carried out on the precisely positioned areas; the image recognition module is used for recognizing the identification number, the invoice code, the invoice number, the billing date and the amount of the buyer tax payer to be recognized by using a template feature matching algorithm; obtaining an identification result; the regularization denoising processing procedure in the image denoising module comprises the following steps of: establishing a pixel mu in coordinates (x, y) xy Edge detector ED of (a) (μxy) The following formula (1): ED (u) xy )=|e ⊥ (x, y) -e (x, y) | (x, y) ∈Ω (1), ED is close to 0 in the smoothed region, ED gets larger near the edge, and ED is close to 0 in the noisy region; denoising model based on total variation and edge detector ED (μxy) The denoising model is proposed as follows (2):where λ is the regularization parameter, f=μ * +ω, wherein μ * For the original unknown image, ω is Gaussian noise, +.>Delta is a positive parameter for controlling the gradual decay of ψ (ED (μ)) from 2 to 1; the image denoising module further comprises a Lagrange equation for obtaining the denoising model type (2) by using a gradient descent method: />Wherein the diffusion function is->Let Φ(s) =s ED(u) ;
The data matching module is used for comparing the identified tax payer identification number with the purchase tax payer identification number in the electronic invoice-reimbursement corresponding table, if the comparison result is consistent, the data generating module is used, otherwise, the generation of the invoice reimbursement information is finished;
and the data generation module is used for comparing the rechecking information obtained by identification with the drawer information, ending the generation of the invoice reimbursement information if the comparison result is consistent, and automatically filling the invoice code data, the invoice number data, the goods or tax service name data and the invoice amount data of the electronic invoice image identification result into the corresponding items in the electronic invoice-reimbursement corresponding table if the comparison result is inconsistent.
5. The image recognition-based value-added tax plain invoice reimbursement information generation apparatus as claimed in claim 4, wherein said image denoising module further comprises solving the lagrangian equation using a partial differential equation-based method:wherein mu NN Is the second derivative in the N direction, mu TT Is the second derivative of N in the vertical direction T, and the image denoising module further comprises the mu NN sum μ TT The method comprises the following steps of:wherein mu xx 、μ yy Sum mu xy Representing the second derivative, and t is the transpose operator, the discrete model given by equation (4) is as follows:and determining iteration stop time according to the energy check of the images before and after denoising.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911210795.8A CN111222412B (en) | 2019-12-02 | 2019-12-02 | Method and device for generating value-added tax ordinary invoice reimbursement information based on image recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911210795.8A CN111222412B (en) | 2019-12-02 | 2019-12-02 | Method and device for generating value-added tax ordinary invoice reimbursement information based on image recognition |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111222412A CN111222412A (en) | 2020-06-02 |
CN111222412B true CN111222412B (en) | 2023-08-01 |
Family
ID=70827731
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911210795.8A Active CN111222412B (en) | 2019-12-02 | 2019-12-02 | Method and device for generating value-added tax ordinary invoice reimbursement information based on image recognition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111222412B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11030450B2 (en) * | 2018-05-31 | 2021-06-08 | Vatbox, Ltd. | System and method for determining originality of computer-generated images |
CN111861594A (en) * | 2020-07-21 | 2020-10-30 | 湖南中斯信息科技有限公司 | Invoice processing method, invoice processing device, storage medium and processor |
CN111784423B (en) * | 2020-07-31 | 2023-08-25 | 广东电网有限责任公司梅州供电局 | Invoice matching method and device, electronic equipment and storage medium |
CN113344690A (en) * | 2021-06-30 | 2021-09-03 | 中国工商银行股份有限公司 | Invoice reimbursement processing method and device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108549843A (en) * | 2018-03-22 | 2018-09-18 | 南京邮电大学 | A kind of VAT invoice recognition methods based on image procossing |
CN109543690A (en) * | 2018-11-27 | 2019-03-29 | 北京百度网讯科技有限公司 | Method and apparatus for extracting information |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090112743A1 (en) * | 2007-10-31 | 2009-04-30 | Mullins Christine M | System and method for reporting according to eu vat related legal requirements |
-
2019
- 2019-12-02 CN CN201911210795.8A patent/CN111222412B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108549843A (en) * | 2018-03-22 | 2018-09-18 | 南京邮电大学 | A kind of VAT invoice recognition methods based on image procossing |
CN109543690A (en) * | 2018-11-27 | 2019-03-29 | 北京百度网讯科技有限公司 | Method and apparatus for extracting information |
Non-Patent Citations (1)
Title |
---|
王淑英 ; 李瑞恒 ; .增值税专用发票的使用和管理.河北水利.2012,(07),全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN111222412A (en) | 2020-06-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111222412B (en) | Method and device for generating value-added tax ordinary invoice reimbursement information based on image recognition | |
US11676285B1 (en) | System, computing device, and method for document detection | |
AU2017302250B2 (en) | Optical character recognition in structured documents | |
US9122953B2 (en) | Methods and systems for character segmentation in automated license plate recognition applications | |
US20200184549A1 (en) | Image-based financial processing | |
US11386417B2 (en) | Payment methods and systems by scanning QR codes already present in a user device | |
US10037581B1 (en) | Methods systems and computer program products for motion initiated document capture | |
US8818107B2 (en) | Identification generation and authentication process application | |
US20140279303A1 (en) | Image capture and processing for financial transactions | |
JP2016517587A (en) | Classification of objects in digital images captured using mobile devices | |
US10339373B1 (en) | Optical character recognition utilizing hashed templates | |
KR101893679B1 (en) | Card number recognition method using deep learnig | |
CN111209792B (en) | Image recognition method and device for value-added tax common invoice | |
US10943226B2 (en) | Method and system of capturing an image of a card | |
CN109344926A (en) | Processing method, device and the equipment of service receipt information | |
US20150120517A1 (en) | Data lifting for duplicate elimination | |
Chen et al. | Video-based content recognition of bank cards with mobile devices | |
US11900755B1 (en) | System, computing device, and method for document detection and deposit processing | |
Pourreza-Shahri et al. | Automatic exposure selection and fusion for high-dynamic-range photography via smartphones | |
US20200027138A1 (en) | Lost and found management systems and methods | |
AU2013100314A4 (en) | Computer implemented frameworks and methodologies for providing financial management and payment solutions via mobile devices | |
Ettl et al. | Text and image area classification in mobile scanned digitised documents | |
CN117218654A (en) | Bank card number identification method and device, storage medium and electronic equipment | |
CN117635762A (en) | Cloud wardrobe generation method, device, equipment and storage medium | |
CN117253248A (en) | Bill identification method, device and storage medium based on large language model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |