CN113869190A - Data processing method and system based on image analysis - Google Patents

Data processing method and system based on image analysis Download PDF

Info

Publication number
CN113869190A
CN113869190A CN202111128539.1A CN202111128539A CN113869190A CN 113869190 A CN113869190 A CN 113869190A CN 202111128539 A CN202111128539 A CN 202111128539A CN 113869190 A CN113869190 A CN 113869190A
Authority
CN
China
Prior art keywords
invoice
difference
text
module
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111128539.1A
Other languages
Chinese (zh)
Inventor
季伯阳
季亚飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Boyang Investment Management Co ltd
Original Assignee
Shenzhen Boyang Investment Management Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Boyang Investment Management Co ltd filed Critical Shenzhen Boyang Investment Management Co ltd
Priority to CN202111128539.1A priority Critical patent/CN113869190A/en
Publication of CN113869190A publication Critical patent/CN113869190A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/12Accounting
    • G06Q40/123Tax preparation or submission
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/12Accounting
    • G06Q40/125Finance or payroll

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Character Input (AREA)

Abstract

The invention relates to a data processing method and a system based on image analysis, relating to the technical field of data processing, comprising the following steps of S1, scanning an invoice to be reimbursed by scanning equipment; step S2, acquiring an invoice image scanned by the scanning equipment through an image acquisition module; step S3, adjusting the direction of the invoice image through an adjusting module; when the direction of the invoice image is adjusted, the direction of the invoice image is adjusted according to the length-width ratio A of the invoice image by the adjusting module; step S4, carrying out area division on the invoice image with the direction adjusted through a partition module to form a plurality of keyword areas; step S5, acquiring the text content of the keyword area through an acquisition module; step S6, judging whether the invoice meets the requirement through a judging module; and step S7, storing the text content meeting the requirement invoice through a storage module. The invention effectively improves the data processing efficiency of invoice reimbursement.

Description

Data processing method and system based on image analysis
Technical Field
The invention relates to the technical field of data processing, in particular to a data processing method and system based on image analysis.
Background
The invoice is a text issued by a seller to a buyer in economic activities, and the content includes the name, quality and agreement price of a product or service provided to the buyer. In addition to pre-payment, the invoice must have the elements of being paid by the purchaser to the seller according to agreed conditions, must contain the date and quantity, and is an important certificate for accounting.
When carrying out financial reimbursement through the invoice, financial staff need spend a large amount of time and check all kinds of reimbursement voucher data, and the invoice easily leads to the typeface to be unclear in printing and the storage process to lead to the manual work to check inefficiency and check the degree of accuracy low, and among the prior art, when carrying out invoice reimbursement data processing, still can't effectively distinguish the invoice that the typeface is unclear, lead to invoice reimbursement data processing inefficiency.
Disclosure of Invention
Therefore, the invention provides a data processing method and system based on image analysis, which are used for solving the problem of low invoice reimbursement efficiency caused by the fact that invoice information cannot be accurately acquired in the prior art.
To achieve the above object, in one aspect, the present invention provides a data processing method based on image analysis, comprising,
step S1, scanning the invoice to be reimbursed through scanning equipment;
step S2, acquiring an invoice image scanned by the scanning equipment through an image acquisition module;
step S3, adjusting the direction of the invoice image through an adjusting module; when the direction of the invoice image is adjusted, the direction of the invoice image is adjusted according to the length-width ratio A of the invoice image by the adjusting module;
step S4, carrying out area division on the invoice image with the direction adjusted through a partition module to form a plurality of keyword areas;
step S5, acquiring the text content of the keyword area through an acquisition module;
step S6, judging whether the invoice meets the requirement through a judging module;
step S7, the text content meeting the requirement invoice is stored through a storage module;
in the step S6, when the determining module determines the text content in the first keyword region of the invoice image, the determining module performs preliminary determination on the first keyword region according to the text number B in the first keyword region, if the text number B in the first keyword region meets the requirement, performs the next determination according to the difference text number D, if the difference text number D meets the requirement, performs the determination according to the number F of strokes having a difference in the difference text, and if the number F of strokes having a difference meets the requirement, performs the final determination on the first keyword region of the invoice image according to the difference length G of the difference strokes;
after the text content of the first keyword area meets the requirement, the judging module judges the second keyword area of the invoice image according to the text number C of the second keyword area of the invoice image, when the text number C of the second keyword area meets the requirement, the next judgment is carried out according to the distinguishing text number M, and if the distinguishing text number M meets the requirement, the final judgment is carried out on the second keyword area of the invoice image according to the outline of the distinguishing text.
Further, in step S3, when the adjusting module adjusts the invoice image, the adjusting module first obtains an aspect ratio a of the invoice image and adjusts the direction of the invoice image according to the aspect ratio a, wherein,
when A is less than 1, the adjusting module rotates the invoice image clockwise by 90 degrees to A is more than 1;
when A is larger than 1, the adjusting module obtains an elliptical graphic area in the middle of the invoice image, adjusts the invoice image according to the position of the elliptical graphic area, does not adjust when the elliptical graphic area is positioned above the invoice image, and rotates the invoice image 180 degrees clockwise when the elliptical graphic area is positioned below the invoice image.
Further, in step S4, when the partition module partitions the adjusted invoice image into regions, the partition module partitions the adjusted invoice image into regions according to the invoice frame structure position relationship, partitions the purchaser name region into a first keyword region, partitions the purchaser taxpayer identification number region into a second keyword region, partitions the tariff total region into a third keyword region, partitions the amount region into a fourth keyword region, and partitions the amount region into a fifth keyword region.
Further, in step S6, when the determining module determines the text content of the first keyword region, the determining module first obtains the text number B and compares the text number B with the text number B0 of the preset name, and performs text analysis according to the comparison result, wherein,
when B is not equal to B0, the judging module judges that the invoice is invalid and stops comparing the text content;
when B is B0, the judging module compares the text contents of the first keyword area one by one, wherein,
the judging module compares the single character shape of the first keyword region with the character shape with the same sequence in the preset name in sequence, takes the character with the different shape in the character content as the difference character, and obtains the difference character quantity D, the judging module compares the difference character quantity D with the preset difference character quantity D0, and judges the character content of the first keyword region according to the comparison result, wherein,
when D is 0, the judging module judges that the text content of the first keyword region meets the requirement;
when D is more than 0 and less than or equal to D0, the judging module analyzes and compares the difference characters in the character contents in detail;
when D > D0, the determination module determines that the invoice is invalid.
Further, when the judging module performs detailed analysis and comparison on the difference characters in the character content, the judging module compares the stroke number F with difference in a single difference character with a preset difference stroke number F0, and judges according to the comparison result, wherein,
when F is larger than F0, the judging module judges that the invoice is invalid;
when F is less than or equal to F0, the judging module carries out the next judgment according to the difference length G of the difference strokes, wherein,
when the stroke directions of the different strokes in the different characters are different, the judging module judges that the invoice is invalid;
when the difference strokes in the difference characters only have the difference in stroke length, the judging module compares the difference length G of the difference strokes with the preset difference length G0 and judges according to the comparison result, wherein,
when G is less than or equal to G0, the judging module judges that the invoice characters are unclear, judges that the difference characters are the same as the characters corresponding to the preset name, and judges that the character content in the first keyword region meets the requirement;
when G is larger than G0, the judging module judges that the difference characters are different from the corresponding characters in the preset name, and judges that the invoice is invalid.
Further, after the text content of the first keyword region meets the requirement, the judging module obtains the text quantity C of the second keyword region, compares the text quantity C of the second keyword region with the text quantity C0 of the preset tax, and makes a judgment according to the comparison result, wherein,
when C is not equal to C0, the judging module judges that the invoice is invalid;
when C is C0, the judging module compares the text content of the second keyword area with the preset tax number separately according to the sequence, takes the text with the difference in the text graphics as the difference text, the judging module compares the number M of the difference text with the preset number M0 of the difference text, and judges according to the comparison result, wherein,
when M is larger than M0, the judging module judges that the invoice is invalid;
when M is more than 0 and less than or equal to M0, the judging module carries out next judgment according to the distinguishing position of the distinguishing characters;
and when the M is 0, the judging module judges that the text content of the second keyword area meets the requirement.
Further, the judging module acquires the distinguishing position of the distinguishing character, when the outline of the distinguishing character is the same as the shape of the corresponding character in the preset tax number and the shape curve of the distinguishing character is discontinuous, the judging module judges that the invoice character is unclear and judges that the distinguishing character is the same as the corresponding character in the preset tax number, and the character content of the second keyword region meets the requirement;
and when the outline of the distinguishing character is different from the shape of the corresponding character in the preset tax number, the judging module judges that the invoice is invalid.
Further, when the text content of the first keyword region and the text content of the second keyword region both meet the requirements, the storage module acquires the numerical amount P in the text content of the third keyword region, acquires the numerical amount R of the fourth keyword region and the numerical amount H of the fifth keyword region, and verifies the numerical amount of the third keyword region,
when P is R + H, the storage module judges that the digital sum is normal, and stores the digital sum P;
and when P is not equal to R + H, the storage module judges that the invoice characters are unclear, stores the character amount and the number amount P in the third key character area at the same time, and prompts the character amount and the number amount.
Furthermore, when the storage module stores, the storage module is further used for establishing a file, and inputting a user name and a number when establishing the file so as to store the invoice image and the invoice amount in a classified manner, so that the invoice image and the invoice amount are stored in the corresponding user file, and the storage module performs statistical storage on the invoice amount of the same user in a month unit so as to be convenient for reference.
In another aspect, the present invention also provides a data processing system based on image analysis, comprising,
the image acquisition module is used for acquiring the invoice image scanned by the external scanning equipment and is connected with the adjustment module;
the adjusting module is used for adjusting the direction of the scanned invoice image and is connected with the partitioning module;
the partition module is used for carrying out regional partition on the adjusted invoice image to form a plurality of keyword regions and is connected with the acquisition module;
the acquiring module is used for acquiring the text content of the keyword area and is connected with the judging module;
the judging module is used for judging whether the invoice meets the requirement according to the text content and is connected with the storage module;
and the storage module is used for storing the invoice image and the invoice content which meet the requirements.
Compared with the prior art, the method has the advantages that firstly, the invoice to be reimbursed is scanned through the scanning equipment to obtain the invoice image, so that the text content of the invoice image is obtained, and the definition of the invoice image can be effectively ensured to meet the requirement through scanning through the scanning equipment, so that the processing efficiency of invoice data is improved; when the invoice is scanned, the scanned invoice image can be longitudinal or reverse, so that the invoice image can be adjusted to be forward by adjusting the invoice image, and the invoice image can be conveniently acquired.
In particular, the text content of the first keyword area is the name of a purchaser, the judging module can quickly determine whether the invoice is valid or not by comparing the text content of the first keyword area with the preset name, so as to improve the data processing efficiency of invoice reimbursement, when the judging module judges according to the text number B, if the number is the same as the preset number, the next step of judgment is carried out, if the number is not the same as the preset number, the invoice issuing is invalid, the unqualified invoice can be quickly filtered by judging the number, so as to improve the data processing efficiency of invoice reimbursement, when the number meets the requirement, the judging module carries out further judgment according to the number of the difference text of the first keyword area in the invoice image, if no difference text exists, the name part meets the requirement, if the number of the difference text is in the preset range, if the number of the characters is larger than the preset value, the invoice is judged to be invalid, the preset value can be set according to the preset name length, and the judging module judges the invoice according to the number of the characters, so that the data processing efficiency of invoice reimbursement can be further improved.
In particular, when the judging module judges that the number of the difference words in the first key word area meets the requirement, the judging module further judges according to the direction and the length of the difference strokes in the difference words, if the direction of the difference stroke is different from the strokes of the corresponding words in the preset name, the difference stroke is proved to be different words, the invoice is judged to be invalid, if the direction is the same and the length is different, the invoice is possibly caused by unclear font, at the moment, the judging module further judges according to the difference length, if the difference length is within the preset range, the difference word is judged to be the same as the corresponding word in the preset name, if the difference length is greater than the preset value, the invoice is judged to be invalid, and by accurately judging the words in the first key word area according to the difference length, the influence on invoice data processing caused by unclear invoice font can be effectively avoided, thereby further improving the data processing efficiency of invoice reimbursement.
In particular, the second keyword area is taxpayer identification number content and consists of numbers and letters, when the name of the first keyword area meets the requirement, the judging module further judges whether the invoice is valid according to the text content of the second keyword area, so as to improve the data processing efficiency of invoice reimbursement, the judging module firstly obtains the text number C of the second keyword area and compares the text number C with the number of preset tax signs, if the text number C is the same as the preset tax number, the further judgment is carried out, if the text number C is different from the preset tax number, the invoice is invalid, if the text number C is the same as the preset tax number, the judging module obtains the text number of the second keyword area and the difference text number of the preset tax number, if the text number C is larger than the preset value, the invoice is invalid, the further judgment is carried out within the preset range, so as to avoid the influence of unclear invoice reimbursement data processing on the invoice, and if the text number is within the preset range, the judgment is carried out according to the text outline of the difference text, if the outlines are the same and only break exists, the invoice is judged to be unclear but the content meets the requirement, if the outlines are different, the invoice is judged to be invalid, and the data processing efficiency of invoice reimbursement is further improved by performing layer-by-layer judgment on the characters in the second keyword area.
Particularly, after the invoice is judged to be valid, the storage module can improve the invoice reimbursement efficiency by checking and storing the invoice amount, when the abnormal invoice characters obtained by the amount are not clear, the abnormal invoice characters can be confirmed by the user through prompting, so that the invoice amount counting accuracy is ensured, in addition, the data processing efficiency of invoice reimbursement is further improved by quickly calculating the amount of the invoice to be reimbursed by the user in each month through counting in the month unit.
Drawings
FIG. 1 is a schematic diagram of a data processing system based on image analysis according to an embodiment;
fig. 2 is a schematic flow chart of the data processing method based on image analysis according to the present embodiment.
Detailed Description
In order that the objects and advantages of the invention will be more clearly understood, the invention is further described below with reference to examples; it should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and do not limit the scope of the present invention.
Furthermore, it should be noted that, in the description of the present invention, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
Referring to fig. 1, a schematic structural diagram of a data processing system based on image analysis according to the present embodiment is shown, the system includes,
the image acquisition module is used for acquiring the invoice image scanned by the external scanning equipment and is connected with the adjustment module;
the adjusting module is used for adjusting the direction of the scanned invoice image and is connected with the partitioning module;
the partition module is used for carrying out regional partition on the adjusted invoice image to form a plurality of keyword regions and is connected with the acquisition module;
the acquiring module is used for acquiring the text content of the keyword area and is connected with the judging module;
the judging module is used for judging whether the invoice meets the requirement according to the text content and is connected with the storage module;
and the storage module is used for storing the invoice image and the invoice content which meet the requirements.
Referring to fig. 2, a schematic flow chart of a data processing method based on image analysis according to the present embodiment is shown, the method includes,
step S1, scanning the invoice to be reimbursed through scanning equipment;
step S2, acquiring an invoice image scanned by the scanning equipment through an image acquisition module;
step S3, adjusting the direction of the invoice image through an adjusting module;
step S4, carrying out area division on the invoice image with the direction adjusted through a partition module to form a plurality of keyword areas;
step S5, acquiring the text content of the keyword area through an acquisition module;
step S6, judging whether the invoice meets the requirement through a judging module;
and step S7, storing the text content meeting the requirement invoice through a storage module.
Specifically, at first scan the invoice to be reimbursed through scanning equipment in this embodiment, in order to obtain the invoice image, thereby obtain the text content of invoice image, scan through scanning equipment and can effectively guarantee that the definition of invoice image satisfies the demands, thereby improve the efficiency of processing of invoice data, it can be understood, when obtaining the invoice image, can also obtain through camera devices such as camera, nevertheless because the invoice is easily folded or the time word can be less clear in the save process, if adopt camera device to obtain the invoice image, will influence the definition of invoice image, thereby influence the degree of accuracy of the text content who obtains, therefore best implementation mode is to scan the invoice through scanning equipment, like the scanner, in order to improve the definition of obtaining the invoice image, thereby improve the efficiency of processing of invoice data.
Specifically, in step S3 of this embodiment, when the adjusting module adjusts the invoice image, the adjusting module first obtains an aspect ratio a of the invoice image, and adjusts the direction of the invoice image according to the aspect ratio a, wherein,
when A is less than 1, the adjusting module rotates the invoice image clockwise by 90 degrees to A is more than 1;
when A is larger than 1, the adjusting module obtains an elliptical graphic area in the middle of the invoice image, adjusts the invoice image according to the position of the elliptical graphic area, does not adjust when the elliptical graphic area is positioned above the invoice image, and rotates the invoice image 180 degrees clockwise when the elliptical graphic area is positioned below the invoice image.
Specifically, in the present embodiment, when an invoice is scanned, an invoice image after scanning may be longitudinal or reverse, and therefore, by adjusting the invoice image, the invoice image can be adjusted to be forward, thereby facilitating acquisition of text content of the invoice image.
Specifically, in step S4 of this embodiment, when the section module divides the adjusted invoice image into regions, the section module divides the invoice image into regions according to the invoice frame structure position relationship, and divides the purchaser name region into a first keyword region, divides the purchaser taxpayer identification number region into a second keyword region, divides the tariff total region into a third keyword region, divides the amount region into a fourth keyword region, and divides the tax amount region into a fifth keyword region.
Specifically, in the embodiment, when the partitioning module performs area partitioning, the invoice itself has a structural frame, so that the invoice image is subjected to area partitioning according to the structural frame of the invoice, and text content in the area can be quickly acquired.
Specifically, in step S6 of this embodiment, when the determining module determines the text content in the first keyword region, the determining module first obtains the text number B and compares the text number B with the text number B0 of the preset name, and performs text analysis according to the comparison result, wherein,
when B is not equal to B0, the judging module judges that the invoice is invalid and stops comparing the text content;
when B is B0, the judging module compares the text contents of the first keyword area one by one, wherein,
the judging module compares the single character shape of the first keyword region with the character shape with the same sequence in the preset name in sequence, takes the character with the different shape in the character content as the difference character, and obtains the difference character quantity D, the judging module compares the difference character quantity D with the preset difference character quantity D0, and judges the character content of the first keyword region according to the comparison result, wherein,
when D is 0, the judging module judges that the text content of the first keyword region meets the requirement;
when D is more than 0 and less than or equal to D0, the judging module analyzes and compares the difference characters in the character contents in detail;
when D > D0, the determination module determines that the invoice is invalid.
Specifically, in this embodiment, the text content of the first keyword region is a purchaser name, the determining module compares the text content of the first keyword region with a preset name to quickly determine whether the invoice is valid, so as to improve the data processing efficiency of invoice reimbursement, when determining, the determining module first determines according to the text number B, if the number is the same as the preset number, the next determination is performed, if the number is not the same as the preset number, the invoice issuance is determined to be invalid, and the number is determined to quickly filter the unqualified invoices, so as to improve the data processing efficiency of invoice reimbursement, when the number is in accordance with the requirement, the determining module further determines according to the difference text number of the first keyword region in the invoice image, if there is no difference text, the name part is in accordance with the requirement, and if the difference text number is within the preset range, if the number of the characters is larger than the preset value, the invoice is judged to be invalid, the preset value can be set according to the preset name length, if the name is short, the preset value can be set to be 1, and if the name is long, the preset value can be set to be 2, and the like.
Specifically, when the judging module performs detailed analysis and comparison on the different words in the word content, the judging module compares the stroke number F with difference in a single different word with a preset difference stroke number F0, and determines according to the comparison result, wherein,
when F is larger than F0, the judging module judges that the invoice is invalid;
when F is less than or equal to F0, the judging module carries out the next judgment according to the difference length G of the difference strokes, wherein,
when the stroke directions of the different strokes in the different characters are different, the judging module judges that the invoice is invalid;
when the difference strokes in the difference characters only have the difference in stroke length, the judging module compares the difference length G of the difference strokes with the preset difference length G0 and judges according to the comparison result, wherein,
when G is less than or equal to G0, the judging module judges that the invoice characters are unclear, judges that the difference characters are the same as the characters corresponding to the preset name, and judges that the character content in the first keyword region meets the requirement;
when G is larger than G0, the judging module judges that the difference characters are different from the corresponding characters in the preset name, and judges that the invoice is invalid.
Specifically, in this embodiment, when the determining module determines that the number of the difference words in the first keyword region meets the requirement, the determining module further determines according to the direction and length of the difference strokes in the difference words, if the direction of the difference stroke is different from the strokes of the corresponding words in the preset name, the difference stroke is proved to be different words, the invoice is determined to be invalid, if the direction is the same and the length is different, the invoice may be caused by unclear font, at this time, the determining module further determines according to the difference length, if the difference length is within the preset range, the difference word is determined to be the same as the corresponding word in the preset name, if the difference length is greater than the preset value, the difference word is determined to be different words, the invoice is determined to be invalid, and by accurately determining the words in the first keyword region according to the difference length, the influence on the invoice data processing due to unclear font of the invoice can be effectively avoided, thereby further improving the data processing efficiency of invoice reimbursement.
Specifically, after the text content in the first keyword region meets the requirement, the determining module obtains the text number C of the second keyword region, compares the text number C of the second keyword region with the text number C0 of the preset tax number, and makes a determination according to the comparison result, wherein,
when C is not equal to C0, the judging module judges that the invoice is invalid;
when C is C0, the judging module compares the text content of the second keyword area with the preset tax number separately according to the sequence, takes the text with the difference in the text graphics as the difference text, the judging module compares the number M of the difference text with the preset number M0 of the difference text, and judges according to the comparison result, wherein,
when M is larger than M0, the judging module judges that the invoice is invalid;
when M is more than 0 and less than or equal to M0, the judging module carries out next judgment according to the distinguishing position of the distinguishing characters;
and when the M is 0, the judging module judges that the text content of the second keyword area meets the requirement.
Specifically, the judging module acquires the distinguishing position of the distinguishing character, when the outline of the distinguishing character is the same as the shape of the corresponding character in the preset tax number and the shape curve of the distinguishing character is discontinuous, the judging module judges that the invoice character is unclear and judges that the distinguishing character is the same as the corresponding character in the preset tax number, and the character content in the second keyword region meets the requirement;
and when the outline of the distinguishing character is different from the shape of the corresponding character in the preset tax number, the judging module judges that the invoice is invalid.
Specifically, in this embodiment, the second keyword region is the taxpayer identification number content, and is composed of numbers and letters, when the name of the first keyword region meets the requirement, the determining module further determines whether the invoice is valid according to the text content of the second keyword region, so as to improve the data processing efficiency of invoice reimbursement, the determining module first obtains the text number C of the second keyword region, and compares the text number C with the number of preset tax signs, if the text number C is the same, further determines that the invoice is invalid, and if the text number C is the same, the determining module obtains the difference text number between the text content of the second keyword region and the preset tax signs, if the difference text number C is greater than the preset value, determines that the invoice is invalid, and if the difference text number is within the preset range, further determines to avoid the influence of unclear invoice reimbursement data processing on the invoice, and if the difference text number is within the preset range, determines according to the text outline of the difference text, if the outlines are the same and only break exists, the invoice is judged to be unclear but the content meets the requirement, if the outlines are different, the invoice is judged to be invalid, and the data processing efficiency of invoice reimbursement is further improved by performing layer-by-layer judgment on the characters in the second keyword area.
Specifically, when the text content of the first keyword region and the text content of the second keyword region both meet the requirements, the storage module acquires the numerical amount P in the text content of the third keyword region, acquires the numerical amount R of the fourth keyword region and the numerical amount H of the fifth keyword region, and verifies the numerical amount of the third keyword region,
when P is R + H, the storage module judges that the digital sum is normal, and stores the digital sum P;
and when P is not equal to R + H, the storage module judges that the invoice characters are unclear, stores the character amount and the number amount P in the third key character area at the same time, and prompts the character amount and the number amount.
Specifically, when the storage module is used for storing, the storage module is further used for establishing a file, and inputting a user name and a number when the file is established so as to store the invoice image and the invoice amount in a classified manner, so that the invoice image and the invoice amount are stored in the corresponding user file, and the storage module is used for counting and storing the invoice amount of the same user in a month unit so as to be convenient for looking up.
Specifically speaking, this embodiment is judging the invoice is effective after, storage module is through carrying out the check-up and the storage to invoice amount of money, can improve invoice reimbursement efficiency, when amount of money acquireed unusual invoice characters not clear, can confirm by the user through the suggestion to guarantee the accuracy of invoice amount of money statistics, and this embodiment is through counting up with the month as the unit, can calculate the amount of money that every month user waited to reimburse fast, thereby further improve the data processing efficiency that invoice reimbursed.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (10)

1. A data processing method based on image analysis is characterized by comprising the following steps,
step S1, scanning the invoice to be reimbursed through scanning equipment;
step S2, acquiring an invoice image scanned by the scanning equipment through an image acquisition module;
step S3, adjusting the direction of the invoice image through an adjusting module; when the direction of the invoice image is adjusted, the direction of the invoice image is adjusted according to the length-width ratio A of the invoice image by the adjusting module;
step S4, carrying out area division on the invoice image with the direction adjusted through a partition module to form a plurality of keyword areas;
step S5, acquiring the text content of the keyword area through an acquisition module;
step S6, judging whether the invoice meets the requirement through a judging module;
step S7, the text content meeting the requirement invoice is stored through a storage module;
in the step S6, when the determining module determines the text content in the first keyword region of the invoice image, the determining module performs preliminary determination on the first keyword region according to the text number B in the first keyword region, if the text number B in the first keyword region meets the requirement, performs the next determination according to the difference text number D, if the difference text number D meets the requirement, performs the determination according to the number F of strokes having a difference in the difference text, and if the number F of strokes having a difference meets the requirement, performs the final determination on the first keyword region of the invoice image according to the difference length G of the difference strokes;
after the text content of the first keyword area meets the requirement, the judging module judges the second keyword area of the invoice image according to the text number C of the second keyword area of the invoice image, when the text number C of the second keyword area meets the requirement, the next judgment is carried out according to the distinguishing text number M, and if the distinguishing text number M meets the requirement, the final judgment is carried out on the second keyword area of the invoice image according to the outline of the distinguishing text.
2. The data processing method based on image analysis according to claim 1, wherein in step S3, when the adjusting module adjusts the invoice image, the adjusting module first obtains an aspect ratio A of the invoice image and adjusts the orientation of the invoice image according to the aspect ratio A, wherein,
when A is less than 1, the adjusting module rotates the invoice image clockwise by 90 degrees to A is more than 1;
when A is larger than 1, the adjusting module obtains an elliptical graphic area in the middle of the invoice image, adjusts the invoice image according to the position of the elliptical graphic area, does not adjust when the elliptical graphic area is positioned above the invoice image, and rotates the invoice image 180 degrees clockwise when the elliptical graphic area is positioned below the invoice image.
3. The image-analysis-based data processing method of claim 1, wherein in step S4, the partition module partitions the adjusted invoice image according to the invoice frame structure position relationship, partitions the purchaser name area into a first keyword area, partitions the purchaser tax payer identification number area into a second keyword area, partitions the price and tax amount total area into a third keyword area, partitions the amount area into a fourth keyword area, and partitions the amount area into a fifth keyword area.
4. The image analysis-based data processing method of claim 1, wherein in the step S6, when the determining module determines the text content of the first keyword region, the determining module first obtains the text number B and compares the text number B with the text number B0 of the preset name, and performs text analysis according to the comparison result,
when B is not equal to B0, the judging module judges that the invoice is invalid and stops comparing the text content;
when B is B0, the judging module compares the text contents of the first keyword area one by one, wherein,
the judging module compares the single character shape of the first keyword region with the character shape with the same sequence in the preset name in sequence, takes the character with the different shape in the character content as the difference character, and obtains the difference character quantity D, the judging module compares the difference character quantity D with the preset difference character quantity D0, and judges the character content of the first keyword region according to the comparison result, wherein,
when D is 0, the judging module judges that the text content of the first keyword region meets the requirement;
when D is more than 0 and less than or equal to D0, the judging module analyzes and compares the difference characters in the character contents in detail;
when D > D0, the determination module determines that the invoice is invalid.
5. The image-analysis-based data processing method according to claim 4, wherein when the judging module performs detailed analysis and comparison on the difference text in the text content, the judging module compares the stroke number F with difference in the single difference text with a preset difference stroke number F0, and determines according to the comparison result, wherein,
when F is larger than F0, the judging module judges that the invoice is invalid;
when F is less than or equal to F0, the judging module carries out the next judgment according to the difference length G of the difference strokes, wherein,
when the stroke directions of the different strokes in the different characters are different, the judging module judges that the invoice is invalid;
when the difference strokes in the difference characters only have the difference in stroke length, the judging module compares the difference length G of the difference strokes with the preset difference length G0 and judges according to the comparison result, wherein,
when G is less than or equal to G0, the judging module judges that the invoice characters are unclear, judges that the difference characters are the same as the characters corresponding to the preset name, and judges that the character content in the first keyword region meets the requirement;
when G is larger than G0, the judging module judges that the difference characters are different from the corresponding characters in the preset name, and judges that the invoice is invalid.
6. The data processing method based on image analysis according to claim 5, wherein the determining module obtains the number of characters C in the second keyword region after the content of the characters in the first keyword region meets the requirement, compares the number of characters C in the second keyword region with the number of characters C0 of the preset tax, and makes a determination according to the comparison result, wherein,
when C is not equal to C0, the judging module judges that the invoice is invalid;
when C is C0, the judging module compares the text content of the second keyword area with the preset tax number separately according to the sequence, takes the text with the difference in the text graphics as the difference text, the judging module compares the number M of the difference text with the preset number M0 of the difference text, and judges according to the comparison result, wherein,
when M is larger than M0, the judging module judges that the invoice is invalid;
when M is more than 0 and less than or equal to M0, the judging module carries out next judgment according to the distinguishing position of the distinguishing characters;
and when the M is 0, the judging module judges that the text content of the second keyword area meets the requirement.
7. The data processing method based on image analysis according to claim 6, wherein the judging module obtains the distinguishing position of the distinguishing character, when the outline of the distinguishing character is the same as the shape of the corresponding character in the preset tax, but the shape curve of the distinguishing character is discontinuous, the judging module judges that the invoice character is unclear, judges that the distinguishing character is the same as the corresponding character in the preset tax, and the character content of the second keyword region meets the requirement;
and when the outline of the distinguishing character is different from the shape of the corresponding character in the preset tax number, the judging module judges that the invoice is invalid.
8. The data processing method based on image analysis according to claim 7, wherein when the text content of the first keyword region and the text content of the second keyword region both meet the requirements, the storage module obtains the numerical sum P in the text content of the third keyword region, obtains the numerical sum R of the fourth keyword region and the numerical sum H of the fifth keyword region, and verifies the numerical sum of the third keyword region, wherein,
when P is R + H, the storage module judges that the digital sum is normal, and stores the digital sum P;
and when P is not equal to R + H, the storage module judges that the invoice characters are unclear, stores the character amount and the number amount P in the third key character area at the same time, and prompts the character amount and the number amount.
9. The data processing method based on image analysis as claimed in claim 8, wherein the storage module is further configured to create a file when storing, input a user name and a number when creating the file, store the invoice image and the invoice amount in a classified manner into the corresponding user file, and statistically store the invoice amount of the same user in months for reference.
10. The system of data processing method based on image analysis according to any one of claims 1 to 9, comprising,
the image acquisition module is used for acquiring the invoice image scanned by the external scanning equipment and is connected with the adjustment module;
the adjusting module is used for adjusting the direction of the scanned invoice image and is connected with the partitioning module;
the partition module is used for carrying out regional partition on the adjusted invoice image to form a plurality of keyword regions and is connected with the acquisition module;
the acquiring module is used for acquiring the text content of the keyword area and is connected with the judging module;
the judging module is used for judging whether the invoice meets the requirement according to the text content and is connected with the storage module;
and the storage module is used for storing the invoice image and the invoice content which meet the requirements.
CN202111128539.1A 2021-09-26 2021-09-26 Data processing method and system based on image analysis Pending CN113869190A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111128539.1A CN113869190A (en) 2021-09-26 2021-09-26 Data processing method and system based on image analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111128539.1A CN113869190A (en) 2021-09-26 2021-09-26 Data processing method and system based on image analysis

Publications (1)

Publication Number Publication Date
CN113869190A true CN113869190A (en) 2021-12-31

Family

ID=78994508

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111128539.1A Pending CN113869190A (en) 2021-09-26 2021-09-26 Data processing method and system based on image analysis

Country Status (1)

Country Link
CN (1) CN113869190A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114119491A (en) * 2021-10-29 2022-03-01 吉林医药学院 Data processing system based on medical image analysis
CN117611363A (en) * 2023-10-25 2024-02-27 浙江爱信诺航天信息技术有限公司 Online verification method and medium for certificates

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114119491A (en) * 2021-10-29 2022-03-01 吉林医药学院 Data processing system based on medical image analysis
CN114119491B (en) * 2021-10-29 2022-09-13 吉林医药学院 Data processing system based on medical image analysis
CN117611363A (en) * 2023-10-25 2024-02-27 浙江爱信诺航天信息技术有限公司 Online verification method and medium for certificates

Similar Documents

Publication Publication Date Title
US11676185B2 (en) System and methods of an expense management system based upon business document analysis
CN113869190A (en) Data processing method and system based on image analysis
US7594600B2 (en) Expanded mass data sets for electronic check processing
CN111275880B (en) Bill identification method, device, equipment and storage medium
US8296230B2 (en) System and method for remote deposit system
CN102567764A (en) Bill certificate and system for improving electronic image recognition efficiency
CN111597958B (en) Highly automated bill classification method and system
KR101942468B1 (en) Structured data and unstructured data extraction system and method
US20140268250A1 (en) Systems and methods for receipt-based mobile image capture
CN110427853B (en) Intelligent bill information extraction processing method
CN115017272B (en) Intelligent verification method and device based on registration data
CN110781726A (en) Image data identification method and device based on OCR (optical character recognition), and computer equipment
CN105335453A (en) image file dividing method
US20110206268A1 (en) Optical waveform generation and use based on print characteristics for MICR data of paper documents
US8355174B1 (en) Automated mechanical approval of advertisement copy
CN112085885A (en) Ticket recognition device and ticket information management system
CN111860450A (en) Ticket recognition device and ticket information management system
US20110215151A1 (en) Method and Apparatus for Correcting Decoding Errors in Machine-Readable Symbols
US20060026082A1 (en) Method for processing account information using network
EP4033376A1 (en) Distributed computer system for document authentication
CN113553883B (en) Bill image identification method and device and electronic equipment
CN117132412A (en) Method, system, terminal equipment and medium for generating travel fee settlement bill
JP6452009B1 (en) Reading paper and reading system
CN116542799A (en) Data auditing method and system based on informatization
CN114764738A (en) Invoice data processing method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination