CN115457567B - Digital missing recovery method, system, equipment and storage medium for bill amount - Google Patents

Digital missing recovery method, system, equipment and storage medium for bill amount Download PDF

Info

Publication number
CN115457567B
CN115457567B CN202211408627.1A CN202211408627A CN115457567B CN 115457567 B CN115457567 B CN 115457567B CN 202211408627 A CN202211408627 A CN 202211408627A CN 115457567 B CN115457567 B CN 115457567B
Authority
CN
China
Prior art keywords
character
character string
digit
digital
amount
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211408627.1A
Other languages
Chinese (zh)
Other versions
CN115457567A (en
Inventor
吴春尧
王殿才
毛晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongke Wanguo Internet Technology Co ltd
Original Assignee
Beijing Zhongke Wanguo Internet Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongke Wanguo Internet Technology Co ltd filed Critical Beijing Zhongke Wanguo Internet Technology Co ltd
Priority to CN202211408627.1A priority Critical patent/CN115457567B/en
Publication of CN115457567A publication Critical patent/CN115457567A/en
Application granted granted Critical
Publication of CN115457567B publication Critical patent/CN115457567B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/42Document-oriented image-based pattern recognition based on the type of document

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Character Discrimination (AREA)

Abstract

The embodiment of the invention discloses a method, a system, equipment and a storage medium for recovering the digital missing of the bill sum, wherein the method, the system, the equipment and the storage medium are used for deeply processing information by fully utilizing the Chinese sum information and the digital sum information after the bill image recognition, firstly utilizing the digital sum information to obtain a candidate digital sum character string, then comparing the candidate digital sum character string with the digital sum information converted from the Chinese sum information, determining a comparison base point where a decimal point is positioned, selecting an optimal digital sum character string, and then utilizing the optimal digital sum character string to compare with the digital sum information converted from the Chinese sum information to perform the complementary recovering on the missing of the sum digit. According to the embodiment of the invention, aiming at the condition of digital missing of various bill amounts, valuable data in the Chinese amount information and the digital amount information are mutually identified to carry out bit-by-bit recovery, so that the bill amount identification accuracy is effectively improved.

Description

Digital missing recovery method, system, equipment and storage medium for bill amount
Technical Field
The application relates to the technical field of bill information processing, in particular to a method, a system, equipment and a storage medium for recovering digital missing of bill amount.
Background
When the amount of a bill is recognized by using an OCR (Optical Character Recognition) technique, the amount information is core data of the bill. The bill surface of a common bill contains two kinds of money information: in order to ensure the bill amount identification accuracy, the Chinese amount information and the digital amount information are often used for mutual evidence obtaining of a final bill amount identification result.
With the popularization of digitization, electronic means such as scanning and photography are often used for processing the bill face information. Since the image plane is easy to generate shadow or distortion, and the recognition capability of the OCR algorithm is limited, these inevitably leave data information missing in the recognition result, such as: decimal point missing, front and back data missing, and middle data missing.
At present, aiming at the problem of missing bill amount identification data, digital amount information is converted into Chinese amount information, and then preliminary mutual calculation is carried out by using the digital amount information and the Chinese amount information to obtain bill amount identification. In the current bill amount identification, the analysis of the digital loss of the bill amount is not deep and sufficient enough, the targeted recovery cannot be carried out, and the accuracy of the bill amount identification cannot be ensured.
Disclosure of Invention
Therefore, the application provides a method, a system, equipment and a storage medium for recovering the digital missing of the bill amount, so as to solve the technical problems that the prior art cannot perform targeted recovery aiming at the digital missing condition of various bill amounts, and the accuracy rate of bill amount identification is low.
In order to achieve the above purpose, the present application provides the following technical solutions:
according to a first aspect of the embodiments of the present invention, an embodiment of the present application provides a method for recovering missing digits of a bill amount, where the method includes:
collecting a bill picture;
identifying the bill picture to obtain a bill amount identification result, wherein the bill amount identification result comprises: a Chinese amount string and a first numeric amount string;
converting the Chinese amount character string into a second digital amount character string;
counting a first character digit of the first digital amount character string and a second character digit of the second digital amount character string;
comparing the values of the first character digit and the second character digit, and selecting a minimum digit value;
sequentially selecting a decimal point candidate position from the last position of the first numeric sum character string and adding decimal points, generating a preset number of third numeric sum character strings and writing the third numeric sum character strings into a decimal point candidate set;
subtracting the corresponding digit character number of the second digital sum character string from the character number of each third digital sum character string, and recording the times of subtracting 0 from the digit character number of the second digital sum character string;
dividing the recording result corresponding to each third numeric sum character string by the minimum value of the number of digits to obtain a reference ratio;
selecting a third numeric sum character string corresponding to the maximum reference ratio as a fourth numeric sum character string, and taking the place of the decimal point in the fourth numeric sum character string as a comparison base point;
comparing the character numbers of the fourth digital sum character string and the second digital sum character string in a position based on the comparison base point;
judging whether the comparison results are consistent;
if the comparison result is not consistent, judging whether one of the two different character numbers is 0;
and if one of the two different character numbers is 0, the bill amount digit is missing, the same character number is reserved according to the digit, the digit complement is carried out on the corresponding character missing position by using the character number which is not 0 in the two different character numbers, and the recovered fifth digital amount character string is generated.
Further, if the comparison result is completely consistent, the number of the bill sum is not lost, and the fourth numeric sum character string or the first numeric sum character string is used as the recovered fifth numeric sum character string.
Preferably, the preset number of the third numeric money character string is 3.
Preferably, the method further comprises:
sequentially selecting target characters from the first position of the Chinese amount character string backwards;
determining a first reference character based on the target character, wherein the first reference character is positioned at the adjacent next character position of the target character, and when the target character is the last character, the first reference character is empty;
judging the character type of the target character;
if the target character is a numeric character, judging the character type of the first reference character;
if the first reference character is a numeric character, determining and recording that a first digit missing problem exists in the character position of the Chinese amount character string between the target character and the first reference character, and circulating to reselect a next target character;
if the first reference character is a quantile character, circulating to reselect the next target character;
and if the first reference character is empty, determining and recording that a second digit missing problem exists in the character position of the Chinese amount character string behind the target character, and circulating to reselect the next target character.
Further, the method further comprises:
determining a second reference character based on the target character, wherein the second reference character is positioned at the adjacent last character position of the target character, and when the target character is a first character, the second reference character is empty;
if the target character is a quantile character, judging the character type of the second reference character;
if the second reference character is a numeric character, looping to reselect the next target character;
if the second reference character is a quantile character, determining and recording that a third digit of the Chinese amount character string is missing on a character position between the target character and the second reference character, and circulating to reselect a next target character;
and if the second reference character is empty, determining and recording that a fourth digit of the Chinese money character string is missing on the character position before the target character, and circulating to reselect the next target character.
Further, the method further comprises:
when the number of digits of the ticket data is missing, judging the position of the missing digit in the fourth digital money character string/the second digital money character string;
if the missing digit is at the head of the fourth digital sum character string/the second digital sum character string, determining and recording that a fifth digit missing problem exists in the head of the fourth digital sum character string/the second digital sum character string, and circulating to reselect a next target character;
if the missing digit is at the last digit of the fourth digital sum character string/the second digital sum character string, determining and recording that a sixth digit missing problem exists in the last digit of the fourth digital sum character string/the second digital sum character string, and circulating to reselect a next target character;
if the missing digit is in the middle of the fourth digital sum character string/the second digital sum character string, determining and recording that a seventh digit missing problem exists in the middle of the fourth digital sum character string/the second digital sum character string, and circulating to reselect a next target character;
and generating a problem character string corresponding to the fifth digital sum character string according to the corresponding digital missing problems on all the recorded digital missing positions.
Further, the method further comprises:
obtaining a credibility grade corresponding to the fifth numeric money character string by using the problem character string based on a preset corresponding relation between each digit missing problem and the credibility grade;
judging whether the reliability level corresponding to the obtained fifth numeric money character string is unique or not;
if the reliability level corresponding to the obtained fifth digital money character string is not unique, outputting the lowest reliability level;
and if the obtained credibility grade corresponding to the fifth digital money character string is unique, outputting the current credibility grade.
According to a second aspect of the embodiments of the present invention, there is provided a system for recovering a digital missing of a bill amount, the system including:
the acquisition module is used for acquiring the bill pictures;
the identification module is used for identifying the bill picture to obtain a bill amount identification result, and the bill amount identification result comprises: a Chinese amount string and a first numeric amount string;
the conversion module is used for converting the Chinese amount character string into a second digital amount character string;
the counting module is used for counting a first character digit of the first digital amount character string and a second character digit of the second digital amount character string; comparing the values of the first character digit and the second character digit, and selecting a minimum digit value;
a decimal candidate acquisition module used for sequentially adding decimal points to the first numeric sum character string from the back of the last character to the front, generating a preset number of third numeric sum character strings and writing the third numeric sum character strings into a decimal candidate set;
the bit difference comparison module is used for subtracting the corresponding digit character number of the second digital sum character string from the character number of each third digital sum character string and recording the times of subtracting the digit character number of the second digital sum character string to be 0; dividing the recording result corresponding to each third numeric sum character string by the minimum value of the number of digits to obtain a reference ratio;
the integral number judgment module is used for selecting a third digital sum character string corresponding to the maximum reference ratio as a fourth digital sum character string, and the position of a decimal point in the fourth digital sum character string is a comparison base point;
the detection module is used for comparing the character numbers of the fourth digital sum character string and the second digital sum character string according to the comparison base point; judging whether the number of the bill amount is missing or not based on the comparison result;
the recovery module is used for judging whether one of the two different character numbers is 0 or not if the comparison results are inconsistent; and if one of the two different character numbers is 0, the bill amount digit is lost, the same character number is reserved according to the position, the character number which is not 0 in the two different character numbers is utilized to carry out the position complementing on the corresponding character lost position, and the recovered fifth numeric amount character string is generated.
Further, the recovery module is further configured to perform the following steps: and if the comparison result is completely consistent, the number of the bill sum is not lost, and the fourth digital sum character string or the second digital sum character string is used as the recovered fifth digital sum character string.
Preferably, the preset number of the third numeric money character string is 3.
Preferably, the conversion module is further configured to perform the following steps:
sequentially selecting target characters from the first position of the Chinese amount character string backwards;
determining a first reference character based on the target character, wherein the first reference character is positioned at the adjacent next character position of the target character, and when the target character is the last character, the first reference character is empty;
judging the character type of the target character;
if the target character is a numeric character, judging the character type of the first reference character;
if the first reference character is a numeric character, determining and recording that a first digit missing problem exists in the character position of the Chinese amount character string between the target character and the first reference character, and circulating to reselect a next target character;
if the first reference character is a quantile character, circulating to reselect the next target character;
and if the first reference character is empty, determining and recording that a second digit missing problem exists in the character position of the Chinese amount character string behind the target character, and circulating to reselect the next target character.
Preferably, the conversion module is further configured to perform the following steps:
determining a second reference character based on the target character, wherein the second reference character is positioned at the adjacent last character position of the target character, and when the target character is a first character, the second reference character is empty;
if the target character is a quantile character, judging the character type of the second reference character;
if the second reference character is a numeric character, looping to reselect the next target character;
if the second reference character is a quantile character, determining and recording that a third digit of the Chinese amount character string is missing on a character position between the target character and the second reference character, and circulating to reselect a next target character;
and if the second reference character is empty, determining and recording that a fourth digit missing problem exists in the character position of the Chinese amount character string before the target character, and circulating to reselect the next target character.
Preferably, the recovery module is further configured to perform the following steps:
when the number of digits of the ticket data is missing, judging the position of the missing digit in the fourth digital money character string/the second digital money character string;
if the missing digit is at the head of the fourth digital sum character string/the second digital sum character string, determining and recording that a fifth digit missing problem exists in the head of the fourth digital sum character string/the second digital sum character string, and circulating to reselect a next target character;
if the missing digit is at the last digit of the fourth digital sum character string/the second digital sum character string, determining and recording that a sixth digit missing problem exists in the last digit of the fourth digital sum character string/the second digital sum character string, and circulating to reselect a next target character;
if the missing digit is in the middle of the fourth digital sum character string/the second digital sum character string, determining and recording that a seventh digit is missing on the middle of the fourth digital sum character string/the second digital sum character string, and circulating to reselect a next target character;
and generating a problem character string corresponding to the fifth digital sum character string according to the corresponding digital missing problems on all the recorded digital missing positions.
Further, the system further comprises a credibility judgment module for executing the following steps:
obtaining a credibility grade corresponding to the fifth numeric sum character string by using the problem character string based on a preset corresponding relation between each digit missing problem and the credibility grade;
judging whether the reliability level corresponding to the obtained fifth digital money character string is unique or not;
if the reliability level corresponding to the obtained fifth digital money character string is not unique, outputting the lowest reliability level;
and if the obtained credibility grade corresponding to the fifth numeric money character string is unique, outputting the current credibility grade.
According to a third aspect of embodiments of the present invention, there is provided a ticket amount digit missing restoration apparatus, the apparatus including: a processor and a memory;
the memory for storing one or more program instructions;
the processor is configured to execute one or more program instructions to perform the steps of a ticket amount digit deletion method as described in any one of the above.
According to a fourth aspect of embodiments of the present invention, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a ticket amount digit deletion method as described in any one of the above.
Compared with the prior art, the method has the following beneficial effects:
the embodiment of the invention discloses a method, a system, equipment and a storage medium for recovering the digital missing of the bill sum, wherein the method, the system, the equipment and the storage medium are used for deeply processing information by fully utilizing the Chinese sum information and the digital sum information after the bill image recognition, firstly utilizing the digital sum information to obtain a candidate digital sum character string, then comparing the candidate digital sum character string with the digital sum information converted from the Chinese sum information, determining a comparison base point where a decimal point is positioned, selecting an optimal digital sum character string, and then utilizing the optimal digital sum character string to compare with the digital sum information converted from the Chinese sum information to perform the complementary recovering on the missing of the sum digit. The embodiment of the invention mutually identifies valuable data in the Chinese amount information and the digital amount information to carry out bit-based recovery aiming at the digital missing condition of various bill amounts, thereby effectively improving the bill amount identification accuracy.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary and that other implementation drawings may be derived from the drawings provided to one of ordinary skill in the art without inventive effort.
Fig. 1 is a schematic diagram of a logical structure of a digital missing recovery system for bill amount according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a method for recovering a missing digit of a bill amount according to an embodiment of the present invention;
FIG. 3 is a schematic flowchart illustrating a method for recovering a missing digit of a bill amount according to an embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating a process of identifying the type of missing digits problem in a method for recovering missing digits of a bill amount according to another embodiment of the present invention;
fig. 5 is a schematic flow chart illustrating the process of determining the reliability in the method for recovering the digital missing of the bill amount according to the embodiment of the present invention.
Detailed Description
The present invention is described in terms of specific embodiments, and other advantages and benefits of the present invention will become apparent to those skilled in the art from the following disclosure. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
The embodiment of the invention aims to: the method aims at the condition of digital missing of various bill sums to carry out targeted recovery, and improves the accuracy of bill sum identification.
Referring to fig. 1, an embodiment of the present application provides a system for recovering a digital missing of a bill amount, which specifically includes: the device comprises an acquisition module 1, an identification module 2, a conversion module 3, a statistic module 4, a decimal place candidate acquisition module 5, a place difference comparison module 6, an integer number judgment module 7, a detection module 8 and a recovery module 9.
Specifically, the acquisition module 1 is used for acquiring a bill picture; the identification module 2 is used for identifying the bill picture to obtain a bill amount identification result, and the bill amount identification result comprises: a Chinese amount string and a first numeric amount string; the conversion module 3 is used for converting the Chinese amount character string into a second digital amount character string; the statistical module 4 is used for counting a first character digit of the first digital amount character string and a second character digit of the second digital amount character string; comparing the values of the first digit and the second character digit, and selecting the minimum value of the digits; the decimal candidate acquisition module 5 is configured to add decimal points to the first numeric money character string from behind the last character in sequence from the front, generate a preset number of third numeric money character strings, and write the third numeric money character strings into a decimal candidate set; the bit difference comparison module 6 is configured to subtract the corresponding digit character number of each third numeric money amount character string from the corresponding digit character number of the second numeric money amount character string, and record the number of times that the subtraction is 0; dividing the recording result of each third numeric money character string by the minimum value of the number of digits to obtain a reference ratio; the integral number judgment module 7 is configured to select a third numeric amount character string corresponding to the maximum reference ratio as a fourth numeric amount character string, where a decimal point in the fourth numeric amount character string is a comparison base point; the detection module 8 is configured to compare, based on the comparison base point, the character number of the fourth numeric money character string with the character number of the second numeric money character string in a digit-by-digit manner; judging whether the number of the bill amount is missing or not based on the comparison result; the recovery module 9 is configured to determine whether one of the two different alphanumeric characters is 0 if the comparison result is inconsistent; and if one of the two different character numbers is 0, the bill amount digit is missing, the same character number is reserved according to the digit, the digit complement is carried out on the corresponding character missing position by using the character number which is not 0 in the two different character numbers, and the recovered fifth digital amount character string is generated.
Compared with the prior art, the embodiment of the invention discloses a method, a system, equipment and a storage medium for recovering the digital missing of the bill sum, the embodiment of the invention fully utilizes the Chinese sum information and the digital sum information after the bill image recognition to carry out deep information processing, firstly utilizes the digital sum information to obtain a candidate digital sum character string, then compares the candidate digital sum character string with the digital sum information converted from the Chinese sum information, determines a comparison base point where a decimal point is located, selects an optimal digital sum character string, and then utilizes the optimal digital sum character string to compare with the digital sum information converted from the Chinese sum information to carry out the bit-complementing recovery on the missing of the sum number. According to the embodiment of the invention, aiming at the condition of digital missing of various bill amounts, valuable data in the Chinese amount information and the digital amount information are mutually identified to carry out bit-by-bit recovery, so that the bill amount identification accuracy is effectively improved.
Corresponding to the above-disclosed digital missing recovery system of the bill amount, the embodiment of the invention also discloses a digital missing recovery method of the bill amount. The method for recovering the digital missing of the bill amount disclosed in the embodiment of the present invention is described in detail below with reference to the system for recovering the digital missing of the bill amount described above.
As shown in fig. 2, the following describes in detail specific steps of a method for recovering a missing digit of a bill amount according to an embodiment of the present application.
The bill picture is collected through the collection module 1.
The bill pictures are bill pictures processed electronically by means of scanning, photographing and the like.
And identifying the bill picture through the identification module 2 to obtain a bill amount identification result.
In the embodiment of the present invention, the collected bill image is subjected to bill amount Recognition by an OCR (Optical Character Recognition) technology, and when the bill amount is recognized by the OCR technology, three types of problems exist in data recognized by the bill amount:
1. decimal point missing: because the decimal point is generally smaller and is easy to lose due to blurring or insufficient light, integer digits can not be recognized frequently due to the missing of the decimal point in recognized digital sum information (lowercase sum information), and the difficulty in bill sum recognition is increased;
2. data loss before and after: due to the detection capability of the detection model, fuzzy boundary and other reasons, the situation of data loss before and after the detection is often generated;
3. intermediate data deletion or substitution: in addition to the loss of the previous and subsequent data, there may be instances where intermediate data is missing or replaced.
The bill amount identification result comprises: a Chinese amount string and a first numeric amount string.
The Chinese amount string (Chinese capitalization amount information) includes two types of data: numeric characters and decisional characters. The digital characters are mainly Chinese digital information, and the Chinese digital information comprises: 0. one, di, tri, fours, wu, lu, seven, ba, jiu. The decisional characters are mainly Chinese decisional information which comprises the following components: ten yuan, bai, qian, wan, jiao, fen, etc. In addition, the Chinese amount character string (Chinese upper case amount information) may further include Chinese capture information, which may be a circle, a round, or the like.
And converting the Chinese amount character string into a second digital amount character string through a conversion module 3.
Counting the first character digit M of the first digital money character string through a counting module 4 1 And a second number of digits M of the second numeric money string 2 (ii) a Comparing the first character digit M 1 And a second number M of character bits 2 Value of (d), selecting the minimum value of digits min (M) 1 ,M 2 )。
And sequentially adding decimal points to the first numeric money character string from the rear of the last character to the front through a decimal candidate acquisition module 5, generating a preset number of third numeric money character strings and writing the third numeric money character strings into a decimal candidate set.
Further, in the embodiment of the present invention, the optimal value of the preset number is 3.
Subtracting the corresponding digit character number of each third digital sum character string from the corresponding digit character number of the second digital sum character string through a digit difference comparison module 6, and recording the number n of times that the subtraction of the two is 0; dividing the recorded result n corresponding to each third numeric money character string by the minimum value min (M) of the number of digits 1 ,M 2 ) To obtain a reference ratio n/min (M) 1 ,M 2 )。
And selecting the third digital amount character string corresponding to the maximum reference ratio as a fourth digital amount character string through the integer judgment module 7.
The position of the decimal point in the fourth numeric money character string is a comparison base point.
Comparing the character numbers of the fourth digital sum character string and the second digital sum character string in a position by a detection module 8 based on the comparison base point; and judging whether the number of the bill amount is missing or not based on the comparison result.
Further, if the comparison result is completely consistent, the bill amount digits do not have deficiency, and the fourth numeric amount character string or the second numeric amount character string is used as the recovered fifth numeric amount character string.
If the comparison result is inconsistent, judging whether one of the two different character numbers is 0 or not through the recovery module 9; and if one of the two different character numbers is 0, the bill amount digit is lost, the same character number is reserved according to the position, the character number which is not 0 in the two different character numbers is utilized to carry out the position complementing on the corresponding character lost position, and the recovered fifth numeric amount character string is generated.
For example, after OCR recognition is performed on the collected bill picture, the obtained chinese money character string is: "Sanwanwu, sanwu, two-element corner', the first numeric sum character string obtained is: "63537228". The character string of the Chinese amount is converted into a character string of a second digital amount, namely 35302.2. The first character digit M of the first numeric sum character string 1 8, the second digit number M of the second digit amount string 2 Is 6, therefore, the minimum value of the number of bits min (M) 1 ,M 2 )=6。
Adding decimal points to the first numeric-amount character string "63537228" in order from the end character onward, the following 3 third numeric-amount character strings are generated: m is 1 =“63537228.”、m 2 =“6353722.8”、m 3 = 635372.28 "and generates a candidate set of decimal places. The numerical values corresponding to the second numerical value character string 35302.2 are subtracted from the "63537228", "6353722.8" and "635372.28", respectively, and the number of times the subtraction is 0 is recorded, so that 3 third numerical value character strings are calculated: m is a unit of 1 =“63537228.”、m 2 =“6353722.8”、m 3 N is the number of times of subtracting 0 from each record of = "635372.28 1 =1、n 2 =1、n 3 And (5). Using the recorded result (n) corresponding to each third numeric money character string 1 =1、n 2 =1、n 3 = 5) divided by the minimum value of the number of bits (min (M) 1 ,M 2 ) = 6), the reference ratios obtained were 1/6, 5/6, respectively.
Selecting a third numeric money character string m corresponding to the maximum reference ratio of 5/6 3 And = 635372.28 is the fourth numeric amount string. The position of the decimal point in the fourth numeric money character string "635372.28" is the comparison base point. And the comparison base point is a money comparison base point, and the fourth digital money character string '635372.28' and the second digital money character string '35302.2' are subjected to data front and back identification according to the comparison base point. Because, when the Chinese amount information and the digital amount information have digit missing, the comparison position cannot be judged from the first digit or the last digit of the original data, otherwise, the identification result cannot be obtained.
Comparing the fourth numeric sum character string '635372.28' with the second numeric sum character string '35302.2' in a digital manner, obviously judging whether one of the two different character numbers is 0 if the comparison result is inconsistent; if the two different character numbers are not 0, acquiring the bill picture again; if one of the two different character numbers is 0, the bill amount digit is missing, the same character number is reserved according to the position, the character number which is not 0 in the two different character numbers is used for complementing the position on the corresponding character missing position, and a recovered fifth number character string '635372.28' is generated.
In summary, the embodiments of the present invention disclose a method, a system, a device, and a storage medium for recovering a missing digit of a bill amount, which perform an in-depth information processing by fully utilizing the chinese amount information and the numeric amount information after the bill image recognition, obtain a candidate numeric amount character string by utilizing the numeric amount information, compare the candidate numeric amount character string with the numeric amount information converted from the chinese amount information, determine a comparison base point where a decimal point is located, select a preferred numeric amount character string, compare the preferred numeric amount character string with the numeric amount information converted from the chinese amount information, and perform a bit-complementing recovery on the missing digit of the amount. The embodiment of the invention mutually identifies valuable data in the Chinese amount information and the digital amount information to carry out bit-based recovery aiming at the digital missing condition of various bill amounts, thereby effectively improving the bill amount identification accuracy.
In the embodiment of the present invention, each digital missing problem is also specifically classified, and a specific type of the digital missing problem can be identified through the classification of each digital missing problem, which is described in detail below.
Specifically, the types of digital bit-miss problems involved in embodiments of the present invention include: a first digital miss problem, a second digital miss problem, a third digital miss problem, a fourth digital miss problem, a fifth digital miss problem, a sixth digital miss problem, a seventh digital miss problem. For example, if the normal Chinese amount character string is "two corners eight between three and ten thousand, five thousand, hundred and seven two and the normal numeric amount character string is" 635372.28", if the recognized Chinese amount character string is" two corners eight between three and ten thousand, three and seven and two and the recognized numeric amount character string is "two corners eight between three and seven", the corresponding character position between three and seven has the first numeric missing problem, and the first numeric missing problem is the missing of Chinese character information; if the recognized Chinese amount character string is 'Lujiasanwuqianweiwuqibi two-two', the second digit deletion problem exists on the character position after 'two', and the second digit deletion problem is the last digit deletion of the Chinese amount information; if the recognized Chinese amount character string is ' Lujiasanwuqianbiebigemini ' two corner eight ', the third digital missing problem exists on the character position between ' Qian ' and ' Bai ', and the third digital missing problem is the Chinese digital information missing; if the recognized Chinese amount character string is 'Jisanwuqianweiwu Baibei two-element two-corner eight', the fourth digital missing problem exists on the character position before 'Jishig', and the fourth digital missing problem is the first digital missing of Chinese information.
According to the above-disclosed digital missing recovery system of the bill amount, the conversion module 3 in the digital missing recovery system of the bill amount is further configured to perform the following steps.
Further, referring to fig. 3, sequentially selecting target characters from the first of the character string of the chinese amount backwards; determining a first reference character based on the target character, wherein the first reference character is positioned at the adjacent next character position of the target character, and when the target character is the last character, the first reference character is empty; judging the character type of the target character; if the target character is a numeric character, judging the character type of the first reference character; if the first reference character is a numeric character, determining and recording that a first digit missing problem exists in the character position of the Chinese amount character string between the target character and the first reference character, and circulating to reselect a next target character; if the first reference character is a quantile character, circulating to reselect the next target character; and if the first reference character is empty, determining and recording that a second digit missing problem exists in the character position of the Chinese sum character string after the target character, and circulating to reselect the next target character.
Further, referring to fig. 3, a second reference character is determined based on the target character, the second reference character is respectively located at adjacent previous character positions of the target character, and when the target character is a leading character, the second reference character is empty; if the target character is a quantile character, judging the character type of the second reference character; if the second reference character is a numeric character, looping to reselect the next target character; if the second reference character is a quantile character, determining and recording that a third digit of the Chinese amount character string is missing on a character position between the target character and the second reference character, and circulating to reselect a next target character; and if the second reference character is empty, determining and recording that a fourth digit missing problem exists in the character position of the Chinese amount character string before the target character, and circulating to reselect the next target character.
According to the above-disclosed digital missing recovery system for the amount of the bill, the recovery module 9 in the digital missing recovery system for the amount of the bill is further configured to perform the following steps.
Further, referring to fig. 4, when there is a missing digit of the ticket data, the position of the missing digit in the fourth/second numeric money string is determined; if the missing digit is at the head of the fourth digital sum character string/the second digital sum character string, determining and recording that a fifth digit missing problem exists in the head of the fourth digital sum character string/the second digital sum character string, and circulating to reselect a next target character; if the missing digit is at the last digit of the fourth digital sum character string/the second digital sum character string, determining and recording that a sixth digit missing problem exists in the last digit of the fourth digital sum character string/the second digital sum character string, and circulating to reselect a next target character; if the missing digit is in the middle of the fourth digital sum character string/the second digital sum character string, determining and recording that a seventh digit is missing on the middle of the fourth digital sum character string/the second digital sum character string, and circulating to reselect a next target character; and generating a problem character string corresponding to the fifth digital sum character string according to the corresponding digital missing problems on all the recorded digital missing positions.
For example, when the current fourth numeric-amount character string is "635372.2" and the second numeric-amount character string is "35302.28", the last character number "8" in the second numeric-amount character string "35302.28" is compared with the character number on the corresponding character digit in the fourth numeric-amount character string "635372.2" by bit based on the base point of comparison, the first numeric character "6" and the middle numeric character "7" in the fourth numeric-amount character string "635372.2" are both inconsistent with the numeric character on the corresponding digit in the second numeric-amount character string "35302.28", and one of the three pairs of inconsistent characters is 0, and it is determined that there is a bill-amount digit missing. Wherein the missing digit corresponding to the character number "6" is at the first digit of the fourth digital amount character string/the second digital amount character string, and it is determined that the second digital amount character string has a fifth digit missing problem at the first digit; wherein the missing digit corresponding to the alphanumeric character "8" is at the last digit of the fourth numeric amount string/second numeric amount string, and it is determined that the sixth digit missing problem exists at the last digit of the fourth numeric amount string; and the missing digit corresponding to the character number '7' is in the middle digit of the fourth numeric sum character string/the second numeric sum character string, and at the moment, the second numeric sum character string is determined to have a seventh digit missing problem on the middle digit.
The embodiment of the invention can also complete conversion on the Chinese money amount information with missing digits, and can also give out a first digit missing problem, a second digit missing problem, a third digit missing problem, a fourth digit missing problem, a fifth digit missing problem, a sixth digit missing problem and a seventh digit missing problem.
The different types of digit loss problems have different effects on the recovery of the bill amount. For example, the missing of Chinese quantile information (units, tens, hundreds, etc.) has relatively less influence than the missing of Chinese numerical information (first, second, third, etc.), because the missing of Chinese numerical information can only be recovered depending on numerical amount information, and the missing of Chinese quantile information can be subjected to reasoning calculation according to the front and rear quantile information and then verified according to the corresponding numerical amount information.
In the embodiment of the present invention, a corresponding reliability level is set for each digital missing problem identified above, so that the reliability level can be output for the recovered fifth numeric money character string, which is described in detail below.
Specifically, the preset correspondence between each missing digit problem and the confidence level is as follows: the reliability grade corresponding to the first digital missing problem is one grade; the reliability level corresponding to the second digital missing problem is two levels; the reliability grade corresponding to the third digital missing problem is three grades; the reliability level corresponding to the fourth bit missing problem is four levels; the reliability grade corresponding to the fifth digital missing problem is five grade; the reliability grade corresponding to the sixth digital missing problem is six grade; the seventh bit miss problem corresponds to a confidence level of seven. In addition, when there is no missing digit, the corresponding confidence level is eight levels.
In the reliability levels, the reliability is sequentially increased according to five levels, four levels, seven levels, three levels, one level, six levels, two levels and eight levels, wherein the eight levels of reliability are the highest reliability, and the five levels of reliability are the lowest reliability.
Referring to fig. 1, the system for recovering the digital missing of the bill amount provided by the embodiment of the present application further includes: and a reliability judging module 10. And judging the reliability grade of the fifth digital money character string through a reliability judging module 10.
Specifically, the reliability determining module 10 is configured to obtain, based on a preset corresponding relationship between each digital missing problem and a reliability level, a reliability level corresponding to the fifth numeric money amount character string by using the problem character string, determine whether the obtained reliability level corresponding to the fifth numeric money amount character string is unique, output a lowest reliability level if the obtained reliability level corresponding to the fifth numeric money amount character string is not unique, and output a current reliability level if the obtained reliability level corresponding to the fifth numeric money amount character string is unique.
Referring to fig. 5, in correspondence to the above-disclosed digital missing recovery system of bill amount, the above-described digital missing recovery method of bill amount disclosed in the embodiment of the present invention further includes the following steps: obtaining a credibility grade corresponding to the fifth numeric money character string by using the problem character string based on a preset corresponding relation between each digit missing problem and the credibility grade; judging whether the reliability level corresponding to the obtained fifth digital money character string is unique or not; if the reliability level corresponding to the obtained fifth digital money character string is not unique, outputting the lowest reliability level; and if the obtained credibility grade corresponding to the fifth digital money character string is unique, outputting the current credibility grade.
The embodiment of the invention compares the Chinese amount information with the digital amount information by simulating human beings, starts from the root of the generated problems, identifies valuable characters to restore according to the position by using the Chinese amount information and the digital amount information as far as possible according to the judgment of the comparison base point as a core, and gives the credibility of the restored amount information.
In addition, the embodiment of the invention also provides a device for recovering the digital loss of the bill amount, which comprises: a processor and a memory; the memory for storing one or more program instructions; the processor is configured to execute one or more program instructions to perform the steps of a method for recovering a missing digit of a bill amount as described in any one of the above.
In addition, the embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method for recovering missing digits of bill amount described in any one of the above are implemented.
In an embodiment of the invention, the processor may be an integrated circuit chip having signal processing capability. The Processor may be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, etc. as is well known in the art. The processor reads the information in the storage medium and completes the steps of the method in combination with the hardware.
The storage medium may be a memory, for example, which may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory.
The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory.
The volatile Memory may be a Random Access Memory (RAM) which serves as an external cache. By way of example and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), SLDRAM (SLDRAM), and Direct Rambus RAM (DRRAM).
The storage media described in connection with the embodiments of the invention are intended to comprise, without being limited to, these and any other suitable types of memory.
Those skilled in the art will appreciate that the functionality described in the present invention may be implemented in a combination of hardware and software in one or more of the examples described above. When software is applied, the corresponding functionality may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
The present application has been described in considerable detail with reference to the foregoing general description and specific examples. It should be understood that several general adaptations or further innovations of these specific embodiments may also be made based on the technical idea of the present application; however, such conventional modifications and further innovations can also fall into the scope of the claims of the present application as long as they do not depart from the technical idea of the present application.

Claims (10)

1. A method for recovering missing digits of a bill amount, the method comprising:
collecting a bill picture;
identifying the bill picture to obtain a bill amount identification result, wherein the bill amount identification result comprises: a Chinese amount string and a first numeric amount string;
converting the Chinese amount character string into a second digital amount character string;
counting a first character digit of the first digital amount character string and a second character digit of the second digital amount character string;
comparing the values of the first character digit and the second character digit, and selecting the minimum value of the digits;
sequentially selecting a decimal point candidate position from the last position of the first numeric sum character string and adding decimal points, generating a preset number of third numeric sum character strings and writing the third numeric sum character strings into a decimal point candidate set;
subtracting the corresponding digit character number of the second digital sum character string from the character number of each third digital sum character string, and recording the times of subtracting 0 from the digit character number of the second digital sum character string;
dividing the recording result corresponding to each third numeric money character string by the minimum value of the number of digits to obtain a reference ratio;
selecting a third numeric sum character string corresponding to the maximum reference ratio as a fourth numeric sum character string, and taking the place of the decimal point in the fourth numeric sum character string as a comparison base point;
comparing the character numbers of the fourth digital sum character string and the second digital sum character string in a position based on the comparison base point;
judging whether the comparison results are consistent;
if the comparison result is inconsistent, judging whether one of the two different character numbers is 0;
and if one of the two different character numbers is 0, the bill amount digit is missing, the same character number is reserved according to the digit, the digit complement is carried out on the corresponding character missing position by using the character number which is not 0 in the two different character numbers, and the recovered fifth digital amount character string is generated.
2. The method of claim 1, wherein the method further comprises:
and if the comparison result is completely consistent, the bill amount digit is not lost, and the fourth numeric amount character string or the second numeric amount character string is used as the recovered fifth numeric amount character string.
3. The method of claim 1, wherein the predetermined number of third numeric value strings is 3.
4. The method of claim 2, wherein the method further comprises:
sequentially selecting target characters from the first position of the Chinese amount character string backwards;
determining a first reference character based on the target character, wherein the first reference character is positioned at the adjacent next character position of the target character, and when the target character is the last character, the first reference character is empty;
judging the character type of the target character;
if the target character is a numeric character, judging the character type of the first reference character;
if the first reference character is a numeric character, determining and recording that a first digit missing problem exists in the character position of the Chinese amount character string between the target character and the first reference character, and circulating to reselect a next target character;
if the first reference character is a quantile character, circulating to reselect the next target character;
and if the first reference character is empty, determining and recording that a second digit missing problem exists in the character position of the Chinese amount character string behind the target character, and circulating to reselect the next target character.
5. The method of claim 4, wherein the method further comprises:
determining a second reference character based on the target character, wherein the second reference character is positioned at the adjacent last character position of the target character, and when the target character is a first character, the second reference character is empty;
if the target character is a quantile character, judging the character type of the second reference character;
if the second reference character is a numeric character, looping to reselect the next target character;
if the second reference character is a quantile character, determining and recording that a third digit of the Chinese money character string is missing on the character position between the target character and the second reference character, and circulating to reselect a next target character;
and if the second reference character is empty, determining and recording that a fourth digit of the Chinese money character string is missing on the character position before the target character, and circulating to reselect the next target character.
6. The method of recovering from a missing digit of a ticket amount of claim 3, further comprising:
when the number of digits of the ticket data is missing, judging the position of the missing digit in the fourth digital money character string/the second digital money character string;
if the missing digit is at the head of the fourth digital sum character string/the second digital sum character string, determining and recording that a fifth digit missing problem exists in the head of the fourth digital sum character string/the second digital sum character string, and circulating to reselect a next target character;
if the missing digit is at the last digit of the fourth digital sum character string/the second digital sum character string, determining and recording that a sixth digit missing problem exists in the last digit of the fourth digital sum character string/the second digital sum character string, and circulating to reselect a next target character;
if the missing digit is in the middle of the fourth digital sum character string/the second digital sum character string, determining and recording that a seventh digit missing problem exists in the middle of the fourth digital sum character string/the second digital sum character string, and circulating to reselect a next target character;
and generating a problem character string corresponding to the fifth digital sum character string according to the corresponding digital missing problems on all the recorded digital missing positions.
7. The method of recovering from a missing digit of a ticket amount of claim 6, further comprising:
obtaining a credibility grade corresponding to the fifth numeric sum character string by using the problem character string based on a preset corresponding relation between each digit missing problem and the credibility grade;
judging whether the reliability level corresponding to the obtained fifth digital money character string is unique or not;
if the obtained credibility grade corresponding to the fifth numeric money character string is not unique, outputting the lowest credibility grade;
and if the obtained credibility grade corresponding to the fifth digital money character string is unique, outputting the current credibility grade.
8. A system for recovering a digital loss of a bill amount, the system comprising:
the acquisition module is used for acquiring the bill pictures;
the identification module is used for identifying the bill picture to obtain a bill amount identification result, and the bill amount identification result comprises: a Chinese amount string and a first numeric amount string;
the conversion module is used for converting the Chinese amount character string into a second digital amount character string;
the statistical module is used for counting a first character digit of the first digital amount character string and a second character digit of the second digital amount character string; comparing the values of the first character digit and the second character digit, and selecting a minimum digit value;
a decimal candidate acquisition module, configured to add decimal points to the first numeric money character string from behind the last character in sequence from the front, generate a preset number of third numeric money character strings, and write the third numeric money character strings into a decimal candidate set;
the bit difference comparison module is used for subtracting the corresponding digit character number of the second digital sum character string from the character number of each third digital sum character string and recording the times of subtracting the digit character number of the second digital sum character string to be 0; dividing the recording result corresponding to each third numeric sum character string by the minimum value of the number of digits to obtain a reference ratio;
the integral number judgment module is used for selecting a third numeric sum character string corresponding to the maximum reference ratio as a fourth numeric sum character string, and the position of a decimal point in the fourth numeric sum character string is a comparison base point;
the detection module is used for comparing the character numbers of the fourth digital sum character string and the second digital sum character string according to the comparison base point; judging whether the number of the bill amount is missing or not based on the comparison result;
the recovery module is used for judging whether one of the two different character numbers is 0 or not if the comparison results are inconsistent; and if one of the two different character numbers is 0, the bill amount digit is lost, the same character number is reserved according to the position, the character number which is not 0 in the two different character numbers is utilized to carry out the position complementing on the corresponding character lost position, and the recovered fifth numeric amount character string is generated.
9. A ticket amount digit loss recovery apparatus, the apparatus comprising: a processor and a memory;
the memory for storing one or more program instructions;
the processor being operable to execute one or more program instructions to perform the steps of a method of recovering a missing digit of a ticket amount according to any one of claims 1 to 7.
10. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, carries out the steps of a method of recovering a missing digit of a ticket amount according to any one of claims 1 to 7.
CN202211408627.1A 2022-11-11 2022-11-11 Digital missing recovery method, system, equipment and storage medium for bill amount Active CN115457567B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211408627.1A CN115457567B (en) 2022-11-11 2022-11-11 Digital missing recovery method, system, equipment and storage medium for bill amount

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211408627.1A CN115457567B (en) 2022-11-11 2022-11-11 Digital missing recovery method, system, equipment and storage medium for bill amount

Publications (2)

Publication Number Publication Date
CN115457567A CN115457567A (en) 2022-12-09
CN115457567B true CN115457567B (en) 2023-01-17

Family

ID=84295526

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211408627.1A Active CN115457567B (en) 2022-11-11 2022-11-11 Digital missing recovery method, system, equipment and storage medium for bill amount

Country Status (1)

Country Link
CN (1) CN115457567B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5097517A (en) * 1987-03-17 1992-03-17 Holt Arthur W Method and apparatus for processing bank checks, drafts and like financial documents
CN113408536A (en) * 2021-06-23 2021-09-17 平安健康保险股份有限公司 Bill amount identification method and device, computer equipment and storage medium
CN115223188A (en) * 2022-07-29 2022-10-21 盐城金堤科技有限公司 Bill information processing method, device, electronic equipment and computer storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5097517A (en) * 1987-03-17 1992-03-17 Holt Arthur W Method and apparatus for processing bank checks, drafts and like financial documents
CN113408536A (en) * 2021-06-23 2021-09-17 平安健康保险股份有限公司 Bill amount identification method and device, computer equipment and storage medium
CN115223188A (en) * 2022-07-29 2022-10-21 盐城金堤科技有限公司 Bill information processing method, device, electronic equipment and computer storage medium

Also Published As

Publication number Publication date
CN115457567A (en) 2022-12-09

Similar Documents

Publication Publication Date Title
US5351304A (en) Fingerprint data registration method
TW201820203A (en) Character recognition systems and character recognition methods thereof
US4027284A (en) Character recognizing system for machine-printed characters
CN103824091B (en) A kind of licence plate recognition method for intelligent transportation system
Huang et al. Camera model identification with unknown models
US7277584B2 (en) Form recognition system, form recognition method, program and storage medium
US20060045389A1 (en) Automatic meter reading
CN108830275B (en) Method and device for identifying dot matrix characters and dot matrix numbers
CN100474331C (en) Character string identification device
AU2017200935A1 (en) Method for securing and verifying a document
CN113989794B (en) License plate detection and recognition method
CN107578011A (en) The decision method and device of key frame of video
CN115457567B (en) Digital missing recovery method, system, equipment and storage medium for bill amount
CN106952211A (en) The compact image hash method of feature based spot projection
CN113128461A (en) Pedestrian re-recognition performance improving method based on human body key point mining full-scale features
CN112308141A (en) Scanning bill classification method and system and readable storage medium
CN112633183A (en) Automatic detection method and device for image occlusion area and storage medium
JP2751865B2 (en) String recognition device
CN108074323B (en) Paper money facing identification method and device thereof
TWI549099B (en) Method for recognizing serial number of bill
CN115496778B (en) Image binarization method and device for improving edge smoothness and storage medium
CN115455966B (en) Safe word stock construction method and safe code extraction method thereof
CN113343983B (en) License plate number recognition method and electronic equipment
CN115565070A (en) Snow mountain detection method and device based on infrared remote sensing image
JPS6175477A (en) Filing system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant