CN117540721A - Bank receipt information extraction method and system - Google Patents
Bank receipt information extraction method and system Download PDFInfo
- Publication number
- CN117540721A CN117540721A CN202410028502.9A CN202410028502A CN117540721A CN 117540721 A CN117540721 A CN 117540721A CN 202410028502 A CN202410028502 A CN 202410028502A CN 117540721 A CN117540721 A CN 117540721A
- Authority
- CN
- China
- Prior art keywords
- information
- keyword
- bank
- coordinate
- keyword information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 99
- 238000000034 method Methods 0.000 claims abstract description 42
- 238000006243 chemical reaction Methods 0.000 claims abstract description 12
- 230000008569 process Effects 0.000 claims description 11
- 238000010276 construction Methods 0.000 claims description 7
- 230000010354 integration Effects 0.000 claims description 6
- 238000010586 diagram Methods 0.000 description 13
- 238000004590 computer program Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000004075 alteration Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/186—Templates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/313—Selection or weighting of terms for indexing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/02—Banking, e.g. interest calculation or account maintenance
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Engineering & Computer Science (AREA)
- Finance (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Accounting & Taxation (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a method and a system for extracting bank receipt information, comprising the following steps: carrying out data identification on the acquired bank receipt data, and carrying out structural conversion on unstructured data in the bank receipt data to obtain receipt string data corresponding to the bank receipt data; obtaining a bank name according to the return string data; according to the bank name, calling an information extraction template corresponding to the bank name; extracting information from the bank receipt data according to the information extraction template corresponding to the bank name; the invention can effectively improve the efficiency and accuracy of receipt information extraction based on unstructured receipt information by extracting the template from the pre-constructed receipt information and extracting the key index information according to the template.
Description
Technical Field
The invention relates to the technical field of data identification, in particular to a method and a system for extracting bank receipt information.
Background
At present, a bank receipt is an original basis of an enterprise billing certificate, and the enterprise can have a corresponding receipt as a proof when receiving payment. The receipt content mainly comprises information such as payment date, receipt name, receipt account number, receipt bank name, amount, remarks and the like. In many big data systems, a large number of unstructured files of bank receipts are collected, and key indexes in the unstructured files are required to be extracted into structured data so as to carry out data analysis by using big data technology;
In the existing step of receipt information extraction, templates are often required to be set manually, more manpower resources are consumed, and the non-structural receipt content faces the problems that index content boundaries are difficult to identify, accuracy is low and the like due to factors such as shielding, content folding and the like. Traditional bank receipt index extraction technology often depends on the characteristics of the identification object, and a personalized template needs to be designed according to the bank receipt format. However, the electronic receipt formats of the large banks are different, so that a large number of bank receipt templates need to be customized, the identification process is excessively dependent on manual intervention, and the identification efficiency is low. In addition, because some indexes in the receipt are displayed unclear or the content of the indexes is different in length, the boundary of the content cannot be accurately defined, so that the extracted indexes are not needed or redundant, and the accuracy is not high.
Disclosure of Invention
In order to solve the above problems, the present invention provides a method for extracting information of a bank receipt, comprising:
carrying out data identification on the acquired bank receipt data, and carrying out structural conversion on unstructured data in the bank receipt data to obtain receipt string data corresponding to the bank receipt data;
Obtaining a bank name according to the return string data;
according to the bank name, calling an information extraction template corresponding to the bank name;
and extracting information from the bank receipt data according to the information extraction template corresponding to the bank name.
Preferably, the information extraction template comprises the following construction process:
classifying according to different bank names in the historical bank receipt sample information to obtain receipt sample information under different bank names;
traversing and matching keyword information in the keyword library in turn in receipt sample information under different bank names based on a pre-constructed keyword library to obtain matched keyword information under different bank names and coordinate information of the keyword information;
carrying out data integration based on the keyword information matched under different bank names and the coordinate information of the keyword information to obtain a vertical coordinate list and a horizontal coordinate two-dimensional array of the keyword information;
and constructing an information extraction template according to the ordinate list and the abscissa two-dimensional array of the keyword information.
Preferably, the constructing an information extraction template according to the ordinate list and the abscissa two-dimensional array of the keyword information includes:
Sequentially taking each keyword information and coordinate information thereof, front side keyword information and coordinate information thereof corresponding to each keyword information, and rear side keyword and coordinate information thereof according to the ordinate list of the keyword information;
calculating the distance proportion between each keyword information in the ordinate list and the front keyword information and the rear keyword information respectively based on the keyword information and the coordinate information thereof, the front keyword information and the coordinate information thereof corresponding to each keyword information and the rear keyword and the coordinate information thereof;
according to the abscissa two-dimensional data of the keyword information, sequentially taking each keyword information and coordinate information thereof, and back side keyword information and coordinate information thereof;
calculating the distance ratio of each keyword information and the rear keyword information in the abscissa two-dimensional array based on the each keyword information and the coordinate information thereof, the rear keyword information and the coordinate information thereof;
and constructing an information extraction template according to the distance ratio of each keyword information in the ordinate list to the front keyword information and the rear keyword information and the distance ratio of each keyword information in the abscissa two-dimensional array to the rear keyword information.
Preferably, calculating the distance ratio between each keyword information in the ordinate list and the front keyword information based on the each keyword information and the coordinate information thereof, and the front keyword information corresponding to each keyword information and the coordinate information thereof, includes:
and taking the row-height quotient of the absolute value of the difference between the vertical coordinates of the keyword information and the front keyword information and the row-height quotient of the keyword information as the distance ratio of the keyword information and the front keyword in the vertical coordinate list based on the keyword information and the coordinate information thereof, the front keyword information and the coordinate information thereof corresponding to the keyword information, and the rear keyword and the coordinate information thereof.
Preferably, the information extraction for the bank receipt data according to the information extraction template corresponding to the bank name includes:
traversing corresponding keyword information and coordinate information thereof in the information extraction template, and keyword information and coordinate information thereof on the vertical upper side, the vertical lower side and the rear side of the keyword information based on the keyword library;
obtaining a first key point according to the key word information and the coordinate information thereof and the key word information and the coordinate information thereof on the vertical upper side of the key word information;
Obtaining a second key point according to the vertical lower side key information and the coordinate information thereof of the key information and the rear side key information and the coordinate information thereof of the key information;
taking the connecting line of the first key point and the second key point as a diagonal line to generate a rectangle;
and extracting information by taking the data in the rectangle as key extraction information.
Preferably, the obtaining the first key point according to the key information and the coordinate information thereof and the key information and the coordinate information thereof on the vertical upper side of the key information includes:
and taking the abscissa of the keyword information with the smallest abscissa in the keyword information as the abscissa of the first keyword, and taking the ordinate of the keyword information on the upper vertical side as the ordinate of the first keyword to obtain the first keyword.
Preferably, the obtaining the second key point according to the vertical lower side key information and the coordinate information thereof of the key information, the rear side key information of the key information and the coordinate information thereof includes:
and taking the ordinate of the vertical lower keyword information with the largest ordinate in the vertical lower keyword information of the keyword information as the ordinate of the second keyword, and taking the abscissa of the rear keyword information with the smallest abscissa in the rear keyword information of the keyword information as the abscissa of the second keyword, so as to obtain the second keyword.
Based on the same inventive concept, the present invention further provides a system for extracting the information of the bank receipt, comprising:
and a data conversion module: the method comprises the steps of carrying out data identification on acquired bank receipt data, and carrying out structural conversion on unstructured data in the bank receipt data to obtain receipt string data corresponding to the bank receipt data;
name acquisition module: the method is used for obtaining a bank name according to the return string data;
and a template selection module: the information extraction template is used for calling the information extraction template corresponding to the bank name according to the bank name;
and the information extraction module is used for: and the information extraction module is used for extracting information from the bank receipt data according to the information extraction template corresponding to the bank name.
Preferably, the information extraction template in the template selection module comprises the following construction process:
classifying according to different bank names in the historical bank receipt sample information to obtain receipt sample information under different bank names;
traversing and matching keyword information in the keyword library in turn in receipt sample information under different bank names based on a pre-constructed keyword library to obtain matched keyword information under different bank names and coordinate information of the keyword information;
Carrying out data integration based on the keyword information matched under different bank names and the coordinate information of the keyword information to obtain a vertical coordinate list and a horizontal coordinate two-dimensional array of the keyword information;
and constructing an information extraction template according to the ordinate list and the abscissa two-dimensional array of the keyword information.
Preferably, the template selection module constructs an information extraction template according to the ordinate list and the abscissa two-dimensional array of the keyword information, and the method includes:
sequentially taking each keyword information and coordinate information thereof, front side keyword information and coordinate information thereof corresponding to each keyword information, and rear side keyword and coordinate information thereof according to the ordinate list of the keyword information;
calculating the distance proportion between each keyword information in the ordinate list and the front keyword information and the rear keyword information respectively based on the keyword information and the coordinate information thereof, the front keyword information and the coordinate information thereof corresponding to each keyword information and the rear keyword and the coordinate information thereof;
according to the abscissa two-dimensional data of the keyword information, sequentially taking each keyword information and coordinate information thereof, and back side keyword information and coordinate information thereof;
Calculating the distance ratio of each keyword information and the rear keyword information in the abscissa two-dimensional array based on the each keyword information and the coordinate information thereof, the rear keyword information and the coordinate information thereof;
and constructing an information extraction template according to the distance ratio of each keyword information in the ordinate list to the front keyword information and the rear keyword information and the distance ratio of each keyword information in the abscissa two-dimensional array to the rear keyword information.
Preferably, the calculating, in the template selecting module, a distance ratio between each keyword information in the ordinate list and the front keyword information based on the each keyword information and the coordinate information thereof, the front keyword information corresponding to each keyword information and the coordinate information thereof, includes:
and taking the row-height quotient of the absolute value of the difference between the vertical coordinates of the keyword information and the front keyword information and the row-height quotient of the keyword information as the distance ratio of the keyword information and the front keyword in the vertical coordinate list based on the keyword information and the coordinate information thereof, the front keyword information and the coordinate information thereof corresponding to the keyword information, and the rear keyword and the coordinate information thereof.
Preferably, the information extraction module performs information extraction on the bank receipt data according to an information extraction template corresponding to the bank name, and includes:
traversing corresponding keyword information and coordinate information thereof in the information extraction template, and keyword information and coordinate information thereof on the vertical upper side, the vertical lower side and the rear side of the keyword information based on the keyword library;
obtaining a first key point according to the key word information and the coordinate information thereof and the key word information and the coordinate information thereof on the vertical upper side of the key word information;
obtaining a second key point according to the vertical lower side key information and the coordinate information thereof of the key information and the rear side key information and the coordinate information thereof of the key information;
taking the connecting line of the first key point and the second key point as a diagonal line to generate a rectangle;
and extracting information by taking the data in the rectangle as key extraction information.
Preferably, the information extraction module obtains a first key point according to the key word information and the coordinate information thereof and the key word information and the coordinate information thereof on the vertical upper side of the key word information, including:
And taking the abscissa of the keyword information with the smallest abscissa in the keyword information as the abscissa of the first keyword, and taking the ordinate of the keyword information on the upper vertical side as the ordinate of the first keyword to obtain the first keyword.
Preferably, the information extraction module obtains a second key point according to the vertical lower side keyword information and the coordinate information thereof of the keyword information, the rear side keyword information and the coordinate information thereof of the keyword information, and the second key point comprises:
and taking the ordinate of the vertical lower keyword information with the largest ordinate in the vertical lower keyword information of the keyword information as the ordinate of the second keyword, and taking the abscissa of the rear keyword information with the smallest abscissa in the rear keyword information of the keyword information as the abscissa of the second keyword, so as to obtain the second keyword.
Compared with the closest prior art, the invention has the following beneficial effects:
the invention provides a method and a system for extracting bank receipt information, comprising the following steps: carrying out data identification on the acquired bank receipt data, and carrying out structural conversion on unstructured data in the bank receipt data to obtain receipt string data corresponding to the bank receipt data; obtaining a bank name according to the return string data; according to the bank name, calling an information extraction template corresponding to the bank name; extracting information from the bank receipt data according to the information extraction template corresponding to the bank name; the invention can effectively improve the efficiency and accuracy of the receipt information extraction based on unstructured bank receipt information by extracting the template from the pre-constructed bank receipt information and extracting the key index information according to the template.
Additional features of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and drawings.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
FIG. 1 is a schematic flow chart of a method for extracting bank receipt information according to the present invention;
FIG. 2 is a schematic diagram of an overall framework of a method for extracting information of a bank receipt according to the present invention;
FIG. 3 is a schematic diagram of the construction process of an information extraction template of a method for extracting information of a bank receipt according to the present invention;
FIG. 4 is a drawing of original sample receipt information of a method for extracting receipt information from a bank according to the present invention;
FIG. 5 is a schematic diagram of the original sample receipt information extracted by the method for extracting the receipt information of the bank according to the present invention;
FIG. 6 is a schematic diagram of an information extraction flow of a method for extracting information from a bank receipt according to the present invention;
FIG. 7 is a schematic diagram of the information extraction result of a method for extracting information from a bank receipt according to the present invention;
fig. 8 is a schematic diagram of the structural composition of a system for extracting information of a bank receipt according to the present invention.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.
It should be noted that in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Example 1:
the invention provides a method for extracting bank receipt information, which is shown in figure 1 and comprises the following steps:
step 1: carrying out data identification on the acquired bank receipt data, and carrying out structural conversion on unstructured data in the bank receipt data to obtain receipt string data corresponding to the bank receipt data; the format of the single-character string data is [ character string value, character string coordinate ];
step 2: obtaining a bank name according to the return string data;
step 3: according to the bank name, calling an information extraction template corresponding to the bank name;
step 4: and extracting information from the bank receipt data according to the information extraction template corresponding to the bank name.
Specifically, the information extraction template in the step 1 includes the following construction processes:
Classifying according to different bank names in the historical bank receipt sample information to obtain receipt sample information under different bank names;
traversing and matching keyword information in the keyword library in turn in receipt sample information under different bank names based on a pre-constructed keyword library to obtain matched keyword information under different bank names and coordinate information of the keyword information; the keyword library is preferably composed of common keywords in the current bank receipt, and the main fields comprise: fields such as a full name of a payer, an account number of the payer, an account opening bank of the payer, a full name of a payee, an account number of the payee, an account opening bank of the payee, an actual payment date, an actual payment amount, application and the like, and according to the fields, the corresponding keywords and synonyms in the receipt are arranged as follows in combination with the receipt content of the main bank:
the payer name, the name of the payer, the name of the account,
the payer account number, the payment account number,
the bank for opening the account of the payer,
the payee name, payee,
the payee account number is a payee account number and a collection account number,
The bank for opening the account of the payee is the bank for opening the account of the payee,
actual payment date: transaction date, billing date, transaction time,
the actual payment amount is transaction amount, currency, amount and amount,
the purposes of remarks, abstracts, uses and appendices
Carrying out data integration based on the keyword information matched under different bank names and the coordinate information of the keyword information to obtain a vertical coordinate list and a horizontal coordinate two-dimensional array of the keyword information;
and constructing an information extraction template according to the ordinate list and the abscissa two-dimensional array of the keyword information.
The construction of the information extraction template according to the ordinate list and the abscissa two-dimensional array of the keyword information comprises the following steps:
sequentially taking each keyword information and coordinate information thereof, front side keyword information and coordinate information thereof corresponding to each keyword information, and rear side keyword and coordinate information thereof according to the ordinate list of the keyword information;
calculating the distance proportion between each keyword information in the ordinate list and the front keyword information and the rear keyword information respectively based on the keyword information and the coordinate information thereof, the front keyword information and the coordinate information thereof corresponding to each keyword information and the rear keyword and the coordinate information thereof;
According to the abscissa two-dimensional data of the keyword information, sequentially taking each keyword information and coordinate information thereof, and back side keyword information and coordinate information thereof;
calculating the distance ratio of each keyword information and the rear keyword information in the abscissa two-dimensional array based on the each keyword information and the coordinate information thereof, the rear keyword information and the coordinate information thereof;
and constructing an information extraction template according to the distance ratio of each keyword information in the ordinate list to the front keyword information and the rear keyword information and the distance ratio of each keyword information in the abscissa two-dimensional array to the rear keyword information.
Based on the keyword information and the coordinate information thereof, the front keyword information corresponding to the keyword information and the coordinate information thereof, calculating the distance ratio between each keyword information and the front keyword information in the ordinate list, wherein the distance ratio comprises the following steps:
and taking the row-height quotient of the absolute value of the difference between the vertical coordinates of the keyword information and the front keyword information and the row-height quotient of the keyword information as the distance ratio of the keyword information and the front keyword in the vertical coordinate list based on the keyword information and the coordinate information thereof, the front keyword information and the coordinate information thereof corresponding to the keyword information, and the rear keyword and the coordinate information thereof.
The calculating the distance ratio between each keyword information and the rear keyword information in the abscissa two-dimensional array based on the each keyword information and the coordinate information thereof, the rear keyword information and the coordinate information thereof includes:
and taking the row quotient of the absolute value of the difference between the abscissa of the keyword information and the abscissa of the rear keyword information and the keyword information as the distance proportion of the keyword information and the rear keyword in the abscissa two-dimensional array based on the keyword information and the coordinate information thereof, the rear keyword information and the coordinate information thereof.
The main content of the template is the template name of the receipt, the keyword names of the extraction indexes, the keyword names above the keywords, the keyword names below and the keyword names on the right side and the example proportion values (as shown in table 1). The detailed steps are as follows:
(1) And judging the bank to which the current receipt belongs, identifying the bank keyword with the minimum coordinates (x, y), intercepting and acquiring punctuation marks or blanks forwards, and recording the character string as the bank to which the receipt belongs.
(2) Traversing the keyword word stock arranged in the first step, sequentially matching keywords in the receipt content, recording and matching the next keyword if the keywords are matched, recording the receipt and manually supplementing the keyword word stock if the keywords are not matched.
(3) Recording the upper left coordinates (x, y) and the lower left coordinates (x 1, y 1) of the keywords and the text boxes on each match;
(4) Sequentially sequencing all recorded keywords from small to large according to Y values to form a list Y;
(5) Ordering all recorded keywords from small to large according to the equality of y, and forming a two-dimensional array X;
(6) Sequentially taking each keyword and left and right adjacent keywords from a list Y, wherein the left word is the name of the keyword on the upper side, the absolute value of Y1 of the left word subtracted from the Y value of the keyword/the height of the keyword (the absolute value of Y-Y1) is taken as the distance proportion from the keyword on the upper side, the right side is recorded as the name of the keyword on the lower side, the absolute value of Y of the right word subtracted from the Y1 value of the keyword/the height of the keyword (the absolute value of Y-Y1) is taken as the distance proportion from the keyword on the lower side, and no value is recorded as null;
(7) Sequentially taking the next word of each keyword from the two-dimensional array X, marking the next word as the name of the right keyword, taking the absolute value of the subtraction of the X value of the keyword X and the X value of the right keyword/the keyword height (the absolute value of y-y 1) as the distance proportion with the right keyword, and marking the zero value as null;
(8) Judging whether the keywords and the position information thereof exist in a receipt template configuration table, if yes, storing the keywords and the position information thereof in the table, and if not, forming a new template.
TABLE 1 receipt template configuration form
Field coding | Field name | Data type |
ID | Inner code | STRING |
TEMPLATE_NAME | Template name | STRING |
TEMPLATE_KEY | Keyword name | STRING |
UP_KEY | Top keyword name | STRING |
UP_RANGE | Distance ratio to upper keyword | NUMBER |
DOWN_KEY | Below keyword names | STRING |
DOWN_RANGE | Distance ratio to the underlying keyword | NUMBER |
right_KEY | Right keyword name | STRING |
right_RANGE | Distance ratio to right keyword | NUMBER |
Step 3, including:
traversing corresponding keyword information and coordinate information thereof in the information extraction template, and keyword information and coordinate information thereof on the vertical upper side, the vertical lower side and the rear side of the keyword information based on the keyword library;
obtaining a first key point according to the key word information and the coordinate information thereof and the key word information and the coordinate information thereof on the vertical upper side of the key word information;
obtaining a second key point according to the vertical lower side key information and the coordinate information thereof of the key information and the rear side key information and the coordinate information thereof of the key information;
taking the connecting line of the first key point and the second key point as a diagonal line to generate a rectangle;
and extracting information by taking the data in the rectangle as key extraction information.
The obtaining a first key point according to the key word information and the coordinate information thereof and the key word information and the coordinate information thereof on the vertical upper side of the key word information comprises the following steps:
And taking the abscissa of the keyword information with the smallest abscissa in the keyword information as the abscissa of the first keyword, and taking the ordinate of the keyword information on the upper vertical side as the ordinate of the first keyword to obtain the first keyword.
The obtaining a second key point according to the vertical lower side key information and the coordinate information thereof of the key information, the rear side key information and the coordinate information thereof of the key information, includes:
and taking the ordinate of the vertical lower keyword information with the largest ordinate in the vertical lower keyword information of the keyword information as the ordinate of the second keyword, and taking the abscissa of the rear keyword information with the smallest abscissa in the rear keyword information of the keyword information as the abscissa of the second keyword, so as to obtain the second keyword.
Specifically, the information extraction process in step 3 includes: firstly, identifying a 'bank' keyword, finding all information extraction templates under the name of the bank, and returning to reconstruct the information extraction templates if the number of found information extraction templates is 0. For finding the information extraction template of the bank, finding the keyword names and positions of the upper side, the lower side and the right side of the extracted keywords according to the template, and taking the lower boundary of the upper side keyword, the upper boundary of the lower side keyword, the left boundary of the right side keyword and the left boundary of the keyword according to the template configuration table, wherein the maximum number of the keyword names are matched as the corresponding template, and the boundary can be determined, so that the content in the boundary is extracted, and the detailed steps are as follows:
(1) And finding a template of the bank receipt. Identifying a 'bank' keyword with the minimum coordinates (x, y), intercepting and taking punctuation marks or blanks forwards, obtaining character strings, finding corresponding configuration information of a bank template in a template library, and returning to reconstruct an information extraction template if the configuration information is not taken;
(2) For all templates under the bank, the template with the most names matched to all keywords is taken as the corresponding receipt template (one bank possibly has a plurality of different templates because of different business or different systems and the like);
(3) Traversing each keyword in the template;
(4) Taking keywords, keywords on the upper side, the lower side and the right side of the keywords and coordinates of the keywords;
(5) The x minimum of the keyword and the y minimum of the upper keyword are taken out (if there is no upper keyword or it is not found, the value y=the y minimum of the keyword- (keyword line height)The distance ratio between the template configuration table and the upper keyword) to obtain a point;
(6) The y maximum value of the lower keyword is taken out (if no lower keyword is found or not found, the value y=y maximum value of the keyword + (keyword row)The ratio of the distance between the template configuration table and the lower keywords), the x minimum value of the right keywords (if none or the right keywords are found, the value x=the x minimum value of the keywords + (the keyword row height) Distance ratio between the template configuration table and the right keyword) to obtain a point;
(7) Taking the lines generated by the two points as diagonal lines of the rectangle to form the rectangle;
(8) Taking out the data in the rectangle as key information;
(9) And (5) finishing extraction of all key information.
Example 2:
an embodiment of the present invention is a method for extracting information from a bank receipt, as shown in fig. 2, which specifically includes:
s1: keywords and synonyms in the bank receipt are arranged;
s2: converting the bank receipt into a structured character string through an OCR algorithm;
s3: automatically generating a template;
s4: extracting key indexes of a bank receipt according to the template;
step S3, as shown in fig. 3, includes:
3.1 In the string of receipt conversion shown in fig. 4, the receipt banking behavior "xx bank" is found by "bank";
3.2 According to the keyword list in step S1, each keyword and its position in the receipt are extracted, as shown in fig. 5, and the result is as follows:
payer name [29, 189], [144, 189], [144, 213], [29, 213]
Payer account numbers [29, 234], [376, 234], [376, 254], [29, 254]
Payer opening row [29, 274], [367, 272], [367, 294], [29, 296]
Payee name [699, 189], [824, 189], [824, 212], [699, 212]
Payee account numbers [699, 234], [1046, 234], [1045, 261], [699, 255]
Payee opening lines [699, 274], [1040, 277], [1040, 297], [699, 294]
Amounts [318, 317], [415, 317], [415, 337], [318, 337]
Remarks [29, 360], [360, 360], [360, 381], [29, 381]
Billing dates [29, 148], [120, 148], [120, 169], [29, 169]
3.3, taking each keyword and the first coordinate point, namely the upper left coordinate point, and the result is as follows:
payer name [29, 189]
Payer account number [29, 234]
Payer bank [29, 274]
Payee name [699, 189]
Payee account number [699, 234]
Payee bank of accounts [699, 274]
Amount [318, 317]
Remarks [29, 360]
Billing date [29, 148]
3.4 And sequentially sequencing all recorded keywords according to the Y value from small to large to form a list Y:
list Y [ [
Accounting date [29, 148],
the payer name [29, 189],
payee name 699, 189,
the payer account number [29, 234],
the payee account number 699, 234,
payer opening row 29, 274,
payee bank [699, 274],
the amount of money 318, 317,
remarks [29, 360]
3.5, ordering all recorded keywords from small to large according to the equality of y, and forming a two-dimensional array X:
Two-dimensional array X
[
[ billing date [29, 148] ],
[ payer name [29, 189], payee name [699, 189] ],
the [ payer account number [29, 234], the payee account number [699, 234] ],
[ Payment person opening row [29, 274], collection person opening row [699, 274] ],
[ amount [318, 317] ],
remarks [29, 360]
]
3.6 traversing each keyword, and taking two adjacent values of the keywords from the list Y, wherein the left side is an upper value and the right side is a lower value. Such as "payer name", whose upper value is "billing date", and whose lower value is "payer account number";
3.7, taking the next value of the keyword from the two-dimensional array X as the right value, such as 'payer name', and the right value as 'payee name';
3.8 Traversing all keywords to form the bank template as shown in the following table 2;
table 2 xx bank receipt template configuration information
Inner code | Bank name | Key for extracting index Words and phrases | Above the extraction index Keyword(s) | And upper keywords Distance ratio | The lower part of the extraction index Keyword(s) | And the following keywords Distance ratio | Right side of extraction index Keyword(s) | And right keyword Distance ratio |
1 | xx bank | Name of payer | Billing date | 0.83 | Payment account | 0.88 | Payee name | 27.92 |
2 | xx bank | Payment account | Payee name | 1.10 | Payment person account opening line | 1.00 | Payee account number | 33.50 |
3 | xx bank | Payment person account opening line | Payee account number | 0.65 | Amount of money | 1.15 | Payee bank | 33.50 |
4 | xx bank | Payee name | Billing date | 0.87 | Payment account | 0.96 | ||
5 | xx bank | Payee account number | Payee name | 0.81 | Payment person account opening line | 0.48 | ||
6 | xx bank | Payee bank | Payee account number | 0.56 | Amount of money | 0.87 | ||
7 | xx bank | Amount of money | Payee bank | 1.00 | Remarks | 1.15 | ||
8 | xx bank | Billing date | Name of payer | 0.95 | ||||
9 | xx bank | Remarks | Amount of money | 1.10 |
Step S4, as shown in fig. 6, includes:
4.1 Identifying a keyword of a bank, finding a receipt bank behavior xx bank through the bank, and finding a corresponding information extraction template of the bank in a template library, wherein the information extraction template is shown in the following table 3:
TABLE 3 information extraction template
Inner code | Bank name | Key for extracting index Words and phrases | Above the extraction index Keyword(s) | And upper keywords Distance ratio | The lower part of the extraction index Keyword(s) | And the following keywords Distance ratio | Right side of extraction index Keyword(s) | And right keyword Distance ratio |
1 | xx Bank 1 | Name of payer | Billing date | 0.83 | Payment account | 0.88 | Payee name | 27.92 |
2 | xx Bank 1 | Payment account | Payee name | 1.10 | Payment person account opening line | 1.00 | Payee account number | 33.50 |
3 | xx Bank 1 | Payment person account opening line | Payee account number | 0.65 | Amount of money | 1.15 | Payee bank | 33.50 |
4 | xx Bank 1 | Payee name | Billing date | 0.87 | Payment account | 0.96 | ||
5 | xx Bank 1 | Payee account number | Payee name | 0.81 | Payment person account opening line | 0.48 | ||
6 | xx Bank 1 | Payee bank | Payee account number | 0.56 | Amount of money | 0.87 | ||
7 | xx Bank 1 | Amount of money | Payee bank | 1.00 | Remarks | 1.15 | ||
8 | xx Bank 1 | Billing date | Name of payer | 0.95 | ||||
9 | xx Bank 1 | Remarks | Amount of money | 1.10 | ||||
10 | xx Bank 2 | Name of payer | Billing date | 0.85 | Payment account | 0.89 | Payee name | 27 |
11 | xx Bank 2 | Payment account | Name of the name | 1.12 | Payment person account opening line | 1.08 | Payee account number | 33 |
12 | xx Bank 2 | Payment person account opening line | Account number | 0.63 | Amount of money | 1.25 | Payee bank | 33.2 |
13 | xx Bank 2 | Payee name | Billing date | 0.87 | Payment account | 0.96 | ||
14 | xx Bank 2 | Payee account number | Payee name | 0.81 | Payment person account opening line | 0.48 | ||
15 | xx Bank 2 | Payee bank | Payee account number | 0.56 | Monetary (capital) | 0.87 | ||
16 | xx Bank 2 | Monetary (capital) | Payee bank | 1.00 | Use of the same | 1.15 | ||
17 | xx Bank 2 | Billing date | Name of payer | 0.95 | ||||
18 | xx Bank 2 | Use of the same | Monetary (capital) | 1.10 |
4.2 The xx bank is provided with two sets of templates, namely xx bank 1 and xx bank 2, template keywords are in the receipt, xx bank 1 is matched with 8 keywords, and xx bank 2 is matched with 6 keywords, so that xx bank 1 is considered as the template of the receipt.
4.3 Traversing each keyword in the template.
4.4 The keyword, the keyword names on the upper side, the lower side and the right side and the corresponding coordinates thereof are taken, and the corresponding result of the 'payer name' is as follows:
upper side: since the keywords are blocked, they are not found.
Keyword: payer name [29, 189], [144, 189], [144, 213], [29, 213] right: payee name [699, 189], [824, 189], [824, 212], [699, 212]
The lower side: payer account numbers [29, 234], [376, 234], [376, 254], [29,254]
4.5 The x minimum value of the keyword is taken out as an abscissa value (29), and the ordinate value is as follows because the upper keyword is not found: the y minimum of the keyword minus the row height times the distance ratio from the upper keyword in the configuration table, i.e., 189-24 x 0.83=169, gives a point [29,169];
4.6 Taking out the x minimum value of the right keyword as an abscissa value and the y maximum value of the lower keyword as an ordinate value to obtain a point [699,254];
4.7 Forming a rectangle by taking a line generated by two points as a diagonal line of the rectangle, wherein the four points are [29,169], [699,169], [29,254], [699,254];
4.8 Taking out the data in the rectangle as key information, namely taking out the data in the range as the name of the payer: the xxx limited removes keywords and punctuation marks, as shown in fig. 7, and finally extracts keywords as follows: xxx limited.
4.9 And according to the steps, all key information extraction is completed.
Example 3:
the invention provides a system for extracting information of a bank receipt, which is shown in fig. 8 in a schematic structural diagram and comprises the following components: and a data conversion module: the method comprises the steps of carrying out data identification on acquired bank receipt data, and carrying out structural conversion on unstructured data in the bank receipt data to obtain receipt string data corresponding to the bank receipt data;
name acquisition module: the method is used for obtaining a bank name according to the return string data;
and a template selection module: the information extraction template is used for calling the information extraction template corresponding to the bank name according to the bank name;
and the information extraction module is used for: and the information extraction module is used for extracting information from the bank receipt data according to the information extraction template corresponding to the bank name.
The information extraction template in the template selection module comprises the following construction processes:
classifying according to different bank names in the historical bank receipt sample information to obtain receipt sample information under different bank names;
traversing and matching keyword information in the keyword library in turn in receipt sample information under different bank names based on a pre-constructed keyword library to obtain matched keyword information under different bank names and coordinate information of the keyword information;
Carrying out data integration based on the keyword information matched under different bank names and the coordinate information of the keyword information to obtain a vertical coordinate list and a horizontal coordinate two-dimensional array of the keyword information;
and constructing an information extraction template according to the ordinate list and the abscissa two-dimensional array of the keyword information.
The template selection module constructs an information extraction template according to the ordinate list and the abscissa two-dimensional array of the keyword information, and the template extraction module comprises:
sequentially taking each keyword information and coordinate information thereof, front side keyword information and coordinate information thereof corresponding to each keyword information, and rear side keyword and coordinate information thereof according to the ordinate list of the keyword information;
calculating the distance proportion between each keyword information in the ordinate list and the front keyword information and the rear keyword information respectively based on the keyword information and the coordinate information thereof, the front keyword information and the coordinate information thereof corresponding to each keyword information and the rear keyword and the coordinate information thereof;
according to the abscissa two-dimensional data of the keyword information, sequentially taking each keyword information and coordinate information thereof, and back side keyword information and coordinate information thereof;
Calculating the distance ratio of each keyword information and the rear keyword information in the abscissa two-dimensional array based on the each keyword information and the coordinate information thereof, the rear keyword information and the coordinate information thereof;
and constructing an information extraction template according to the distance ratio of each keyword information in the ordinate list to the front keyword information and the rear keyword information and the distance ratio of each keyword information in the abscissa two-dimensional array to the rear keyword information.
The calculating, in the template selection module, a distance ratio between each keyword information in the ordinate list and the front keyword information based on the each keyword information and the coordinate information thereof, and the front keyword information corresponding to each keyword information and the coordinate information thereof, includes:
and taking the row-height quotient of the absolute value of the difference between the vertical coordinates of the keyword information and the front keyword information and the row-height quotient of the keyword information as the distance ratio of the keyword information and the front keyword in the vertical coordinate list based on the keyword information and the coordinate information thereof, the front keyword information and the coordinate information thereof corresponding to the keyword information, and the rear keyword and the coordinate information thereof.
The calculating, in the template selection module, a distance ratio between each keyword information and the rear keyword information in the abscissa two-dimensional array based on the each keyword information and the coordinate information thereof, the rear keyword information and the coordinate information thereof, includes:
and taking the row quotient of the absolute value of the difference between the abscissa of the keyword information and the abscissa of the rear keyword information and the keyword information as the distance proportion of the keyword information and the rear keyword in the abscissa two-dimensional array based on the keyword information and the coordinate information thereof, the rear keyword information and the coordinate information thereof.
The information extraction module performs information extraction on the bank receipt data according to the information extraction template corresponding to the bank name, and includes:
traversing corresponding keyword information and coordinate information thereof in the information extraction template, and keyword information and coordinate information thereof on the vertical upper side, the vertical lower side and the rear side of the keyword information based on the keyword library;
obtaining a first key point according to the key word information and the coordinate information thereof and the key word information and the coordinate information thereof on the vertical upper side of the key word information;
Obtaining a second key point according to the vertical lower side key information and the coordinate information thereof of the key information and the rear side key information and the coordinate information thereof of the key information;
taking the connecting line of the first key point and the second key point as a diagonal line to generate a rectangle;
and extracting information by taking the data in the rectangle as key extraction information.
The information extraction module obtains a first key point according to the key word information and the coordinate information thereof and the key word information and the coordinate information thereof on the vertical upper side of the key word information, and the method comprises the following steps:
and taking the abscissa of the keyword information with the smallest abscissa in the keyword information as the abscissa of the first keyword, and taking the ordinate of the keyword information on the upper vertical side as the ordinate of the first keyword to obtain the first keyword.
The information extraction module obtains a second key point according to the vertical lower side key information and the coordinate information thereof of the key information, the rear side key information and the coordinate information thereof of the key information, and the second key point comprises:
and taking the ordinate of the vertical lower keyword information with the largest ordinate in the vertical lower keyword information of the keyword information as the ordinate of the second keyword, and taking the abscissa of the rear keyword information with the smallest abscissa in the rear keyword information of the keyword information as the abscissa of the second keyword, so as to obtain the second keyword.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.
Claims (9)
1. The bank receipt information extraction method is characterized by comprising the following steps of:
carrying out data identification on the acquired bank receipt data, and carrying out structural conversion on unstructured data in the bank receipt data to obtain receipt string data corresponding to the bank receipt data;
obtaining a bank name according to the return string data;
according to the bank name, calling an information extraction template corresponding to the bank name;
and extracting information from the bank receipt data according to the information extraction template corresponding to the bank name.
2. The method of claim 1, wherein the information extraction template comprises a construction process of:
classifying according to different bank names in the historical bank receipt sample information to obtain receipt sample information under different bank names;
traversing and matching keyword information in the keyword library in turn in receipt sample information under different bank names based on a pre-constructed keyword library to obtain matched keyword information under different bank names and coordinate information of the keyword information;
carrying out data integration based on the keyword information matched under different bank names and the coordinate information of the keyword information to obtain a vertical coordinate list and a horizontal coordinate two-dimensional array of the keyword information;
And constructing an information extraction template according to the ordinate list and the abscissa two-dimensional array of the keyword information.
3. The method of claim 2, wherein constructing an information extraction template from the list of ordinates and the two-dimensional array of abscissas of the keyword information comprises:
sequentially taking each keyword information and coordinate information thereof, front side keyword information and coordinate information thereof corresponding to each keyword information, and rear side keyword and coordinate information thereof according to the ordinate list of the keyword information;
calculating the distance proportion between each keyword information in the ordinate list and the front keyword information and the rear keyword information respectively based on the keyword information and the coordinate information thereof, the front keyword information and the coordinate information thereof corresponding to each keyword information and the rear keyword and the coordinate information thereof;
according to the abscissa two-dimensional data of the keyword information, sequentially taking each keyword information and coordinate information thereof, and back side keyword information and coordinate information thereof;
calculating the distance ratio of each keyword information and the rear keyword information in the abscissa two-dimensional array based on the each keyword information and the coordinate information thereof, the rear keyword information and the coordinate information thereof;
And constructing an information extraction template according to the distance ratio of each keyword information in the ordinate list to the front keyword information and the rear keyword information and the distance ratio of each keyword information in the abscissa two-dimensional array to the rear keyword information.
4. The method of claim 3, wherein calculating a distance ratio of each keyword information in the ordinate list to the front keyword information based on the each keyword information and the coordinate information thereof, the front keyword information corresponding to each keyword information and the coordinate information thereof, respectively, comprises:
and taking the row-height quotient of the absolute value of the difference between the vertical coordinates of the keyword information and the front keyword information and the row-height quotient of the keyword information as the distance ratio of the keyword information and the front keyword in the vertical coordinate list based on the keyword information and the coordinate information thereof, the front keyword information and the coordinate information thereof corresponding to the keyword information, and the rear keyword and the coordinate information thereof.
5. The method of claim 2, wherein the extracting information from the bank receipt data according to the information extraction template corresponding to the bank name comprises:
Traversing corresponding keyword information and coordinate information thereof in the information extraction template, and keyword information and coordinate information thereof on the vertical upper side, the vertical lower side and the rear side of the keyword information based on the keyword library;
obtaining a first key point according to the key word information and the coordinate information thereof and the key word information and the coordinate information thereof on the vertical upper side of the key word information;
obtaining a second key point according to the vertical lower side key information and the coordinate information thereof of the key information and the rear side key information and the coordinate information thereof of the key information;
taking the connecting line of the first key point and the second key point as a diagonal line to generate a rectangle;
and extracting information by taking the data in the rectangle as key extraction information.
6. The method of claim 5, wherein the obtaining a first key point according to the key information and the coordinate information thereof and the key information and the coordinate information thereof on the vertical upper side of the key information comprises:
and taking the abscissa of the keyword information with the smallest abscissa in the keyword information as the abscissa of the first keyword, and taking the ordinate of the keyword information on the upper vertical side as the ordinate of the first keyword to obtain the first keyword.
7. The method of claim 5, wherein the obtaining a second key point according to the vertically lower side key information of the key information and the coordinate information thereof, the rear side key information of the key information and the coordinate information thereof, comprises:
and taking the ordinate of the vertical lower keyword information with the largest ordinate in the vertical lower keyword information of the keyword information as the ordinate of the second keyword, and taking the abscissa of the rear keyword information with the smallest abscissa in the rear keyword information of the keyword information as the abscissa of the second keyword, so as to obtain the second keyword.
8. A system for extracting information from a bank receipt, comprising:
and a data conversion module: the method comprises the steps of carrying out data identification on acquired bank receipt data, and carrying out structural conversion on unstructured data in the bank receipt data to obtain receipt string data corresponding to the bank receipt data;
name acquisition module: the method is used for obtaining a bank name according to the return string data;
and a template selection module: the information extraction template is used for calling the information extraction template corresponding to the bank name according to the bank name;
And the information extraction module is used for: and the information extraction module is used for extracting information from the bank receipt data according to the information extraction template corresponding to the bank name.
9. The system of claim 8, wherein the information extraction template in the template selection module comprises a build process of:
classifying according to different bank names in the historical bank receipt sample information to obtain receipt sample information under different bank names;
traversing and matching keyword information in the keyword library in turn in receipt sample information under different bank names based on a pre-constructed keyword library to obtain matched keyword information under different bank names and coordinate information of the keyword information;
carrying out data integration based on the keyword information matched under different bank names and the coordinate information of the keyword information to obtain a vertical coordinate list and a horizontal coordinate two-dimensional array of the keyword information;
and constructing an information extraction template according to the ordinate list and the abscissa two-dimensional array of the keyword information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410028502.9A CN117540721B (en) | 2024-01-09 | 2024-01-09 | Bank receipt information extraction method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410028502.9A CN117540721B (en) | 2024-01-09 | 2024-01-09 | Bank receipt information extraction method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117540721A true CN117540721A (en) | 2024-02-09 |
CN117540721B CN117540721B (en) | 2024-04-12 |
Family
ID=89782703
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410028502.9A Active CN117540721B (en) | 2024-01-09 | 2024-01-09 | Bank receipt information extraction method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117540721B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130290270A1 (en) * | 2012-04-26 | 2013-10-31 | Anu Pareek | Method and system of data extraction from a portable document format file |
CN108376365A (en) * | 2018-03-22 | 2018-08-07 | 中国银行股份有限公司 | A kind of Bank Number determines method and device |
CN111428599A (en) * | 2020-03-17 | 2020-07-17 | 北京公瑾科技有限公司 | Bill identification method, device and equipment |
CN113962197A (en) * | 2021-08-19 | 2022-01-21 | 上海哥特网络技术有限公司 | Medical laboratory test report standardization method and device, electronic equipment and storage medium |
CN116740444A (en) * | 2023-06-14 | 2023-09-12 | 中国银行股份有限公司 | Information acquisition method, device, electronic equipment and storage medium |
-
2024
- 2024-01-09 CN CN202410028502.9A patent/CN117540721B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130290270A1 (en) * | 2012-04-26 | 2013-10-31 | Anu Pareek | Method and system of data extraction from a portable document format file |
CN108376365A (en) * | 2018-03-22 | 2018-08-07 | 中国银行股份有限公司 | A kind of Bank Number determines method and device |
CN111428599A (en) * | 2020-03-17 | 2020-07-17 | 北京公瑾科技有限公司 | Bill identification method, device and equipment |
CN113962197A (en) * | 2021-08-19 | 2022-01-21 | 上海哥特网络技术有限公司 | Medical laboratory test report standardization method and device, electronic equipment and storage medium |
CN116740444A (en) * | 2023-06-14 | 2023-09-12 | 中国银行股份有限公司 | Information acquisition method, device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN117540721B (en) | 2024-04-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109887153B (en) | Finance and tax processing method and system | |
CN108960223B (en) | Method for automatically generating voucher based on intelligent bill identification | |
RU2679209C2 (en) | Processing of electronic documents for invoices recognition | |
RU2695489C1 (en) | Identification of fields on an image using artificial intelligence | |
US20210366055A1 (en) | Systems and methods for generating accurate transaction data and manipulation | |
US11232300B2 (en) | System and method for automatic detection and verification of optical character recognition data | |
CN110889310B (en) | Financial document information intelligent extraction system and method | |
US20240046684A1 (en) | System for Information Extraction from Form-Like Documents | |
CN103177128A (en) | Method and system for processing bill crown word number information | |
CN111931780A (en) | Intelligent management method and equipment for accounting documents | |
CN112418812A (en) | Distributed full-link automatic intelligent clearance system, method and storage medium | |
JP2019204535A (en) | Accounting support system | |
Li et al. | Image pattern recognition in identification of financial bills risk management | |
CN114511866A (en) | Data auditing method, device, system, processor and machine-readable storage medium | |
CN110688998A (en) | Bill identification method and device | |
CN117540721B (en) | Bank receipt information extraction method and system | |
CN112668335A (en) | Method for identifying and extracting business license structured information by using named entity | |
CN116798061A (en) | Bill auditing and identifying method, device, terminal and storage medium | |
CN111104853A (en) | Image information input method and device, electronic equipment and storage medium | |
CN115934963A (en) | Business draft big data analysis method and application map for enterprise financial customer acquisition | |
US20220121881A1 (en) | Systems and methods for enabling relevant data to be extracted from a plurality of documents | |
CN111241955B (en) | Bill information extraction method and system | |
CN118229441B (en) | Electronic credential data application method and system based on intelligent association graph | |
CN117608565B (en) | Method and system for recommending AI type components in RPA (remote procedure A) based on screenshot analysis | |
CN111753841B (en) | Bill identification method and device based on route distribution |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |