CN117540721A - Bank receipt information extraction method and system - Google Patents

Bank receipt information extraction method and system Download PDF

Info

Publication number
CN117540721A
CN117540721A CN202410028502.9A CN202410028502A CN117540721A CN 117540721 A CN117540721 A CN 117540721A CN 202410028502 A CN202410028502 A CN 202410028502A CN 117540721 A CN117540721 A CN 117540721A
Authority
CN
China
Prior art keywords
information
keyword
bank
coordinate
keyword information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410028502.9A
Other languages
Chinese (zh)
Other versions
CN117540721B (en
Inventor
姬永杰
朱培冬
郝强
陈国强
贾军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dasy Technology Development Co ltd
Original Assignee
Beijing Dasy Technology Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dasy Technology Development Co ltd filed Critical Beijing Dasy Technology Development Co ltd
Priority to CN202410028502.9A priority Critical patent/CN117540721B/en
Publication of CN117540721A publication Critical patent/CN117540721A/en
Application granted granted Critical
Publication of CN117540721B publication Critical patent/CN117540721B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Finance (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Accounting & Taxation (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a system for extracting bank receipt information, comprising the following steps: carrying out data identification on the acquired bank receipt data, and carrying out structural conversion on unstructured data in the bank receipt data to obtain receipt string data corresponding to the bank receipt data; obtaining a bank name according to the return string data; according to the bank name, calling an information extraction template corresponding to the bank name; extracting information from the bank receipt data according to the information extraction template corresponding to the bank name; the invention can effectively improve the efficiency and accuracy of receipt information extraction based on unstructured receipt information by extracting the template from the pre-constructed receipt information and extracting the key index information according to the template.

Description

Bank receipt information extraction method and system
Technical Field
The invention relates to the technical field of data identification, in particular to a method and a system for extracting bank receipt information.
Background
At present, a bank receipt is an original basis of an enterprise billing certificate, and the enterprise can have a corresponding receipt as a proof when receiving payment. The receipt content mainly comprises information such as payment date, receipt name, receipt account number, receipt bank name, amount, remarks and the like. In many big data systems, a large number of unstructured files of bank receipts are collected, and key indexes in the unstructured files are required to be extracted into structured data so as to carry out data analysis by using big data technology;
In the existing step of receipt information extraction, templates are often required to be set manually, more manpower resources are consumed, and the non-structural receipt content faces the problems that index content boundaries are difficult to identify, accuracy is low and the like due to factors such as shielding, content folding and the like. Traditional bank receipt index extraction technology often depends on the characteristics of the identification object, and a personalized template needs to be designed according to the bank receipt format. However, the electronic receipt formats of the large banks are different, so that a large number of bank receipt templates need to be customized, the identification process is excessively dependent on manual intervention, and the identification efficiency is low. In addition, because some indexes in the receipt are displayed unclear or the content of the indexes is different in length, the boundary of the content cannot be accurately defined, so that the extracted indexes are not needed or redundant, and the accuracy is not high.
Disclosure of Invention
In order to solve the above problems, the present invention provides a method for extracting information of a bank receipt, comprising:
carrying out data identification on the acquired bank receipt data, and carrying out structural conversion on unstructured data in the bank receipt data to obtain receipt string data corresponding to the bank receipt data;
Obtaining a bank name according to the return string data;
according to the bank name, calling an information extraction template corresponding to the bank name;
and extracting information from the bank receipt data according to the information extraction template corresponding to the bank name.
Preferably, the information extraction template comprises the following construction process:
classifying according to different bank names in the historical bank receipt sample information to obtain receipt sample information under different bank names;
traversing and matching keyword information in the keyword library in turn in receipt sample information under different bank names based on a pre-constructed keyword library to obtain matched keyword information under different bank names and coordinate information of the keyword information;
carrying out data integration based on the keyword information matched under different bank names and the coordinate information of the keyword information to obtain a vertical coordinate list and a horizontal coordinate two-dimensional array of the keyword information;
and constructing an information extraction template according to the ordinate list and the abscissa two-dimensional array of the keyword information.
Preferably, the constructing an information extraction template according to the ordinate list and the abscissa two-dimensional array of the keyword information includes:
Sequentially taking each keyword information and coordinate information thereof, front side keyword information and coordinate information thereof corresponding to each keyword information, and rear side keyword and coordinate information thereof according to the ordinate list of the keyword information;
calculating the distance proportion between each keyword information in the ordinate list and the front keyword information and the rear keyword information respectively based on the keyword information and the coordinate information thereof, the front keyword information and the coordinate information thereof corresponding to each keyword information and the rear keyword and the coordinate information thereof;
according to the abscissa two-dimensional data of the keyword information, sequentially taking each keyword information and coordinate information thereof, and back side keyword information and coordinate information thereof;
calculating the distance ratio of each keyword information and the rear keyword information in the abscissa two-dimensional array based on the each keyword information and the coordinate information thereof, the rear keyword information and the coordinate information thereof;
and constructing an information extraction template according to the distance ratio of each keyword information in the ordinate list to the front keyword information and the rear keyword information and the distance ratio of each keyword information in the abscissa two-dimensional array to the rear keyword information.
Preferably, calculating the distance ratio between each keyword information in the ordinate list and the front keyword information based on the each keyword information and the coordinate information thereof, and the front keyword information corresponding to each keyword information and the coordinate information thereof, includes:
and taking the row-height quotient of the absolute value of the difference between the vertical coordinates of the keyword information and the front keyword information and the row-height quotient of the keyword information as the distance ratio of the keyword information and the front keyword in the vertical coordinate list based on the keyword information and the coordinate information thereof, the front keyword information and the coordinate information thereof corresponding to the keyword information, and the rear keyword and the coordinate information thereof.
Preferably, the information extraction for the bank receipt data according to the information extraction template corresponding to the bank name includes:
traversing corresponding keyword information and coordinate information thereof in the information extraction template, and keyword information and coordinate information thereof on the vertical upper side, the vertical lower side and the rear side of the keyword information based on the keyword library;
obtaining a first key point according to the key word information and the coordinate information thereof and the key word information and the coordinate information thereof on the vertical upper side of the key word information;
Obtaining a second key point according to the vertical lower side key information and the coordinate information thereof of the key information and the rear side key information and the coordinate information thereof of the key information;
taking the connecting line of the first key point and the second key point as a diagonal line to generate a rectangle;
and extracting information by taking the data in the rectangle as key extraction information.
Preferably, the obtaining the first key point according to the key information and the coordinate information thereof and the key information and the coordinate information thereof on the vertical upper side of the key information includes:
and taking the abscissa of the keyword information with the smallest abscissa in the keyword information as the abscissa of the first keyword, and taking the ordinate of the keyword information on the upper vertical side as the ordinate of the first keyword to obtain the first keyword.
Preferably, the obtaining the second key point according to the vertical lower side key information and the coordinate information thereof of the key information, the rear side key information of the key information and the coordinate information thereof includes:
and taking the ordinate of the vertical lower keyword information with the largest ordinate in the vertical lower keyword information of the keyword information as the ordinate of the second keyword, and taking the abscissa of the rear keyword information with the smallest abscissa in the rear keyword information of the keyword information as the abscissa of the second keyword, so as to obtain the second keyword.
Based on the same inventive concept, the present invention further provides a system for extracting the information of the bank receipt, comprising:
and a data conversion module: the method comprises the steps of carrying out data identification on acquired bank receipt data, and carrying out structural conversion on unstructured data in the bank receipt data to obtain receipt string data corresponding to the bank receipt data;
name acquisition module: the method is used for obtaining a bank name according to the return string data;
and a template selection module: the information extraction template is used for calling the information extraction template corresponding to the bank name according to the bank name;
and the information extraction module is used for: and the information extraction module is used for extracting information from the bank receipt data according to the information extraction template corresponding to the bank name.
Preferably, the information extraction template in the template selection module comprises the following construction process:
classifying according to different bank names in the historical bank receipt sample information to obtain receipt sample information under different bank names;
traversing and matching keyword information in the keyword library in turn in receipt sample information under different bank names based on a pre-constructed keyword library to obtain matched keyword information under different bank names and coordinate information of the keyword information;
Carrying out data integration based on the keyword information matched under different bank names and the coordinate information of the keyword information to obtain a vertical coordinate list and a horizontal coordinate two-dimensional array of the keyword information;
and constructing an information extraction template according to the ordinate list and the abscissa two-dimensional array of the keyword information.
Preferably, the template selection module constructs an information extraction template according to the ordinate list and the abscissa two-dimensional array of the keyword information, and the method includes:
sequentially taking each keyword information and coordinate information thereof, front side keyword information and coordinate information thereof corresponding to each keyword information, and rear side keyword and coordinate information thereof according to the ordinate list of the keyword information;
calculating the distance proportion between each keyword information in the ordinate list and the front keyword information and the rear keyword information respectively based on the keyword information and the coordinate information thereof, the front keyword information and the coordinate information thereof corresponding to each keyword information and the rear keyword and the coordinate information thereof;
according to the abscissa two-dimensional data of the keyword information, sequentially taking each keyword information and coordinate information thereof, and back side keyword information and coordinate information thereof;
Calculating the distance ratio of each keyword information and the rear keyword information in the abscissa two-dimensional array based on the each keyword information and the coordinate information thereof, the rear keyword information and the coordinate information thereof;
and constructing an information extraction template according to the distance ratio of each keyword information in the ordinate list to the front keyword information and the rear keyword information and the distance ratio of each keyword information in the abscissa two-dimensional array to the rear keyword information.
Preferably, the calculating, in the template selecting module, a distance ratio between each keyword information in the ordinate list and the front keyword information based on the each keyword information and the coordinate information thereof, the front keyword information corresponding to each keyword information and the coordinate information thereof, includes:
and taking the row-height quotient of the absolute value of the difference between the vertical coordinates of the keyword information and the front keyword information and the row-height quotient of the keyword information as the distance ratio of the keyword information and the front keyword in the vertical coordinate list based on the keyword information and the coordinate information thereof, the front keyword information and the coordinate information thereof corresponding to the keyword information, and the rear keyword and the coordinate information thereof.
Preferably, the information extraction module performs information extraction on the bank receipt data according to an information extraction template corresponding to the bank name, and includes:
traversing corresponding keyword information and coordinate information thereof in the information extraction template, and keyword information and coordinate information thereof on the vertical upper side, the vertical lower side and the rear side of the keyword information based on the keyword library;
obtaining a first key point according to the key word information and the coordinate information thereof and the key word information and the coordinate information thereof on the vertical upper side of the key word information;
obtaining a second key point according to the vertical lower side key information and the coordinate information thereof of the key information and the rear side key information and the coordinate information thereof of the key information;
taking the connecting line of the first key point and the second key point as a diagonal line to generate a rectangle;
and extracting information by taking the data in the rectangle as key extraction information.
Preferably, the information extraction module obtains a first key point according to the key word information and the coordinate information thereof and the key word information and the coordinate information thereof on the vertical upper side of the key word information, including:
And taking the abscissa of the keyword information with the smallest abscissa in the keyword information as the abscissa of the first keyword, and taking the ordinate of the keyword information on the upper vertical side as the ordinate of the first keyword to obtain the first keyword.
Preferably, the information extraction module obtains a second key point according to the vertical lower side keyword information and the coordinate information thereof of the keyword information, the rear side keyword information and the coordinate information thereof of the keyword information, and the second key point comprises:
and taking the ordinate of the vertical lower keyword information with the largest ordinate in the vertical lower keyword information of the keyword information as the ordinate of the second keyword, and taking the abscissa of the rear keyword information with the smallest abscissa in the rear keyword information of the keyword information as the abscissa of the second keyword, so as to obtain the second keyword.
Compared with the closest prior art, the invention has the following beneficial effects:
the invention provides a method and a system for extracting bank receipt information, comprising the following steps: carrying out data identification on the acquired bank receipt data, and carrying out structural conversion on unstructured data in the bank receipt data to obtain receipt string data corresponding to the bank receipt data; obtaining a bank name according to the return string data; according to the bank name, calling an information extraction template corresponding to the bank name; extracting information from the bank receipt data according to the information extraction template corresponding to the bank name; the invention can effectively improve the efficiency and accuracy of the receipt information extraction based on unstructured bank receipt information by extracting the template from the pre-constructed bank receipt information and extracting the key index information according to the template.
Additional features of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and drawings.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
FIG. 1 is a schematic flow chart of a method for extracting bank receipt information according to the present invention;
FIG. 2 is a schematic diagram of an overall framework of a method for extracting information of a bank receipt according to the present invention;
FIG. 3 is a schematic diagram of the construction process of an information extraction template of a method for extracting information of a bank receipt according to the present invention;
FIG. 4 is a drawing of original sample receipt information of a method for extracting receipt information from a bank according to the present invention;
FIG. 5 is a schematic diagram of the original sample receipt information extracted by the method for extracting the receipt information of the bank according to the present invention;
FIG. 6 is a schematic diagram of an information extraction flow of a method for extracting information from a bank receipt according to the present invention;
FIG. 7 is a schematic diagram of the information extraction result of a method for extracting information from a bank receipt according to the present invention;
fig. 8 is a schematic diagram of the structural composition of a system for extracting information of a bank receipt according to the present invention.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.
It should be noted that in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Example 1:
the invention provides a method for extracting bank receipt information, which is shown in figure 1 and comprises the following steps:
step 1: carrying out data identification on the acquired bank receipt data, and carrying out structural conversion on unstructured data in the bank receipt data to obtain receipt string data corresponding to the bank receipt data; the format of the single-character string data is [ character string value, character string coordinate ];
step 2: obtaining a bank name according to the return string data;
step 3: according to the bank name, calling an information extraction template corresponding to the bank name;
step 4: and extracting information from the bank receipt data according to the information extraction template corresponding to the bank name.
Specifically, the information extraction template in the step 1 includes the following construction processes:
Classifying according to different bank names in the historical bank receipt sample information to obtain receipt sample information under different bank names;
traversing and matching keyword information in the keyword library in turn in receipt sample information under different bank names based on a pre-constructed keyword library to obtain matched keyword information under different bank names and coordinate information of the keyword information; the keyword library is preferably composed of common keywords in the current bank receipt, and the main fields comprise: fields such as a full name of a payer, an account number of the payer, an account opening bank of the payer, a full name of a payee, an account number of the payee, an account opening bank of the payee, an actual payment date, an actual payment amount, application and the like, and according to the fields, the corresponding keywords and synonyms in the receipt are arranged as follows in combination with the receipt content of the main bank:
the payer name, the name of the payer, the name of the account,
the payer account number, the payment account number,
the bank for opening the account of the payer,
the payee name, payee,
the payee account number is a payee account number and a collection account number,
The bank for opening the account of the payee is the bank for opening the account of the payee,
actual payment date: transaction date, billing date, transaction time,
the actual payment amount is transaction amount, currency, amount and amount,
the purposes of remarks, abstracts, uses and appendices
Carrying out data integration based on the keyword information matched under different bank names and the coordinate information of the keyword information to obtain a vertical coordinate list and a horizontal coordinate two-dimensional array of the keyword information;
and constructing an information extraction template according to the ordinate list and the abscissa two-dimensional array of the keyword information.
The construction of the information extraction template according to the ordinate list and the abscissa two-dimensional array of the keyword information comprises the following steps:
sequentially taking each keyword information and coordinate information thereof, front side keyword information and coordinate information thereof corresponding to each keyword information, and rear side keyword and coordinate information thereof according to the ordinate list of the keyword information;
calculating the distance proportion between each keyword information in the ordinate list and the front keyword information and the rear keyword information respectively based on the keyword information and the coordinate information thereof, the front keyword information and the coordinate information thereof corresponding to each keyword information and the rear keyword and the coordinate information thereof;
According to the abscissa two-dimensional data of the keyword information, sequentially taking each keyword information and coordinate information thereof, and back side keyword information and coordinate information thereof;
calculating the distance ratio of each keyword information and the rear keyword information in the abscissa two-dimensional array based on the each keyword information and the coordinate information thereof, the rear keyword information and the coordinate information thereof;
and constructing an information extraction template according to the distance ratio of each keyword information in the ordinate list to the front keyword information and the rear keyword information and the distance ratio of each keyword information in the abscissa two-dimensional array to the rear keyword information.
Based on the keyword information and the coordinate information thereof, the front keyword information corresponding to the keyword information and the coordinate information thereof, calculating the distance ratio between each keyword information and the front keyword information in the ordinate list, wherein the distance ratio comprises the following steps:
and taking the row-height quotient of the absolute value of the difference between the vertical coordinates of the keyword information and the front keyword information and the row-height quotient of the keyword information as the distance ratio of the keyword information and the front keyword in the vertical coordinate list based on the keyword information and the coordinate information thereof, the front keyword information and the coordinate information thereof corresponding to the keyword information, and the rear keyword and the coordinate information thereof.
The calculating the distance ratio between each keyword information and the rear keyword information in the abscissa two-dimensional array based on the each keyword information and the coordinate information thereof, the rear keyword information and the coordinate information thereof includes:
and taking the row quotient of the absolute value of the difference between the abscissa of the keyword information and the abscissa of the rear keyword information and the keyword information as the distance proportion of the keyword information and the rear keyword in the abscissa two-dimensional array based on the keyword information and the coordinate information thereof, the rear keyword information and the coordinate information thereof.
The main content of the template is the template name of the receipt, the keyword names of the extraction indexes, the keyword names above the keywords, the keyword names below and the keyword names on the right side and the example proportion values (as shown in table 1). The detailed steps are as follows:
(1) And judging the bank to which the current receipt belongs, identifying the bank keyword with the minimum coordinates (x, y), intercepting and acquiring punctuation marks or blanks forwards, and recording the character string as the bank to which the receipt belongs.
(2) Traversing the keyword word stock arranged in the first step, sequentially matching keywords in the receipt content, recording and matching the next keyword if the keywords are matched, recording the receipt and manually supplementing the keyword word stock if the keywords are not matched.
(3) Recording the upper left coordinates (x, y) and the lower left coordinates (x 1, y 1) of the keywords and the text boxes on each match;
(4) Sequentially sequencing all recorded keywords from small to large according to Y values to form a list Y;
(5) Ordering all recorded keywords from small to large according to the equality of y, and forming a two-dimensional array X;
(6) Sequentially taking each keyword and left and right adjacent keywords from a list Y, wherein the left word is the name of the keyword on the upper side, the absolute value of Y1 of the left word subtracted from the Y value of the keyword/the height of the keyword (the absolute value of Y-Y1) is taken as the distance proportion from the keyword on the upper side, the right side is recorded as the name of the keyword on the lower side, the absolute value of Y of the right word subtracted from the Y1 value of the keyword/the height of the keyword (the absolute value of Y-Y1) is taken as the distance proportion from the keyword on the lower side, and no value is recorded as null;
(7) Sequentially taking the next word of each keyword from the two-dimensional array X, marking the next word as the name of the right keyword, taking the absolute value of the subtraction of the X value of the keyword X and the X value of the right keyword/the keyword height (the absolute value of y-y 1) as the distance proportion with the right keyword, and marking the zero value as null;
(8) Judging whether the keywords and the position information thereof exist in a receipt template configuration table, if yes, storing the keywords and the position information thereof in the table, and if not, forming a new template.
TABLE 1 receipt template configuration form
Field coding Field name Data type
ID Inner code STRING
TEMPLATE_NAME Template name STRING
TEMPLATE_KEY Keyword name STRING
UP_KEY Top keyword name STRING
UP_RANGE Distance ratio to upper keyword NUMBER
DOWN_KEY Below keyword names STRING
DOWN_RANGE Distance ratio to the underlying keyword NUMBER
right_KEY Right keyword name STRING
right_RANGE Distance ratio to right keyword NUMBER
Step 3, including:
traversing corresponding keyword information and coordinate information thereof in the information extraction template, and keyword information and coordinate information thereof on the vertical upper side, the vertical lower side and the rear side of the keyword information based on the keyword library;
obtaining a first key point according to the key word information and the coordinate information thereof and the key word information and the coordinate information thereof on the vertical upper side of the key word information;
obtaining a second key point according to the vertical lower side key information and the coordinate information thereof of the key information and the rear side key information and the coordinate information thereof of the key information;
taking the connecting line of the first key point and the second key point as a diagonal line to generate a rectangle;
and extracting information by taking the data in the rectangle as key extraction information.
The obtaining a first key point according to the key word information and the coordinate information thereof and the key word information and the coordinate information thereof on the vertical upper side of the key word information comprises the following steps:
And taking the abscissa of the keyword information with the smallest abscissa in the keyword information as the abscissa of the first keyword, and taking the ordinate of the keyword information on the upper vertical side as the ordinate of the first keyword to obtain the first keyword.
The obtaining a second key point according to the vertical lower side key information and the coordinate information thereof of the key information, the rear side key information and the coordinate information thereof of the key information, includes:
and taking the ordinate of the vertical lower keyword information with the largest ordinate in the vertical lower keyword information of the keyword information as the ordinate of the second keyword, and taking the abscissa of the rear keyword information with the smallest abscissa in the rear keyword information of the keyword information as the abscissa of the second keyword, so as to obtain the second keyword.
Specifically, the information extraction process in step 3 includes: firstly, identifying a 'bank' keyword, finding all information extraction templates under the name of the bank, and returning to reconstruct the information extraction templates if the number of found information extraction templates is 0. For finding the information extraction template of the bank, finding the keyword names and positions of the upper side, the lower side and the right side of the extracted keywords according to the template, and taking the lower boundary of the upper side keyword, the upper boundary of the lower side keyword, the left boundary of the right side keyword and the left boundary of the keyword according to the template configuration table, wherein the maximum number of the keyword names are matched as the corresponding template, and the boundary can be determined, so that the content in the boundary is extracted, and the detailed steps are as follows:
(1) And finding a template of the bank receipt. Identifying a 'bank' keyword with the minimum coordinates (x, y), intercepting and taking punctuation marks or blanks forwards, obtaining character strings, finding corresponding configuration information of a bank template in a template library, and returning to reconstruct an information extraction template if the configuration information is not taken;
(2) For all templates under the bank, the template with the most names matched to all keywords is taken as the corresponding receipt template (one bank possibly has a plurality of different templates because of different business or different systems and the like);
(3) Traversing each keyword in the template;
(4) Taking keywords, keywords on the upper side, the lower side and the right side of the keywords and coordinates of the keywords;
(5) The x minimum of the keyword and the y minimum of the upper keyword are taken out (if there is no upper keyword or it is not found, the value y=the y minimum of the keyword- (keyword line height)The distance ratio between the template configuration table and the upper keyword) to obtain a point;
(6) The y maximum value of the lower keyword is taken out (if no lower keyword is found or not found, the value y=y maximum value of the keyword + (keyword row)The ratio of the distance between the template configuration table and the lower keywords), the x minimum value of the right keywords (if none or the right keywords are found, the value x=the x minimum value of the keywords + (the keyword row height) Distance ratio between the template configuration table and the right keyword) to obtain a point;
(7) Taking the lines generated by the two points as diagonal lines of the rectangle to form the rectangle;
(8) Taking out the data in the rectangle as key information;
(9) And (5) finishing extraction of all key information.
Example 2:
an embodiment of the present invention is a method for extracting information from a bank receipt, as shown in fig. 2, which specifically includes:
s1: keywords and synonyms in the bank receipt are arranged;
s2: converting the bank receipt into a structured character string through an OCR algorithm;
s3: automatically generating a template;
s4: extracting key indexes of a bank receipt according to the template;
step S3, as shown in fig. 3, includes:
3.1 In the string of receipt conversion shown in fig. 4, the receipt banking behavior "xx bank" is found by "bank";
3.2 According to the keyword list in step S1, each keyword and its position in the receipt are extracted, as shown in fig. 5, and the result is as follows:
payer name [29, 189], [144, 189], [144, 213], [29, 213]
Payer account numbers [29, 234], [376, 234], [376, 254], [29, 254]
Payer opening row [29, 274], [367, 272], [367, 294], [29, 296]
Payee name [699, 189], [824, 189], [824, 212], [699, 212]
Payee account numbers [699, 234], [1046, 234], [1045, 261], [699, 255]
Payee opening lines [699, 274], [1040, 277], [1040, 297], [699, 294]
Amounts [318, 317], [415, 317], [415, 337], [318, 337]
Remarks [29, 360], [360, 360], [360, 381], [29, 381]
Billing dates [29, 148], [120, 148], [120, 169], [29, 169]
3.3, taking each keyword and the first coordinate point, namely the upper left coordinate point, and the result is as follows:
payer name [29, 189]
Payer account number [29, 234]
Payer bank [29, 274]
Payee name [699, 189]
Payee account number [699, 234]
Payee bank of accounts [699, 274]
Amount [318, 317]
Remarks [29, 360]
Billing date [29, 148]
3.4 And sequentially sequencing all recorded keywords according to the Y value from small to large to form a list Y:
list Y [ [
Accounting date [29, 148],
the payer name [29, 189],
payee name 699, 189,
the payer account number [29, 234],
the payee account number 699, 234,
payer opening row 29, 274,
payee bank [699, 274],
the amount of money 318, 317,
remarks [29, 360]
3.5, ordering all recorded keywords from small to large according to the equality of y, and forming a two-dimensional array X:
Two-dimensional array X
[
[ billing date [29, 148] ],
[ payer name [29, 189], payee name [699, 189] ],
the [ payer account number [29, 234], the payee account number [699, 234] ],
[ Payment person opening row [29, 274], collection person opening row [699, 274] ],
[ amount [318, 317] ],
remarks [29, 360]
]
3.6 traversing each keyword, and taking two adjacent values of the keywords from the list Y, wherein the left side is an upper value and the right side is a lower value. Such as "payer name", whose upper value is "billing date", and whose lower value is "payer account number";
3.7, taking the next value of the keyword from the two-dimensional array X as the right value, such as 'payer name', and the right value as 'payee name';
3.8 Traversing all keywords to form the bank template as shown in the following table 2;
table 2 xx bank receipt template configuration information
Inner code Bank name Key for extracting index Words and phrases Above the extraction index Keyword(s) And upper keywords Distance ratio The lower part of the extraction index Keyword(s) And the following keywords Distance ratio Right side of extraction index Keyword(s) And right keyword Distance ratio
1 xx bank Name of payer Billing date 0.83 Payment account 0.88 Payee name 27.92
2 xx bank Payment account Payee name 1.10 Payment person account opening line 1.00 Payee account number 33.50
3 xx bank Payment person account opening line Payee account number 0.65 Amount of money 1.15 Payee bank 33.50
4 xx bank Payee name Billing date 0.87 Payment account 0.96
5 xx bank Payee account number Payee name 0.81 Payment person account opening line 0.48
6 xx bank Payee bank Payee account number 0.56 Amount of money 0.87
7 xx bank Amount of money Payee bank 1.00 Remarks 1.15
8 xx bank Billing date Name of payer 0.95
9 xx bank Remarks Amount of money 1.10
Step S4, as shown in fig. 6, includes:
4.1 Identifying a keyword of a bank, finding a receipt bank behavior xx bank through the bank, and finding a corresponding information extraction template of the bank in a template library, wherein the information extraction template is shown in the following table 3:
TABLE 3 information extraction template
Inner code Bank name Key for extracting index Words and phrases Above the extraction index Keyword(s) And upper keywords Distance ratio The lower part of the extraction index Keyword(s) And the following keywords Distance ratio Right side of extraction index Keyword(s) And right keyword Distance ratio
1 xx Bank 1 Name of payer Billing date 0.83 Payment account 0.88 Payee name 27.92
2 xx Bank 1 Payment account Payee name 1.10 Payment person account opening line 1.00 Payee account number 33.50
3 xx Bank 1 Payment person account opening line Payee account number 0.65 Amount of money 1.15 Payee bank 33.50
4 xx Bank 1 Payee name Billing date 0.87 Payment account 0.96
5 xx Bank 1 Payee account number Payee name 0.81 Payment person account opening line 0.48
6 xx Bank 1 Payee bank Payee account number 0.56 Amount of money 0.87
7 xx Bank 1 Amount of money Payee bank 1.00 Remarks 1.15
8 xx Bank 1 Billing date Name of payer 0.95
9 xx Bank 1 Remarks Amount of money 1.10
10 xx Bank 2 Name of payer Billing date 0.85 Payment account 0.89 Payee name 27
11 xx Bank 2 Payment account Name of the name 1.12 Payment person account opening line 1.08 Payee account number 33
12 xx Bank 2 Payment person account opening line Account number 0.63 Amount of money 1.25 Payee bank 33.2
13 xx Bank 2 Payee name Billing date 0.87 Payment account 0.96
14 xx Bank 2 Payee account number Payee name 0.81 Payment person account opening line 0.48
15 xx Bank 2 Payee bank Payee account number 0.56 Monetary (capital) 0.87
16 xx Bank 2 Monetary (capital) Payee bank 1.00 Use of the same 1.15
17 xx Bank 2 Billing date Name of payer 0.95
18 xx Bank 2 Use of the same Monetary (capital) 1.10
4.2 The xx bank is provided with two sets of templates, namely xx bank 1 and xx bank 2, template keywords are in the receipt, xx bank 1 is matched with 8 keywords, and xx bank 2 is matched with 6 keywords, so that xx bank 1 is considered as the template of the receipt.
4.3 Traversing each keyword in the template.
4.4 The keyword, the keyword names on the upper side, the lower side and the right side and the corresponding coordinates thereof are taken, and the corresponding result of the 'payer name' is as follows:
upper side: since the keywords are blocked, they are not found.
Keyword: payer name [29, 189], [144, 189], [144, 213], [29, 213] right: payee name [699, 189], [824, 189], [824, 212], [699, 212]
The lower side: payer account numbers [29, 234], [376, 234], [376, 254], [29,254]
4.5 The x minimum value of the keyword is taken out as an abscissa value (29), and the ordinate value is as follows because the upper keyword is not found: the y minimum of the keyword minus the row height times the distance ratio from the upper keyword in the configuration table, i.e., 189-24 x 0.83=169, gives a point [29,169];
4.6 Taking out the x minimum value of the right keyword as an abscissa value and the y maximum value of the lower keyword as an ordinate value to obtain a point [699,254];
4.7 Forming a rectangle by taking a line generated by two points as a diagonal line of the rectangle, wherein the four points are [29,169], [699,169], [29,254], [699,254];
4.8 Taking out the data in the rectangle as key information, namely taking out the data in the range as the name of the payer: the xxx limited removes keywords and punctuation marks, as shown in fig. 7, and finally extracts keywords as follows: xxx limited.
4.9 And according to the steps, all key information extraction is completed.
Example 3:
the invention provides a system for extracting information of a bank receipt, which is shown in fig. 8 in a schematic structural diagram and comprises the following components: and a data conversion module: the method comprises the steps of carrying out data identification on acquired bank receipt data, and carrying out structural conversion on unstructured data in the bank receipt data to obtain receipt string data corresponding to the bank receipt data;
name acquisition module: the method is used for obtaining a bank name according to the return string data;
and a template selection module: the information extraction template is used for calling the information extraction template corresponding to the bank name according to the bank name;
and the information extraction module is used for: and the information extraction module is used for extracting information from the bank receipt data according to the information extraction template corresponding to the bank name.
The information extraction template in the template selection module comprises the following construction processes:
classifying according to different bank names in the historical bank receipt sample information to obtain receipt sample information under different bank names;
traversing and matching keyword information in the keyword library in turn in receipt sample information under different bank names based on a pre-constructed keyword library to obtain matched keyword information under different bank names and coordinate information of the keyword information;
Carrying out data integration based on the keyword information matched under different bank names and the coordinate information of the keyword information to obtain a vertical coordinate list and a horizontal coordinate two-dimensional array of the keyword information;
and constructing an information extraction template according to the ordinate list and the abscissa two-dimensional array of the keyword information.
The template selection module constructs an information extraction template according to the ordinate list and the abscissa two-dimensional array of the keyword information, and the template extraction module comprises:
sequentially taking each keyword information and coordinate information thereof, front side keyword information and coordinate information thereof corresponding to each keyword information, and rear side keyword and coordinate information thereof according to the ordinate list of the keyword information;
calculating the distance proportion between each keyword information in the ordinate list and the front keyword information and the rear keyword information respectively based on the keyword information and the coordinate information thereof, the front keyword information and the coordinate information thereof corresponding to each keyword information and the rear keyword and the coordinate information thereof;
according to the abscissa two-dimensional data of the keyword information, sequentially taking each keyword information and coordinate information thereof, and back side keyword information and coordinate information thereof;
Calculating the distance ratio of each keyword information and the rear keyword information in the abscissa two-dimensional array based on the each keyword information and the coordinate information thereof, the rear keyword information and the coordinate information thereof;
and constructing an information extraction template according to the distance ratio of each keyword information in the ordinate list to the front keyword information and the rear keyword information and the distance ratio of each keyword information in the abscissa two-dimensional array to the rear keyword information.
The calculating, in the template selection module, a distance ratio between each keyword information in the ordinate list and the front keyword information based on the each keyword information and the coordinate information thereof, and the front keyword information corresponding to each keyword information and the coordinate information thereof, includes:
and taking the row-height quotient of the absolute value of the difference between the vertical coordinates of the keyword information and the front keyword information and the row-height quotient of the keyword information as the distance ratio of the keyword information and the front keyword in the vertical coordinate list based on the keyword information and the coordinate information thereof, the front keyword information and the coordinate information thereof corresponding to the keyword information, and the rear keyword and the coordinate information thereof.
The calculating, in the template selection module, a distance ratio between each keyword information and the rear keyword information in the abscissa two-dimensional array based on the each keyword information and the coordinate information thereof, the rear keyword information and the coordinate information thereof, includes:
and taking the row quotient of the absolute value of the difference between the abscissa of the keyword information and the abscissa of the rear keyword information and the keyword information as the distance proportion of the keyword information and the rear keyword in the abscissa two-dimensional array based on the keyword information and the coordinate information thereof, the rear keyword information and the coordinate information thereof.
The information extraction module performs information extraction on the bank receipt data according to the information extraction template corresponding to the bank name, and includes:
traversing corresponding keyword information and coordinate information thereof in the information extraction template, and keyword information and coordinate information thereof on the vertical upper side, the vertical lower side and the rear side of the keyword information based on the keyword library;
obtaining a first key point according to the key word information and the coordinate information thereof and the key word information and the coordinate information thereof on the vertical upper side of the key word information;
Obtaining a second key point according to the vertical lower side key information and the coordinate information thereof of the key information and the rear side key information and the coordinate information thereof of the key information;
taking the connecting line of the first key point and the second key point as a diagonal line to generate a rectangle;
and extracting information by taking the data in the rectangle as key extraction information.
The information extraction module obtains a first key point according to the key word information and the coordinate information thereof and the key word information and the coordinate information thereof on the vertical upper side of the key word information, and the method comprises the following steps:
and taking the abscissa of the keyword information with the smallest abscissa in the keyword information as the abscissa of the first keyword, and taking the ordinate of the keyword information on the upper vertical side as the ordinate of the first keyword to obtain the first keyword.
The information extraction module obtains a second key point according to the vertical lower side key information and the coordinate information thereof of the key information, the rear side key information and the coordinate information thereof of the key information, and the second key point comprises:
and taking the ordinate of the vertical lower keyword information with the largest ordinate in the vertical lower keyword information of the keyword information as the ordinate of the second keyword, and taking the abscissa of the rear keyword information with the smallest abscissa in the rear keyword information of the keyword information as the abscissa of the second keyword, so as to obtain the second keyword.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (9)

1. The bank receipt information extraction method is characterized by comprising the following steps of:
carrying out data identification on the acquired bank receipt data, and carrying out structural conversion on unstructured data in the bank receipt data to obtain receipt string data corresponding to the bank receipt data;
obtaining a bank name according to the return string data;
according to the bank name, calling an information extraction template corresponding to the bank name;
and extracting information from the bank receipt data according to the information extraction template corresponding to the bank name.
2. The method of claim 1, wherein the information extraction template comprises a construction process of:
classifying according to different bank names in the historical bank receipt sample information to obtain receipt sample information under different bank names;
traversing and matching keyword information in the keyword library in turn in receipt sample information under different bank names based on a pre-constructed keyword library to obtain matched keyword information under different bank names and coordinate information of the keyword information;
carrying out data integration based on the keyword information matched under different bank names and the coordinate information of the keyword information to obtain a vertical coordinate list and a horizontal coordinate two-dimensional array of the keyword information;
And constructing an information extraction template according to the ordinate list and the abscissa two-dimensional array of the keyword information.
3. The method of claim 2, wherein constructing an information extraction template from the list of ordinates and the two-dimensional array of abscissas of the keyword information comprises:
sequentially taking each keyword information and coordinate information thereof, front side keyword information and coordinate information thereof corresponding to each keyword information, and rear side keyword and coordinate information thereof according to the ordinate list of the keyword information;
calculating the distance proportion between each keyword information in the ordinate list and the front keyword information and the rear keyword information respectively based on the keyword information and the coordinate information thereof, the front keyword information and the coordinate information thereof corresponding to each keyword information and the rear keyword and the coordinate information thereof;
according to the abscissa two-dimensional data of the keyword information, sequentially taking each keyword information and coordinate information thereof, and back side keyword information and coordinate information thereof;
calculating the distance ratio of each keyword information and the rear keyword information in the abscissa two-dimensional array based on the each keyword information and the coordinate information thereof, the rear keyword information and the coordinate information thereof;
And constructing an information extraction template according to the distance ratio of each keyword information in the ordinate list to the front keyword information and the rear keyword information and the distance ratio of each keyword information in the abscissa two-dimensional array to the rear keyword information.
4. The method of claim 3, wherein calculating a distance ratio of each keyword information in the ordinate list to the front keyword information based on the each keyword information and the coordinate information thereof, the front keyword information corresponding to each keyword information and the coordinate information thereof, respectively, comprises:
and taking the row-height quotient of the absolute value of the difference between the vertical coordinates of the keyword information and the front keyword information and the row-height quotient of the keyword information as the distance ratio of the keyword information and the front keyword in the vertical coordinate list based on the keyword information and the coordinate information thereof, the front keyword information and the coordinate information thereof corresponding to the keyword information, and the rear keyword and the coordinate information thereof.
5. The method of claim 2, wherein the extracting information from the bank receipt data according to the information extraction template corresponding to the bank name comprises:
Traversing corresponding keyword information and coordinate information thereof in the information extraction template, and keyword information and coordinate information thereof on the vertical upper side, the vertical lower side and the rear side of the keyword information based on the keyword library;
obtaining a first key point according to the key word information and the coordinate information thereof and the key word information and the coordinate information thereof on the vertical upper side of the key word information;
obtaining a second key point according to the vertical lower side key information and the coordinate information thereof of the key information and the rear side key information and the coordinate information thereof of the key information;
taking the connecting line of the first key point and the second key point as a diagonal line to generate a rectangle;
and extracting information by taking the data in the rectangle as key extraction information.
6. The method of claim 5, wherein the obtaining a first key point according to the key information and the coordinate information thereof and the key information and the coordinate information thereof on the vertical upper side of the key information comprises:
and taking the abscissa of the keyword information with the smallest abscissa in the keyword information as the abscissa of the first keyword, and taking the ordinate of the keyword information on the upper vertical side as the ordinate of the first keyword to obtain the first keyword.
7. The method of claim 5, wherein the obtaining a second key point according to the vertically lower side key information of the key information and the coordinate information thereof, the rear side key information of the key information and the coordinate information thereof, comprises:
and taking the ordinate of the vertical lower keyword information with the largest ordinate in the vertical lower keyword information of the keyword information as the ordinate of the second keyword, and taking the abscissa of the rear keyword information with the smallest abscissa in the rear keyword information of the keyword information as the abscissa of the second keyword, so as to obtain the second keyword.
8. A system for extracting information from a bank receipt, comprising:
and a data conversion module: the method comprises the steps of carrying out data identification on acquired bank receipt data, and carrying out structural conversion on unstructured data in the bank receipt data to obtain receipt string data corresponding to the bank receipt data;
name acquisition module: the method is used for obtaining a bank name according to the return string data;
and a template selection module: the information extraction template is used for calling the information extraction template corresponding to the bank name according to the bank name;
And the information extraction module is used for: and the information extraction module is used for extracting information from the bank receipt data according to the information extraction template corresponding to the bank name.
9. The system of claim 8, wherein the information extraction template in the template selection module comprises a build process of:
classifying according to different bank names in the historical bank receipt sample information to obtain receipt sample information under different bank names;
traversing and matching keyword information in the keyword library in turn in receipt sample information under different bank names based on a pre-constructed keyword library to obtain matched keyword information under different bank names and coordinate information of the keyword information;
carrying out data integration based on the keyword information matched under different bank names and the coordinate information of the keyword information to obtain a vertical coordinate list and a horizontal coordinate two-dimensional array of the keyword information;
and constructing an information extraction template according to the ordinate list and the abscissa two-dimensional array of the keyword information.
CN202410028502.9A 2024-01-09 2024-01-09 Bank receipt information extraction method and system Active CN117540721B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410028502.9A CN117540721B (en) 2024-01-09 2024-01-09 Bank receipt information extraction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410028502.9A CN117540721B (en) 2024-01-09 2024-01-09 Bank receipt information extraction method and system

Publications (2)

Publication Number Publication Date
CN117540721A true CN117540721A (en) 2024-02-09
CN117540721B CN117540721B (en) 2024-04-12

Family

ID=89782703

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410028502.9A Active CN117540721B (en) 2024-01-09 2024-01-09 Bank receipt information extraction method and system

Country Status (1)

Country Link
CN (1) CN117540721B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130290270A1 (en) * 2012-04-26 2013-10-31 Anu Pareek Method and system of data extraction from a portable document format file
CN108376365A (en) * 2018-03-22 2018-08-07 中国银行股份有限公司 A kind of Bank Number determines method and device
CN111428599A (en) * 2020-03-17 2020-07-17 北京公瑾科技有限公司 Bill identification method, device and equipment
CN113962197A (en) * 2021-08-19 2022-01-21 上海哥特网络技术有限公司 Medical laboratory test report standardization method and device, electronic equipment and storage medium
CN116740444A (en) * 2023-06-14 2023-09-12 中国银行股份有限公司 Information acquisition method, device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130290270A1 (en) * 2012-04-26 2013-10-31 Anu Pareek Method and system of data extraction from a portable document format file
CN108376365A (en) * 2018-03-22 2018-08-07 中国银行股份有限公司 A kind of Bank Number determines method and device
CN111428599A (en) * 2020-03-17 2020-07-17 北京公瑾科技有限公司 Bill identification method, device and equipment
CN113962197A (en) * 2021-08-19 2022-01-21 上海哥特网络技术有限公司 Medical laboratory test report standardization method and device, electronic equipment and storage medium
CN116740444A (en) * 2023-06-14 2023-09-12 中国银行股份有限公司 Information acquisition method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN117540721B (en) 2024-04-12

Similar Documents

Publication Publication Date Title
CN108960223B (en) Method for automatically generating voucher based on intelligent bill identification
CN109887153B (en) Finance and tax processing method and system
RU2679209C2 (en) Processing of electronic documents for invoices recognition
RU2695489C1 (en) Identification of fields on an image using artificial intelligence
US11232300B2 (en) System and method for automatic detection and verification of optical character recognition data
CN110889310B (en) Financial document information intelligent extraction system and method
US20210366055A1 (en) Systems and methods for generating accurate transaction data and manipulation
CN103177128A (en) Method and system for processing bill crown word number information
CN107133571A (en) A kind of system and method that paper invoice is automatically generated to financial statement
US20240046684A1 (en) System for Information Extraction from Form-Like Documents
CN111931780A (en) Intelligent management method and equipment for accounting documents
CN111444793A (en) Bill recognition method, equipment, storage medium and device based on OCR
CN112418812A (en) Distributed full-link automatic intelligent clearance system, method and storage medium
JP2019204535A (en) Accounting support system
CN111914729A (en) Voucher association method and device, computer equipment and storage medium
CN114511866A (en) Data auditing method, device, system, processor and machine-readable storage medium
Li et al. Image pattern recognition in identification of financial bills risk management
CN117540721B (en) Bank receipt information extraction method and system
CN115934963A (en) Business draft big data analysis method and application map for enterprise financial customer acquisition
US20220121881A1 (en) Systems and methods for enabling relevant data to be extracted from a plurality of documents
CN111241955B (en) Bill information extraction method and system
CN117608565B (en) Method and system for recommending AI type components in RPA (remote procedure A) based on screenshot analysis
CN111753841B (en) Bill identification method and device based on route distribution
US20220327502A1 (en) Enhanced image transaction processing solution and architecture
US20230409644A1 (en) Systems and method for generating labelled datasets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant