CN112784720A - Key information extraction method, device, equipment and medium based on bank receipt - Google Patents
Key information extraction method, device, equipment and medium based on bank receipt Download PDFInfo
- Publication number
- CN112784720A CN112784720A CN202110042586.8A CN202110042586A CN112784720A CN 112784720 A CN112784720 A CN 112784720A CN 202110042586 A CN202110042586 A CN 202110042586A CN 112784720 A CN112784720 A CN 112784720A
- Authority
- CN
- China
- Prior art keywords
- field
- key information
- character
- fields
- bank receipt
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 35
- 238000004140 cleaning Methods 0.000 claims abstract description 24
- 238000010801 machine learning Methods 0.000 claims abstract description 16
- 238000007635 classification algorithm Methods 0.000 claims description 28
- 238000000034 method Methods 0.000 claims description 24
- 238000004590 computer program Methods 0.000 claims description 12
- 230000011218 segmentation Effects 0.000 claims description 11
- 238000013145 classification model Methods 0.000 claims description 4
- 238000005406 washing Methods 0.000 claims 2
- 230000000694 effects Effects 0.000 abstract description 3
- 238000013135 deep learning Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000012549 training Methods 0.000 description 3
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/247—Thesauruses; Synonyms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Character Input (AREA)
Abstract
The invention discloses a key information extraction method based on bank receipt, which comprises the following steps: identifying an initial text field of a bank receipt; cleaning the initial character field to obtain a target character field; establishing dynamic link between adjacent target character fields to generate character field combination; and identifying the type of each text field combination, and extracting key information of the bank receipt from each text field combination through a machine learning model. Therefore, in the scheme, after the initial character field of the bank receipt is identified, the problems of field errors, incompleteness and the like of the extracted key information can be avoided in a mode of cleaning the initial character field, and the association degree between the fields can be improved in a mode of establishing character field combination, so that the integrity and the accuracy of the key information are improved; the invention also discloses a key information extraction device, equipment and a medium based on the bank receipt, and the technical effects can be realized.
Description
Technical Field
The invention relates to the technical field of information identification, in particular to a key information extraction method, a key information extraction device, key information extraction equipment and a key information extraction medium based on bank receipt.
Background
In recent years, the text recognition floor application based on deep learning is very mature, in the field of bank receipt recognition, a related recognition algorithm is used for performing related optimization work on a text recognition result based on deep learning, the central idea of the optimization work is to acquire a key information field in a bank receipt based on a keyword template matching mode, but due to the fact that the text recognition based on deep learning is in a related task of recognizing the bank receipt, the problem of separation of recognized fields occurs, the recognized key information field of the bank receipt is lost, and the recognition performance is not robust enough.
Therefore, how to improve the integrity and accuracy of the key information in the bank receipt is a problem to be solved by those skilled in the art.
Disclosure of Invention
The invention aims to provide a method, a device, equipment and a medium for extracting key information based on a bank receipt, so as to improve the integrity and accuracy of the key information in the bank receipt.
In order to achieve the above object, the present invention provides a key information extraction method based on bank receipt, which comprises:
identifying an initial text field of a bank receipt;
cleaning the initial character field to obtain a target character field;
establishing dynamic link between adjacent target character fields to generate character field combination;
and identifying the type of each text field combination, and extracting key information of the bank receipt from each text field combination through a machine learning model.
Wherein the executing a cleaning operation on the initial text field comprises:
and identifying stop words in the initial character field and deleting the stop words.
Wherein the executing a cleaning operation on the initial text field comprises:
and identifying the non-standard character fields in the initial character fields, and changing the non-standard character fields through a pre-stored common word library.
Wherein, the change of the non-standard character field through the pre-stored common word stock comprises the following steps:
changing the non-standard character field through a pre-stored company noun word stock; and/or modifying the non-standard text field through a pre-stored format regular rule.
Wherein the identifying the type of each text field combination comprises:
and determining the type of each character field combination through any one of a keyword classification algorithm, a word segmentation classification algorithm and a specific template classification algorithm.
If the type of the character field combination cannot be determined through the keyword classification algorithm, the word segmentation classification algorithm and the template classification algorithm, the key information extraction method further comprises the following steps:
the type of the text field combination is determined by a language classification model.
Wherein, the establishing of dynamic link between the adjacent target character fields comprises:
determining the position of each target character field;
and establishing dynamic links among target character fields which belong to the same horizontal direction and are adjacent in position, and/or establishing dynamic links among target character fields which belong to the same vertical direction and are adjacent in position, and/or establishing dynamic links among target character fields which do not belong to the same horizontal direction and vertical direction and are adjacent in position.
In order to achieve the above object, the present invention further provides a key information extraction device based on bank receipt, including:
the identification module is used for identifying the initial character field of the bank receipt;
the field cleaning module is used for executing cleaning operation on the initial character field to obtain a target character field;
the link establishing module is used for establishing dynamic link between adjacent target character fields to generate a character field combination;
the type identification module is used for identifying the type of each character field combination;
and the extraction module is used for extracting the key information of the bank receipt from each character field combination through a machine learning model.
To achieve the above object, the present invention further provides an electronic device comprising:
a memory for storing a computer program;
and the processor is used for realizing the steps of the key information extraction method based on the bank receipt when the computer program is executed.
In order to achieve the above object, the present invention further provides a computer-readable storage medium, which stores a computer program, and the computer program, when executed by a processor, implements the steps of the key information extraction method based on bank receipt.
According to the scheme, the key information extraction method based on the bank receipt provided by the embodiment of the invention comprises the following steps: identifying an initial text field of a bank receipt; cleaning the initial character field to obtain a target character field; establishing dynamic link between adjacent target character fields to generate character field combination; and identifying the type of each text field combination, and extracting key information of the bank receipt from each text field combination through a machine learning model. Therefore, in the scheme, after the initial character field of the bank receipt is identified, the problems of field errors, incompleteness and the like of the extracted key information can be avoided in a mode of cleaning the initial character field, and the association degree between the fields can be improved in a mode of establishing character field combination, so that the integrity and the accuracy of the key information are improved; the invention also discloses a key information extraction device, equipment and a medium based on the bank receipt, and the technical effects can be realized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a key information extraction method based on bank receipt disclosed by the embodiment of the invention;
fig. 2 is a schematic structural diagram of a key information extraction device based on a bank receipt according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention discloses a method, a device, equipment and a medium for extracting key information based on a bank receipt, which are used for improving the integrity and the accuracy of the key information in the bank receipt.
Referring to fig. 1, a schematic flow diagram of a key information extraction method based on a bank receipt provided by the embodiment of the present invention is shown; the method comprises the following steps:
s101, identifying an initial character field of a bank receipt;
in the scheme, firstly, character fields in the bank receipt need to be identified, and in the scheme, the identified character fields which do not execute the cleaning operation are used as initial character fields; and, when recognizing the initial character field, the initial character field can be recognized by a character recognition method based on deep learning. In the bank receipt, the text fields may be: payee, company name, account number, bank card number, etc.
S102, cleaning the initial character field to obtain a target character field;
in the present embodiment, in order to avoid situations such as inaccuracy and incompleteness of the extracted key information, after the initial text field is identified, a cleaning operation needs to be performed on the initial text field, and the text field after cleaning is referred to as a target text field.
Specifically, the method for executing the cleaning operation on the initial character field comprises the following steps: stop words in the initial text field are identified and deleted. Such as: and presetting a stop word library, wherein stop words in the stop word library are words which are already stopped, and if the stop words are detected in the initial character field, directly deleting the stop words to realize the cleaning of the stop words in the initial character field.
Further, the method for executing the cleaning operation on the initial character field further comprises the following steps: and identifying the non-standard character fields in the initial character fields, and changing the non-standard character fields through a pre-stored common word library. Wherein, non-standard characters field in this scheme includes: and identifying fields with errors, incompleteness and the like, and if the non-standard character field is detected, modifying the non-standard character field through a pre-stored common word library. The common word bank comprises: the system comprises a company name word stock and a format regular rule, so that when the non-standard character field is changed, the non-standard character field can be changed through the pre-stored company name word stock and/or the non-standard character field can be changed through the pre-stored format regular rule. For example: if the identified non-standard text fields are: the Zhejiang XX network is changed into the Zhejiang XX network through a company noun word stock, and then the Zhejiang XX network is changed into the Zhejiang XX network through a format regular rule.
S103, establishing dynamic links between adjacent target character fields to generate character field combinations;
the process of establishing dynamic links between adjacent target text fields specifically comprises the following steps: determining the position of each target character field; and establishing dynamic links among target character fields which belong to the same horizontal direction and are adjacent in position, and/or establishing dynamic links among target character fields which belong to the same vertical direction and are adjacent in position, and/or establishing dynamic links among target character fields which do not belong to the same horizontal direction and vertical direction and are adjacent in position.
Specifically, the position of each target text field in the scheme is the coordinate position of each target text field in the bank receipt. The position relation between the target character fields can be known through the positions of the target character fields. The adjacent target character fields in the scheme specifically refer to: and if the distance between the two target character fields is too large, the dynamic link between the two target character fields does not need to be established. And if the two target character fields are judged to be adjacent character fields with the distance value smaller than the preset threshold value through the position relation, judging that the two fields have the association relation, and establishing the dynamic link between the two fields. For example: if the field of the payee and the field of the XX company are adjacent fields, a dynamic link between the payee and the XX company is established, and the generated text field combination is as follows: "payee XX company".
It should be noted that the dynamic link in the present solution may specifically be a text space link, and includes establishing a dynamic link between adjacent target text fields in different directions. Therefore, the dynamic link of the adjacent text fields in the scheme comprises the following steps: the shortest coordinate distance link, the shortest vertical coordinate distance link, the shortest adjacent coordinate distance link and the like in the same row can be customized according to actual requirements in actual application. Wherein, the shortest link of the coordinate distance of the same row is as follows: establishing dynamic links between target character fields which belong to the same horizontal direction and are adjacent in position, for example: if the payee is adjacent to the XX company in the horizontal direction, a link is established between the payee and the XX company to generate a text field combination of the payee XX company; the shortest link in vertical coordinate distance is: establishing dynamic links between target character fields which belong to the same vertical direction and are adjacent in position, for example: if the payee is adjacent to the XX company in the vertical direction, a link is established between the payee and the XX company to generate a character field combination of the payee XX company; the shortest link between adjacent coordinate distances is: establishing dynamic links between target character fields which do not belong to the same horizontal direction and vertical direction and are adjacent in position, for example: if the "payee" and the "XX company" are not adjacent in the horizontal direction nor in the vertical direction, but the "XX company" is at the lower right corner of the adjacent "payee", in which case it can also be determined that the two have an adjacent relationship, a link is established between the "payee" and the "XX company", and a text field combination "payee XX company" is generated.
Therefore, the text space link structure is applied to extraction of key information of bank receipt, and the adjacent text field dynamic link module is used for optimally combining the related close fields, so that the completeness and the accuracy of key information extraction in the scheme are improved.
And S104, identifying the type of each character field combination, and extracting key information of the bank receipt from each character field combination through a machine learning model.
In the scheme, when the type of each character field combination is identified, the type of each character field combination can be determined through any one of a keyword classification algorithm, a participle classification algorithm and a specific template classification algorithm. The word field combination is classified based on a keyword classification algorithm, and is specifically determined according to keywords in the word field combination, for example: if the keyword 'payee account' appears in the character field combination, the type of the character field combination can be judged to be the payee account type; classifying character field combinations based on a word segmentation classification algorithm, specifically determining according to word segmentation, wherein the word segmentation refers to a part of keywords, such as: if the character field combination has an analysis account, the type of the character field combination can be judged to be the account type; when classifying character field combinations based on a specific template classification algorithm, firstly, different templates need to be preset, such as: the utility model discloses a text field combination is classified to the template that the accessible corresponds with this company to the text field combination is classified to the template that the foreign bank template or local bank template, because the field between the different banks is different, consequently set up behind the template, when carrying out key information extraction to the bank receipt of a certain company, thereby has improved categorised speed and accuracy. If the type of the character field combination cannot be determined through the key word classification algorithm, the word segmentation classification algorithm and the template classification algorithm, the type of the character field combination can be determined through the language classification model.
It should be noted that, after the classification, there is a problem that specific types cannot be identified, for example: when the word segmentation classification algorithm is used for classifying the character field combination, if the character field combination has an analysis account, the type of the character field combination can be judged to be the account type, but the mode cannot determine whether the account type is the payee account type or the payer account type, and in this case, the type needs to be determined again. In the scheme, regular patterns, keywords, custom patterns and rule patterns can be used for accurate classification.
When the classification is performed in the regular manner, different types of fields need to be stored in advance, for example: a payee account and a payer account are stored in advance, and if the account in the text field combination belongs to the payee account, the type of the text field combination is determined to be the payee account; when the keywords are used for classification, if the character field combination is detected to have a character of receiving, the character field combination is judged to be a payee account; when the user-defined template is used for classification, the type of the character field combination is determined by freely defining the keywords and the coordinate position of the field according to the self experience of a user in advance; when the rule template is used for classification, the rule template needs to be preset, and some rules in the rule template can be used for identifying the type of the text field combination, for example: if the rule is: the position of the payee is above the position of the payer, and the type of the payee is determined to be the payee or the payer through the position of the character field combination, if the position is above, the payee is determined to be the payee, and if the position is below, the payer is determined to be the payer.
Furthermore, after the type of each text field combination is identified through the above method, each text field combination and corresponding related information need to be sent to the machine learning model, and key information of the bank receipt is extracted from each text field combination through the machine learning model. The related information is specifically the type, coordinate position and adjacent field of the character field combination. The off-line training process of the key information extraction model based on machine learning can perform key information extraction training according to relevant information of fields. The machine learning model may specifically extract key information of the bank receipt from each text field combination as follows: corresponding specific contents are extracted from each type of text field combination, for example: if the word field combination is "payee XX company", the key information of the machine learning model for extracting the bank receipt is: by the method, the fact that the payee in the bank receipt is the XX company can be accurately determined.
In addition, in order to improve the accurate determination of the model extraction, after the key information of the bank receipt is extracted, if the identified key information is wrong, the backstage can automatically record the wrong key information, and through the analysis and identification of the wrong information, the corresponding template, rule and machine learning model are changed, or the machine learning model is trained again and other operations are carried out, and through the continuous training mode, the parameters and the structure of the model are optimized, so that the better field key information extraction effect is realized. Therefore, according to the scheme, fields and related parameters are selected by using the technologies such as keywords/rules and the like according to the existing field links, and the fields and the related parameters are transmitted to the machine learning model for extracting the key information, so that the method can cope with the ever-expanding user groups and markets, and ensure the robustness and the advancement of the relevant service of bank receipt identification.
In conclusion, the extraction process of the key information of the bank receipt can be completed through the operations of cleaning the character fields, dynamic character field linking, character field classification, key information extraction and the like, and the completeness and the accuracy of the key information are ensured.
In the following, the key information extracting apparatus, the device, and the medium according to the embodiments of the present invention are introduced, and the key information extracting apparatus, the device, and the medium described below may be referred to the key information extracting method described above.
Referring to fig. 2, an embodiment of the present invention provides a key information extraction apparatus based on a bank receipt, including:
the identification module 100 is used for identifying an initial text field of a bank receipt;
a field cleaning module 200, configured to perform a cleaning operation on the initial text field to obtain a target text field;
a link establishing module 300, configured to establish a dynamic link between adjacent target text fields, and generate a text field combination;
a type identification module 400 for identifying the type of each text field combination;
and the extracting module 500 is used for extracting the key information of the bank receipt from each character field combination through a machine learning model.
Wherein, the field cleaning module comprises:
and the deleting unit is used for identifying the stop words in the initial character field and deleting the stop words.
Wherein, the field cleaning module comprises:
the identification unit is used for identifying the non-standard character field in the initial character field;
and the changing unit is used for changing the non-standard character field through a pre-stored common word stock.
Wherein the modification unit is specifically configured to: changing the non-standard character field through a pre-stored company noun word stock; and/or modifying the non-standard text field through a pre-stored format regular rule.
Wherein the identification unit is specifically configured to: and determining the type of each character field combination through any one of a keyword classification algorithm, a word segmentation classification algorithm and a specific template classification algorithm. And if the type of the character field combination cannot be determined through the keyword classification algorithm, the word segmentation classification algorithm and the template classification algorithm, determining the type of the character field combination through a language classification model.
Wherein, the link establishment module comprises:
the position determining unit is used for determining the position of each target character field;
and the link establishing unit is used for establishing dynamic links among target character fields which belong to the same horizontal direction and are adjacent in position, and/or establishing dynamic links among target character fields which belong to the same vertical direction and are adjacent in position, and/or establishing dynamic links among target character fields which do not belong to the same horizontal direction and vertical direction and are adjacent in position.
Fig. 3 is a schematic structural diagram of an electronic device disclosed in the embodiment of the present invention; the method comprises the following steps:
a memory 11 for storing a computer program;
and a processor 12, configured to implement the steps of the key information extraction method based on the bank receipt according to any of the above-mentioned method embodiments when executing the computer program.
In this embodiment, the device may be a PC (Personal Computer), or may be a terminal device such as a smart phone, a tablet Computer, a palmtop Computer, or a portable Computer.
The device may include a memory 11, a processor 12, and a bus 13.
The memory 11 includes at least one type of readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The memory 11 may in some embodiments be an internal storage unit of the device, for example a hard disk of the device. The memory 11 may also be an external storage device of the device in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc. provided on the device. Further, the memory 11 may also include both an internal storage unit of the device and an external storage device. The memory 11 may be used not only to store application software installed in the device and various types of data, but also to temporarily store data that has been output or will be output.
The bus 13 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 3, but this does not mean only one bus or one type of bus.
Further, the device may further include a network interface 14, and the network interface 14 may optionally include a wired interface and/or a wireless interface (e.g., WI-FI interface, bluetooth interface, etc.), which are generally used to establish a communication connection between the device and other electronic devices.
Optionally, the device may further comprise a user interface 15, the user interface 15 may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 15 may further comprise a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the device and for displaying a visualized user interface.
Fig. 3 shows only the device with the components 11-15, and it will be understood by those skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the device, and may comprise fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
The embodiment of the invention also discloses a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and when being executed by a processor, the computer program realizes the steps of the key information extraction method based on the bank receipt in any method embodiment.
Wherein the storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
1. A key information extraction method based on bank receipt is characterized by comprising the following steps:
identifying an initial text field of a bank receipt;
cleaning the initial character field to obtain a target character field;
establishing dynamic link between adjacent target character fields to generate character field combination;
and identifying the type of each text field combination, and extracting key information of the bank receipt from each text field combination through a machine learning model.
2. The method of claim 1, wherein the performing a washing operation on the initial text field comprises:
and identifying stop words in the initial character field and deleting the stop words.
3. The method of claim 1, wherein the performing a washing operation on the initial text field comprises:
and identifying the non-standard character fields in the initial character fields, and changing the non-standard character fields through a pre-stored common word library.
4. The method for extracting key information according to claim 3, wherein the modifying the non-standard text field through the pre-stored common lexicon comprises:
changing the non-standard character field through a pre-stored company noun word stock; and/or modifying the non-standard text field through a pre-stored format regular rule.
5. A method as claimed in claim 1, wherein said identifying the type of each text field combination comprises:
and determining the type of each character field combination through any one of a keyword classification algorithm, a word segmentation classification algorithm and a specific template classification algorithm.
6. The method of claim 5, wherein if the type of the text field combination cannot be determined by the keyword classification algorithm, the word segmentation classification algorithm, or the template classification algorithm, the method further comprises:
the type of the text field combination is determined by a language classification model.
7. A key information extraction method as claimed in any one of claims 1 to 6, wherein the establishing of dynamic links between adjacent target text fields comprises:
determining the position of each target character field;
and establishing dynamic links among target character fields which belong to the same horizontal direction and are adjacent in position, and/or establishing dynamic links among target character fields which belong to the same vertical direction and are adjacent in position, and/or establishing dynamic links among target character fields which do not belong to the same horizontal direction and vertical direction and are adjacent in position.
8. The utility model provides a key information extraction element based on bank receipt which characterized in that includes:
the identification module is used for identifying the initial character field of the bank receipt;
the field cleaning module is used for executing cleaning operation on the initial character field to obtain a target character field;
the link establishing module is used for establishing dynamic link between adjacent target character fields to generate a character field combination;
the type identification module is used for identifying the type of each character field combination;
and the extraction module is used for extracting the key information of the bank receipt from each character field combination through a machine learning model.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the method for extracting key information based on bank receipt according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the method for extracting key information based on bank statements according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110042586.8A CN112784720A (en) | 2021-01-13 | 2021-01-13 | Key information extraction method, device, equipment and medium based on bank receipt |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110042586.8A CN112784720A (en) | 2021-01-13 | 2021-01-13 | Key information extraction method, device, equipment and medium based on bank receipt |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112784720A true CN112784720A (en) | 2021-05-11 |
Family
ID=75755718
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110042586.8A Pending CN112784720A (en) | 2021-01-13 | 2021-01-13 | Key information extraction method, device, equipment and medium based on bank receipt |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112784720A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113469005A (en) * | 2021-06-24 | 2021-10-01 | 金蝶软件(中国)有限公司 | Recognition method of bank receipt, related device and storage medium |
CN113741995A (en) * | 2021-08-09 | 2021-12-03 | 太逗科技集团有限公司 | Method, device, equipment and medium for automatically confirming receipt by bypassing bank control |
CN114419640A (en) * | 2022-02-25 | 2022-04-29 | 北京百度网讯科技有限公司 | Text processing method and device, electronic equipment and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108197163A (en) * | 2017-12-14 | 2018-06-22 | 上海银江智慧智能化技术有限公司 | A kind of structuring processing method based on judgement document |
CN109698798A (en) * | 2018-12-14 | 2019-04-30 | 北京锐安科技有限公司 | A kind of recognition methods of application, device, server and storage medium |
CN110991456A (en) * | 2019-12-05 | 2020-04-10 | 北京百度网讯科技有限公司 | Bill identification method and device |
CN111191715A (en) * | 2019-12-27 | 2020-05-22 | 深圳市商汤科技有限公司 | Image processing method and device, electronic equipment and storage medium |
KR20200063750A (en) * | 2018-11-28 | 2020-06-05 | 삼성생명보험주식회사 | A computing device for extracting item from a document image |
CN111460164A (en) * | 2020-05-22 | 2020-07-28 | 南京大学 | Intelligent barrier judgment method for telecommunication work order based on pre-training language model |
CN111696675A (en) * | 2020-05-22 | 2020-09-22 | 平安国际智慧城市科技股份有限公司 | User data classification method and device based on Internet of things data and computer equipment |
CN111767334A (en) * | 2020-06-30 | 2020-10-13 | 北京百度网讯科技有限公司 | Information extraction method and device, electronic equipment and storage medium |
-
2021
- 2021-01-13 CN CN202110042586.8A patent/CN112784720A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108197163A (en) * | 2017-12-14 | 2018-06-22 | 上海银江智慧智能化技术有限公司 | A kind of structuring processing method based on judgement document |
KR20200063750A (en) * | 2018-11-28 | 2020-06-05 | 삼성생명보험주식회사 | A computing device for extracting item from a document image |
CN109698798A (en) * | 2018-12-14 | 2019-04-30 | 北京锐安科技有限公司 | A kind of recognition methods of application, device, server and storage medium |
CN110991456A (en) * | 2019-12-05 | 2020-04-10 | 北京百度网讯科技有限公司 | Bill identification method and device |
CN111191715A (en) * | 2019-12-27 | 2020-05-22 | 深圳市商汤科技有限公司 | Image processing method and device, electronic equipment and storage medium |
CN111460164A (en) * | 2020-05-22 | 2020-07-28 | 南京大学 | Intelligent barrier judgment method for telecommunication work order based on pre-training language model |
CN111696675A (en) * | 2020-05-22 | 2020-09-22 | 平安国际智慧城市科技股份有限公司 | User data classification method and device based on Internet of things data and computer equipment |
CN111767334A (en) * | 2020-06-30 | 2020-10-13 | 北京百度网讯科技有限公司 | Information extraction method and device, electronic equipment and storage medium |
Non-Patent Citations (1)
Title |
---|
闫龙川: "《数据科学与工程技术丛书 Python文本分析》", vol. 2, 机械工业出版社, pages: 371 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113469005A (en) * | 2021-06-24 | 2021-10-01 | 金蝶软件(中国)有限公司 | Recognition method of bank receipt, related device and storage medium |
CN113741995A (en) * | 2021-08-09 | 2021-12-03 | 太逗科技集团有限公司 | Method, device, equipment and medium for automatically confirming receipt by bypassing bank control |
CN114419640A (en) * | 2022-02-25 | 2022-04-29 | 北京百度网讯科技有限公司 | Text processing method and device, electronic equipment and storage medium |
CN114419640B (en) * | 2022-02-25 | 2023-08-11 | 北京百度网讯科技有限公司 | Text processing method, device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6710483B2 (en) | Character recognition method for damages claim document, device, server and storage medium | |
CN112784720A (en) | Key information extraction method, device, equipment and medium based on bank receipt | |
US10049096B2 (en) | System and method of template creation for a data extraction tool | |
US20200117944A1 (en) | Key value extraction from documents | |
JP2012083951A (en) | Information processing equipment, information processing method and program | |
CN112818457B (en) | BIM model intelligent generation method and system based on CAD drawing | |
CN113657274B (en) | Table generation method and device, electronic equipment and storage medium | |
CN102750552B (en) | Handwriting recognition method and system as well as handwriting recognition terminal | |
CN103455264A (en) | Handwritten Chinese character input method and electronic device with same | |
CN106874947A (en) | Method and apparatus for determining word shape recency | |
CN106325596A (en) | Automatic handwriting error correction method and system | |
EP3405906A1 (en) | System and method for recognizing multiple object structure | |
CN104794485A (en) | Method and device for recognizing written words | |
CN104064182A (en) | A voice recognition system and method based on classification rules | |
CN113312899A (en) | Text classification method and device and electronic equipment | |
CN112381458A (en) | Project evaluation method, project evaluation device, equipment and storage medium | |
CN114092948A (en) | Bill identification method, device, equipment and storage medium | |
CN113673528A (en) | Text processing method and device, electronic equipment and readable storage medium | |
JP2020140450A (en) | Structured data generation method and structured data generation device | |
CN110688995B (en) | Map query processing method, computer-readable storage medium and mobile terminal | |
CN112579781A (en) | Text classification method and device, electronic equipment and medium | |
CN112163400A (en) | Information processing method and device | |
CN110442843B (en) | Character replacement method, system, computer device and computer readable storage medium | |
CN111325023A (en) | Risk item information data searching method | |
CN115578736A (en) | Certificate information extraction method, device, storage medium and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |