CN115294588B - Data processing method and system based on RPA flow robot - Google Patents
Data processing method and system based on RPA flow robot Download PDFInfo
- Publication number
- CN115294588B CN115294588B CN202210983630.XA CN202210983630A CN115294588B CN 115294588 B CN115294588 B CN 115294588B CN 202210983630 A CN202210983630 A CN 202210983630A CN 115294588 B CN115294588 B CN 115294588B
- Authority
- CN
- China
- Prior art keywords
- manager
- characters
- confidence
- area
- handwriting
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 14
- 238000012015 optical character recognition Methods 0.000 claims description 26
- 238000012937 correction Methods 0.000 claims description 18
- 238000012790 confirmation Methods 0.000 claims description 8
- 238000007781 pre-processing Methods 0.000 claims description 8
- 238000012986 modification Methods 0.000 claims description 6
- 230000004048 modification Effects 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 6
- 239000003086 colorant Substances 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims 1
- 238000012550 audit Methods 0.000 abstract description 2
- 208000014633 Retinitis punctata albescens Diseases 0.000 description 14
- 238000000034 method Methods 0.000 description 9
- 238000007726 management method Methods 0.000 description 4
- 238000000605 extraction Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000004801 process automation Methods 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/412—Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/1444—Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/24—Character recognition characterised by the processing or recognition method
- G06V30/242—Division of the character sequences into groups prior to recognition; Selection of dictionaries
- G06V30/244—Division of the character sequences into groups prior to recognition; Selection of dictionaries using graphical properties, e.g. alphabet type or font
- G06V30/2455—Discrimination between machine-print, hand-print and cursive writing
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Character Discrimination (AREA)
Abstract
The invention relates to a table data processing method and a system based on an RPA flow robot, which are characterized in that: s1, reading the content of a table, and distinguishing whether the content contains handwriting or not; s2, carrying out module division on the area of the table according to the type of the table, and dividing the area where the handwritten word is located into a fuzzy area; s3, carrying out confidence assignment on the read text content in the table according to the occupation ratio of the handwriting words in the text in the table; s4, comparing the confidence coefficient value of the text of the fuzzy area with a preset confidence coefficient value, and outputting prompt information to a manager. The scheme can automatically convert and extract the text content of the table file; and the handwriting in the table can be distinguished, the credibility of the extracted contents of the table is automatically prejudged, and when the credibility is lower, a manager is prompted to manually audit, modify and confirm the extracted text contents. The scheme can partially replace manual operation to automatically extract the table information, and can remarkably improve the working efficiency.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to a table data processing method and system based on an RPA flow robot.
Background
RPA robot process automation refers to a process rule designed in advance by a developer, so that the robot can simulate operations such as manual text input, copy and paste, mouse movement, clicking and the like, thereby replacing or assisting human complete repetitive work.
For example, chinese patent with application number of cn202111033494.X discloses a data processing method and device based on RPA robot, which can also be used in the financial field, and the method includes: acquiring a basic functional component and a corresponding business flow message of an RPA robot by calling an RPA robot interface, classifying and abstracting the basic functional component and the business flow message according to a message specification corresponding to the RPA robot to obtain a structured data dictionary and displaying the structured data dictionary to a user; receiving a basic function module selection instruction and a business process execution instruction which are sent after the user selects the structured data dictionary, and generating an RPA development requirement; the application can effectively improve the RPA requirement mining and extraction efficiency.
In a power system, in order to enhance information management of a business process, information of paper forms, electronic forms in various formats, and online forms needs to be extracted and centrally managed. The existing information extraction mainly adopts a mode of manual scanning and manual copying input, forms information is input into a management system, a large amount of repeated labor is needed, manpower resources are wasted, the manual operation is easy to leak, the efficiency is low, and the improvement is needed.
Disclosure of Invention
Based on the expression, the invention provides a table data processing method and a table data processing system for an RPA flow robot, which can partially replace manual operation to automatically extract table information and can improve the working efficiency.
The technical scheme for solving the technical problems is as follows:
A table data processing method based on an RPA flow robot comprises the following steps:
s1, recognizing and preprocessing a table, converting contents in the table into readable contents, reading the converted contents, distinguishing whether the contents contain handwriting or not, and defining the type of the table according to the handwriting;
s2, carrying out module division on the area of the table according to the type of the table, dividing the area where the handwritten word is located into a fuzzy area, and the other areas are trusted areas;
s3, carrying out confidence assignment on the read text content in the table according to the occupation ratio of the handwriting words in the text in the table;
S4, comparing the confidence coefficient value of the text of the fuzzy area with a preset confidence coefficient value, and outputting prompt information to a manager when the confidence coefficient value is smaller than the preset confidence coefficient value, so as to prompt the manager to modify and confirm the information of the identified fuzzy area.
As a preferable scheme: and S1, recognizing and preprocessing the table, reading the suffix of the table file, judging the format of the table, and performing OCR (optical character recognition) on the characters of the table in the picture or PDF format to obtain readable character information.
As a preferable scheme: when OCR recognition is performed, whether the characters are written characters or typewritten by a printer is determined according to the stroke flatness of the characters, each recognized character is marked and counted, and the occupation ratio of the handwriting characters in all the characters is calculated after the marking and the counting are summarized.
As a preferable scheme: when the handwritten characters in the form are identified, the method further comprises the steps of identifying and marking the writers, outputting the identification information of the writers to a manager while outputting prompt information, popping up a selectable operation window, prompting the manager to assign the handwritten character identification degree of the writers to obtain an identification degree value, calculating the average identification degree value of the writers after multiple assignments, calculating the average identification degree value and the character confidence degree value of a fuzzy area to obtain correction confidence degree, comparing the correction confidence degree with the preset confidence degree value, and outputting prompt information to the manager when the correction confidence degree is smaller than the preset confidence degree value.
As a preferable scheme: when the fact that the filling area is not filled in the form is recognized, missing information is output to the manager, and the manager is prompted to confirm modification confirmation.
As a preferable scheme: and popping up a selectable operation window to the manager, selecting a filling area required to be prompted by the manager, and outputting missing information to the manager when the required filling area does not fill content, so as to prompt the manager to carry out modification confirmation.
As a preferable scheme: when the prompt information is output to the manager, the prompt content is marked and displayed through colors and underlines.
A form data processing system based on an RPA flow robot, comprising:
The preprocessing module is used for identifying and classifying the table files according to the suffixes of the table files;
The OCR recognition module is used for carrying out OCR recognition on the picture type or PDF type form file and distinguishing typewriting and handwritten characters according to the straightness of strokes;
the identification module is used for typesetting and marking the identification characters output by the OCR module, defining the area where the handwritten characters are located as a fuzzy area, defining other areas as trusted areas, and marking and displaying the fuzzy area through colors or underlines;
The reading module is used for reading the characters identified by the OCR module, summarizing and calculating the duty ratio of the handwritten characters in all the identified characters, and outputting a statistical result;
the assignment module is used for carrying out confidence assignment on the identification content of the table file according to the occupation ratio of the handwriting, and the higher the handwriting occupation ratio is, the lower the confidence is;
And the comparison prompt module is used for comparing the confidence coefficient value of the text content of the form with a preset confidence coefficient value, and outputting prompt information to a manager when the confidence coefficient value is smaller than the preset confidence coefficient value, so as to prompt the manager to modify and confirm the information of the identified fuzzy area.
As a preferable scheme: the assignment module further comprises a correction unit, the reading module is used for reading the name of the signature in the identified text content, the correction unit is used for enabling a manager to conduct identification assignment on the read name, and the correction unit is used for calculating the identification value and the confidence value to obtain a corrected confidence value.
Compared with the prior art, the technical scheme of the application has the following beneficial technical effects: the scheme can distinguish and define the types of the table files, directly extract the text contents in the readable table files, and automatically convert and extract the text contents of the unreadable table files; and handwritten characters in the table can be distinguished, the credibility of the extracted contents of the table is automatically prejudged according to the occupation ratio of the handwritten characters in the extracted contents of the characters, and when the prejudgment is lower in credibility, a manager is prompted to manually audit, modify and confirm the extracted contents of the characters, so that the phenomenon of inputting error information into a management system can be avoided. The scheme can partially replace manual operation to automatically extract the table information, and can remarkably improve the working efficiency.
Drawings
Fig. 1 is a flow chart of a method in a first embodiment.
Detailed Description
Embodiment one:
Referring to fig. 1, a table data processing method based on an RPA flow robot includes the following steps:
s1, recognizing and preprocessing a table, converting contents in the table into readable contents, reading the converted contents, distinguishing whether the contents contain handwriting or not, and defining the type of the table according to the handwriting;
In practical situations, the file formats of the tables are various, and in combination with several table formats used in the business, the table formats are specified in advance, for example, it is specified that only the table file with the suffix doc, docx, wps, xls, jpg, png, pdf, htm, html format can be used in the business flow. The table files are classified according to the suffixes, and are roughly classified into two types, wherein the first type is directly readable and the second type is not directly readable.
Doc, docx, wps, xls is a standard document format, and the contents of the form file can be accurately read without identification; the html and the html are in a webpage document format, and the contents of the form files can be directly read.
Jpg, png, PDF is a form file in a picture and PDF format, and the contents of such form file cannot be read directly, so that the contents of such form file need to be first identified and converted into a readable file format, and then the contents thereof need to be read. And since the table files in the picture and PDF format generally contain handwriting and signature, that is, contain handwriting content, it is difficult to avoid the situation of having recognition errors when recognizing and outputting the handwriting content. And when more handwritten contents are in one table file, the frequency of recognizing the handwritten contents is higher.
In this embodiment, when the suffix of the table file is identified as a table file in doc, docx, wps, xls, htm, html format, the content in the table file is directly read and output; when the suffix of the table file is identified as jpg, png or pdf, character information in the table is identified and output through OCR (optical character recognition) on the content of the table file. And in the recognition process, the stroke flatness of each character is judged to distinguish whether the character is typewritten or handwritten.
The straightness of the strokes is judged by establishing a coordinate system, selecting a plurality of strokes (OCR recognition, white areas are blank areas and black areas are strokes) of the characters, selecting a plurality of points on each stroke (namely, selecting a plurality of points in a plurality of continuous black areas), determining coordinates of the plurality of points, judging through the difference value of the horizontal and vertical coordinates of the adjacent three points, and if the equivalent change of the horizontal and vertical coordinates of the adjacent three points occurs, considering the current characters as typewriting by a printer, otherwise, the current characters are handwritten characters.
When the form file does not contain handwriting, the form file is defined as a 'class' file; when a form file contains a handwritten word, it is defined as a "class two" file.
S2, carrying out module division on the area of the table according to the type of the table, dividing the area where the handwritten word is located into a fuzzy area, and the other areas are trusted areas;
For a 'class' file, dividing the whole file into trusted files; and for the second-class file, dividing the handwriting area in the content of the second-class file into a fuzzy area, and dividing other areas into trusted areas.
S3, carrying out confidence assignment on the read text content in the table according to the occupation ratio of the handwriting words in the text in the table;
And (3) performing OCR (optical character recognition) on the second-class files, distinguishing and counting the handwritten characters and the typewriting of the machine in the OCR process, calculating the ratio of the handwritten characters to the characters, and outputting a statistical result. Since the more handwritten content is in the form file, the more frequently the handwritten content is recognized as erroneous. So the accuracy of the overall identification of the content of the handwritten account-comparison table file is directly influenced, and can be simply defined: the higher the duty cycle of the handwritten word, the lower the accuracy of the overall recognition. Assigning a value to the recognition confidence of the table file according to the occupancy ratio of the handwriting, for example: when no handwriting is performed, the confidence is 10; when the ratio of the handwriting is lower than 10%, the confidence is 9; when the handwriting ratio is 10% -20%, the confidence coefficient is 8 … …, and when the handwriting ratio is 80% -90%, the confidence coefficient is 1; when the handwriting ratio is more than 90%, the confidence is 0.
S4, comparing the confidence coefficient value of the text of the fuzzy area with a preset confidence coefficient value, and outputting prompt information to a manager when the confidence coefficient value is smaller than the preset confidence coefficient value, so as to prompt the manager to modify and confirm the information of the identified fuzzy area.
By presetting the prompting rules, for example, only when the confidence of the content of the read form file is lower than 8, outputting the read text content, prompting the manager to review, modify and confirm the read text content, and after the manager clicks the confirmation, calculating the effective content of the text content read in the form file, so that the subsequent flow can be performed, or else, the subsequent flow cannot be continued.
In fact, for different writers, handwriting and the straightness of handwriting are different, the accuracy of OCR recognition can be directly affected by the handwriting and the straightness of writing, and the clearer the handwriting and the higher the straightness are, the higher the accuracy of recognition is. Therefore, the recognition accuracy of the handwriting of different writers is not different, so that the confidence of the text content read in the table file can be corrected according to the handwriting recognition accuracy of the writers, not just the duty ratio of the reference handwriting. For a writer with higher handwriting recognition accuracy, the handwriting of the writer can be accurately recognized, so that assignment and prompt of contents of a table file containing the handwriting are not needed, unnecessary prompt is reduced, and the table data processing efficiency is improved.
The specific implementation mode is as follows: in step S4, when the content confidence of the read form file is lower than the preset value, outputting the read text content, wherein the output text content contains the signature of the writer, prompting the manager to manually check, modify and confirm the output text content so as to correct the text with the identification error, and the manager can modify the text content to be considered as the effective content after finishing clicking confirmation, so that the next flow can be executed. If the output text content contains the signature of the writer, after the manager clicks to finish confirmation, the recognition degree assignment click box is popped up to prompt the manager to input assignment to the recognition degree (namely recognition accuracy) of the handwritten word of the writer. The manager can assign a value according to the number of manually corrected words, the value is lower as the number of corrected words is larger, and if the handwritten words of the writer can be accurately identified, the identification degree value is larger than 1
Assigning the handwriting recognition degrees in a plurality of table files of the same writer, calculating to obtain an average recognition degree value of the writer, calculating the average recognition degree value and a text confidence degree value of a fuzzy area to obtain a correction confidence degree, comparing the correction confidence degree with the preset confidence degree value, and outputting prompt information to a manager when the former is smaller than the latter.
Therefore, when the handwritten characters of the writer can be accurately identified, the correction confidence is necessarily larger than the preset confidence, and for the table text content containing the handwritten characters of the writer, the system can not output the identified text content but automatically considers the text content as effective content, automatically enters the next process, does not need the steps of checking, modifying and confirming by management personnel, reduces unnecessary prompts and improves the working efficiency.
In this embodiment, when a filling area (the long section in the form is underlined by OCR) is recognized in the form, the area is considered as the filling area, and if no text is recognized in the area, the unfilled text is considered as unfilled, the missing information is output to the manager, and the manager is prompted to confirm the modification confirmation. Specifically, a selectable operation window is popped up to the manager, and the manager selects a filling area required to be prompted.
In this embodiment, when a prompt message is output to a manager, the fuzzy area and the missing area are marked and displayed by color and underline, so that the manager can quickly and intuitively find the area needing to be checked and modified, and the operation efficiency can be improved.
Embodiment two:
a form data processing system based on an RPA flow robot, comprising:
The preprocessing module is used for identifying and classifying the table files according to the suffixes of the table files;
The OCR recognition module is used for carrying out OCR recognition on the picture type or PDF type form file and distinguishing typewriting and handwritten characters according to the straightness of strokes;
the identification module is used for typesetting and marking the identification characters output by the OCR module, defining the area where the handwritten characters are located as a fuzzy area, defining other areas as trusted areas, and marking and displaying the fuzzy area through colors or underlines;
The reading module is used for reading the characters identified by the OCR module, summarizing and calculating the duty ratio of the handwritten characters in all the identified characters, and outputting a statistical result;
the assignment module is used for carrying out confidence assignment on the identification content of the table file according to the occupation ratio of the handwriting, and the higher the handwriting occupation ratio is, the lower the confidence is;
And the comparison prompt module is used for comparing the confidence coefficient value of the text content of the form with a preset confidence coefficient value, and outputting prompt information to a manager when the confidence coefficient value is smaller than the preset confidence coefficient value, so as to prompt the manager to modify and confirm the information of the identified fuzzy area.
In this embodiment: the assignment module further comprises a correction unit, the reading module is used for reading the name of the signature in the identified text content, the correction unit is used for enabling a manager to conduct identification assignment on the read name, and the correction unit is used for calculating the identification value and the confidence value to obtain a corrected confidence value.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.
Claims (7)
1. The table data processing method based on the RPA flow robot is characterized by comprising the following steps of:
s1, recognizing and preprocessing a table, converting contents in the table into readable contents, reading the converted contents, distinguishing whether the contents contain handwriting or not, and defining the type of the table according to the handwriting;
s2, carrying out module division on the area of the table according to the type of the table, dividing the area where the handwritten word is located into a fuzzy area, and the other areas are trusted areas;
s3, carrying out confidence assignment on the read text content in the table according to the occupation ratio of the handwriting words in the text in the table;
S4, comparing the confidence coefficient value of the characters in the fuzzy area with a preset confidence coefficient value, and outputting prompt information to a manager when the confidence coefficient value is smaller than the preset confidence coefficient value, so as to prompt the manager to modify and confirm the information of the identified fuzzy area; when the handwritten characters in the table are identified, identification and marking of the writers are carried out, prompt information is output, meanwhile, identification information of the writers is output to a manager, a selectable operation window is popped up, the manager is prompted to carry out assignment on the handwritten character identification degree of the writers, an identification degree value is obtained, the average identification degree value of the writers is obtained through calculation after multiple assignments, the average identification degree value is calculated with the character confidence degree value of a fuzzy area, correction confidence degree is obtained, the correction confidence degree is compared with the preset confidence degree value, and prompt information is output to the manager when the correction confidence degree is smaller than the preset confidence degree value.
2. The table data processing method based on the RPA flow robot according to claim 1, wherein: and S1, recognizing and preprocessing the table, reading the suffix of the table file, judging the format of the table, and performing OCR (optical character recognition) on the characters of the table in the picture or PDF format to obtain readable character information.
3. The table data processing method based on the RPA flow robot according to claim 2, wherein: when OCR recognition is performed, whether the characters are written characters or typewritten by a printer is determined according to the stroke flatness of the characters, each recognized character is marked and counted, and the occupation ratio of the handwriting characters in all the characters is calculated after the marking and the counting are summarized.
4. The table data processing method based on the RPA flow robot according to claim 1, wherein: when the fact that the filling area is not filled in the form is recognized, missing information is output to the manager, and the manager is prompted to confirm modification confirmation.
5. The table data processing method based on the RPA flow robot according to claim 1, wherein: and popping up a selectable operation window to the manager, selecting a filling area required to be prompted by the manager, and outputting missing information to the manager when the required filling area does not fill content, so as to prompt the manager to carry out modification confirmation.
6. The table data processing method based on the RPA flow robot according to claim 1, wherein: when the prompt information is output to the manager, the prompt content is marked and displayed through colors and underlines.
7. A form data processing system based on an RPA flow robot, comprising:
The preprocessing module is used for identifying and classifying the table files according to the suffixes of the table files;
The OCR recognition module is used for carrying out OCR recognition on the picture type or PDF type form file and distinguishing typewriting and handwritten characters according to the straightness of strokes;
the identification module is used for typesetting and marking the identification characters output by the OCR module, defining the area where the handwritten characters are located as a fuzzy area, defining other areas as trusted areas, and marking and displaying the fuzzy area through colors or underlines;
The reading module is used for reading the characters identified by the OCR module, summarizing and calculating the duty ratio of the handwritten characters in all the identified characters, and outputting a statistical result;
The assignment module is used for carrying out confidence assignment on the identification content of the table file according to the occupation ratio of the handwriting, and the higher the handwriting occupation ratio is, the lower the confidence is; the assignment module further comprises a correction unit, the reading module is used for reading the name of the signature in the identified text content, the correction unit is used for enabling a manager to conduct identification assignment on the read name, and the correction unit is used for calculating the identification value and the confidence value to obtain a corrected confidence value;
And the comparison prompt module is used for comparing the confidence coefficient value of the text content of the form with a preset confidence coefficient value, and outputting prompt information to a manager when the confidence coefficient value is smaller than the preset confidence coefficient value, so as to prompt the manager to modify and confirm the information of the identified fuzzy area.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210983630.XA CN115294588B (en) | 2022-08-17 | 2022-08-17 | Data processing method and system based on RPA flow robot |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210983630.XA CN115294588B (en) | 2022-08-17 | 2022-08-17 | Data processing method and system based on RPA flow robot |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115294588A CN115294588A (en) | 2022-11-04 |
CN115294588B true CN115294588B (en) | 2024-04-19 |
Family
ID=83829855
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210983630.XA Active CN115294588B (en) | 2022-08-17 | 2022-08-17 | Data processing method and system based on RPA flow robot |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115294588B (en) |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH07160831A (en) * | 1993-12-09 | 1995-06-23 | Fuji Facom Corp | Reject method for handwritten character recognition result |
CN107545391A (en) * | 2017-09-07 | 2018-01-05 | 安徽共生物流科技有限公司 | A kind of logistics document intellectual analysis and automatic storage method based on image recognition |
CN112149399A (en) * | 2020-09-25 | 2020-12-29 | 北京来也网络科技有限公司 | Table information extraction method, device, equipment and medium based on RPA and AI |
CN112639818A (en) * | 2018-08-27 | 2021-04-09 | 京瓷办公信息系统株式会社 | OCR system |
CN113191309A (en) * | 2021-05-19 | 2021-07-30 | 杭州点望科技有限公司 | Method and system for recognizing, scoring and correcting handwritten Chinese characters |
CN113378822A (en) * | 2021-07-08 | 2021-09-10 | 中教云智数字科技有限公司 | System for marking handwritten answer area by using special mark frame in test paper |
CN113377958A (en) * | 2021-07-07 | 2021-09-10 | 北京百度网讯科技有限公司 | Document classification method and device, electronic equipment and storage medium |
CN113919303A (en) * | 2021-11-02 | 2022-01-11 | 中国工商银行股份有限公司 | Method and device for automatically generating service information table |
CN113936130A (en) * | 2021-09-29 | 2022-01-14 | 未鲲(上海)科技服务有限公司 | Document information intelligent acquisition and error correction method, system and equipment based on OCR technology |
CN114417798A (en) * | 2022-01-19 | 2022-04-29 | 广州天维信息技术股份有限公司 | Document structured extraction method and device, computer equipment and storage medium |
CN114581928A (en) * | 2021-12-29 | 2022-06-03 | 壹链盟生态科技有限公司 | Form identification method and system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11157783B2 (en) * | 2019-12-02 | 2021-10-26 | UiPath, Inc. | Training optical character detection and recognition models for robotic process automation |
-
2022
- 2022-08-17 CN CN202210983630.XA patent/CN115294588B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH07160831A (en) * | 1993-12-09 | 1995-06-23 | Fuji Facom Corp | Reject method for handwritten character recognition result |
CN107545391A (en) * | 2017-09-07 | 2018-01-05 | 安徽共生物流科技有限公司 | A kind of logistics document intellectual analysis and automatic storage method based on image recognition |
CN112639818A (en) * | 2018-08-27 | 2021-04-09 | 京瓷办公信息系统株式会社 | OCR system |
CN112149399A (en) * | 2020-09-25 | 2020-12-29 | 北京来也网络科技有限公司 | Table information extraction method, device, equipment and medium based on RPA and AI |
CN113191309A (en) * | 2021-05-19 | 2021-07-30 | 杭州点望科技有限公司 | Method and system for recognizing, scoring and correcting handwritten Chinese characters |
CN113377958A (en) * | 2021-07-07 | 2021-09-10 | 北京百度网讯科技有限公司 | Document classification method and device, electronic equipment and storage medium |
CN113378822A (en) * | 2021-07-08 | 2021-09-10 | 中教云智数字科技有限公司 | System for marking handwritten answer area by using special mark frame in test paper |
CN113936130A (en) * | 2021-09-29 | 2022-01-14 | 未鲲(上海)科技服务有限公司 | Document information intelligent acquisition and error correction method, system and equipment based on OCR technology |
CN113919303A (en) * | 2021-11-02 | 2022-01-11 | 中国工商银行股份有限公司 | Method and device for automatically generating service information table |
CN114581928A (en) * | 2021-12-29 | 2022-06-03 | 壹链盟生态科技有限公司 | Form identification method and system |
CN114417798A (en) * | 2022-01-19 | 2022-04-29 | 广州天维信息技术股份有限公司 | Document structured extraction method and device, computer equipment and storage medium |
Non-Patent Citations (1)
Title |
---|
"人工智能在财务共享服务管理中的应用";董屹岭;《 中国新技术新产品》;20210810(第8期);130-132 * |
Also Published As
Publication number | Publication date |
---|---|
CN115294588A (en) | 2022-11-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10572725B1 (en) | Form image field extraction | |
US5555101A (en) | Forms creation and interpretation system | |
CN101443790B (en) | Efficient processing of non-reflow content in a digital image | |
US6333994B1 (en) | Spatial sorting and formatting for handwriting recognition | |
US7136082B2 (en) | Method and apparatus to convert digital ink images for use in a structured text/graphics editor | |
RU2357284C2 (en) | Method of processing digital hand-written notes for recognition, binding and reformatting digital hand-written notes and system to this end | |
US7668372B2 (en) | Method and system for collecting data from a plurality of machine readable documents | |
CN101542504B (en) | Shape clustering in post optical character recognition processing | |
US8340425B2 (en) | Optical character recognition with two-pass zoning | |
US20040193520A1 (en) | Automated understanding and decomposition of table-structured electronic documents | |
US20080235263A1 (en) | Automating Creation of Digital Test Materials | |
WO2006002009A2 (en) | Document management system with enhanced intelligent document recognition capabilities | |
US11501549B2 (en) | Document processing using hybrid rule-based artificial intelligence (AI) mechanisms | |
US20050160194A1 (en) | Method of limiting amount of waste paper generated from printed documents | |
CN110096275B (en) | Page processing method and device | |
US11568666B2 (en) | Method and system for human-vision-like scans of unstructured text data to detect information-of-interest | |
US11615244B2 (en) | Data extraction and ordering based on document layout analysis | |
US20140334731A1 (en) | Methods and systems for evaluating handwritten documents | |
CN104462068A (en) | Character conversion system and method | |
CN112801084A (en) | Image processing method and device, electronic equipment and storage medium | |
CN112487859A (en) | Information processing apparatus, information processing method, and computer readable medium | |
US8687239B2 (en) | Relevance based print integrity verification | |
CN115294588B (en) | Data processing method and system based on RPA flow robot | |
CN113723063A (en) | Method for converting RTF (real time function) into HTML (hypertext markup language) and realizing effect on PDF (Portable document Format) file | |
JP6856916B1 (en) | Information processing equipment, information processing methods and information processing programs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |