CN115294588B - Data processing method and system based on RPA flow robot - Google Patents

Data processing method and system based on RPA flow robot Download PDF

Info

Publication number
CN115294588B
CN115294588B CN202210983630.XA CN202210983630A CN115294588B CN 115294588 B CN115294588 B CN 115294588B CN 202210983630 A CN202210983630 A CN 202210983630A CN 115294588 B CN115294588 B CN 115294588B
Authority
CN
China
Prior art keywords
manager
characters
confidence
area
handwriting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210983630.XA
Other languages
Chinese (zh)
Other versions
CN115294588A (en
Inventor
徐辉
姜勇
黄仁亮
伍小冬
朱雪琼
黄蒙蒙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei Infotech Co ltd
Original Assignee
Hubei Infotech Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei Infotech Co ltd filed Critical Hubei Infotech Co ltd
Priority to CN202210983630.XA priority Critical patent/CN115294588B/en
Publication of CN115294588A publication Critical patent/CN115294588A/en
Application granted granted Critical
Publication of CN115294588B publication Critical patent/CN115294588B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/1444Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/24Character recognition characterised by the processing or recognition method
    • G06V30/242Division of the character sequences into groups prior to recognition; Selection of dictionaries
    • G06V30/244Division of the character sequences into groups prior to recognition; Selection of dictionaries using graphical properties, e.g. alphabet type or font
    • G06V30/2455Discrimination between machine-print, hand-print and cursive writing

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Character Discrimination (AREA)

Abstract

The invention relates to a table data processing method and a system based on an RPA flow robot, which are characterized in that: s1, reading the content of a table, and distinguishing whether the content contains handwriting or not; s2, carrying out module division on the area of the table according to the type of the table, and dividing the area where the handwritten word is located into a fuzzy area; s3, carrying out confidence assignment on the read text content in the table according to the occupation ratio of the handwriting words in the text in the table; s4, comparing the confidence coefficient value of the text of the fuzzy area with a preset confidence coefficient value, and outputting prompt information to a manager. The scheme can automatically convert and extract the text content of the table file; and the handwriting in the table can be distinguished, the credibility of the extracted contents of the table is automatically prejudged, and when the credibility is lower, a manager is prompted to manually audit, modify and confirm the extracted text contents. The scheme can partially replace manual operation to automatically extract the table information, and can remarkably improve the working efficiency.

Description

Data processing method and system based on RPA flow robot
Technical Field
The invention relates to the technical field of data processing, in particular to a table data processing method and system based on an RPA flow robot.
Background
RPA robot process automation refers to a process rule designed in advance by a developer, so that the robot can simulate operations such as manual text input, copy and paste, mouse movement, clicking and the like, thereby replacing or assisting human complete repetitive work.
For example, chinese patent with application number of cn202111033494.X discloses a data processing method and device based on RPA robot, which can also be used in the financial field, and the method includes: acquiring a basic functional component and a corresponding business flow message of an RPA robot by calling an RPA robot interface, classifying and abstracting the basic functional component and the business flow message according to a message specification corresponding to the RPA robot to obtain a structured data dictionary and displaying the structured data dictionary to a user; receiving a basic function module selection instruction and a business process execution instruction which are sent after the user selects the structured data dictionary, and generating an RPA development requirement; the application can effectively improve the RPA requirement mining and extraction efficiency.
In a power system, in order to enhance information management of a business process, information of paper forms, electronic forms in various formats, and online forms needs to be extracted and centrally managed. The existing information extraction mainly adopts a mode of manual scanning and manual copying input, forms information is input into a management system, a large amount of repeated labor is needed, manpower resources are wasted, the manual operation is easy to leak, the efficiency is low, and the improvement is needed.
Disclosure of Invention
Based on the expression, the invention provides a table data processing method and a table data processing system for an RPA flow robot, which can partially replace manual operation to automatically extract table information and can improve the working efficiency.
The technical scheme for solving the technical problems is as follows:
A table data processing method based on an RPA flow robot comprises the following steps:
s1, recognizing and preprocessing a table, converting contents in the table into readable contents, reading the converted contents, distinguishing whether the contents contain handwriting or not, and defining the type of the table according to the handwriting;
s2, carrying out module division on the area of the table according to the type of the table, dividing the area where the handwritten word is located into a fuzzy area, and the other areas are trusted areas;
s3, carrying out confidence assignment on the read text content in the table according to the occupation ratio of the handwriting words in the text in the table;
S4, comparing the confidence coefficient value of the text of the fuzzy area with a preset confidence coefficient value, and outputting prompt information to a manager when the confidence coefficient value is smaller than the preset confidence coefficient value, so as to prompt the manager to modify and confirm the information of the identified fuzzy area.
As a preferable scheme: and S1, recognizing and preprocessing the table, reading the suffix of the table file, judging the format of the table, and performing OCR (optical character recognition) on the characters of the table in the picture or PDF format to obtain readable character information.
As a preferable scheme: when OCR recognition is performed, whether the characters are written characters or typewritten by a printer is determined according to the stroke flatness of the characters, each recognized character is marked and counted, and the occupation ratio of the handwriting characters in all the characters is calculated after the marking and the counting are summarized.
As a preferable scheme: when the handwritten characters in the form are identified, the method further comprises the steps of identifying and marking the writers, outputting the identification information of the writers to a manager while outputting prompt information, popping up a selectable operation window, prompting the manager to assign the handwritten character identification degree of the writers to obtain an identification degree value, calculating the average identification degree value of the writers after multiple assignments, calculating the average identification degree value and the character confidence degree value of a fuzzy area to obtain correction confidence degree, comparing the correction confidence degree with the preset confidence degree value, and outputting prompt information to the manager when the correction confidence degree is smaller than the preset confidence degree value.
As a preferable scheme: when the fact that the filling area is not filled in the form is recognized, missing information is output to the manager, and the manager is prompted to confirm modification confirmation.
As a preferable scheme: and popping up a selectable operation window to the manager, selecting a filling area required to be prompted by the manager, and outputting missing information to the manager when the required filling area does not fill content, so as to prompt the manager to carry out modification confirmation.
As a preferable scheme: when the prompt information is output to the manager, the prompt content is marked and displayed through colors and underlines.
A form data processing system based on an RPA flow robot, comprising:
The preprocessing module is used for identifying and classifying the table files according to the suffixes of the table files;
The OCR recognition module is used for carrying out OCR recognition on the picture type or PDF type form file and distinguishing typewriting and handwritten characters according to the straightness of strokes;
the identification module is used for typesetting and marking the identification characters output by the OCR module, defining the area where the handwritten characters are located as a fuzzy area, defining other areas as trusted areas, and marking and displaying the fuzzy area through colors or underlines;
The reading module is used for reading the characters identified by the OCR module, summarizing and calculating the duty ratio of the handwritten characters in all the identified characters, and outputting a statistical result;
the assignment module is used for carrying out confidence assignment on the identification content of the table file according to the occupation ratio of the handwriting, and the higher the handwriting occupation ratio is, the lower the confidence is;
And the comparison prompt module is used for comparing the confidence coefficient value of the text content of the form with a preset confidence coefficient value, and outputting prompt information to a manager when the confidence coefficient value is smaller than the preset confidence coefficient value, so as to prompt the manager to modify and confirm the information of the identified fuzzy area.
As a preferable scheme: the assignment module further comprises a correction unit, the reading module is used for reading the name of the signature in the identified text content, the correction unit is used for enabling a manager to conduct identification assignment on the read name, and the correction unit is used for calculating the identification value and the confidence value to obtain a corrected confidence value.
Compared with the prior art, the technical scheme of the application has the following beneficial technical effects: the scheme can distinguish and define the types of the table files, directly extract the text contents in the readable table files, and automatically convert and extract the text contents of the unreadable table files; and handwritten characters in the table can be distinguished, the credibility of the extracted contents of the table is automatically prejudged according to the occupation ratio of the handwritten characters in the extracted contents of the characters, and when the prejudgment is lower in credibility, a manager is prompted to manually audit, modify and confirm the extracted contents of the characters, so that the phenomenon of inputting error information into a management system can be avoided. The scheme can partially replace manual operation to automatically extract the table information, and can remarkably improve the working efficiency.
Drawings
Fig. 1 is a flow chart of a method in a first embodiment.
Detailed Description
Embodiment one:
Referring to fig. 1, a table data processing method based on an RPA flow robot includes the following steps:
s1, recognizing and preprocessing a table, converting contents in the table into readable contents, reading the converted contents, distinguishing whether the contents contain handwriting or not, and defining the type of the table according to the handwriting;
In practical situations, the file formats of the tables are various, and in combination with several table formats used in the business, the table formats are specified in advance, for example, it is specified that only the table file with the suffix doc, docx, wps, xls, jpg, png, pdf, htm, html format can be used in the business flow. The table files are classified according to the suffixes, and are roughly classified into two types, wherein the first type is directly readable and the second type is not directly readable.
Doc, docx, wps, xls is a standard document format, and the contents of the form file can be accurately read without identification; the html and the html are in a webpage document format, and the contents of the form files can be directly read.
Jpg, png, PDF is a form file in a picture and PDF format, and the contents of such form file cannot be read directly, so that the contents of such form file need to be first identified and converted into a readable file format, and then the contents thereof need to be read. And since the table files in the picture and PDF format generally contain handwriting and signature, that is, contain handwriting content, it is difficult to avoid the situation of having recognition errors when recognizing and outputting the handwriting content. And when more handwritten contents are in one table file, the frequency of recognizing the handwritten contents is higher.
In this embodiment, when the suffix of the table file is identified as a table file in doc, docx, wps, xls, htm, html format, the content in the table file is directly read and output; when the suffix of the table file is identified as jpg, png or pdf, character information in the table is identified and output through OCR (optical character recognition) on the content of the table file. And in the recognition process, the stroke flatness of each character is judged to distinguish whether the character is typewritten or handwritten.
The straightness of the strokes is judged by establishing a coordinate system, selecting a plurality of strokes (OCR recognition, white areas are blank areas and black areas are strokes) of the characters, selecting a plurality of points on each stroke (namely, selecting a plurality of points in a plurality of continuous black areas), determining coordinates of the plurality of points, judging through the difference value of the horizontal and vertical coordinates of the adjacent three points, and if the equivalent change of the horizontal and vertical coordinates of the adjacent three points occurs, considering the current characters as typewriting by a printer, otherwise, the current characters are handwritten characters.
When the form file does not contain handwriting, the form file is defined as a 'class' file; when a form file contains a handwritten word, it is defined as a "class two" file.
S2, carrying out module division on the area of the table according to the type of the table, dividing the area where the handwritten word is located into a fuzzy area, and the other areas are trusted areas;
For a 'class' file, dividing the whole file into trusted files; and for the second-class file, dividing the handwriting area in the content of the second-class file into a fuzzy area, and dividing other areas into trusted areas.
S3, carrying out confidence assignment on the read text content in the table according to the occupation ratio of the handwriting words in the text in the table;
And (3) performing OCR (optical character recognition) on the second-class files, distinguishing and counting the handwritten characters and the typewriting of the machine in the OCR process, calculating the ratio of the handwritten characters to the characters, and outputting a statistical result. Since the more handwritten content is in the form file, the more frequently the handwritten content is recognized as erroneous. So the accuracy of the overall identification of the content of the handwritten account-comparison table file is directly influenced, and can be simply defined: the higher the duty cycle of the handwritten word, the lower the accuracy of the overall recognition. Assigning a value to the recognition confidence of the table file according to the occupancy ratio of the handwriting, for example: when no handwriting is performed, the confidence is 10; when the ratio of the handwriting is lower than 10%, the confidence is 9; when the handwriting ratio is 10% -20%, the confidence coefficient is 8 … …, and when the handwriting ratio is 80% -90%, the confidence coefficient is 1; when the handwriting ratio is more than 90%, the confidence is 0.
S4, comparing the confidence coefficient value of the text of the fuzzy area with a preset confidence coefficient value, and outputting prompt information to a manager when the confidence coefficient value is smaller than the preset confidence coefficient value, so as to prompt the manager to modify and confirm the information of the identified fuzzy area.
By presetting the prompting rules, for example, only when the confidence of the content of the read form file is lower than 8, outputting the read text content, prompting the manager to review, modify and confirm the read text content, and after the manager clicks the confirmation, calculating the effective content of the text content read in the form file, so that the subsequent flow can be performed, or else, the subsequent flow cannot be continued.
In fact, for different writers, handwriting and the straightness of handwriting are different, the accuracy of OCR recognition can be directly affected by the handwriting and the straightness of writing, and the clearer the handwriting and the higher the straightness are, the higher the accuracy of recognition is. Therefore, the recognition accuracy of the handwriting of different writers is not different, so that the confidence of the text content read in the table file can be corrected according to the handwriting recognition accuracy of the writers, not just the duty ratio of the reference handwriting. For a writer with higher handwriting recognition accuracy, the handwriting of the writer can be accurately recognized, so that assignment and prompt of contents of a table file containing the handwriting are not needed, unnecessary prompt is reduced, and the table data processing efficiency is improved.
The specific implementation mode is as follows: in step S4, when the content confidence of the read form file is lower than the preset value, outputting the read text content, wherein the output text content contains the signature of the writer, prompting the manager to manually check, modify and confirm the output text content so as to correct the text with the identification error, and the manager can modify the text content to be considered as the effective content after finishing clicking confirmation, so that the next flow can be executed. If the output text content contains the signature of the writer, after the manager clicks to finish confirmation, the recognition degree assignment click box is popped up to prompt the manager to input assignment to the recognition degree (namely recognition accuracy) of the handwritten word of the writer. The manager can assign a value according to the number of manually corrected words, the value is lower as the number of corrected words is larger, and if the handwritten words of the writer can be accurately identified, the identification degree value is larger than 1
Assigning the handwriting recognition degrees in a plurality of table files of the same writer, calculating to obtain an average recognition degree value of the writer, calculating the average recognition degree value and a text confidence degree value of a fuzzy area to obtain a correction confidence degree, comparing the correction confidence degree with the preset confidence degree value, and outputting prompt information to a manager when the former is smaller than the latter.
Therefore, when the handwritten characters of the writer can be accurately identified, the correction confidence is necessarily larger than the preset confidence, and for the table text content containing the handwritten characters of the writer, the system can not output the identified text content but automatically considers the text content as effective content, automatically enters the next process, does not need the steps of checking, modifying and confirming by management personnel, reduces unnecessary prompts and improves the working efficiency.
In this embodiment, when a filling area (the long section in the form is underlined by OCR) is recognized in the form, the area is considered as the filling area, and if no text is recognized in the area, the unfilled text is considered as unfilled, the missing information is output to the manager, and the manager is prompted to confirm the modification confirmation. Specifically, a selectable operation window is popped up to the manager, and the manager selects a filling area required to be prompted.
In this embodiment, when a prompt message is output to a manager, the fuzzy area and the missing area are marked and displayed by color and underline, so that the manager can quickly and intuitively find the area needing to be checked and modified, and the operation efficiency can be improved.
Embodiment two:
a form data processing system based on an RPA flow robot, comprising:
The preprocessing module is used for identifying and classifying the table files according to the suffixes of the table files;
The OCR recognition module is used for carrying out OCR recognition on the picture type or PDF type form file and distinguishing typewriting and handwritten characters according to the straightness of strokes;
the identification module is used for typesetting and marking the identification characters output by the OCR module, defining the area where the handwritten characters are located as a fuzzy area, defining other areas as trusted areas, and marking and displaying the fuzzy area through colors or underlines;
The reading module is used for reading the characters identified by the OCR module, summarizing and calculating the duty ratio of the handwritten characters in all the identified characters, and outputting a statistical result;
the assignment module is used for carrying out confidence assignment on the identification content of the table file according to the occupation ratio of the handwriting, and the higher the handwriting occupation ratio is, the lower the confidence is;
And the comparison prompt module is used for comparing the confidence coefficient value of the text content of the form with a preset confidence coefficient value, and outputting prompt information to a manager when the confidence coefficient value is smaller than the preset confidence coefficient value, so as to prompt the manager to modify and confirm the information of the identified fuzzy area.
In this embodiment: the assignment module further comprises a correction unit, the reading module is used for reading the name of the signature in the identified text content, the correction unit is used for enabling a manager to conduct identification assignment on the read name, and the correction unit is used for calculating the identification value and the confidence value to obtain a corrected confidence value.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims (7)

1. The table data processing method based on the RPA flow robot is characterized by comprising the following steps of:
s1, recognizing and preprocessing a table, converting contents in the table into readable contents, reading the converted contents, distinguishing whether the contents contain handwriting or not, and defining the type of the table according to the handwriting;
s2, carrying out module division on the area of the table according to the type of the table, dividing the area where the handwritten word is located into a fuzzy area, and the other areas are trusted areas;
s3, carrying out confidence assignment on the read text content in the table according to the occupation ratio of the handwriting words in the text in the table;
S4, comparing the confidence coefficient value of the characters in the fuzzy area with a preset confidence coefficient value, and outputting prompt information to a manager when the confidence coefficient value is smaller than the preset confidence coefficient value, so as to prompt the manager to modify and confirm the information of the identified fuzzy area; when the handwritten characters in the table are identified, identification and marking of the writers are carried out, prompt information is output, meanwhile, identification information of the writers is output to a manager, a selectable operation window is popped up, the manager is prompted to carry out assignment on the handwritten character identification degree of the writers, an identification degree value is obtained, the average identification degree value of the writers is obtained through calculation after multiple assignments, the average identification degree value is calculated with the character confidence degree value of a fuzzy area, correction confidence degree is obtained, the correction confidence degree is compared with the preset confidence degree value, and prompt information is output to the manager when the correction confidence degree is smaller than the preset confidence degree value.
2. The table data processing method based on the RPA flow robot according to claim 1, wherein: and S1, recognizing and preprocessing the table, reading the suffix of the table file, judging the format of the table, and performing OCR (optical character recognition) on the characters of the table in the picture or PDF format to obtain readable character information.
3. The table data processing method based on the RPA flow robot according to claim 2, wherein: when OCR recognition is performed, whether the characters are written characters or typewritten by a printer is determined according to the stroke flatness of the characters, each recognized character is marked and counted, and the occupation ratio of the handwriting characters in all the characters is calculated after the marking and the counting are summarized.
4. The table data processing method based on the RPA flow robot according to claim 1, wherein: when the fact that the filling area is not filled in the form is recognized, missing information is output to the manager, and the manager is prompted to confirm modification confirmation.
5. The table data processing method based on the RPA flow robot according to claim 1, wherein: and popping up a selectable operation window to the manager, selecting a filling area required to be prompted by the manager, and outputting missing information to the manager when the required filling area does not fill content, so as to prompt the manager to carry out modification confirmation.
6. The table data processing method based on the RPA flow robot according to claim 1, wherein: when the prompt information is output to the manager, the prompt content is marked and displayed through colors and underlines.
7. A form data processing system based on an RPA flow robot, comprising:
The preprocessing module is used for identifying and classifying the table files according to the suffixes of the table files;
The OCR recognition module is used for carrying out OCR recognition on the picture type or PDF type form file and distinguishing typewriting and handwritten characters according to the straightness of strokes;
the identification module is used for typesetting and marking the identification characters output by the OCR module, defining the area where the handwritten characters are located as a fuzzy area, defining other areas as trusted areas, and marking and displaying the fuzzy area through colors or underlines;
The reading module is used for reading the characters identified by the OCR module, summarizing and calculating the duty ratio of the handwritten characters in all the identified characters, and outputting a statistical result;
The assignment module is used for carrying out confidence assignment on the identification content of the table file according to the occupation ratio of the handwriting, and the higher the handwriting occupation ratio is, the lower the confidence is; the assignment module further comprises a correction unit, the reading module is used for reading the name of the signature in the identified text content, the correction unit is used for enabling a manager to conduct identification assignment on the read name, and the correction unit is used for calculating the identification value and the confidence value to obtain a corrected confidence value;
And the comparison prompt module is used for comparing the confidence coefficient value of the text content of the form with a preset confidence coefficient value, and outputting prompt information to a manager when the confidence coefficient value is smaller than the preset confidence coefficient value, so as to prompt the manager to modify and confirm the information of the identified fuzzy area.
CN202210983630.XA 2022-08-17 2022-08-17 Data processing method and system based on RPA flow robot Active CN115294588B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210983630.XA CN115294588B (en) 2022-08-17 2022-08-17 Data processing method and system based on RPA flow robot

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210983630.XA CN115294588B (en) 2022-08-17 2022-08-17 Data processing method and system based on RPA flow robot

Publications (2)

Publication Number Publication Date
CN115294588A CN115294588A (en) 2022-11-04
CN115294588B true CN115294588B (en) 2024-04-19

Family

ID=83829855

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210983630.XA Active CN115294588B (en) 2022-08-17 2022-08-17 Data processing method and system based on RPA flow robot

Country Status (1)

Country Link
CN (1) CN115294588B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07160831A (en) * 1993-12-09 1995-06-23 Fuji Facom Corp Reject method for handwritten character recognition result
CN107545391A (en) * 2017-09-07 2018-01-05 安徽共生物流科技有限公司 A kind of logistics document intellectual analysis and automatic storage method based on image recognition
CN112149399A (en) * 2020-09-25 2020-12-29 北京来也网络科技有限公司 Table information extraction method, device, equipment and medium based on RPA and AI
CN112639818A (en) * 2018-08-27 2021-04-09 京瓷办公信息系统株式会社 OCR system
CN113191309A (en) * 2021-05-19 2021-07-30 杭州点望科技有限公司 Method and system for recognizing, scoring and correcting handwritten Chinese characters
CN113378822A (en) * 2021-07-08 2021-09-10 中教云智数字科技有限公司 System for marking handwritten answer area by using special mark frame in test paper
CN113377958A (en) * 2021-07-07 2021-09-10 北京百度网讯科技有限公司 Document classification method and device, electronic equipment and storage medium
CN113919303A (en) * 2021-11-02 2022-01-11 中国工商银行股份有限公司 Method and device for automatically generating service information table
CN113936130A (en) * 2021-09-29 2022-01-14 未鲲(上海)科技服务有限公司 Document information intelligent acquisition and error correction method, system and equipment based on OCR technology
CN114417798A (en) * 2022-01-19 2022-04-29 广州天维信息技术股份有限公司 Document structured extraction method and device, computer equipment and storage medium
CN114581928A (en) * 2021-12-29 2022-06-03 壹链盟生态科技有限公司 Form identification method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11157783B2 (en) * 2019-12-02 2021-10-26 UiPath, Inc. Training optical character detection and recognition models for robotic process automation

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07160831A (en) * 1993-12-09 1995-06-23 Fuji Facom Corp Reject method for handwritten character recognition result
CN107545391A (en) * 2017-09-07 2018-01-05 安徽共生物流科技有限公司 A kind of logistics document intellectual analysis and automatic storage method based on image recognition
CN112639818A (en) * 2018-08-27 2021-04-09 京瓷办公信息系统株式会社 OCR system
CN112149399A (en) * 2020-09-25 2020-12-29 北京来也网络科技有限公司 Table information extraction method, device, equipment and medium based on RPA and AI
CN113191309A (en) * 2021-05-19 2021-07-30 杭州点望科技有限公司 Method and system for recognizing, scoring and correcting handwritten Chinese characters
CN113377958A (en) * 2021-07-07 2021-09-10 北京百度网讯科技有限公司 Document classification method and device, electronic equipment and storage medium
CN113378822A (en) * 2021-07-08 2021-09-10 中教云智数字科技有限公司 System for marking handwritten answer area by using special mark frame in test paper
CN113936130A (en) * 2021-09-29 2022-01-14 未鲲(上海)科技服务有限公司 Document information intelligent acquisition and error correction method, system and equipment based on OCR technology
CN113919303A (en) * 2021-11-02 2022-01-11 中国工商银行股份有限公司 Method and device for automatically generating service information table
CN114581928A (en) * 2021-12-29 2022-06-03 壹链盟生态科技有限公司 Form identification method and system
CN114417798A (en) * 2022-01-19 2022-04-29 广州天维信息技术股份有限公司 Document structured extraction method and device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"人工智能在财务共享服务管理中的应用";董屹岭;《 中国新技术新产品》;20210810(第8期);130-132 *

Also Published As

Publication number Publication date
CN115294588A (en) 2022-11-04

Similar Documents

Publication Publication Date Title
US10572725B1 (en) Form image field extraction
US5555101A (en) Forms creation and interpretation system
CN101443790B (en) Efficient processing of non-reflow content in a digital image
US6333994B1 (en) Spatial sorting and formatting for handwriting recognition
RU2357284C2 (en) Method of processing digital hand-written notes for recognition, binding and reformatting digital hand-written notes and system to this end
US7668372B2 (en) Method and system for collecting data from a plurality of machine readable documents
CN101542504B (en) Shape clustering in post optical character recognition processing
US20040193520A1 (en) Automated understanding and decomposition of table-structured electronic documents
US8340425B2 (en) Optical character recognition with two-pass zoning
WO2006002009A2 (en) Document management system with enhanced intelligent document recognition capabilities
US20080235263A1 (en) Automating Creation of Digital Test Materials
US11501549B2 (en) Document processing using hybrid rule-based artificial intelligence (AI) mechanisms
US20050160194A1 (en) Method of limiting amount of waste paper generated from printed documents
CN110096275B (en) Page processing method and device
US11568666B2 (en) Method and system for human-vision-like scans of unstructured text data to detect information-of-interest
US11615244B2 (en) Data extraction and ordering based on document layout analysis
CN104462068A (en) Character conversion system and method
CN112801084A (en) Image processing method and device, electronic equipment and storage medium
US8687239B2 (en) Relevance based print integrity verification
CN115294588B (en) Data processing method and system based on RPA flow robot
CN113723063A (en) Method for converting RTF (real time function) into HTML (hypertext markup language) and realizing effect on PDF (Portable document Format) file
JP6856916B1 (en) Information processing equipment, information processing methods and information processing programs
US9208381B1 (en) Processing digital images including character recognition using ontological rules
US20100023517A1 (en) Method and system for extracting data-points from a data file
RU2398276C2 (en) Analysis alternatives in scope trees

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant