CN110210455B - Printing content formatting extraction method - Google Patents
Printing content formatting extraction method Download PDFInfo
- Publication number
- CN110210455B CN110210455B CN201910526081.1A CN201910526081A CN110210455B CN 110210455 B CN110210455 B CN 110210455B CN 201910526081 A CN201910526081 A CN 201910526081A CN 110210455 B CN110210455 B CN 110210455B
- Authority
- CN
- China
- Prior art keywords
- extraction
- printing
- elements
- template
- extracting
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/412—Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
Abstract
The invention relates to the technical field of document printing, in particular to a method for formatting and extracting printing content, which comprises the following steps: s1, intercepting and converting printing contents of a printing document into printing elements to generate a printing element set; s2, designing extraction elements according to the sampled printing element set to generate an extraction template; and S3, inputting the printing element set and the extraction template, and performing operation by using an extraction engine to generate a formatted extraction result. The method for extracting the printing content in the formatted mode effectively overcomes the defect of extracting the pure text content, and can flexibly, efficiently and accurately extract the content in the complex form. The OCR form is effectively supplemented and optimized. The extraction of precise coordinates is innovatively improved, and the container extraction elements are embedded into the combination of basic extraction elements, so that complex extraction forms can be effectively dealt with. The visual template design interface greatly simplifies the design difficulty and improves the design efficiency.
Description
Technical Field
The invention relates to the technical field of document printing, in particular to a method for formatting and extracting printing content.
Background
At present, the printout is an indispensable content output mode in various industries, but the printout content is only suitable for being watched and read by human eyes, the output content cannot be effectively formatted again, and the secondary processing of the data is not facilitated. In the current era of big data flow, a way to reformat the printout content of other systems is urgently needed, so that the disclosed effective data can be reused in a low-cost and efficient way without data interface authorization. And a basic data acquisition solution is provided for applications such as big data calculation, artificial intelligence and the like.
There are three main ways of extracting content. Firstly, pure text printing content is obtained, and character segmentation and searching matching are carried out aiming at special keywords. Secondly, the printing content is completely converted into pictures, and the content is extracted by utilizing an OCR technology. Thirdly, analyzing the printing standard, acquiring accurate content and matched coordinate information, and extracting the content by utilizing the coordinate partition.
The three extraction modes have the advantages and the disadvantages: the advantage of the first approach is that the way to obtain the underlying data is simple. The method has the disadvantages that complex information cannot be accurately extracted, and analysis errors are easily generated for a large amount of nonstandard table data (such as missing row and column data). The second mode has the advantages that the extraction area can be freely defined, and various types of printing contents can be uniformly converted into pictures for processing. The defects are that the accuracy of the content analyzed by the general OCR is not high, or higher accuracy and performance (high technical implementation difficulty) are obtained after the OCR is trained by relying on big data. The third mode has the advantages that the content is accurate and does not need to be analyzed, and the content is convenient to divide with coordinates. The disadvantage is that it is inconvenient to combine scattered data, and some data which is originally picture content cannot be processed.
Disclosure of Invention
The invention aims to provide a printing content formatting extraction method, which aims to solve the problem of difficult extraction of complex content in the background technology; the method mainly comprises the following steps: the problem that the number of lines of the extracted form is uncertain and the number of lines cannot be accurately determined before extraction is solved; the size of the table row is different, and the influence is caused to the extraction of the partitioned area; the problem of form data paging display extraction; extracting the problem of removing content interference information; the problem of flexible conversion of the image-text mixed extraction mode; extracting the problem of information floating positioning.
In order to achieve the purpose, the invention provides the following technical scheme:
a method for formatting and extracting print contents comprises the following steps:
s1, intercepting and converting the printing content of a printing document into printing elements (including the text content, x and y coordinates of the left upper corner of a corresponding page and height and width information of the displayed text content), and generating a printing element set (including the name of the printing document, the total number of printing pages, the index number of each page, the height and width of each page, the printing elements contained in each page and independent page pictures of each page);
s2, designing extraction elements (mainly comprising extraction element types, keywords, extraction ranges (extracting x and y coordinates, height and width, and extracting elements can be nested) and other special type attribute information) according to the sampled printing element set, and generating an extraction template;
and S3, inputting the printing element set and the extraction template, and performing operation by using an extraction engine to generate a formatted extraction result (including all data extracted by the extraction elements, and forming key value pair data by using the keywords and the extracted contents).
As a further scheme of the invention: in step S2, the extraction template includes an extraction template name, a plurality of extraction elements, and a set of processing scripts; the extraction elements include basic extraction elements or container extraction elements, which may be nested combinations.
As a still further scheme of the invention: the basic extraction elements comprise text extraction elements or bar code extraction elements; the text extraction element comprises an extraction key value and a group of coordinates, the group of coordinates is used for dividing an area relative to the current page and extracting printing elements in the area, and the extraction key value is used for generating a key value pair from the extracted content.
As a still further scheme of the invention: the container extraction element comprises a form extraction element; the form extraction element is provided with a plurality of basic text extraction elements, and the coordinates of the text extraction elements are relative to the parent container form extraction element.
As a still further scheme of the invention: the specific implementation method of step S1 is:
s1-1, converting the printed document into an EMF file by using a formatted virtual printer;
s1-2, analyzing the EMF file, extracting coordinates and contents, and generating a printing element document;
s1-3, each printed page is analyzed and converted into a page picture.
As a still further scheme of the invention: the specific implementation method of step S2 is:
s2-1, processing by using a quick slide printing formatting extraction template design client;
s2-2, importing printing element set sample data;
s2-3, dragging and setting extraction elements by using a mouse with the aid of a visual interface, and setting related extraction parameters;
s2-4, testing extraction and checking extraction results, if not satisfied, repeating the steps S2-2 to S2-4 until the extraction results of a plurality of printing samples in the same format are satisfied;
s2-5, storing the printing extraction template, uploading the template to a printing formatting extraction server, and binding the printing type.
As a still further scheme of the invention: the specific implementation method of step S3 is:
s3-1, uploading the generated printing element document and page picture to a printing formatting extraction server;
s3-2, the printing formatting extraction server calls the designed printing extraction template according to the uploaded related printing types;
and S3-3, the extraction engine automatically performs formatting extraction according to the known input information operation, and stores the extraction result in a database.
As a still further scheme of the invention: in step S3-3, the extraction engine operates as follows:
s3-3-1, traversing all pages, and packaging the printing elements of the current page and the page pictures together as the following input parameters;
s3-3-2, traversing all top-level extraction elements on the current page, and performing extraction operation:
s3-3-2-1, if the extracted element is a basic extracted element, such as a text extracted element or a bar code extracted element, directly matching the extracted result of the extracted element with the key word of the extracted element to form a key value pair and returning the key value pair;
s3-3-2-2, if the extraction element is a container extraction element, such as a form extraction element, traversing all sub extraction elements, extracting, forming a queue by extraction results of the sub extraction elements, and forming a key value pair to return by matching with keywords of the container extraction element;
s3-3-3, converting all returned key value pairs into formatted extraction results in json format;
and S3-3-4, transmitting the formatted extraction result to a processing script in a parameter form, and performing secondary processing by the processing script or directly returning the result without any change.
Compared with the prior art, the invention has the beneficial effects that:
the method for extracting the printing content in the formatted mode solves the problem that complex content is difficult to extract, and mainly comprises the following steps: the problem that the number of lines of the extracted form is uncertain and the number of lines cannot be accurately determined before extraction is solved; the size of the table row is different, and the influence is caused to the extraction of the partitioned area; the problem of form data paging display extraction; extracting the problem of removing content interference information; the problem of flexible conversion of the image-text mixed extraction mode; extracting the problem of information floating positioning.
The method for extracting the printing content in the formatted mode effectively overcomes the defect of extracting the pure text content, and can flexibly, efficiently and accurately extract the content in the complex form. The OCR form is effectively supplemented and optimized, and the calculation efficiency of the OCR is effectively improved in an accurate defined range. The extraction of accurate coordinates is innovatively improved, the combination of embedding basic extraction elements in container extraction elements can effectively deal with complex extraction forms, and the extraction method is used for processing various difficult extraction problems of form contents. The visual template design interface greatly simplifies the design difficulty and improves the design efficiency.
Drawings
FIG. 1 is a block flow diagram of an embodiment of the present invention.
Detailed Description
The technical solution of the present patent will be described in further detail with reference to the following embodiments.
Referring to fig. 1, in an embodiment of the present invention, a method for formatting and extracting print content includes the following steps:
s1, intercepting and converting the printing content of a printing document into printing elements (including the text content, x and y coordinates of the left upper corner of a corresponding page and height and width information of the displayed text content), and generating a printing element set (including the name of the printing document, the total number of printing pages, the index number of each page, the height and width of each page, the printing elements contained in each page and independent page pictures of each page);
s2, designing extraction elements (mainly comprising extraction element types, keywords, extraction ranges (extracting x and y coordinates, height and width, and extracting elements can be nested) and other special type attribute information) according to the sampled printing element set, and generating an extraction template;
and S3, inputting the printing element set and the extraction template, and performing operation by using an extraction engine to generate a formatted extraction result (including all data extracted by the extraction elements, and forming key value pair data by using the keywords and the extracted contents).
Further, in step S2, the extraction template includes an extraction template name, a plurality of extraction elements, and a set of processing scripts; the extraction elements include basic extraction elements or container extraction elements, which may be nested combinations.
Specifically, the basic extraction element comprises a text extraction element or a barcode extraction element; the text extraction element comprises an extraction key value and a group of coordinates, the group of coordinates is used for dividing an area relative to the current page and extracting printing elements in the area, and the extraction key value is used for generating a key value pair from the extracted content.
Specifically, the container extraction element comprises a form extraction element; the form extraction element is provided with a plurality of basic text extraction elements, and the coordinates of the text extraction elements are relative to the parent container form extraction element.
Specifically, the specific implementation method of step S1 is as follows:
s1-1, converting the printed document into an EMF file by using a formatted virtual printer, and specifically, printing by using a quick transport formatted virtual printer;
s1-2, analyzing the EMF file, extracting coordinates and contents, and generating a printing element document (jhcef format file);
s1-3, analyzing each printed page and converting the page into a page picture; in particular, a jpg picture may be converted.
Specifically, the specific implementation method of step S2 is as follows:
s2-1, processing by using a quick slide printing formatting extraction template design client;
s2-2, importing printing element set sample data;
s2-3, dragging and setting extraction elements by using a mouse with the aid of a visual interface, and setting related extraction parameters;
s2-4, testing extraction and checking extraction results, if not satisfied, repeating the steps S2-2 to S2-4 until the extraction results of a plurality of printing samples in the same format are satisfied;
s2-5, storing the printing extraction template, uploading the template to a printing formatting extraction server, and binding the printing type.
Specifically, the specific implementation method of step S3 is as follows:
s3-1, uploading the generated printing element document and page picture to a printing formatting extraction server;
s3-2, the printing formatting extraction server calls the designed printing extraction template according to the uploaded related printing types;
and S3-3, the extraction engine automatically performs formatting extraction according to known input information operation, and stores the extraction result into a database, wherein the format of the document of the formatting extraction result is jhcer.
Further, in step S3-3, the extraction engine operates as follows:
s3-3-1, traversing all pages, and packaging the printing elements of the current page and the page pictures together as the following input parameters;
s3-3-2, traversing all top-level extraction elements on the current page, and performing extraction operation:
s3-3-2-1, if the extracted element is a basic extracted element, such as a text extracted element or a bar code extracted element, directly matching the extracted result of the extracted element with the key word of the extracted element to form a key value pair and returning the key value pair;
s3-3-2-2, if the extraction element is a container extraction element, such as a form extraction element, traversing all sub extraction elements, extracting, forming a queue by extraction results of the sub extraction elements, and forming a key value pair to return by matching with keywords of the container extraction element;
s3-3-3, converting all returned key value pairs into formatted extraction results in json format;
and S3-3-4, transmitting the formatted extraction result to a processing script in a parameter form, and performing secondary processing by the processing script or directly returning the result without any change.
The invention comprehensively utilizes the advantages of the prior schemes, uses the proper scheme combination under the proper environment and achieves the optimal extraction formatting effect. The invention designs an extraction template according to the printing elements with coordinates. The extraction template comprises a plurality of extraction elements and a group of processing scripts. The extraction elements are divided into text extraction elements, form extraction elements and bar code extraction elements. The text extraction element is the most basic extraction element and comprises a set of coordinates which can define an area relative to the current page for extracting the printing elements in the area. In addition, the method also comprises extracting key values which are used for generating key value pairs from the extracted contents. The form extraction element is a container extraction element that requires multiple underlying text extraction elements to be placed in it, with coordinates relative to its parent container form extraction element. By utilizing the visual interface, a user can conveniently set the extraction template by clicking and dragging a mouse. And then, the printing elements and the extraction template are delivered to an extraction engine for calculation, and an extraction result in a json format is obtained after calculation. The method for extracting the printing content in the formatted mode effectively overcomes the defect of extracting the pure text content, and can flexibly, efficiently and accurately extract the content in the complex form. The OCR form is effectively supplemented and optimized, and the calculation efficiency of the OCR is effectively improved in an accurate defined range. The extraction of accurate coordinates is innovatively improved, the combination of embedding basic extraction elements in container extraction elements can effectively deal with complex extraction forms, and the extraction method is used for processing various difficult extraction problems of form contents. The visual template design interface greatly simplifies the design difficulty and improves the design efficiency.
While the preferred embodiments of the present invention have been described in detail, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.
Claims (1)
1. A method for formatting and extracting print contents is characterized by comprising the following steps:
s1, intercepting and converting printing contents of a printing document into printing elements to generate a printing element set;
s2, designing extraction elements according to the sampled printing element set to generate an extraction template;
s3, inputting a printing element set and an extraction template, and performing operation by using an extraction engine to generate a formatted extraction result;
in step S2, the extraction template includes an extraction template name, a plurality of extraction elements, and a set of processing scripts; the extraction element comprises a container extraction element;
the container extraction element comprises a form extraction element; the form extraction element is provided with a plurality of basic text extraction elements, and the coordinates of the text extraction elements are relative to the parent container form extraction element;
the specific implementation method of step S1 is:
s1-1, converting the printed document into an EMF file by using a formatted virtual printer;
s1-2, analyzing the EMF file, extracting coordinates and contents, and generating a printing element document;
s1-3, analyzing each printed page and converting the page into a page picture;
the specific implementation method of step S2 is:
s2-1, processing by using a printing formatting extraction template design client;
s2-2, importing printing element set sample data;
s2-3, dragging and setting extraction elements by using a mouse with the aid of a visual interface, and setting related extraction parameters;
s2-4, testing extraction and checking extraction results, if not satisfied, repeating the steps S2-2 to S2-4 until the extraction results of a plurality of printing samples in the same format are satisfied;
s2-5, storing the printing extraction template, uploading the printing extraction template to a printing formatting extraction server, and binding the printing type;
the specific implementation method of step S3 is:
s3-1, uploading the generated printing element document and page picture to a printing formatting extraction server;
s3-2, the printing formatting extraction server calls the designed printing extraction template according to the uploaded related printing types;
s3-3, the extraction engine automatically performs formatted extraction according to the known input information operation, and stores the extraction result in a database;
in step S3-3, the extraction engine operates as follows:
s3-3-1, traversing all pages, and packaging the printing elements of the current page and the page pictures together as the following input parameters;
s3-3-2, traversing all extraction elements on the current page, and performing extraction operation;
if the extraction element is a container extraction element, traversing all the sub-extraction elements, extracting, forming a queue by the extraction results of the sub-extraction elements, and forming a key value pair to return by matching with the key words of the container extraction element;
s3-3-3, converting all returned key value pairs into formatted extraction results in json format;
and S3-3-4, transmitting the formatted extraction result to a processing script in a parameter form, and performing secondary processing by the processing script or directly returning the result without any change.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910526081.1A CN110210455B (en) | 2019-06-18 | 2019-06-18 | Printing content formatting extraction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910526081.1A CN110210455B (en) | 2019-06-18 | 2019-06-18 | Printing content formatting extraction method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110210455A CN110210455A (en) | 2019-09-06 |
CN110210455B true CN110210455B (en) | 2022-03-01 |
Family
ID=67793281
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910526081.1A Active CN110210455B (en) | 2019-06-18 | 2019-06-18 | Printing content formatting extraction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110210455B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112035076A (en) * | 2020-08-25 | 2020-12-04 | 上海中通吉网络技术有限公司 | JSON-based printing analysis method, device, equipment and printing system |
CN113360106B (en) * | 2021-06-30 | 2022-12-09 | 建信金融科技有限责任公司 | Webpage printing method and device |
CN114035755A (en) * | 2021-11-16 | 2022-02-11 | 上海中通吉网络技术有限公司 | Picture processing method and printing method |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102713884A (en) * | 2010-01-29 | 2012-10-03 | 惠普发展公司,有限责任合伙企业 | Remote printing |
CN102819532A (en) * | 2011-06-07 | 2012-12-12 | 解玉麟 | Obtaining and transferring method of web form data |
US8606010B2 (en) * | 2011-03-18 | 2013-12-10 | Seiko Epson Corporation | Identifying text pixels in scanned images |
CN103890748A (en) * | 2011-10-17 | 2014-06-25 | 谷歌公司 | Roving printing in a cloud-based print service |
CN104657091A (en) * | 2013-11-20 | 2015-05-27 | 航天信息股份有限公司 | Method for formatted printing of template data in tax control system |
CN106445426A (en) * | 2016-08-31 | 2017-02-22 | 深圳市华阳信通科技发展有限公司 | Printing driver-based text data acquisition and printing control method and system |
JP2017041073A (en) * | 2015-08-19 | 2017-02-23 | 株式会社スプラインネットワーク | Print data management system, information processing apparatus, print data acquisition program, and method |
Family Cites Families (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6454381B1 (en) * | 2001-04-27 | 2002-09-24 | Hewlett-Packard Company | Method and apparatus for providing ink container extraction characteristics to a printing system |
AU2002952711A0 (en) * | 2002-11-18 | 2002-11-28 | Typefi Systems Pty Ltd | A method of formatting documents |
JP4328604B2 (en) * | 2003-11-21 | 2009-09-09 | キヤノン株式会社 | Image processing method, image processing apparatus, and program |
JP4095617B2 (en) * | 2005-02-28 | 2008-06-04 | キヤノン株式会社 | Document processing apparatus, document processing method, and computer program |
US8150156B2 (en) * | 2006-01-04 | 2012-04-03 | International Business Machines Corporation | Automated processing of paper forms using remotely-stored templates |
US8085980B2 (en) * | 2008-08-13 | 2011-12-27 | Lockheed Martin Corporation | Mail piece identification using bin independent attributes |
CN102830947A (en) * | 2012-08-13 | 2012-12-19 | 南京莱斯信息技术股份有限公司 | Report printing control implemented based on report printing template format |
US9052863B2 (en) * | 2012-08-14 | 2015-06-09 | Seiko Epson Corporation | ePOS printing |
US9864741B2 (en) * | 2014-09-23 | 2018-01-09 | Prysm, Inc. | Automated collective term and phrase index |
CN105589686B (en) * | 2014-11-14 | 2021-03-02 | 航天信息股份有限公司 | Template-based information input and printing method and device under WinCE platform |
US10324926B2 (en) * | 2015-05-15 | 2019-06-18 | Microsoft Technology Licensing, Llc | System and method for extracting and sharing application-related user data |
CN105653216A (en) * | 2015-12-25 | 2016-06-08 | 珠海奔图电子有限公司 | Printing control system and method |
CN107025452A (en) * | 2016-01-29 | 2017-08-08 | 富士通株式会社 | Image-recognizing method and image recognition apparatus |
US9436760B1 (en) * | 2016-02-05 | 2016-09-06 | Quid, Inc. | Measuring accuracy of semantic graphs with exogenous datasets |
JP6887233B2 (en) * | 2016-09-02 | 2021-06-16 | 株式会社アイリックコーポレーション | Insurance policy image analysis system, description content analysis device, mobile terminal and program for mobile terminal |
CN108334627B (en) * | 2018-02-12 | 2022-09-23 | 北京百度网讯科技有限公司 | Method and device for searching new media content and computer equipment |
CN109543690B (en) * | 2018-11-27 | 2020-04-07 | 北京百度网讯科技有限公司 | Method and device for extracting information |
CN109657669B (en) * | 2018-12-13 | 2023-02-14 | 江西金格科技有限公司 | Intelligent electronic seal extraction method based on image processing |
CN109840278A (en) * | 2019-01-28 | 2019-06-04 | 平安科技(深圳)有限公司 | Histogram data switching control method, device, computer equipment and storage medium |
-
2019
- 2019-06-18 CN CN201910526081.1A patent/CN110210455B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102713884A (en) * | 2010-01-29 | 2012-10-03 | 惠普发展公司,有限责任合伙企业 | Remote printing |
US8606010B2 (en) * | 2011-03-18 | 2013-12-10 | Seiko Epson Corporation | Identifying text pixels in scanned images |
CN102819532A (en) * | 2011-06-07 | 2012-12-12 | 解玉麟 | Obtaining and transferring method of web form data |
CN103890748A (en) * | 2011-10-17 | 2014-06-25 | 谷歌公司 | Roving printing in a cloud-based print service |
CN104657091A (en) * | 2013-11-20 | 2015-05-27 | 航天信息股份有限公司 | Method for formatted printing of template data in tax control system |
JP2017041073A (en) * | 2015-08-19 | 2017-02-23 | 株式会社スプラインネットワーク | Print data management system, information processing apparatus, print data acquisition program, and method |
CN106445426A (en) * | 2016-08-31 | 2017-02-22 | 深圳市华阳信通科技发展有限公司 | Printing driver-based text data acquisition and printing control method and system |
Non-Patent Citations (2)
Title |
---|
《一招解决PDF打印和内容提取问题》;我心飞翔;《软件与系统》;20110331;全文 * |
《基于打印指令的打印数据文本信息的提取和追加》;李培然;《中国优秀硕士学位论文全文数据库 信息科技辑》;20160715(第7期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN110210455A (en) | 2019-09-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11868717B2 (en) | Multi-page document recognition in document capture | |
CN110210455B (en) | Printing content formatting extraction method | |
EP3437019B1 (en) | Optical character recognition in structured documents | |
CN101419612B (en) | Image processing device and image processing method | |
US20200065601A1 (en) | Method and system for transforming handwritten text to digital ink | |
Mara et al. | GigaMesh and gilgamesh: –3D multiscale integral invariant cuneiform character extraction | |
US8892990B2 (en) | Automatic creation of a table and query tools | |
US9740995B2 (en) | Coordinate-based document processing and data entry system and method | |
US20140244668A1 (en) | Sorting and Filtering a Table with Image Data and Symbolic Data in a Single Cell | |
US9298685B2 (en) | Automatic creation of multiple rows in a table | |
US10339373B1 (en) | Optical character recognition utilizing hashed templates | |
US9396389B2 (en) | Techniques for detecting user-entered check marks | |
US9558400B2 (en) | Search by stroke | |
JP5905690B2 (en) | Answer processing device, answer processing method, program, and seal | |
Lin et al. | Automatic receipt recognition system based on artificial intelligence technology | |
CN113850265A (en) | PDF document analysis method and device, electronic equipment and storage medium | |
TWM607472U (en) | Text section labeling system | |
Matzig | outlineR: Artefact processing and extraction protocol | |
US20230121351A1 (en) | Systems and processes of extracting unstructured data from complex documents | |
TWI787651B (en) | Method and system for labeling text segment | |
CN111753814B (en) | Sample generation method, device and equipment | |
US9201857B2 (en) | Finding multiple field groupings in semi-structured documents | |
Rahaman et al. | A Rule-based Semi-automated OCR Postprocessing Method for Aligning Multi-language Transcripts with Multi-column Text | |
CN109983447A (en) | Evaluating apparatus, evaluation method, assessment process and evaluation system | |
Shah et al. | Digitization and Paperless Processing through the use of mobile imaging Technology |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |