CN110210455B - Printing content formatting extraction method - Google Patents

Printing content formatting extraction method Download PDF

Info

Publication number
CN110210455B
CN110210455B CN201910526081.1A CN201910526081A CN110210455B CN 110210455 B CN110210455 B CN 110210455B CN 201910526081 A CN201910526081 A CN 201910526081A CN 110210455 B CN110210455 B CN 110210455B
Authority
CN
China
Prior art keywords
extraction
printing
elements
template
extracting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910526081.1A
Other languages
Chinese (zh)
Other versions
CN110210455A (en
Inventor
夏莫戛
张文静
甘玉涛
樊利红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shijiazhuang Jiehong Technology Co ltd
Original Assignee
Shijiazhuang Jiehong Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shijiazhuang Jiehong Technology Co ltd filed Critical Shijiazhuang Jiehong Technology Co ltd
Priority to CN201910526081.1A priority Critical patent/CN110210455B/en
Publication of CN110210455A publication Critical patent/CN110210455A/en
Application granted granted Critical
Publication of CN110210455B publication Critical patent/CN110210455B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables

Abstract

The invention relates to the technical field of document printing, in particular to a method for formatting and extracting printing content, which comprises the following steps: s1, intercepting and converting printing contents of a printing document into printing elements to generate a printing element set; s2, designing extraction elements according to the sampled printing element set to generate an extraction template; and S3, inputting the printing element set and the extraction template, and performing operation by using an extraction engine to generate a formatted extraction result. The method for extracting the printing content in the formatted mode effectively overcomes the defect of extracting the pure text content, and can flexibly, efficiently and accurately extract the content in the complex form. The OCR form is effectively supplemented and optimized. The extraction of precise coordinates is innovatively improved, and the container extraction elements are embedded into the combination of basic extraction elements, so that complex extraction forms can be effectively dealt with. The visual template design interface greatly simplifies the design difficulty and improves the design efficiency.

Description

Printing content formatting extraction method
Technical Field
The invention relates to the technical field of document printing, in particular to a method for formatting and extracting printing content.
Background
At present, the printout is an indispensable content output mode in various industries, but the printout content is only suitable for being watched and read by human eyes, the output content cannot be effectively formatted again, and the secondary processing of the data is not facilitated. In the current era of big data flow, a way to reformat the printout content of other systems is urgently needed, so that the disclosed effective data can be reused in a low-cost and efficient way without data interface authorization. And a basic data acquisition solution is provided for applications such as big data calculation, artificial intelligence and the like.
There are three main ways of extracting content. Firstly, pure text printing content is obtained, and character segmentation and searching matching are carried out aiming at special keywords. Secondly, the printing content is completely converted into pictures, and the content is extracted by utilizing an OCR technology. Thirdly, analyzing the printing standard, acquiring accurate content and matched coordinate information, and extracting the content by utilizing the coordinate partition.
The three extraction modes have the advantages and the disadvantages: the advantage of the first approach is that the way to obtain the underlying data is simple. The method has the disadvantages that complex information cannot be accurately extracted, and analysis errors are easily generated for a large amount of nonstandard table data (such as missing row and column data). The second mode has the advantages that the extraction area can be freely defined, and various types of printing contents can be uniformly converted into pictures for processing. The defects are that the accuracy of the content analyzed by the general OCR is not high, or higher accuracy and performance (high technical implementation difficulty) are obtained after the OCR is trained by relying on big data. The third mode has the advantages that the content is accurate and does not need to be analyzed, and the content is convenient to divide with coordinates. The disadvantage is that it is inconvenient to combine scattered data, and some data which is originally picture content cannot be processed.
Disclosure of Invention
The invention aims to provide a printing content formatting extraction method, which aims to solve the problem of difficult extraction of complex content in the background technology; the method mainly comprises the following steps: the problem that the number of lines of the extracted form is uncertain and the number of lines cannot be accurately determined before extraction is solved; the size of the table row is different, and the influence is caused to the extraction of the partitioned area; the problem of form data paging display extraction; extracting the problem of removing content interference information; the problem of flexible conversion of the image-text mixed extraction mode; extracting the problem of information floating positioning.
In order to achieve the purpose, the invention provides the following technical scheme:
a method for formatting and extracting print contents comprises the following steps:
s1, intercepting and converting the printing content of a printing document into printing elements (including the text content, x and y coordinates of the left upper corner of a corresponding page and height and width information of the displayed text content), and generating a printing element set (including the name of the printing document, the total number of printing pages, the index number of each page, the height and width of each page, the printing elements contained in each page and independent page pictures of each page);
s2, designing extraction elements (mainly comprising extraction element types, keywords, extraction ranges (extracting x and y coordinates, height and width, and extracting elements can be nested) and other special type attribute information) according to the sampled printing element set, and generating an extraction template;
and S3, inputting the printing element set and the extraction template, and performing operation by using an extraction engine to generate a formatted extraction result (including all data extracted by the extraction elements, and forming key value pair data by using the keywords and the extracted contents).
As a further scheme of the invention: in step S2, the extraction template includes an extraction template name, a plurality of extraction elements, and a set of processing scripts; the extraction elements include basic extraction elements or container extraction elements, which may be nested combinations.
As a still further scheme of the invention: the basic extraction elements comprise text extraction elements or bar code extraction elements; the text extraction element comprises an extraction key value and a group of coordinates, the group of coordinates is used for dividing an area relative to the current page and extracting printing elements in the area, and the extraction key value is used for generating a key value pair from the extracted content.
As a still further scheme of the invention: the container extraction element comprises a form extraction element; the form extraction element is provided with a plurality of basic text extraction elements, and the coordinates of the text extraction elements are relative to the parent container form extraction element.
As a still further scheme of the invention: the specific implementation method of step S1 is:
s1-1, converting the printed document into an EMF file by using a formatted virtual printer;
s1-2, analyzing the EMF file, extracting coordinates and contents, and generating a printing element document;
s1-3, each printed page is analyzed and converted into a page picture.
As a still further scheme of the invention: the specific implementation method of step S2 is:
s2-1, processing by using a quick slide printing formatting extraction template design client;
s2-2, importing printing element set sample data;
s2-3, dragging and setting extraction elements by using a mouse with the aid of a visual interface, and setting related extraction parameters;
s2-4, testing extraction and checking extraction results, if not satisfied, repeating the steps S2-2 to S2-4 until the extraction results of a plurality of printing samples in the same format are satisfied;
s2-5, storing the printing extraction template, uploading the template to a printing formatting extraction server, and binding the printing type.
As a still further scheme of the invention: the specific implementation method of step S3 is:
s3-1, uploading the generated printing element document and page picture to a printing formatting extraction server;
s3-2, the printing formatting extraction server calls the designed printing extraction template according to the uploaded related printing types;
and S3-3, the extraction engine automatically performs formatting extraction according to the known input information operation, and stores the extraction result in a database.
As a still further scheme of the invention: in step S3-3, the extraction engine operates as follows:
s3-3-1, traversing all pages, and packaging the printing elements of the current page and the page pictures together as the following input parameters;
s3-3-2, traversing all top-level extraction elements on the current page, and performing extraction operation:
s3-3-2-1, if the extracted element is a basic extracted element, such as a text extracted element or a bar code extracted element, directly matching the extracted result of the extracted element with the key word of the extracted element to form a key value pair and returning the key value pair;
s3-3-2-2, if the extraction element is a container extraction element, such as a form extraction element, traversing all sub extraction elements, extracting, forming a queue by extraction results of the sub extraction elements, and forming a key value pair to return by matching with keywords of the container extraction element;
s3-3-3, converting all returned key value pairs into formatted extraction results in json format;
and S3-3-4, transmitting the formatted extraction result to a processing script in a parameter form, and performing secondary processing by the processing script or directly returning the result without any change.
Compared with the prior art, the invention has the beneficial effects that:
the method for extracting the printing content in the formatted mode solves the problem that complex content is difficult to extract, and mainly comprises the following steps: the problem that the number of lines of the extracted form is uncertain and the number of lines cannot be accurately determined before extraction is solved; the size of the table row is different, and the influence is caused to the extraction of the partitioned area; the problem of form data paging display extraction; extracting the problem of removing content interference information; the problem of flexible conversion of the image-text mixed extraction mode; extracting the problem of information floating positioning.
The method for extracting the printing content in the formatted mode effectively overcomes the defect of extracting the pure text content, and can flexibly, efficiently and accurately extract the content in the complex form. The OCR form is effectively supplemented and optimized, and the calculation efficiency of the OCR is effectively improved in an accurate defined range. The extraction of accurate coordinates is innovatively improved, the combination of embedding basic extraction elements in container extraction elements can effectively deal with complex extraction forms, and the extraction method is used for processing various difficult extraction problems of form contents. The visual template design interface greatly simplifies the design difficulty and improves the design efficiency.
Drawings
FIG. 1 is a block flow diagram of an embodiment of the present invention.
Detailed Description
The technical solution of the present patent will be described in further detail with reference to the following embodiments.
Referring to fig. 1, in an embodiment of the present invention, a method for formatting and extracting print content includes the following steps:
s1, intercepting and converting the printing content of a printing document into printing elements (including the text content, x and y coordinates of the left upper corner of a corresponding page and height and width information of the displayed text content), and generating a printing element set (including the name of the printing document, the total number of printing pages, the index number of each page, the height and width of each page, the printing elements contained in each page and independent page pictures of each page);
s2, designing extraction elements (mainly comprising extraction element types, keywords, extraction ranges (extracting x and y coordinates, height and width, and extracting elements can be nested) and other special type attribute information) according to the sampled printing element set, and generating an extraction template;
and S3, inputting the printing element set and the extraction template, and performing operation by using an extraction engine to generate a formatted extraction result (including all data extracted by the extraction elements, and forming key value pair data by using the keywords and the extracted contents).
Further, in step S2, the extraction template includes an extraction template name, a plurality of extraction elements, and a set of processing scripts; the extraction elements include basic extraction elements or container extraction elements, which may be nested combinations.
Specifically, the basic extraction element comprises a text extraction element or a barcode extraction element; the text extraction element comprises an extraction key value and a group of coordinates, the group of coordinates is used for dividing an area relative to the current page and extracting printing elements in the area, and the extraction key value is used for generating a key value pair from the extracted content.
Specifically, the container extraction element comprises a form extraction element; the form extraction element is provided with a plurality of basic text extraction elements, and the coordinates of the text extraction elements are relative to the parent container form extraction element.
Specifically, the specific implementation method of step S1 is as follows:
s1-1, converting the printed document into an EMF file by using a formatted virtual printer, and specifically, printing by using a quick transport formatted virtual printer;
s1-2, analyzing the EMF file, extracting coordinates and contents, and generating a printing element document (jhcef format file);
s1-3, analyzing each printed page and converting the page into a page picture; in particular, a jpg picture may be converted.
Specifically, the specific implementation method of step S2 is as follows:
s2-1, processing by using a quick slide printing formatting extraction template design client;
s2-2, importing printing element set sample data;
s2-3, dragging and setting extraction elements by using a mouse with the aid of a visual interface, and setting related extraction parameters;
s2-4, testing extraction and checking extraction results, if not satisfied, repeating the steps S2-2 to S2-4 until the extraction results of a plurality of printing samples in the same format are satisfied;
s2-5, storing the printing extraction template, uploading the template to a printing formatting extraction server, and binding the printing type.
Specifically, the specific implementation method of step S3 is as follows:
s3-1, uploading the generated printing element document and page picture to a printing formatting extraction server;
s3-2, the printing formatting extraction server calls the designed printing extraction template according to the uploaded related printing types;
and S3-3, the extraction engine automatically performs formatting extraction according to known input information operation, and stores the extraction result into a database, wherein the format of the document of the formatting extraction result is jhcer.
Further, in step S3-3, the extraction engine operates as follows:
s3-3-1, traversing all pages, and packaging the printing elements of the current page and the page pictures together as the following input parameters;
s3-3-2, traversing all top-level extraction elements on the current page, and performing extraction operation:
s3-3-2-1, if the extracted element is a basic extracted element, such as a text extracted element or a bar code extracted element, directly matching the extracted result of the extracted element with the key word of the extracted element to form a key value pair and returning the key value pair;
s3-3-2-2, if the extraction element is a container extraction element, such as a form extraction element, traversing all sub extraction elements, extracting, forming a queue by extraction results of the sub extraction elements, and forming a key value pair to return by matching with keywords of the container extraction element;
s3-3-3, converting all returned key value pairs into formatted extraction results in json format;
and S3-3-4, transmitting the formatted extraction result to a processing script in a parameter form, and performing secondary processing by the processing script or directly returning the result without any change.
The invention comprehensively utilizes the advantages of the prior schemes, uses the proper scheme combination under the proper environment and achieves the optimal extraction formatting effect. The invention designs an extraction template according to the printing elements with coordinates. The extraction template comprises a plurality of extraction elements and a group of processing scripts. The extraction elements are divided into text extraction elements, form extraction elements and bar code extraction elements. The text extraction element is the most basic extraction element and comprises a set of coordinates which can define an area relative to the current page for extracting the printing elements in the area. In addition, the method also comprises extracting key values which are used for generating key value pairs from the extracted contents. The form extraction element is a container extraction element that requires multiple underlying text extraction elements to be placed in it, with coordinates relative to its parent container form extraction element. By utilizing the visual interface, a user can conveniently set the extraction template by clicking and dragging a mouse. And then, the printing elements and the extraction template are delivered to an extraction engine for calculation, and an extraction result in a json format is obtained after calculation. The method for extracting the printing content in the formatted mode effectively overcomes the defect of extracting the pure text content, and can flexibly, efficiently and accurately extract the content in the complex form. The OCR form is effectively supplemented and optimized, and the calculation efficiency of the OCR is effectively improved in an accurate defined range. The extraction of accurate coordinates is innovatively improved, the combination of embedding basic extraction elements in container extraction elements can effectively deal with complex extraction forms, and the extraction method is used for processing various difficult extraction problems of form contents. The visual template design interface greatly simplifies the design difficulty and improves the design efficiency.
While the preferred embodiments of the present invention have been described in detail, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims (1)

1. A method for formatting and extracting print contents is characterized by comprising the following steps:
s1, intercepting and converting printing contents of a printing document into printing elements to generate a printing element set;
s2, designing extraction elements according to the sampled printing element set to generate an extraction template;
s3, inputting a printing element set and an extraction template, and performing operation by using an extraction engine to generate a formatted extraction result;
in step S2, the extraction template includes an extraction template name, a plurality of extraction elements, and a set of processing scripts; the extraction element comprises a container extraction element;
the container extraction element comprises a form extraction element; the form extraction element is provided with a plurality of basic text extraction elements, and the coordinates of the text extraction elements are relative to the parent container form extraction element;
the specific implementation method of step S1 is:
s1-1, converting the printed document into an EMF file by using a formatted virtual printer;
s1-2, analyzing the EMF file, extracting coordinates and contents, and generating a printing element document;
s1-3, analyzing each printed page and converting the page into a page picture;
the specific implementation method of step S2 is:
s2-1, processing by using a printing formatting extraction template design client;
s2-2, importing printing element set sample data;
s2-3, dragging and setting extraction elements by using a mouse with the aid of a visual interface, and setting related extraction parameters;
s2-4, testing extraction and checking extraction results, if not satisfied, repeating the steps S2-2 to S2-4 until the extraction results of a plurality of printing samples in the same format are satisfied;
s2-5, storing the printing extraction template, uploading the printing extraction template to a printing formatting extraction server, and binding the printing type;
the specific implementation method of step S3 is:
s3-1, uploading the generated printing element document and page picture to a printing formatting extraction server;
s3-2, the printing formatting extraction server calls the designed printing extraction template according to the uploaded related printing types;
s3-3, the extraction engine automatically performs formatted extraction according to the known input information operation, and stores the extraction result in a database;
in step S3-3, the extraction engine operates as follows:
s3-3-1, traversing all pages, and packaging the printing elements of the current page and the page pictures together as the following input parameters;
s3-3-2, traversing all extraction elements on the current page, and performing extraction operation;
if the extraction element is a container extraction element, traversing all the sub-extraction elements, extracting, forming a queue by the extraction results of the sub-extraction elements, and forming a key value pair to return by matching with the key words of the container extraction element;
s3-3-3, converting all returned key value pairs into formatted extraction results in json format;
and S3-3-4, transmitting the formatted extraction result to a processing script in a parameter form, and performing secondary processing by the processing script or directly returning the result without any change.
CN201910526081.1A 2019-06-18 2019-06-18 Printing content formatting extraction method Active CN110210455B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910526081.1A CN110210455B (en) 2019-06-18 2019-06-18 Printing content formatting extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910526081.1A CN110210455B (en) 2019-06-18 2019-06-18 Printing content formatting extraction method

Publications (2)

Publication Number Publication Date
CN110210455A CN110210455A (en) 2019-09-06
CN110210455B true CN110210455B (en) 2022-03-01

Family

ID=67793281

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910526081.1A Active CN110210455B (en) 2019-06-18 2019-06-18 Printing content formatting extraction method

Country Status (1)

Country Link
CN (1) CN110210455B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112035076A (en) * 2020-08-25 2020-12-04 上海中通吉网络技术有限公司 JSON-based printing analysis method, device, equipment and printing system
CN113360106B (en) * 2021-06-30 2022-12-09 建信金融科技有限责任公司 Webpage printing method and device
CN114035755A (en) * 2021-11-16 2022-02-11 上海中通吉网络技术有限公司 Picture processing method and printing method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102713884A (en) * 2010-01-29 2012-10-03 惠普发展公司,有限责任合伙企业 Remote printing
CN102819532A (en) * 2011-06-07 2012-12-12 解玉麟 Obtaining and transferring method of web form data
US8606010B2 (en) * 2011-03-18 2013-12-10 Seiko Epson Corporation Identifying text pixels in scanned images
CN103890748A (en) * 2011-10-17 2014-06-25 谷歌公司 Roving printing in a cloud-based print service
CN104657091A (en) * 2013-11-20 2015-05-27 航天信息股份有限公司 Method for formatted printing of template data in tax control system
CN106445426A (en) * 2016-08-31 2017-02-22 深圳市华阳信通科技发展有限公司 Printing driver-based text data acquisition and printing control method and system
JP2017041073A (en) * 2015-08-19 2017-02-23 株式会社スプラインネットワーク Print data management system, information processing apparatus, print data acquisition program, and method

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6454381B1 (en) * 2001-04-27 2002-09-24 Hewlett-Packard Company Method and apparatus for providing ink container extraction characteristics to a printing system
AU2002952711A0 (en) * 2002-11-18 2002-11-28 Typefi Systems Pty Ltd A method of formatting documents
JP4328604B2 (en) * 2003-11-21 2009-09-09 キヤノン株式会社 Image processing method, image processing apparatus, and program
JP4095617B2 (en) * 2005-02-28 2008-06-04 キヤノン株式会社 Document processing apparatus, document processing method, and computer program
US8150156B2 (en) * 2006-01-04 2012-04-03 International Business Machines Corporation Automated processing of paper forms using remotely-stored templates
US8085980B2 (en) * 2008-08-13 2011-12-27 Lockheed Martin Corporation Mail piece identification using bin independent attributes
CN102830947A (en) * 2012-08-13 2012-12-19 南京莱斯信息技术股份有限公司 Report printing control implemented based on report printing template format
US9052863B2 (en) * 2012-08-14 2015-06-09 Seiko Epson Corporation ePOS printing
US9864741B2 (en) * 2014-09-23 2018-01-09 Prysm, Inc. Automated collective term and phrase index
CN105589686B (en) * 2014-11-14 2021-03-02 航天信息股份有限公司 Template-based information input and printing method and device under WinCE platform
US10324926B2 (en) * 2015-05-15 2019-06-18 Microsoft Technology Licensing, Llc System and method for extracting and sharing application-related user data
CN105653216A (en) * 2015-12-25 2016-06-08 珠海奔图电子有限公司 Printing control system and method
CN107025452A (en) * 2016-01-29 2017-08-08 富士通株式会社 Image-recognizing method and image recognition apparatus
US9436760B1 (en) * 2016-02-05 2016-09-06 Quid, Inc. Measuring accuracy of semantic graphs with exogenous datasets
JP6887233B2 (en) * 2016-09-02 2021-06-16 株式会社アイリックコーポレーション Insurance policy image analysis system, description content analysis device, mobile terminal and program for mobile terminal
CN108334627B (en) * 2018-02-12 2022-09-23 北京百度网讯科技有限公司 Method and device for searching new media content and computer equipment
CN109543690B (en) * 2018-11-27 2020-04-07 北京百度网讯科技有限公司 Method and device for extracting information
CN109657669B (en) * 2018-12-13 2023-02-14 江西金格科技有限公司 Intelligent electronic seal extraction method based on image processing
CN109840278A (en) * 2019-01-28 2019-06-04 平安科技(深圳)有限公司 Histogram data switching control method, device, computer equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102713884A (en) * 2010-01-29 2012-10-03 惠普发展公司,有限责任合伙企业 Remote printing
US8606010B2 (en) * 2011-03-18 2013-12-10 Seiko Epson Corporation Identifying text pixels in scanned images
CN102819532A (en) * 2011-06-07 2012-12-12 解玉麟 Obtaining and transferring method of web form data
CN103890748A (en) * 2011-10-17 2014-06-25 谷歌公司 Roving printing in a cloud-based print service
CN104657091A (en) * 2013-11-20 2015-05-27 航天信息股份有限公司 Method for formatted printing of template data in tax control system
JP2017041073A (en) * 2015-08-19 2017-02-23 株式会社スプラインネットワーク Print data management system, information processing apparatus, print data acquisition program, and method
CN106445426A (en) * 2016-08-31 2017-02-22 深圳市华阳信通科技发展有限公司 Printing driver-based text data acquisition and printing control method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《一招解决PDF打印和内容提取问题》;我心飞翔;《软件与系统》;20110331;全文 *
《基于打印指令的打印数据文本信息的提取和追加》;李培然;《中国优秀硕士学位论文全文数据库 信息科技辑》;20160715(第7期);全文 *

Also Published As

Publication number Publication date
CN110210455A (en) 2019-09-06

Similar Documents

Publication Publication Date Title
US11868717B2 (en) Multi-page document recognition in document capture
CN110210455B (en) Printing content formatting extraction method
EP3437019B1 (en) Optical character recognition in structured documents
CN101419612B (en) Image processing device and image processing method
US20200065601A1 (en) Method and system for transforming handwritten text to digital ink
Mara et al. GigaMesh and gilgamesh: –3D multiscale integral invariant cuneiform character extraction
US8892990B2 (en) Automatic creation of a table and query tools
US9740995B2 (en) Coordinate-based document processing and data entry system and method
US20140244668A1 (en) Sorting and Filtering a Table with Image Data and Symbolic Data in a Single Cell
US9298685B2 (en) Automatic creation of multiple rows in a table
US10339373B1 (en) Optical character recognition utilizing hashed templates
US9396389B2 (en) Techniques for detecting user-entered check marks
US9558400B2 (en) Search by stroke
JP5905690B2 (en) Answer processing device, answer processing method, program, and seal
Lin et al. Automatic receipt recognition system based on artificial intelligence technology
CN113850265A (en) PDF document analysis method and device, electronic equipment and storage medium
TWM607472U (en) Text section labeling system
Matzig outlineR: Artefact processing and extraction protocol
US20230121351A1 (en) Systems and processes of extracting unstructured data from complex documents
TWI787651B (en) Method and system for labeling text segment
CN111753814B (en) Sample generation method, device and equipment
US9201857B2 (en) Finding multiple field groupings in semi-structured documents
Rahaman et al. A Rule-based Semi-automated OCR Postprocessing Method for Aligning Multi-language Transcripts with Multi-column Text
CN109983447A (en) Evaluating apparatus, evaluation method, assessment process and evaluation system
Shah et al. Digitization and Paperless Processing through the use of mobile imaging Technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant