CN116704540A - Technology for marking paper file content and converting paper file content into OFD file with high fidelity - Google Patents

Technology for marking paper file content and converting paper file content into OFD file with high fidelity Download PDF

Info

Publication number
CN116704540A
CN116704540A CN202310996818.2A CN202310996818A CN116704540A CN 116704540 A CN116704540 A CN 116704540A CN 202310996818 A CN202310996818 A CN 202310996818A CN 116704540 A CN116704540 A CN 116704540A
Authority
CN
China
Prior art keywords
picture
file
ofd
content
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310996818.2A
Other languages
Chinese (zh)
Inventor
严伟
何冉冉
何中
朱聪聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Zhongwei Technology Software System Co ltd
Original Assignee
Jiangsu Zhongwei Technology Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Zhongwei Technology Software System Co ltd filed Critical Jiangsu Zhongwei Technology Software System Co ltd
Priority to CN202310996818.2A priority Critical patent/CN116704540A/en
Publication of CN116704540A publication Critical patent/CN116704540A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/42Document-oriented image-based pattern recognition based on the type of document
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/116Details of conversion of file system types or formats
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5846Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/1444Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/15Cutting or merging image elements, e.g. region growing, watershed or clustering-based techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Library & Information Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Character Input (AREA)

Abstract

The invention provides a technology for identifying the content of a paper file and converting the paper file into an OFD file with high fidelity, which reads a picture to be processed and detects and identifies the content and the position of characters in the picture; identifying line elements in the picture; identifying the position and the content of the graphic elements in the picture; establishing an information base of identification results of the text, the line and the graphic information in the identified picture; creating an OFD file, and converting the identified picture information into the OFD file in a one-to-one correspondence manner; after conversion is completed, transparent information layers are covered on the text, line and graphic information on the OFD file, the invention can process the paper file into an electronic file, a solid foundation is made for the utilization of the paper file after the recognition of the paper file as the electronic file, the content is extracted, the digital utilization is facilitated, the original mold is high-fidelity, the state before the file processing is stored, and the specific position of the file can be quickly found and positioned by various tools in the utilization process by covering the transparent information layers.

Description

Technology for marking paper file content and converting paper file content into OFD file with high fidelity
Technical Field
The invention relates to the field of picture conversion, in particular to a technology for identifying paper file contents and converting the paper file contents into an OFD file with high fidelity.
Background
The technology of tablet personal computers, electronic paper books and the like is appeared, so that reading objects are gradually converted from paper files to electronic files, the paper files are very rich in smoke and sea at present, and the technology of converting the paper files into the electronic files is required to be suitable for the technology to meet the reading requirements of readers.
A common technology for converting a paper file into an electronic file is OCR (Opt ical Character Recognition ) technology, wherein the core of the OCR technology is to recognize character pictures one by one, and the judgment basis is the outline of the character pictures. The existing picture conversion also has the following problems:
(1) The OFD file generated by the picture can not extract the text and the linear information in the picture, and can only be displayed;
(2) The content of the graph in the picture cannot be identified and positioned, and quick positioning cannot be realized when the graph information is required to be inquired in the picture file;
(3) The generation of the OFD file by manually extracting the contents such as text, straight line, graphic information and the like in the picture file takes much time and effort.
Disclosure of Invention
The present invention aims to provide a technique for identifying the content of a paper file and converting the content into an OFD file with high fidelity, so as to solve the problems set forth in the background art.
In order to achieve the above purpose, the present invention provides the following technical solutions: the technology for identifying the content of the paper file and converting the content into the OFD file with high fidelity is characterized by comprising the following steps:
step S1: scanning the paper file as a picture, reading the picture to be processed, and detecting and identifying the content, the word size and the position of the words in the picture;
step S2: identifying line elements in the picture;
step S3: identifying the position and the content of the graphic elements in the picture;
step S4: establishing an information base of identification results of the text, the line and the graphic information in the identified picture;
step S5: creating an OFD file, and converting the identified characters, lines and other information into the OFD file in a one-to-one correspondence with original molds in a picture mode;
step S6: after the conversion is completed, a transparent information layer is covered on each character, line and graphic information on the OFD file.
Preferably, the text detection includes the steps of:
step 1: acquiring a part with characters in the picture, and detecting the characters in the picture by using a character detection algorithm, wherein the character detection algorithm comprises, but is not limited to, CTPN, DBNet, faster-RCNN, yoLo and other character detection algorithms;
step 2: returning the detection result of the position of the current character and the height of the character after the detection is finished;
step 3: recognizing text content on the picture using text recognition algorithms including, but not limited to, CRNN, 2D-CTC, SVTR, SVR, etc. algorithms available for text recognition;
step 4: and calculating the current character size according to the pixel size proportion of the character height in the paper.
Preferably, the detecting of the line element includes the steps of:
step 1: the line position in the picture is subjected to segmentation detection through an image segmentation algorithm, wherein the image segmentation algorithm comprises, but is not limited to, algorithms such as Unet, mask-RCNN, segNet, FCN and the like which can be used for image segmentation;
step 2: and returning the position information of the current line after the detection is completed.
Preferably, the pattern recognition in the picture comprises the following steps:
step 1: intercepting a part with a graph in the picture, and detecting the graph element in the picture by using a target detection algorithm, wherein the target detection algorithm comprises but is not limited to target detection algorithms such as fast-RCNN, yolo and the like;
step 2: and returning the position information of the current graph after the detection is completed.
Preferably, the establishment of the information base includes the following steps:
step 1: creating a picture content information base;
step 2: the recognized text, line and graphic information are respectively classified and imported into an information base, the graphic content is classified and identified and then imported into the information base, and the information base stores various contents respectively.
Preferably, the picture information conversion includes the steps of:
step 1: identifying the position and the size of the picture;
step 2: creating a normalized OFD blank page according to the identified picture size;
step 3: importing the corresponding pictures in the established information base according to the OFD file format, and converting the text, the line and the picture information into blank OFD files in a one-to-one correspondence manner.
Preferably, the covering the transparent information layer on the OFD file includes the following steps:
step 1: when a picture is imported, generating a file according to an OFD file standard format from an information base of the identification result and setting the transparency of a picture layer;
step 2: covering transparent characters and characters of lines on the picture according to the identification result, wherein the character size of the covered characters is the same as the character size of the fonts on the picture;
step 3: overlaying a typeface of the specific content of the transparent graphic on the identified graphic;
preferably, the picture searching and positioning is characterized in that: and inputting a keyword in a search box of the OFD file, clicking and searching to search the whole text after the keyword is input, and automatically jumping and positioning to a specific position of a picture in the OFD file after the search is finished.
Preferably, the method for converting the picture is also suitable for a format data stream file and a PDF file, and the overlay transparent information layer is suitable for the format data stream file.
Compared with the prior art, the invention has the beneficial effects that:
(1) The invention can process the paper file into the electronic file, and provides a solid foundation for the utilization of the paper file after being identified as the electronic file;
(2) In the process of processing the paper file into the electronic file, the content is extracted, so that the digital utilization is facilitated, and the original form is high-fidelity, so that the state before the file processing is stored;
(3) According to the method, the picture is converted into the OFD file, and transparent content identification characters are covered on the characters, lines, graphics and other elements of the picture, so that the position of the characters in the picture can be automatically positioned when the content in the picture is searched in the OFD reader, and if the searched content is the graphic content in the picture, the position of the graphics can be positioned, and in this way, the file can be quickly found and positioned to the specific position of the file through various tools when in use.
Drawings
FIG. 1 is a schematic flow chart of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
When the paper file is converted into the OFD file, the following method is adopted when the file content is characters, and the method comprises the following steps:
step S1: scanning a paper file as a picture, reading the picture to be processed, obtaining a part with characters in the picture, and detecting the characters in the picture by using a character detection algorithm, wherein the character detection algorithm comprises, but is not limited to, CTPN, DBNet, faster-RCNN, yoLo and other character detection algorithms and target detection algorithms;
step S2: returning the detection result of the position of the current character and the height of the character after the detection is finished;
step S3: recognizing text content on the picture using text recognition algorithms including, but not limited to, CRNN, 2D-CTC, SVTR, SVR, etc. algorithms available for text recognition;
step S4: calculating the current character size according to the pixel size proportion of the character height in the paper;
step S5: creating a picture content information base, respectively importing the identified text information into the information base, identifying the text content, and then importing the text content into the information base, wherein the information base stores various contents respectively;
step S6: identifying the position and the size of a picture, and creating a normalized OFD blank page according to the identified size of the picture;
step S7: importing the corresponding pictures in the established information base according to an OFD file format, and converting the identified text information into OFD files in a one-to-one correspondence of original models in a picture mode;
step S8: when a picture is imported, generating a file according to an OFD file standard format from an information base of the identification result and setting the transparency of a picture layer;
step S9: covering transparent characters on the picture according to the identification result, wherein the character size of the covered characters is the same as the character size of the fonts on the picture;
example two
When the paper file is converted into the OFD file, the following method is adopted when the file content is a line, and the method comprises the following steps:
step S1: scanning a paper file as a picture, reading the picture to be processed, and performing segmentation detection on the line position in the picture through an image segmentation algorithm, wherein the image segmentation algorithm comprises, but is not limited to, algorithms such as Unet, mask-RCNN, segNet, FCN and the like which can be used for image segmentation;
step S2: returning the position information of the current line after the detection is completed;
step S3: creating a picture content information base, importing the identified line information into the information base, classifying and identifying the picture content, importing the image content into the information base, and storing the identified content by the information base;
step S4: identifying the position and the size of a picture, and creating a normalized OFD blank page according to the identified size of the picture;
step S5: importing the corresponding pictures in the established information base according to an OFD file format, and converting the identified line information into OFD files in a one-to-one correspondence of original modes in a picture mode;
step S6: when a picture is imported, generating a file according to an OFD file standard format from an information base of the identification result and setting the transparency of a picture layer;
step S7: covering transparent characters of lines on the picture according to the identification result;
example III
When the paper file is converted into the OFD file, the following method is adopted when the file content is a graph, and the method comprises the following steps:
step S1: scanning a paper file as a picture, reading the picture to be processed, intercepting a part with a graph in the picture, and detecting the graph element in the picture by using a target detection algorithm, wherein the target detection algorithm comprises but is not limited to target detection algorithms such as fast-RCNN, yolo and the like;
step S2: returning the position information of the current graph after the detection is completed;
step S3: creating a picture content information base, importing the identified picture information into the information base, classifying and identifying the picture content, importing the picture content into the information base, and respectively storing various contents by the information base;
step S4: identifying the position and the size of a picture, and creating a normalized OFD blank page according to the identified size of the picture;
step S5: importing the corresponding pictures in the established information base according to an OFD file format, and converting the identified graphic information into an OFD file in a one-to-one correspondence of original models in a picture mode;
step S6: when a picture is imported, generating a file according to an OFD file standard format from an information base of the identification result and setting the transparency of a picture layer;
step S7: covering transparent graphic content word on the picture according to the identification result;
when the converted picture content is searched and text information is required to be searched, inputting text keywords required to be searched in a search box of the OFD file, clicking for searching after the keywords are input, searching for the whole text, and automatically jumping to be positioned at the text specific position of the picture in the OFD file after the searching is finished.
When the converted picture content is searched, inputting the line information to be searched in a search box of the OFD file, clicking the search after the keyword is input, searching the whole text, and automatically jumping to be positioned at the specific line position of the picture in the OFD file after the search is finished.
When the converted picture content is searched and the graphic information in the picture is required to be searched, inputting the content information of the graphic required to be searched in a search box of the OFD file, clicking the search after the keyword is input, searching the whole text, and automatically jumping and positioning to the specific position of the graphic of the picture in the OFD file after the search is finished.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims (9)

1. The technology for identifying the content of the paper file and converting the content into the OFD file with high fidelity is characterized by comprising the following steps:
step S1: scanning the paper file as a picture, reading the picture to be processed, and detecting and identifying the content, the word size and the position of the words in the picture;
step S2: identifying line elements in the picture;
step S3: identifying the position and the content of the graphic elements in the picture;
step S4: establishing an information base of identification results of the text, the line and the graphic information in the identified picture;
step S5: creating an OFD file, and converting the identified characters, lines and other information into the OFD file in a one-to-one correspondence with original molds in a picture mode;
step S6: after the conversion is completed, a transparent information layer is covered on each character, line and graphic information on the OFD file.
2. The technique for identifying and converting paper document content into OFD documents with high fidelity according to claim 1, wherein: the text detection comprises the following steps:
step 1: acquiring a part with characters in the picture, and detecting the characters in the picture by using a character detection algorithm, wherein the character detection algorithm comprises, but is not limited to, CTPN, DBNet, faster-RCNN, yoLo and other character detection algorithms;
step 2: returning the detection result of the position of the current character and the height of the character after the detection is finished;
step 3: recognizing text content on the picture using text recognition algorithms including, but not limited to, CRNN, 2D-CTC, SVTR, SVR, etc. algorithms available for text recognition;
step 4: and calculating the current character size according to the pixel size proportion of the character height in the paper.
3. The technique for identifying and converting paper document content into OFD documents with high fidelity according to claim 1, wherein: the detection of the line element comprises the following steps:
step 1: the line position in the picture is subjected to segmentation detection through an image segmentation algorithm, wherein the image segmentation algorithm comprises, but is not limited to, algorithms such as Unet, mask-RCNN, segNet, FCN and the like which can be used for image segmentation;
step 2: and returning the position information of the current line after the detection is completed.
4. The technique for identifying and converting paper document content into OFD documents with high fidelity according to claim 1, wherein: the pattern recognition in the picture comprises the following steps:
step 1: intercepting a part with a graph in the picture, and detecting the graph element in the picture by using a target detection algorithm, wherein the target detection algorithm comprises but is not limited to target detection algorithms such as fast-RCNN, yolo and the like;
step 2: and returning the position information of the current graph after the detection is completed.
5. The technique for identifying and converting paper document content into OFD documents with high fidelity according to claim 1, wherein: the establishment of the information base comprises the following steps:
step 1: creating a picture content information base;
step 2: the recognized text, line and graphic information are respectively classified and imported into an information base, the graphic content is classified and identified and then imported into the information base, and the information base stores various contents respectively.
6. The technique for identifying and converting paper document content into OFD documents with high fidelity according to claim 1, wherein: the picture information conversion includes the steps of:
step 1: identifying the position and the size of the picture;
step 2: creating a normalized OFD blank page according to the identified picture size;
step 3: importing the corresponding pictures in the established information base according to the OFD file format, and converting the text, the line and the picture information into blank OFD files in a one-to-one correspondence manner.
7. The technique for identifying and converting paper document content into OFD documents with high fidelity according to claim 1, wherein: the covering of the transparent information layer on the OFD file comprises the following steps:
step 1: when a picture is imported, generating a file according to an OFD file standard format from an information base of the identification result and setting the transparency of a picture layer;
step 2: covering transparent characters and characters of lines on the picture according to the identification result, wherein the character size of the covered characters is the same as the character size of the fonts on the picture;
step 3: and overlaying a typeface of the specific content of the transparent graph on the identified graph.
8. The technology for identifying the content of the paper file and converting the content into the OFD file with high fidelity is characterized in that: and inputting a keyword in a search box of the OFD file, clicking and searching to search the whole text after the keyword is input, and automatically jumping and positioning to a specific position of a picture in the OFD file after the search is finished.
9. The technique for identifying and converting paper document content into OFD documents with high fidelity according to claim 1, wherein: the method for converting the picture is also suitable for the format data stream file and the PDF file, and the covering transparent information layer is suitable for the format data stream file.
CN202310996818.2A 2023-08-09 2023-08-09 Technology for marking paper file content and converting paper file content into OFD file with high fidelity Pending CN116704540A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310996818.2A CN116704540A (en) 2023-08-09 2023-08-09 Technology for marking paper file content and converting paper file content into OFD file with high fidelity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310996818.2A CN116704540A (en) 2023-08-09 2023-08-09 Technology for marking paper file content and converting paper file content into OFD file with high fidelity

Publications (1)

Publication Number Publication Date
CN116704540A true CN116704540A (en) 2023-09-05

Family

ID=87836113

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310996818.2A Pending CN116704540A (en) 2023-08-09 2023-08-09 Technology for marking paper file content and converting paper file content into OFD file with high fidelity

Country Status (1)

Country Link
CN (1) CN116704540A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108415887A (en) * 2018-02-09 2018-08-17 武汉大学 A kind of method that pdf document is converted to OFD files
CN109829139A (en) * 2019-01-30 2019-05-31 中国软件与技术服务股份有限公司 The method and apparatus that a kind of stream-oriented file of DOC/DOCX format is converted into the layout files of OFD format
CN111898433A (en) * 2020-06-22 2020-11-06 百望股份有限公司 Paper bill digitization method and device
CN114463758A (en) * 2022-01-28 2022-05-10 南京云档信息科技有限公司 OCR double-layer file generation method capable of retaining native content

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108415887A (en) * 2018-02-09 2018-08-17 武汉大学 A kind of method that pdf document is converted to OFD files
CN109829139A (en) * 2019-01-30 2019-05-31 中国软件与技术服务股份有限公司 The method and apparatus that a kind of stream-oriented file of DOC/DOCX format is converted into the layout files of OFD format
CN111898433A (en) * 2020-06-22 2020-11-06 百望股份有限公司 Paper bill digitization method and device
CN114463758A (en) * 2022-01-28 2022-05-10 南京云档信息科技有限公司 OCR double-layer file generation method capable of retaining native content

Similar Documents

Publication Publication Date Title
US10353997B1 (en) Freeform annotation transcription
US20040139391A1 (en) Integration of handwritten annotations into an electronic original
US9916499B2 (en) Method and system for linking printed objects with electronic content
KR101552525B1 (en) A system for recognizing a font and providing its information and the method thereof
Isheawy et al. Optical character recognition (OCR) system
CN113221711A (en) Information extraction method and device
Singla et al. Optical character recognition based speech synthesis system using LabVIEW
CN113901952A (en) Print form and handwritten form separated character recognition method based on deep learning
CN115828874A (en) Industry table digital processing method based on image recognition technology
CN109685061A (en) The recognition methods of mathematical formulae suitable for structuring
CN111859885A (en) Automatic generation method and system for legal decision book
CN112464907A (en) Document processing system and method
CN115661846A (en) Data processing method and device, electronic equipment and storage medium
CN114579796B (en) Machine reading understanding method and device
CN110929479A (en) Method and device for converting PDF scanning piece, electronic equipment and storage medium
CN115909449A (en) File processing method, file processing device, electronic equipment, storage medium and program product
JP2013152564A (en) Document processor and document processing method
CN116704540A (en) Technology for marking paper file content and converting paper file content into OFD file with high fidelity
CN111241955B (en) Bill information extraction method and system
CN109409359A (en) A kind of method for extracting video captions based on deep learning
CN115203474A (en) Automatic database classification and extraction technology
Baloun et al. ChronSeg: Novel Dataset for Segmentation of Handwritten Historical Chronicles.
US20220237397A1 (en) Identifying handwritten signatures in digital images using ocr residues
JP4031189B2 (en) Document recognition apparatus and document recognition method
US9483694B2 (en) Image text search and retrieval system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20230905