CN116704540A - Technology for marking paper file content and converting paper file content into OFD file with high fidelity - Google Patents
Technology for marking paper file content and converting paper file content into OFD file with high fidelity Download PDFInfo
- Publication number
- CN116704540A CN116704540A CN202310996818.2A CN202310996818A CN116704540A CN 116704540 A CN116704540 A CN 116704540A CN 202310996818 A CN202310996818 A CN 202310996818A CN 116704540 A CN116704540 A CN 116704540A
- Authority
- CN
- China
- Prior art keywords
- picture
- file
- ofd
- content
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 claims abstract description 21
- 238000006243 chemical reaction Methods 0.000 claims abstract description 7
- 238000001514 detection method Methods 0.000 claims description 37
- 238000003709 image segmentation Methods 0.000 claims description 9
- 230000009191 jumping Effects 0.000 claims description 5
- 102100032202 Cornulin Human genes 0.000 claims description 3
- 101000920981 Homo sapiens Cornulin Proteins 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 238000003909 pattern recognition Methods 0.000 claims description 2
- 239000007787 solid Substances 0.000 abstract description 2
- 238000012015 optical character recognition Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000000779 smoke Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/42—Document-oriented image-based pattern recognition based on the type of document
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/11—File system administration, e.g. details of archiving or snapshots
- G06F16/116—Details of conversion of file system types or formats
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/51—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/5846—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/1444—Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/15—Cutting or merging image elements, e.g. region growing, watershed or clustering-based techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Library & Information Science (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Character Input (AREA)
Abstract
The invention provides a technology for identifying the content of a paper file and converting the paper file into an OFD file with high fidelity, which reads a picture to be processed and detects and identifies the content and the position of characters in the picture; identifying line elements in the picture; identifying the position and the content of the graphic elements in the picture; establishing an information base of identification results of the text, the line and the graphic information in the identified picture; creating an OFD file, and converting the identified picture information into the OFD file in a one-to-one correspondence manner; after conversion is completed, transparent information layers are covered on the text, line and graphic information on the OFD file, the invention can process the paper file into an electronic file, a solid foundation is made for the utilization of the paper file after the recognition of the paper file as the electronic file, the content is extracted, the digital utilization is facilitated, the original mold is high-fidelity, the state before the file processing is stored, and the specific position of the file can be quickly found and positioned by various tools in the utilization process by covering the transparent information layers.
Description
Technical Field
The invention relates to the field of picture conversion, in particular to a technology for identifying paper file contents and converting the paper file contents into an OFD file with high fidelity.
Background
The technology of tablet personal computers, electronic paper books and the like is appeared, so that reading objects are gradually converted from paper files to electronic files, the paper files are very rich in smoke and sea at present, and the technology of converting the paper files into the electronic files is required to be suitable for the technology to meet the reading requirements of readers.
A common technology for converting a paper file into an electronic file is OCR (Opt ical Character Recognition ) technology, wherein the core of the OCR technology is to recognize character pictures one by one, and the judgment basis is the outline of the character pictures. The existing picture conversion also has the following problems:
(1) The OFD file generated by the picture can not extract the text and the linear information in the picture, and can only be displayed;
(2) The content of the graph in the picture cannot be identified and positioned, and quick positioning cannot be realized when the graph information is required to be inquired in the picture file;
(3) The generation of the OFD file by manually extracting the contents such as text, straight line, graphic information and the like in the picture file takes much time and effort.
Disclosure of Invention
The present invention aims to provide a technique for identifying the content of a paper file and converting the content into an OFD file with high fidelity, so as to solve the problems set forth in the background art.
In order to achieve the above purpose, the present invention provides the following technical solutions: the technology for identifying the content of the paper file and converting the content into the OFD file with high fidelity is characterized by comprising the following steps:
step S1: scanning the paper file as a picture, reading the picture to be processed, and detecting and identifying the content, the word size and the position of the words in the picture;
step S2: identifying line elements in the picture;
step S3: identifying the position and the content of the graphic elements in the picture;
step S4: establishing an information base of identification results of the text, the line and the graphic information in the identified picture;
step S5: creating an OFD file, and converting the identified characters, lines and other information into the OFD file in a one-to-one correspondence with original molds in a picture mode;
step S6: after the conversion is completed, a transparent information layer is covered on each character, line and graphic information on the OFD file.
Preferably, the text detection includes the steps of:
step 1: acquiring a part with characters in the picture, and detecting the characters in the picture by using a character detection algorithm, wherein the character detection algorithm comprises, but is not limited to, CTPN, DBNet, faster-RCNN, yoLo and other character detection algorithms;
step 2: returning the detection result of the position of the current character and the height of the character after the detection is finished;
step 3: recognizing text content on the picture using text recognition algorithms including, but not limited to, CRNN, 2D-CTC, SVTR, SVR, etc. algorithms available for text recognition;
step 4: and calculating the current character size according to the pixel size proportion of the character height in the paper.
Preferably, the detecting of the line element includes the steps of:
step 1: the line position in the picture is subjected to segmentation detection through an image segmentation algorithm, wherein the image segmentation algorithm comprises, but is not limited to, algorithms such as Unet, mask-RCNN, segNet, FCN and the like which can be used for image segmentation;
step 2: and returning the position information of the current line after the detection is completed.
Preferably, the pattern recognition in the picture comprises the following steps:
step 1: intercepting a part with a graph in the picture, and detecting the graph element in the picture by using a target detection algorithm, wherein the target detection algorithm comprises but is not limited to target detection algorithms such as fast-RCNN, yolo and the like;
step 2: and returning the position information of the current graph after the detection is completed.
Preferably, the establishment of the information base includes the following steps:
step 1: creating a picture content information base;
step 2: the recognized text, line and graphic information are respectively classified and imported into an information base, the graphic content is classified and identified and then imported into the information base, and the information base stores various contents respectively.
Preferably, the picture information conversion includes the steps of:
step 1: identifying the position and the size of the picture;
step 2: creating a normalized OFD blank page according to the identified picture size;
step 3: importing the corresponding pictures in the established information base according to the OFD file format, and converting the text, the line and the picture information into blank OFD files in a one-to-one correspondence manner.
Preferably, the covering the transparent information layer on the OFD file includes the following steps:
step 1: when a picture is imported, generating a file according to an OFD file standard format from an information base of the identification result and setting the transparency of a picture layer;
step 2: covering transparent characters and characters of lines on the picture according to the identification result, wherein the character size of the covered characters is the same as the character size of the fonts on the picture;
step 3: overlaying a typeface of the specific content of the transparent graphic on the identified graphic;
preferably, the picture searching and positioning is characterized in that: and inputting a keyword in a search box of the OFD file, clicking and searching to search the whole text after the keyword is input, and automatically jumping and positioning to a specific position of a picture in the OFD file after the search is finished.
Preferably, the method for converting the picture is also suitable for a format data stream file and a PDF file, and the overlay transparent information layer is suitable for the format data stream file.
Compared with the prior art, the invention has the beneficial effects that:
(1) The invention can process the paper file into the electronic file, and provides a solid foundation for the utilization of the paper file after being identified as the electronic file;
(2) In the process of processing the paper file into the electronic file, the content is extracted, so that the digital utilization is facilitated, and the original form is high-fidelity, so that the state before the file processing is stored;
(3) According to the method, the picture is converted into the OFD file, and transparent content identification characters are covered on the characters, lines, graphics and other elements of the picture, so that the position of the characters in the picture can be automatically positioned when the content in the picture is searched in the OFD reader, and if the searched content is the graphic content in the picture, the position of the graphics can be positioned, and in this way, the file can be quickly found and positioned to the specific position of the file through various tools when in use.
Drawings
FIG. 1 is a schematic flow chart of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
When the paper file is converted into the OFD file, the following method is adopted when the file content is characters, and the method comprises the following steps:
step S1: scanning a paper file as a picture, reading the picture to be processed, obtaining a part with characters in the picture, and detecting the characters in the picture by using a character detection algorithm, wherein the character detection algorithm comprises, but is not limited to, CTPN, DBNet, faster-RCNN, yoLo and other character detection algorithms and target detection algorithms;
step S2: returning the detection result of the position of the current character and the height of the character after the detection is finished;
step S3: recognizing text content on the picture using text recognition algorithms including, but not limited to, CRNN, 2D-CTC, SVTR, SVR, etc. algorithms available for text recognition;
step S4: calculating the current character size according to the pixel size proportion of the character height in the paper;
step S5: creating a picture content information base, respectively importing the identified text information into the information base, identifying the text content, and then importing the text content into the information base, wherein the information base stores various contents respectively;
step S6: identifying the position and the size of a picture, and creating a normalized OFD blank page according to the identified size of the picture;
step S7: importing the corresponding pictures in the established information base according to an OFD file format, and converting the identified text information into OFD files in a one-to-one correspondence of original models in a picture mode;
step S8: when a picture is imported, generating a file according to an OFD file standard format from an information base of the identification result and setting the transparency of a picture layer;
step S9: covering transparent characters on the picture according to the identification result, wherein the character size of the covered characters is the same as the character size of the fonts on the picture;
example two
When the paper file is converted into the OFD file, the following method is adopted when the file content is a line, and the method comprises the following steps:
step S1: scanning a paper file as a picture, reading the picture to be processed, and performing segmentation detection on the line position in the picture through an image segmentation algorithm, wherein the image segmentation algorithm comprises, but is not limited to, algorithms such as Unet, mask-RCNN, segNet, FCN and the like which can be used for image segmentation;
step S2: returning the position information of the current line after the detection is completed;
step S3: creating a picture content information base, importing the identified line information into the information base, classifying and identifying the picture content, importing the image content into the information base, and storing the identified content by the information base;
step S4: identifying the position and the size of a picture, and creating a normalized OFD blank page according to the identified size of the picture;
step S5: importing the corresponding pictures in the established information base according to an OFD file format, and converting the identified line information into OFD files in a one-to-one correspondence of original modes in a picture mode;
step S6: when a picture is imported, generating a file according to an OFD file standard format from an information base of the identification result and setting the transparency of a picture layer;
step S7: covering transparent characters of lines on the picture according to the identification result;
example III
When the paper file is converted into the OFD file, the following method is adopted when the file content is a graph, and the method comprises the following steps:
step S1: scanning a paper file as a picture, reading the picture to be processed, intercepting a part with a graph in the picture, and detecting the graph element in the picture by using a target detection algorithm, wherein the target detection algorithm comprises but is not limited to target detection algorithms such as fast-RCNN, yolo and the like;
step S2: returning the position information of the current graph after the detection is completed;
step S3: creating a picture content information base, importing the identified picture information into the information base, classifying and identifying the picture content, importing the picture content into the information base, and respectively storing various contents by the information base;
step S4: identifying the position and the size of a picture, and creating a normalized OFD blank page according to the identified size of the picture;
step S5: importing the corresponding pictures in the established information base according to an OFD file format, and converting the identified graphic information into an OFD file in a one-to-one correspondence of original models in a picture mode;
step S6: when a picture is imported, generating a file according to an OFD file standard format from an information base of the identification result and setting the transparency of a picture layer;
step S7: covering transparent graphic content word on the picture according to the identification result;
when the converted picture content is searched and text information is required to be searched, inputting text keywords required to be searched in a search box of the OFD file, clicking for searching after the keywords are input, searching for the whole text, and automatically jumping to be positioned at the text specific position of the picture in the OFD file after the searching is finished.
When the converted picture content is searched, inputting the line information to be searched in a search box of the OFD file, clicking the search after the keyword is input, searching the whole text, and automatically jumping to be positioned at the specific line position of the picture in the OFD file after the search is finished.
When the converted picture content is searched and the graphic information in the picture is required to be searched, inputting the content information of the graphic required to be searched in a search box of the OFD file, clicking the search after the keyword is input, searching the whole text, and automatically jumping and positioning to the specific position of the graphic of the picture in the OFD file after the search is finished.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Claims (9)
1. The technology for identifying the content of the paper file and converting the content into the OFD file with high fidelity is characterized by comprising the following steps:
step S1: scanning the paper file as a picture, reading the picture to be processed, and detecting and identifying the content, the word size and the position of the words in the picture;
step S2: identifying line elements in the picture;
step S3: identifying the position and the content of the graphic elements in the picture;
step S4: establishing an information base of identification results of the text, the line and the graphic information in the identified picture;
step S5: creating an OFD file, and converting the identified characters, lines and other information into the OFD file in a one-to-one correspondence with original molds in a picture mode;
step S6: after the conversion is completed, a transparent information layer is covered on each character, line and graphic information on the OFD file.
2. The technique for identifying and converting paper document content into OFD documents with high fidelity according to claim 1, wherein: the text detection comprises the following steps:
step 1: acquiring a part with characters in the picture, and detecting the characters in the picture by using a character detection algorithm, wherein the character detection algorithm comprises, but is not limited to, CTPN, DBNet, faster-RCNN, yoLo and other character detection algorithms;
step 2: returning the detection result of the position of the current character and the height of the character after the detection is finished;
step 3: recognizing text content on the picture using text recognition algorithms including, but not limited to, CRNN, 2D-CTC, SVTR, SVR, etc. algorithms available for text recognition;
step 4: and calculating the current character size according to the pixel size proportion of the character height in the paper.
3. The technique for identifying and converting paper document content into OFD documents with high fidelity according to claim 1, wherein: the detection of the line element comprises the following steps:
step 1: the line position in the picture is subjected to segmentation detection through an image segmentation algorithm, wherein the image segmentation algorithm comprises, but is not limited to, algorithms such as Unet, mask-RCNN, segNet, FCN and the like which can be used for image segmentation;
step 2: and returning the position information of the current line after the detection is completed.
4. The technique for identifying and converting paper document content into OFD documents with high fidelity according to claim 1, wherein: the pattern recognition in the picture comprises the following steps:
step 1: intercepting a part with a graph in the picture, and detecting the graph element in the picture by using a target detection algorithm, wherein the target detection algorithm comprises but is not limited to target detection algorithms such as fast-RCNN, yolo and the like;
step 2: and returning the position information of the current graph after the detection is completed.
5. The technique for identifying and converting paper document content into OFD documents with high fidelity according to claim 1, wherein: the establishment of the information base comprises the following steps:
step 1: creating a picture content information base;
step 2: the recognized text, line and graphic information are respectively classified and imported into an information base, the graphic content is classified and identified and then imported into the information base, and the information base stores various contents respectively.
6. The technique for identifying and converting paper document content into OFD documents with high fidelity according to claim 1, wherein: the picture information conversion includes the steps of:
step 1: identifying the position and the size of the picture;
step 2: creating a normalized OFD blank page according to the identified picture size;
step 3: importing the corresponding pictures in the established information base according to the OFD file format, and converting the text, the line and the picture information into blank OFD files in a one-to-one correspondence manner.
7. The technique for identifying and converting paper document content into OFD documents with high fidelity according to claim 1, wherein: the covering of the transparent information layer on the OFD file comprises the following steps:
step 1: when a picture is imported, generating a file according to an OFD file standard format from an information base of the identification result and setting the transparency of a picture layer;
step 2: covering transparent characters and characters of lines on the picture according to the identification result, wherein the character size of the covered characters is the same as the character size of the fonts on the picture;
step 3: and overlaying a typeface of the specific content of the transparent graph on the identified graph.
8. The technology for identifying the content of the paper file and converting the content into the OFD file with high fidelity is characterized in that: and inputting a keyword in a search box of the OFD file, clicking and searching to search the whole text after the keyword is input, and automatically jumping and positioning to a specific position of a picture in the OFD file after the search is finished.
9. The technique for identifying and converting paper document content into OFD documents with high fidelity according to claim 1, wherein: the method for converting the picture is also suitable for the format data stream file and the PDF file, and the covering transparent information layer is suitable for the format data stream file.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310996818.2A CN116704540A (en) | 2023-08-09 | 2023-08-09 | Technology for marking paper file content and converting paper file content into OFD file with high fidelity |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310996818.2A CN116704540A (en) | 2023-08-09 | 2023-08-09 | Technology for marking paper file content and converting paper file content into OFD file with high fidelity |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116704540A true CN116704540A (en) | 2023-09-05 |
Family
ID=87836113
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310996818.2A Pending CN116704540A (en) | 2023-08-09 | 2023-08-09 | Technology for marking paper file content and converting paper file content into OFD file with high fidelity |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116704540A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108415887A (en) * | 2018-02-09 | 2018-08-17 | 武汉大学 | A kind of method that pdf document is converted to OFD files |
CN109829139A (en) * | 2019-01-30 | 2019-05-31 | 中国软件与技术服务股份有限公司 | The method and apparatus that a kind of stream-oriented file of DOC/DOCX format is converted into the layout files of OFD format |
CN111898433A (en) * | 2020-06-22 | 2020-11-06 | 百望股份有限公司 | Paper bill digitization method and device |
CN114463758A (en) * | 2022-01-28 | 2022-05-10 | 南京云档信息科技有限公司 | OCR double-layer file generation method capable of retaining native content |
-
2023
- 2023-08-09 CN CN202310996818.2A patent/CN116704540A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108415887A (en) * | 2018-02-09 | 2018-08-17 | 武汉大学 | A kind of method that pdf document is converted to OFD files |
CN109829139A (en) * | 2019-01-30 | 2019-05-31 | 中国软件与技术服务股份有限公司 | The method and apparatus that a kind of stream-oriented file of DOC/DOCX format is converted into the layout files of OFD format |
CN111898433A (en) * | 2020-06-22 | 2020-11-06 | 百望股份有限公司 | Paper bill digitization method and device |
CN114463758A (en) * | 2022-01-28 | 2022-05-10 | 南京云档信息科技有限公司 | OCR double-layer file generation method capable of retaining native content |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10353997B1 (en) | Freeform annotation transcription | |
US20040139391A1 (en) | Integration of handwritten annotations into an electronic original | |
US9916499B2 (en) | Method and system for linking printed objects with electronic content | |
KR101552525B1 (en) | A system for recognizing a font and providing its information and the method thereof | |
Isheawy et al. | Optical character recognition (OCR) system | |
CN113221711A (en) | Information extraction method and device | |
Singla et al. | Optical character recognition based speech synthesis system using LabVIEW | |
CN113901952A (en) | Print form and handwritten form separated character recognition method based on deep learning | |
CN115828874A (en) | Industry table digital processing method based on image recognition technology | |
CN109685061A (en) | The recognition methods of mathematical formulae suitable for structuring | |
CN111859885A (en) | Automatic generation method and system for legal decision book | |
CN112464907A (en) | Document processing system and method | |
CN115661846A (en) | Data processing method and device, electronic equipment and storage medium | |
CN114579796B (en) | Machine reading understanding method and device | |
CN110929479A (en) | Method and device for converting PDF scanning piece, electronic equipment and storage medium | |
CN115909449A (en) | File processing method, file processing device, electronic equipment, storage medium and program product | |
JP2013152564A (en) | Document processor and document processing method | |
CN116704540A (en) | Technology for marking paper file content and converting paper file content into OFD file with high fidelity | |
CN111241955B (en) | Bill information extraction method and system | |
CN109409359A (en) | A kind of method for extracting video captions based on deep learning | |
CN115203474A (en) | Automatic database classification and extraction technology | |
Baloun et al. | ChronSeg: Novel Dataset for Segmentation of Handwritten Historical Chronicles. | |
US20220237397A1 (en) | Identifying handwritten signatures in digital images using ocr residues | |
JP4031189B2 (en) | Document recognition apparatus and document recognition method | |
US9483694B2 (en) | Image text search and retrieval system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20230905 |