CN110705515A - Hospital paper archive filing method and system based on OCR character recognition - Google Patents

Hospital paper archive filing method and system based on OCR character recognition Download PDF

Info

Publication number
CN110705515A
CN110705515A CN201910992909.2A CN201910992909A CN110705515A CN 110705515 A CN110705515 A CN 110705515A CN 201910992909 A CN201910992909 A CN 201910992909A CN 110705515 A CN110705515 A CN 110705515A
Authority
CN
China
Prior art keywords
file
paper
archive
hospital
character recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910992909.2A
Other languages
Chinese (zh)
Inventor
罗述岭
吴玉雁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Health And Medical Big Data Co Ltd
Original Assignee
Shandong Health And Medical Big Data Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Health And Medical Big Data Co Ltd filed Critical Shandong Health And Medical Big Data Co Ltd
Priority to CN201910992909.2A priority Critical patent/CN110705515A/en
Publication of CN110705515A publication Critical patent/CN110705515A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/93Document management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data

Abstract

The invention discloses a hospital paper archive filing method and system based on OCR character recognition, and belongs to the technical field of pattern recognition. The invention relates to a hospital paper archive filing method based on OCR character recognition, which comprises the steps of establishing a keyword dictionary base, scanning a paper archive, positioning and dividing characters of a scanned part, recognizing keywords in the archive, recognizing the type of the archive according to the head title of the archive, distinguishing the type of the archive, further recognizing predefined important fields, recognizing data and inputting the data into the database. The hospital paper archive filing method based on OCR character recognition can save time and labor cost, realizes hospital paper archive entry into a database in an efficient and excellent mode, and has good popularization and application values.

Description

Hospital paper archive filing method and system based on OCR character recognition
Technical Field
The invention relates to the technical field of pattern recognition, and particularly provides a hospital paper archive filing method and system based on OCR character recognition.
Background
With the continuous maturity of deep learning technology, the accuracy of character recognition also reaches the standards of industrial application and commercial application, and at present, the character recognition is widely applied to the scenes of license plate number recognition, identity card recognition, express address recognition and the like, so that the efficiency of the activities is greatly improved, and more applications select character recognition to reduce the pressure of workers and save the cost.
Although the informatization level of hospitals is greatly improved at present, the establishment of electronic files is particularly carried out. With archives information datamation, not only convenient the saving reads more conveniently moreover, but some archives information need be followed paper archives and type into the database, and whole typing process still is in the manual work at present and types into the stage.
As is well known, in the practical application, the types of hospital archives are from daily nursing information to surgical information, and are various, the paper archives generated in a hospital within one day are a very large number, the archives information independently input into each post is also a very time-consuming work, and when the paper archives are subjected to datamation filing, manual input is mainly used, so that the labor cost is high, and the efficiency is low. In order to save time and labor cost, it is necessary for hospitals to introduce means for entering paper files by character recognition.
Disclosure of Invention
The technical task of the invention is to provide a hospital paper archive filing method based on OCR character recognition, which can save time and labor cost and realize the entry of the hospital paper archive into a database in an efficient and excellent mode, aiming at the problems.
The invention further provides a hospital paper archive filing system based on OCR character recognition.
In order to achieve the purpose, the invention provides the following technical scheme:
a hospital paper archive filing method based on OCR character recognition is characterized by establishing a keyword dictionary base, scanning paper archives, positioning and segmenting characters of scanned files, recognizing keywords in the archives, recognizing archive types according to head titles of the archives, distinguishing archive types, further recognizing predefined important fields, recognizing data and inputting the data into the database.
The hospital paper archive filing method based on OCR character recognition saves time and labor cost for the hospital paper archive filing process, and can greatly improve the efficiency and accuracy of paper archive filing.
Preferably, the hospital paper archive filing method based on OCR character recognition specifically comprises the following steps:
s1, establishing a keyword dictionary base to establish a file information specification;
s2, scanning the paper file, recording the paper file into a storage system, and preprocessing the scanned file;
s3, positioning and dividing characters, and cutting the scanned file of the hospital file into single character pictures by utilizing horizontal projection and vertical projection;
s4, identifying key fields and checking the content;
and S5, recording the acquired content into a database for storage.
Preferably, the profile information specification includes an indication of whether an archive header name, a key field name, and a check box for the classification of the archive are checked, where check is represented by √ as a.
Preferably, when the paper archive is scanned, the paper archive is recorded into a computer system, the format of the scanned file is jpg, and the scanned file is preprocessed to remove noise and correct the direction of the text.
Preferably, when characters are positioned and divided, file header names are separately divided for files of table types by horizontal projection and vertical projection, a table part is divided into cells, and then the single character pictures are divided by dividing cell texts by horizontal projection and vertical projection; for a text-type archive, the scanned file is cut into single-character pictures by using horizontal projection and vertical projection.
Preferably, the key field identification and content verification comprises the following processes:
1) training a character recognition model through a deep learning convolutional neural network;
2) inputting the cut single-character picture into a model for recognition;
3) for the form type file, performing lexical analysis on the identified content, comparing the lexical analysis with keywords in a dictionary, associating texts in corresponding cells with the keywords, using map by default, checking the content with a fixed format, and rejecting the file if the content is not standard;
4) and for the text type file, performing lexical analysis on the identified content, comparing the lexical analysis with keywords in a dictionary, associating the content behind the lexical analysis with the keywords, taking a fixed format or distinguishing through fonts for content information, checking the content with the fixed format, and rejecting the file if the content is not standardized.
The utility model provides a hospital paper archives system of filing based on OCR characters discernment, includes keyword field establishment module, paper archives scanning module, characters location and cut apart module, keyword field discernment and content verification module and data storage module:
the keyword field establishing module is used for establishing a keyword dictionary library to establish file information specifications;
the paper archive scanning module is responsible for recording paper archives into the storage system and preprocessing the scanned files;
the character positioning and dividing module is used for cutting the scanned file of the hospital file into single character pictures by utilizing horizontal projection and vertical projection;
the key field identification and content verification module is used for identifying the key field and verifying the content;
the data storage module is used for recording the acquired content into a database for storage.
Preferably, the keyword field creation module creates a profile information specification including an indication of whether an archive file header name, a keyword field name, and a check box for the classification of the archive file are checked, wherein check is indicated by √ g.
Preferably, the paper archive scanning module records the paper archive into the computer system when the paper archive is scanned, the scanning file format is jpg, and the scanning file is preprocessed to remove noise and correct the text direction.
Preferably, when the character positioning and dividing module is used for positioning and dividing characters, file header names are separately divided for files of table types by using horizontal projection and vertical projection, a table part is divided into cells, and then the cell texts are divided by using the horizontal projection and the vertical projection to divide single character pictures; for a text-type archive, the scanned file is cut into single-character pictures by using horizontal projection and vertical projection.
Compared with the prior art, the hospital paper archive filing method based on OCR character recognition has the following outstanding beneficial effects: the hospital paper archive filing method based on OCR character recognition saves time and labor cost for the hospital paper archive filing process, can greatly improve the efficiency and accuracy of paper archive filing, and has good popularization and application values.
Detailed Description
The method and system for archiving paper archives of hospital based on OCR character recognition of the present invention will be further described in detail with reference to the following embodiments.
Examples
The invention relates to a hospital paper archive filing method based on OCR character recognition, which comprises the steps of establishing a keyword dictionary base, scanning a paper archive, positioning and dividing characters of a scanned part, recognizing keywords in the archive, recognizing the type of the archive according to the head title of the archive, distinguishing the type of the archive, further recognizing predefined important fields, recognizing data and inputting the data into the database.
The hospital paper archive filing method based on OCR character recognition specifically comprises the following steps:
and S1, establishing a keyword dictionary base to establish the file information specification.
The build archive information specification includes an indication of whether the archive file header name, the key field name, and the check box for the archive file classification are checked, where checked is represented by √.
And S2, scanning the paper file, recording the paper file into a storage system, and preprocessing the scanned file.
When the paper archive is scanned, the paper archive is recorded into a computer system, the scanning file format is jpg, the scanning file is preprocessed, noise is removed, and the text direction is corrected.
And S3, positioning and dividing characters, and cutting the scanned file of the hospital file into single character pictures by utilizing horizontal projection and vertical projection.
When characters are positioned and divided, file header names of files in a form type are separately divided by utilizing horizontal projection and vertical projection, a form part is divided into cells, and then a cell text is divided by utilizing the horizontal projection and the vertical projection to divide a single character picture; for a text-type archive, the scanned file is cut into single-character pictures by using horizontal projection and vertical projection.
And S4, verifying key field identification and content.
The method comprises the following steps:
1) training a character recognition model through a deep learning convolutional neural network;
2) inputting the cut single-character picture into a model for recognition;
3) for the form type file, performing lexical analysis on the identified content, comparing the lexical analysis with keywords in a dictionary, associating texts in corresponding cells with the keywords, using map by default, checking the content with a fixed format, and rejecting the file if the content is not standard;
4) and for the text type file, performing lexical analysis on the identified content, comparing the lexical analysis with keywords in a dictionary, associating the content behind the lexical analysis with the keywords, taking a fixed format or distinguishing through fonts for content information, checking the content with the fixed format, and rejecting the file if the content is not standardized.
And S5, recording the acquired content into a database for storage.
The invention discloses a hospital paper archive filing system based on OCR character recognition, which comprises a keyword field establishing module, a paper archive scanning module, a character positioning and dividing module, a keyword field recognition and content verification module and a data storage module.
The keyword field establishing module is used for establishing a keyword dictionary base to establish file information specifications.
The key field creation module creates a profile information specification including a profile header name for the profile classification, a key field name, and an indication of whether a check box is checked, where check is indicated by √ as a check.
And the paper archive scanning module is responsible for recording the paper archive into the storage system and preprocessing the scanned file.
When the paper archives are scanned, the paper archives are recorded into a computer system by the paper archives scanning module, the scanning file format is jpg, the scanning file is preprocessed, and noise and text direction correction are removed.
And the character positioning and dividing module is responsible for cutting the scanned file of the hospital archive into single character pictures by utilizing horizontal projection and vertical projection.
When the character positioning and dividing module is used for positioning and dividing characters, file header names are separately divided for files of table types by using horizontal projection and vertical projection, a table part is divided into cells, and then the cell texts are divided by using the horizontal projection and the vertical projection to divide single character pictures; for a text-type archive, the scanned file is cut into single-character pictures by using horizontal projection and vertical projection.
And the key field identification and content verification module is used for identifying the key field and verifying the content.
The data storage module is used for recording the acquired content into a database for storage.
The above-described embodiments are merely preferred embodiments of the present invention, and general changes and substitutions by those skilled in the art within the technical scope of the present invention are included in the protection scope of the present invention.

Claims (10)

1. A hospital paper archive filing method based on OCR character recognition is characterized in that: establishing a keyword dictionary base, scanning paper archives, positioning and segmenting characters of scanned files, identifying keywords in the archives, identifying the types of the archives according to head titles of the archives, distinguishing the types of the archives, further identifying predefined important fields, identifying data and inputting the data into the database.
2. The hospital paper archive filing method based on OCR character recognition as recited in claim 1, wherein: the method specifically comprises the following steps:
s1, establishing a keyword dictionary base to establish a file information specification;
s2, scanning the paper file, recording the paper file into a storage system, and preprocessing the scanned file;
s3, positioning and dividing characters, and cutting the scanned file of the hospital file into single character pictures by utilizing horizontal projection and vertical projection;
s4, identifying key fields and checking the content;
and S5, recording the acquired content into a database for storage.
3. The hospital paper archive filing method based on OCR character recognition as recited in claim 2, wherein: the create profile information specification includes an indication of whether a profile header name, a key field name, and a check box for a profile category are checked.
4. The hospital paper archive filing method based on OCR character recognition as recited in claim 3, wherein: when the paper archive is scanned, the paper archive is recorded into a computer system, the scanning file format is jpg, the scanning file is preprocessed, noise is removed, and the text direction is corrected.
5. The hospital paper archive filing method based on OCR character recognition according to claim 4, characterized in that: when characters are positioned and divided, file header names of files in a form type are separately divided by utilizing horizontal projection and vertical projection, a form part is divided into cells, and then a cell text is divided by utilizing the horizontal projection and the vertical projection to divide a single character picture; for a text-type archive, the scanned file is cut into single-character pictures by using horizontal projection and vertical projection.
6. The hospital paper archive filing method based on OCR character recognition as recited in claim 5, wherein: the key field identification and content verification comprises the following processes:
1) training a character recognition model through a deep learning convolutional neural network;
2) inputting the cut single-character picture into a model for recognition;
3) for the form type file, performing lexical analysis on the identified content, comparing the lexical analysis with keywords in a dictionary, and associating texts in corresponding cells with the keywords;
4) and for the text type file, performing lexical analysis on the identified content, comparing the lexical analysis with the keywords in the dictionary, associating the content behind the lexical analysis with the keywords, and distinguishing content information in a fixed format or by fonts.
7. The utility model provides a hospital paper archives system of filing based on OCR characters discernment which characterized in that: the system comprises a keyword field establishing module, a paper archive scanning module, a character positioning and segmenting module, a keyword field identification and content verification module and a data storage module:
the keyword field establishing module is used for establishing a keyword dictionary library to establish file information specifications;
the paper archive scanning module is responsible for recording paper archives into the storage system and preprocessing the scanned files;
the character positioning and dividing module is used for cutting the scanned file of the hospital file into single character pictures by utilizing horizontal projection and vertical projection;
the key field identification and content verification module is used for identifying the key field and verifying the content;
the data storage module is used for recording the acquired content into a database for storage.
8. An OCR character recognition based hospital paper archive filing system according to claim 7, characterized in that: the keyword field establishing module establishes a file information specification which comprises a file head name used for file classification, a keyword field name and an indication whether a check box is selected or not.
9. An OCR character recognition based hospital paper archive filing system according to claim 8, characterized in that: when the paper archives are scanned, the paper archives are recorded into a computer system by the paper archives scanning module, the scanning file format is jpg, the scanning file is preprocessed, and noise and text direction correction are removed.
10. An OCR character recognition based hospital paper archive filing system according to claim 9, characterized in that: when the character positioning and dividing module is used for positioning and dividing characters, file header names are separately divided for files of table types by using horizontal projection and vertical projection, a table part is divided into cells, and then the cell texts are divided by using the horizontal projection and the vertical projection to divide single character pictures; for a text-type archive, the scanned file is cut into single-character pictures by using horizontal projection and vertical projection.
CN201910992909.2A 2019-10-18 2019-10-18 Hospital paper archive filing method and system based on OCR character recognition Pending CN110705515A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910992909.2A CN110705515A (en) 2019-10-18 2019-10-18 Hospital paper archive filing method and system based on OCR character recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910992909.2A CN110705515A (en) 2019-10-18 2019-10-18 Hospital paper archive filing method and system based on OCR character recognition

Publications (1)

Publication Number Publication Date
CN110705515A true CN110705515A (en) 2020-01-17

Family

ID=69201569

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910992909.2A Pending CN110705515A (en) 2019-10-18 2019-10-18 Hospital paper archive filing method and system based on OCR character recognition

Country Status (1)

Country Link
CN (1) CN110705515A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111405138A (en) * 2020-03-15 2020-07-10 上海派申网络技术有限公司 Cross-border electric commercial file scanning automatic filing equipment
CN111860524A (en) * 2020-07-28 2020-10-30 上海兑观信息科技技术有限公司 Intelligent classification device and method for digital files
CN112052749A (en) * 2020-08-20 2020-12-08 中国建设银行股份有限公司 Archive filing method and device, electronic equipment and computer readable storage medium
CN112686262A (en) * 2020-12-28 2021-04-20 广州博士信息技术研究院有限公司 Method for extracting structured data and rapidly archiving handbooks based on image recognition technology
CN112766266A (en) * 2021-01-29 2021-05-07 云从科技集团股份有限公司 Text direction correction method, system and device based on staged probability statistics
CN112836073A (en) * 2021-02-02 2021-05-25 嘉应学院 Historical literature digitization method, system, device and storage medium
CN113342883A (en) * 2021-05-25 2021-09-03 国网上海市电力公司 Power equipment detection data structuring method, device, medium and equipment
CN115019326A (en) * 2022-08-02 2022-09-06 北京杭升科技有限公司 Archive recording system, method, device and storage medium
CN115101186A (en) * 2022-07-25 2022-09-23 武汉大学人民医院(湖北省人民医院) Hospital treatment information management method and device based on big data
CN115794496A (en) * 2023-02-07 2023-03-14 中信天津金融科技服务有限公司 Archive storage method and system based on information extraction

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120041786A1 (en) * 2009-04-29 2012-02-16 Onemednet Corporation Methods, systems, and devices for managing medical images and records
CN104408678A (en) * 2014-10-31 2015-03-11 中国科学院苏州生物医学工程技术研究所 Electronic medical record system for personal use
CN104715436A (en) * 2013-12-13 2015-06-17 北京美智医疗科技有限公司 Medical information collecting and filing method and system
US20180011974A1 (en) * 2010-09-01 2018-01-11 Apixio, Inc. Systems and methods for improved optical character recognition of health records
CN108121966A (en) * 2017-12-21 2018-06-05 欧浦智网股份有限公司 A kind of list method for automatically inputting, electronic equipment and storage medium based on OCR technique
CN108805076A (en) * 2018-06-07 2018-11-13 浙江大学 The extracting method and system of environmental impact assessment report table word
CN109658062A (en) * 2018-12-13 2019-04-19 广州华资软件技术有限公司 A kind of electronic record intelligent processing method based on deep learning
CN110263740A (en) * 2019-06-26 2019-09-20 四川新网银行股份有限公司 Different type block letter document dubbing method based on OCR technique

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120041786A1 (en) * 2009-04-29 2012-02-16 Onemednet Corporation Methods, systems, and devices for managing medical images and records
US20180011974A1 (en) * 2010-09-01 2018-01-11 Apixio, Inc. Systems and methods for improved optical character recognition of health records
CN104715436A (en) * 2013-12-13 2015-06-17 北京美智医疗科技有限公司 Medical information collecting and filing method and system
CN104408678A (en) * 2014-10-31 2015-03-11 中国科学院苏州生物医学工程技术研究所 Electronic medical record system for personal use
CN108121966A (en) * 2017-12-21 2018-06-05 欧浦智网股份有限公司 A kind of list method for automatically inputting, electronic equipment and storage medium based on OCR technique
CN108805076A (en) * 2018-06-07 2018-11-13 浙江大学 The extracting method and system of environmental impact assessment report table word
CN109658062A (en) * 2018-12-13 2019-04-19 广州华资软件技术有限公司 A kind of electronic record intelligent processing method based on deep learning
CN110263740A (en) * 2019-06-26 2019-09-20 四川新网银行股份有限公司 Different type block letter document dubbing method based on OCR technique

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李鹏等: "无纸化病案归档系统的研究与应用", 《中国数字医学》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111405138B (en) * 2020-03-15 2022-02-01 上海派申网络技术有限公司 Cross-border electric commercial file scanning automatic filing equipment
CN111405138A (en) * 2020-03-15 2020-07-10 上海派申网络技术有限公司 Cross-border electric commercial file scanning automatic filing equipment
CN111860524A (en) * 2020-07-28 2020-10-30 上海兑观信息科技技术有限公司 Intelligent classification device and method for digital files
CN112052749A (en) * 2020-08-20 2020-12-08 中国建设银行股份有限公司 Archive filing method and device, electronic equipment and computer readable storage medium
CN112686262A (en) * 2020-12-28 2021-04-20 广州博士信息技术研究院有限公司 Method for extracting structured data and rapidly archiving handbooks based on image recognition technology
CN112766266A (en) * 2021-01-29 2021-05-07 云从科技集团股份有限公司 Text direction correction method, system and device based on staged probability statistics
CN112836073A (en) * 2021-02-02 2021-05-25 嘉应学院 Historical literature digitization method, system, device and storage medium
CN113342883A (en) * 2021-05-25 2021-09-03 国网上海市电力公司 Power equipment detection data structuring method, device, medium and equipment
CN115101186A (en) * 2022-07-25 2022-09-23 武汉大学人民医院(湖北省人民医院) Hospital treatment information management method and device based on big data
CN115101186B (en) * 2022-07-25 2022-11-08 武汉大学人民医院(湖北省人民医院) Hospital treatment information management method and device based on big data
CN115019326A (en) * 2022-08-02 2022-09-06 北京杭升科技有限公司 Archive recording system, method, device and storage medium
CN115019326B (en) * 2022-08-02 2023-08-22 北京杭升科技有限公司 File entry system, method, device and storage medium
CN115794496A (en) * 2023-02-07 2023-03-14 中信天津金融科技服务有限公司 Archive storage method and system based on information extraction

Similar Documents

Publication Publication Date Title
CN110705515A (en) Hospital paper archive filing method and system based on OCR character recognition
Kleber et al. Cvl-database: An off-line database for writer retrieval, writer identification and word spotting
US9633257B2 (en) Method and system of pre-analysis and automated classification of documents
US20160055376A1 (en) Method and system for identification and extraction of data from structured documents
US8452132B2 (en) Automatic file name generation in OCR systems
US9141853B1 (en) System and method for extracting information from documents
US20110188759A1 (en) Method and System of Pre-Analysis and Automated Classification of Documents
US20070033118A1 (en) Document Scanning and Data Derivation Architecture.
US8208737B1 (en) Methods and systems for identifying captions in media material
CN106846961B (en) Electronic test paper processing method and device
CN112052749A (en) Archive filing method and device, electronic equipment and computer readable storage medium
CN110543475A (en) financial statement data automatic identification and analysis method based on machine learning
KR101019627B1 (en) System and Method for Construction Automatic Bibliography based Pattern, and Recording Medium therefor
CN101833545A (en) Method for indexing data in digital recourse processing process
Couasnon et al. Making handwritten archives documents accessible to public with a generic system of document image analysis
TWI396990B (en) Citation record extraction system and method, and program product
CN111091003A (en) Parallel extraction method based on knowledge graph query
CN115774805A (en) File intelligent query method and system based on digital processing
CN113935296A (en) Method for extracting paper bank flow information by using sliding template technology
CN114495138A (en) Intelligent document identification and feature extraction method, device platform and storage medium
JP4347675B2 (en) Form OCR program, method and apparatus
CN113269101A (en) Bill identification method, device and equipment
CN102207947A (en) Direct speech material library generation method
CN105808783B (en) A kind of large file difference analysis method of difference Domain Name Form registering sites
CN115640952B (en) Method and system for importing and uploading data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200117