KR20180033786A - System of searching and providing original document image file and method thereof - Google Patents
System of searching and providing original document image file and method thereof Download PDFInfo
- Publication number
- KR20180033786A KR20180033786A KR1020160123177A KR20160123177A KR20180033786A KR 20180033786 A KR20180033786 A KR 20180033786A KR 1020160123177 A KR1020160123177 A KR 1020160123177A KR 20160123177 A KR20160123177 A KR 20160123177A KR 20180033786 A KR20180033786 A KR 20180033786A
- Authority
- KR
- South Korea
- Prior art keywords
- original document
- image file
- module
- xml
- document image
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/178—Techniques for file synchronisation in file systems
- G06F16/1794—Details of file format conversion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/80—Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
- G06F16/84—Mapping; Conversion
- G06F16/86—Mapping to a database
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Processing Or Creating Images (AREA)
Abstract
A system and method for searching and providing an original document image file are disclosed. Original document image file database where the original document image file is pre-stored; OCR recognition / metadata extraction for extracting metadata indicating the format of an original document image file by performing an OCR (optical character reader) recognition by scanning an original document image file stored in the original document image file database module; An XML conversion module for converting the original document data of the original document image file in which the OCR recognition is performed by the OCR recognition / metadata extraction module into an XML document using the extracted metadata; An XML document file database in which the XML document converted by the XML conversion module is stored; An XML document retrieving module for retrieving original document data requested to be retrieved by a user terminal in an XML document stored in the XML document file database; An original document image file corresponding to the original document data retrieved from the XML document retrieval module is retrieved from the original document image file database and is provided to the user terminal.
Description
The present invention relates to a system and method for searching and providing a file, and more particularly, to a system and method for searching and providing an original document image file.
In a government office, a securities company, or a bank, all documents such as contract documents, personal information documents, and application documents are stored. These documents are scanned by a scanner and stored as image files . In addition to written documents, fax documents and e-mails are kept as scanned image files for personal information documents.
Since the amount of documents is so large and constantly accumulated, it is easily stored as a scanned image file since it is not easy to keep it as a document file.
However, when a personal information document or a document to be searched later is to be searched, the character of the image file can not be read so that the search level is lowered and the usability thereof is very low.
When storing a scanned image file, only the basic information such as the type of the scanned image file, the title, the date, and related personal information are stored together. Therefore, the scanned image file can be searched only within the range that can be searched through such basic information Only.
Since the scanned image file also contains various contents, there is a limitation in performing a specific search in a wide range, for example, collecting real estate contract documents or searching for a real estate contract document of a specific individual. For example, although you can specifically search for Person A's September 23, 2016 dossier, a more diverse and broader search based on the text is not possible.
Of course, scan image files may be recognized as OCR (optical character reader) and stored as document files such as PDF. However, personal information and security related contents contained in these documents may be illegally leaked through document files or exposed by hacking The risk is very high. Most of these documents contain personal information.
In addition, in the case of separately storing each document file such as PDF, it takes a considerable time to open and read each file to check the contents of the document file, and there is a problem that the convenience of access is very low.
Accordingly, it is possible to solve the problem that the inconvenience of searching according to the existing scan image file method and the problem that the searchable level itself is lowered, and the security risk or convenience of access which is caused when the document file is stored is lowered There is a demand.
An object of the present invention is to provide a system for searching and providing an original document image file.
It is another object of the present invention to provide a method for searching and providing an original document image file.
According to another aspect of the present invention, there is provided a system for searching and providing an original document image file, comprising: an original document image file database in which an original document image file is stored in advance; OCR recognition / metadata extraction for extracting metadata indicating the format of an original document image file by performing an OCR (optical character reader) recognition by scanning an original document image file stored in the original document image file database module; An XML conversion module for converting the original document data of the original document image file in which the OCR recognition is performed by the OCR recognition / metadata extraction module into an XML document using the extracted metadata; An XML document file database in which the XML document converted by the XML conversion module is stored; An XML document retrieving module for retrieving original document data requested to be retrieved by a user terminal in an XML document stored in the XML document file database; And an original document image file providing module for searching an original document image file corresponding to the original document data retrieved from the XML document retrieving module in the original document image file database and providing the original document image file to the user terminal.
The tag extraction module may further include a tag extraction module for automatically extracting a tag for searching the contents of the original document image file in which OCR recognition is performed by the OCR recognition / metadata extraction module.
In this case, the XML document search module may be configured to search the XML document using the tag extracted from the tag extraction module for the text data requested to be searched by the user terminal.
According to another aspect of the present invention, there is provided a method for searching and providing an original document image file, the OCR recognition module scanning an original document image file stored in advance in an original document image file database to perform OCR recognition, Extracting metadata representing a format of an original document image file; Converting the original document data of the original document image file in which the OCR recognition is performed by the OCR recognition / metadata extraction module into an XML document and storing the converted XML document in an XML document file database; The XML document retrieval module retrieving the original document data to be retrieved by the user terminal in the XML document stored in the XML document file database; And the original document image file providing module searches the original document image file database for the original document image file corresponding to the original text data retrieved from the XML document retrieving module and provides the original document image file to the user terminal.
Here, the XML conversion module converts the original document data of the original document image file in which the OCR recognition is performed by the OCR recognition / metadata extraction module into an XML document and stores the converted XML document in an XML document file database. The module may be configured to automatically extract a tag for searching the contents of the original document image file in which the OCR recognition is performed by the OCR recognition / metadata extraction module.
The step of the XML document retrieval module retrieving the original document data requested to be retrieved by the user terminal in the XML document stored in the XML document file database may include retrieving the original document data to be retrieved by the user terminal And retrieve the XML document using the tag extracted by the tag extracting module.
According to the above-described system and method for searching and providing original document image files, a separate document file database is constructed by recognizing OCR of the original document image file while keeping the existing original document image file database established, And the original document image file of the original document image file database is selected and read by using the search result in the database so that a desired original document image file can be searched in a variety of search methods in a large original document image file, There is an effect that can be read out.
In addition, the XML document file database only provides search results and is configured not to directly access the document body to obtain information, thereby preventing personal information or security information contained in the document body from being leaked.
1 is a block diagram of a system for searching and providing an original document image file according to an embodiment of the present invention.
2 is a flowchart illustrating a method of searching for and providing an original document image file according to an embodiment of the present invention.
While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail to the concrete inventive concept. It should be understood, however, that the invention is not intended to be limited to the particular embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention. Like reference numerals are used for like elements in describing each drawing.
The terms first, second, A, B, etc. may be used to describe various elements, but the elements should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component. And / or < / RTI > includes any combination of a plurality of related listed items or any of a plurality of related listed items.
It is to be understood that when an element is referred to as being "connected" or "connected" to another element, it may be directly connected or connected to the other element, . On the other hand, when an element is referred to as being "directly connected" or "directly connected" to another element, it should be understood that there are no other elements in between.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The singular expressions include plural expressions unless the context clearly dictates otherwise. In the present application, the terms "comprises" or "having" and the like are used to specify that there is a feature, a number, a step, an operation, an element, a component or a combination thereof described in the specification, But do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or combinations thereof.
Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries are to be interpreted as having a meaning consistent with the contextual meaning of the related art and are to be interpreted as either ideal or overly formal in the sense of the present application Do not.
Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings.
1 is a block diagram of a system for searching and providing an original document image file according to an embodiment of the present invention.
Referring to FIG. 1, an original document image file searching and providing
The
In particular, there is no fear that the contents of the text or the personal information contained in the text will be leaked, and each document is implemented as an XML document, so it is very convenient and easy to search Is high.
Hereinafter, the detailed configuration will be described.
The original document
The original document
The
The OCR recognition /
In addition, the OCR recognition /
The OCR recognition /
Meanwhile, the
The tags can be defined by predetermined algorithms and can be determined according to the characteristics of the documents. The tag can be automatically extracted from the contents of the document. In the case of a bank, the title such as an account opening agreement or a cash card application can be automatically designated as a contract month, a name, and the like. In addition, the tag may be composed of predetermined keywords such as a contract, an opening, an account, a card, a loan, a credit loan, a security, and the like.
These tags can be used to perform searches on vast amounts of document content. The tag can also be used for searching for big data analysis.
The
The XML document converted by the
Meanwhile, the XML
The XML
The original document image
The XML
The XML
For example, the contents of an XML document can be retrieved by using search terms such as X, XX, contract, and loan. Multiple XML documents containing these search terms can be retrieved.
On the other hand, the retrieval method using metadata can be utilized for big data analysis. Since the content of the text is not provided in the form of a readable character code even in the analysis of the big data, the XML
That is, the content of the XML document itself is configured not to leak out of the XML
2 is a flowchart illustrating a method of searching for and providing an original document image file according to an embodiment of the present invention.
Referring to FIG. 2, an optical character reader (OCR)
Next, the
Next, the
Next, the XML
At this time, the XML
Next, the original document image
It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention as defined in the following claims. There will be.
110: Original document image file database
120: OCR recognition / metadata extraction module
130: XML transformation module
140: XML document file database
150: tag extraction module
160: XML document retrieval module
170: Original document image file provision module
Claims (4)
OCR recognition / metadata extraction for extracting metadata indicating the format of an original document image file by performing an OCR (optical character reader) recognition by scanning an original document image file stored in the original document image file database module;
An XML conversion module for converting the original document data of the original document image file in which the OCR recognition is performed by the OCR recognition / metadata extraction module into an XML document using the extracted metadata;
An XML document file database in which the XML document converted by the XML conversion module is stored;
An XML document retrieving module for retrieving original document data requested to be retrieved by a user terminal in an XML document stored in the XML document file database;
An original document image file providing module for searching an original document image file corresponding to the original document data retrieved from the XML document retrieving module from the original document image file database and providing the original document image file to the user terminal; .
Further comprising a tag extracting module for automatically extracting a tag for searching the contents of an original document image file in which OCR recognition is performed by the OCR recognition / metadata extracting module,
Wherein the XML document retrieval module comprises:
And searching the XML document using the tag extracted from the tag extracting module for the original document data to be searched for by the user terminal.
Converting the original document data of the original document image file in which the OCR recognition is performed by the OCR recognition / metadata extraction module into an XML document and storing the converted XML document in an XML document file database;
The XML document retrieval module retrieving the original document data to be retrieved by the user terminal in the XML document stored in the XML document file database;
Retrieving an original document image file corresponding to the original document data retrieved from the XML document retrieving module from the original document image file database and providing the retrieved original document image file to the user terminal; Delivery method.
Converting the original document data of the original document image file in which the OCR recognition is performed by the OCR recognition / metadata extraction module into the XML document and storing the converted XML document in the XML document file database,
Further comprising the step of automatically extracting a tag for retrieving a content from an original document image file in which the OCR recognition is performed by the OCR recognition / metadata extraction module,
The step of the XML document retrieval module retrieving the original text data to be retrieved by the user terminal in the XML document stored in the XML document file database,
Wherein the XML document retrieving module is configured to retrieve an original document data to be retrieved by the user terminal from an XML document using a tag extracted from the tag extracting module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020160123177A KR20180033786A (en) | 2016-09-26 | 2016-09-26 | System of searching and providing original document image file and method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020160123177A KR20180033786A (en) | 2016-09-26 | 2016-09-26 | System of searching and providing original document image file and method thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
KR20180033786A true KR20180033786A (en) | 2018-04-04 |
Family
ID=61975260
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020160123177A KR20180033786A (en) | 2016-09-26 | 2016-09-26 | System of searching and providing original document image file and method thereof |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR20180033786A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102350111B1 (en) * | 2021-04-02 | 2022-01-12 | (주)광개토연구소 | Method for issuing combined web page including both at least one specific identification phrase and at least one locator capable of allowing at least one docking result data corresponding to the specific identification phrase to be accessed and server using the same |
-
2016
- 2016-09-26 KR KR1020160123177A patent/KR20180033786A/en not_active Application Discontinuation
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102350111B1 (en) * | 2021-04-02 | 2022-01-12 | (주)광개토연구소 | Method for issuing combined web page including both at least one specific identification phrase and at least one locator capable of allowing at least one docking result data corresponding to the specific identification phrase to be accessed and server using the same |
WO2022211163A1 (en) * | 2021-04-02 | 2022-10-06 | (주)광개토연구소 | Method for issuing combined web page including at least one specific identification phrase extracted from specific input data of user and at least one locator which can access at least one docking result data piece corresponding to same at least one specific identification phrase, and combined web page issuance server using same |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107608958B (en) | Contract text risk information mining method and system based on unified modeling of clauses | |
US20190286898A1 (en) | System and method for data extraction and searching | |
AU2007325200B2 (en) | Digital image archiving and retrieval using a mobile device system | |
KR100990018B1 (en) | Method for adding metadata to data | |
US8244037B2 (en) | Image-based data management method and system | |
JP4740916B2 (en) | Image document processing apparatus, image document processing program, and recording medium recording image document processing program | |
JP4364914B2 (en) | Image document processing apparatus, image document processing method, program, and recording medium | |
CN101297319B (en) | Embedding hot spots in electronic documents | |
Déjean et al. | A system for converting PDF documents into structured XML format | |
US8290269B2 (en) | Image document processing device, image document processing method, program, and storage medium | |
US20120284250A1 (en) | Enhanced search engine | |
EP2442273A1 (en) | Object identification image database creating method, creating apparatus and creating process program | |
Ugale et al. | Document management system: A notion towards paperless office | |
WO2007023992A1 (en) | Method and system for image matching in a mixed media environment | |
CN101493896B (en) | Document image processing apparatus and method | |
US20090030882A1 (en) | Document image processing apparatus and document image processing method | |
KR102373884B1 (en) | Image data processing method for searching images by text | |
Maiti et al. | Capturing, eliciting, and prioritizing (CEP) NFRs in agile software engineering | |
Kavallieratou et al. | The GRUHD database of Greek unconstrained handwriting | |
Schröder et al. | Supporting land reuse of former open pit mining sites using text classification and active learning | |
US20060210171A1 (en) | Image processing apparatus | |
EP1917637A1 (en) | Data organization and access for mixed media document system | |
US7286722B2 (en) | Memo image managing apparatus, memo image managing system and memo image managing method | |
WO2007023991A1 (en) | Embedding hot spots in electronic documents | |
KR20180033786A (en) | System of searching and providing original document image file and method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A201 | Request for examination | ||
E902 | Notification of reason for refusal | ||
E601 | Decision to refuse application |