CN116186133A - Electronic document management method integrating forward index and backward index - Google Patents
Electronic document management method integrating forward index and backward index Download PDFInfo
- Publication number
- CN116186133A CN116186133A CN202211729747.1A CN202211729747A CN116186133A CN 116186133 A CN116186133 A CN 116186133A CN 202211729747 A CN202211729747 A CN 202211729747A CN 116186133 A CN116186133 A CN 116186133A
- Authority
- CN
- China
- Prior art keywords
- document
- electronic document
- database
- index database
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000007726 management method Methods 0.000 title claims description 34
- 238000000034 method Methods 0.000 claims abstract description 13
- 238000004806 packaging method and process Methods 0.000 claims abstract 2
- 238000004590 computer program Methods 0.000 claims description 6
- 230000002596 correlated effect Effects 0.000 claims description 3
- 230000006870 function Effects 0.000 abstract description 5
- 238000005516 engineering process Methods 0.000 description 12
- 230000000875 corresponding effect Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 241000533950 Leucojum Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention provides a method for managing electronic documents by fusing a forward index and an inverse index, which comprises the steps of selecting a database with a search engine as the forward index and a database with a search engine as the inverse index, and packaging a unified database API to fuse and connect the two databases; when the electronic document is stored, storing structural data of the electronic document in a forward index database, storing text data of the electronic document in an inverse index database, and correlating the data in the forward index database with the data in the inverse index database through the ID of the electronic document; when searching the document, searching is carried out in the forward index database through the structural information of the document according to different requirements, or the full-text efficient searching of the document is realized in the reverse index database through the keywords. The invention not only satisfies the structured management and storage functions of electronic document management, but also realizes the efficient retrieval function of massive text contents.
Description
Technical Field
The invention relates to the field of computer software, in particular to an electronic document management method integrating forward and reverse indexes.
Background
With the development of information technology, electronic document management systems are gradually being used by more and more enterprises as main management schemes of documents. However, the main functions of the current electronic document management systems are biased towards management, and little attention is paid to efficient retrieval of massive text content. Even though many electronic document management systems have retrieval functionality, it is difficult to efficiently retrieve from a vast array of text, subject to the limitations of the management systems generally employing relational structured databases (which use forward index search engines). And a simple management system taking the inverted index database as a bottom layer can carry out efficient retrieval on massive texts, but is difficult to carry out effective structured management on documents.
Disclosure of Invention
The invention aims to provide an electronic document management method integrating forward indexes and reverse indexes.
The technical solution for realizing the purpose of the invention is as follows: an electronic document management method integrating forward and reverse indexes includes the following steps:
step 1, selecting a database with a search engine as a forward index and a database with a search engine as an inverse index, coding and designing a uniform access interface, supporting uniform access operation on two databases, and realizing fusion and connection on the two databases;
step 2, when the electronic document is stored, storing structural data of the electronic document in a forward index database, storing text data of the electronic document in an backward index database, and correlating the data in the forward index database with the data in the backward index database through the ID of the electronic document;
and 3, searching in a forward index database according to the structural information of the document or realizing the full-text efficient retrieval of the document in an inverted index database through keywords according to different requirements when searching the document.
Further, step 2, when storing the electronic document, storing the structured data of the electronic document in the forward index database, storing the text data of the electronic document in the reverse index database, and associating the data in the forward index database with the data in the reverse index database by the ID of the electronic document, the specific method is as follows:
(1) Before entering data, initializing a search engine into a table structure of a database indexed in a forward direction, wherein the table structure comprises a directory table and an electronic document table, the directory table is a self-association table, and a parent directory attribute of the directory table references a main key of the table; the parent directory attribute of the electronic document table is an external key which references the main key of the directory table;
(4) Determining the category of a document to be stored, including a primary catalog, a secondary catalog and own names, uploading and analyzing the document, obtaining the title and the full text content, and generating a global ID for the document;
(5) Inquiring the ID of the direct father catalog of the document in the catalog, if the ID does not exist, establishing relevant catalog data in the catalog, and inputting the ID, the title and the ID of the father catalog of the document into a forward index database; the ID, title and full text content of the document are segmented and then input into an inverted index database, so that the data in the two databases are correlated through the ID of the electronic document.
Further, in step 3, when searching the document, searching is performed in the forward index database through structural information of the document or the full text efficient search of the document is realized in the reverse index database through keywords according to different requirements, and the specific method is as follows:
(1) If the specific name and the category information of the file are determined, the document is found in a first-level manner according to the category of the file, namely, the document is searched in a forward index database;
(2) If the specific name and the category information of the document are not determined, searching the document through the inverted index database according to a certain keyword in the document;
an electronic document management system integrating forward and reverse indexes is characterized in that electronic document management integrating forward and reverse indexes is realized based on the electronic document management method integrating forward and reverse indexes.
A computer device comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein when the processor executes the computer program, the electronic document management method for fusing the forward index and the reverse index is based on the electronic document management method for fusing the forward index and the reverse index.
A computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs electronic document management incorporating forward and reverse indexes based on the electronic document management method incorporating forward and reverse indexes.
Compared with the prior art, the invention has the remarkable advantages that: the method not only meets the structural management and storage functions of electronic document management, but also realizes the efficient retrieval function of massive text contents.
Drawings
Fig. 1 is a forward index schematic.
Fig. 2 is a schematic diagram of an inverted index.
FIG. 3 is a schematic diagram of an electronic document management method incorporating forward and reverse indexes;
FIG. 4 is a flow chart of a method of electronic document management that fuses forward and reverse indexes.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
The forward index is found by key, as shown in FIG. 1. When a new document is added, a block space is newly built for the new document, and the new document is connected to the back of the original index file; when deleting the document, directly finding the index information corresponding to the document, and deleting the document. Conventional relational database search engines typically employ forward indexes. The inverted index is to find the key by value, i.e., the document by attribute value, as shown in fig. 2. Documents containing this word can be quickly obtained with the keyword through the inverted index.
Accordingly, the present invention proposes a method for managing electronic documents by fusing a forward index and a backward index, as shown in fig. 3, comprising the following steps:
step 1, a database using a forward index search engine (which can be understood as a conventional relational database) and a database using an inverse index search engine are selected, respectively. And encapsulating the call of the database to realize that the same application system is connected with two types of databases.
And 2, dividing the data to be input when the file is stored, and inputting different types of data into a corresponding database according to the need. The method comprises the following steps:
(1) Before entering data, it is first confirmed whether the search engine has been initialized to the table structure of the forward indexed database, i.e., the relational database. Mainly comprises a catalog table and an electronic document table. If not, a table needs to be established first. Wherein the directory table is a self-associated table, and its parent directory attribute references the primary key of the table; the parent directory attribute of the electronic document table is an foreign key that references the primary key of the directory table.
(6) The category of the document to be stored is determined, including a primary directory, a secondary directory, a self name, and the like. Uploading and analyzing the file to obtain the title and the full text content. A global ID is generated for the document.
(7) And inquiring the ID of the direct parent directory of the document in the directory table, and if the ID does not exist, establishing relevant directory data in the directory table. Entering the ID, title and parent directory ID of the document into a forward index database; the ID, title and full text content of the document are segmented and then input into an inverted index database. Thus, the data in the two types of databases can be correlated by the ID of the electronic document.
Step 3, when searching the document, searching in the forward index database according to different requirements through the structural information of the document; the efficient full-text retrieval of the document can be realized in the inverted index database through the keywords in the text. Comprising the following aspects:
(1) If the summary information of the file is known, the document can be conveniently found by one level by its category, which is found in the forward index database.
(2) If the specific name and category information of the document are not known, the document can be quickly searched out by a database based on the inverted index according to a certain keyword in the document.
(3) A more typical application of inverted index retrieval is when you want to find all documents in the system that contain a certain keyword, enter that keyword, you can quickly retrieve all relevant documents in a huge amount of text.
Examples
To verify the effectiveness of the inventive protocol, the following experiments were performed.
Step 1, firstly selecting a relational database with a search engine as a forward index, and then selecting a database with a bottom layer as an inverse index. And writing a program to access the two databases simultaneously, and realizing corresponding database storage and retrieval methods.
Step 2, a technical document Java technical specification needs to be input into an electronic document management system, and the specific content of the electronic document management system comprises 10 tens of thousands of words, wherein the following words are included: any magic value is not allowed to appear directly in the code. The method comprises the following steps:
(1) Firstly, confirming whether a table is built in a relational database, if not, firstly, building a directory table which is a self-association table, wherein the parent directory attribute of the table references the main key of the table; and then establishing an electronic document table, wherein the father directory attribute in the electronic document table is a foreign key which references the main key of the directory table.
(2) Determining the classification of the computer technology class, wherein the primary catalog is an industrial technology class, and the secondary catalog is a computer technology class; determining its name as "Java technical Specification"; uploading a file and obtaining 10-thousand words of full text content after file analysis; a globally unique ID is generated for the document using a snowflake algorithm.
(3) If the ID of the computer technology class in the directory table does not exist, the entry table is newly built in the directory table, the father directory of the directory is the industrial technology class, and if the industrial technology class directory does not exist, the industrial technology class directory needs to be newly built first. After obtaining the ID of the computer technology class directory, storing the ID of the Java technical specification, the title and the ID of a parent directory (computer technology class directory) of the Java technical specification into a forward index database; the ID, title and full text content of Java technical specifications are stored in an inverted index database. Thus, after the ID of Java technical specification is obtained from any type of database, the same document can be queried from another type of database.
And 3, when the electronic document of Java technical Specification needs to be found, different retrieval methods can be used for different scenes. Comprising the following aspects:
(1) If its specific category is clear, the document summary information can be found directly by "industrial technology class-computer technology class-Java technical specification" where it is retrieved by a relational database underlying the forward index search engine. The document summary information is found, so that the ID of the document summary information is obtained, and the full-text content of the document is obtained through the ID reverse index database.
(2) If the specific category of the electronic document is not clear, the related document can be obtained by searching Java or technical specification or magic value, then the document is found, and the document is searched by a database with the bottom layer being an inverted index, so that the searching of the title and the whole text is very time-consuming by a relational database under the condition of massive texts.
(3) If a document is not specifically found, but all documents related to a certain field are found, full text retrieval can be directly performed, for example, a code is input to find related documents in a programming field, and the code is contained in Java technical Specification, so that the document can be retrieved, and other documents containing the code in a title or content can be retrieved. This is retrieved in an inverted index database, which is very efficient in retrieving large amounts of text.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.
Claims (6)
1. An electronic document management method integrating forward and reverse indexes is characterized by comprising the following steps:
step 1, selecting a database with a search engine as a forward index and a database with a search engine as an inverse index, and packaging a unified database API to fuse and connect the two databases;
step 2, when the electronic document is stored, storing structural data of the electronic document in a forward index database, storing text data of the electronic document in an backward index database, and correlating the data in the forward index database with the data in the backward index database through the ID of the electronic document;
and 3, searching in a forward index database according to the structural information of the document or realizing the full-text efficient retrieval of the document in an inverted index database through keywords according to different requirements when searching the document.
2. The method for managing electronic documents by fusing forward and reverse indexes as claimed in claim 1, wherein in the step 2, when storing the electronic documents, the structured data of the electronic documents are stored in the forward index database, the text data of the electronic documents are stored in the reverse index database, and the data in the forward index database and the data in the reverse index database are related to each other by the ID of the electronic documents, specifically comprising:
(1) Before entering data, initializing a search engine into a table structure of a database indexed in a forward direction, wherein the table structure comprises a directory table and an electronic document table, the directory table is a self-association table, and a parent directory attribute of the directory table references a main key of the table; the parent directory attribute of the electronic document table is an external key which references the main key of the directory table;
(2) Determining the category of a document to be stored, including a primary catalog, a secondary catalog and own names, uploading and analyzing the document, obtaining the title and the full text content, and generating a global ID for the document;
(3) Inquiring the ID of the direct father catalog of the document in the catalog, if the ID does not exist, establishing relevant catalog data in the catalog, and inputting the ID, the title and the ID of the father catalog of the document into a forward index database; the ID, title and full text content of the document are segmented and then input into an inverted index database, so that the data in the two databases are correlated through the ID of the electronic document.
3. The method for managing electronic documents by fusing forward and backward indexes as claimed in claim 1, wherein step 3, when searching documents, searching is performed in a forward index database through structural information of the documents or the full text efficient search of the documents is realized in the backward index database through keywords according to different requirements, and the specific method comprises the following steps:
(1) If the specific name and the category information of the file are determined, the document is found in a first-level manner according to the category of the file, namely, the document is searched in a forward index database;
(2) If the specific name and category information of the document are not determined, the document is retrieved through the inverted index database according to a certain keyword within the document.
4. An electronic document management system integrating forward and reverse indexes, wherein the electronic document management system integrating forward and reverse indexes is realized based on the electronic document management method integrating forward and reverse indexes as claimed in any one of claims 1 to 3.
5. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing electronic document management incorporating forward and reverse indexes based on the electronic document management method incorporating forward and reverse indexes of any one of claims 1 to 3 when the computer program is executed by the processor.
6. A computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements electronic document management incorporating forward and reverse indexes based on the electronic document management method incorporating forward and reverse indexes as set forth in any one of claims 1 to 3.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211038247.3A CN115442242A (en) | 2022-08-29 | 2022-08-29 | Workflow arrangement system and method based on importance ordering |
CN2022110382473 | 2022-08-29 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116186133A true CN116186133A (en) | 2023-05-30 |
Family
ID=84245199
Family Applications (5)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211038247.3A Pending CN115442242A (en) | 2022-08-29 | 2022-08-29 | Workflow arrangement system and method based on importance ordering |
CN202211625573.4A Active CN115858168B (en) | 2022-08-29 | 2022-12-16 | Earth application model arrangement system and method based on importance ranking |
CN202211731819.6A Active CN116127190B (en) | 2022-08-29 | 2022-12-30 | Digital earth resource recommendation system and method |
CN202211729747.1A Pending CN116186133A (en) | 2022-08-29 | 2022-12-30 | Electronic document management method integrating forward index and backward index |
CN202310035450.3A Pending CN116542326A (en) | 2022-08-29 | 2023-01-10 | Knowledge representation method and system based on time sequence convolution |
Family Applications Before (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211038247.3A Pending CN115442242A (en) | 2022-08-29 | 2022-08-29 | Workflow arrangement system and method based on importance ordering |
CN202211625573.4A Active CN115858168B (en) | 2022-08-29 | 2022-12-16 | Earth application model arrangement system and method based on importance ranking |
CN202211731819.6A Active CN116127190B (en) | 2022-08-29 | 2022-12-30 | Digital earth resource recommendation system and method |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310035450.3A Pending CN116542326A (en) | 2022-08-29 | 2023-01-10 | Knowledge representation method and system based on time sequence convolution |
Country Status (1)
Country | Link |
---|---|
CN (5) | CN115442242A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117421487A (en) * | 2023-12-19 | 2024-01-19 | 西安康奈网络科技有限公司 | Multiple network information screening management system based on artificial intelligence |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117370424B (en) * | 2023-12-07 | 2024-02-13 | 深圳市易图资讯股份有限公司 | Mobile application comment data analysis mining method and system for economic information analysis |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110078516A1 (en) * | 2009-09-28 | 2011-03-31 | International Business Machines Corporation | Method and a system for performing a two-phase commit protocol |
CN109684537A (en) * | 2018-10-29 | 2019-04-26 | 昆明理工大学 | A kind of the knowledge resource intelligently pushing system and its method for pushing of Business Process-oriented |
US11537446B2 (en) * | 2019-08-14 | 2022-12-27 | Microsoft Technology Licensing, Llc | Orchestration and scheduling of services |
CN110795219B (en) * | 2019-10-24 | 2022-03-18 | 华东计算技术研究所(中国电子科技集团公司第三十二研究所) | Resource scheduling method and system suitable for multiple computing frameworks |
CN111079009B (en) * | 2019-12-11 | 2023-05-26 | 中国地质大学(武汉) | User interest detection method and system for government map service |
CN111752723B (en) * | 2020-06-06 | 2021-05-04 | 中国科学院电子学研究所苏州研究院 | Visual multi-source service management system and implementation method thereof |
CN112463363B (en) * | 2020-11-06 | 2022-08-26 | 苏州浪潮智能科技有限公司 | Resource arranging method, device, equipment and storage medium |
CN114138486B (en) * | 2021-12-02 | 2024-03-26 | 中国人民解放军国防科技大学 | Method, system and medium for arranging containerized micro-services for cloud edge heterogeneous environment |
CN114422582B (en) * | 2022-01-20 | 2023-05-16 | 中国科学院软件研究所 | Dynamic service combination method and device for technological resources |
CN114638021A (en) * | 2022-03-18 | 2022-06-17 | 北京邮电大学 | Internet of things lightweight block chain system security evaluation method |
CN114756170B (en) * | 2022-04-02 | 2023-03-24 | 苏州空天信息研究院 | Storage isolation system and method for container application |
-
2022
- 2022-08-29 CN CN202211038247.3A patent/CN115442242A/en active Pending
- 2022-12-16 CN CN202211625573.4A patent/CN115858168B/en active Active
- 2022-12-30 CN CN202211731819.6A patent/CN116127190B/en active Active
- 2022-12-30 CN CN202211729747.1A patent/CN116186133A/en active Pending
-
2023
- 2023-01-10 CN CN202310035450.3A patent/CN116542326A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117421487A (en) * | 2023-12-19 | 2024-01-19 | 西安康奈网络科技有限公司 | Multiple network information screening management system based on artificial intelligence |
CN117421487B (en) * | 2023-12-19 | 2024-03-08 | 西安康奈网络科技有限公司 | Multiple network information screening management system based on artificial intelligence |
Also Published As
Publication number | Publication date |
---|---|
CN116127190B (en) | 2023-07-28 |
CN116542326A (en) | 2023-08-04 |
CN115858168B (en) | 2023-10-13 |
CN116127190A (en) | 2023-05-16 |
CN115858168A (en) | 2023-03-28 |
CN115442242A (en) | 2022-12-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6898592B2 (en) | Scoping queries in a search engine | |
US8051045B2 (en) | Archive indexing engine | |
RU2409847C2 (en) | Mapping system model to database object | |
JP6006267B2 (en) | System and method for narrowing a search using index keys | |
US7228299B1 (en) | System and method for performing file lookups based on tags | |
US7849063B2 (en) | Systems and methods for indexing content for fast and scalable retrieval | |
CN116186133A (en) | Electronic document management method integrating forward index and backward index | |
US7133867B2 (en) | Text and attribute searches of data stores that include business objects | |
US9330178B2 (en) | Search engine | |
US20050120004A1 (en) | Systems and methods for indexing content for fast and scalable retrieval | |
US20150293958A1 (en) | Scalable data structures | |
US20080114733A1 (en) | User-structured data table indexing | |
US20050076018A1 (en) | Sorting result buffer | |
CN105117433A (en) | Method and system for statistically querying HBase based on analysis performed by Hive on HFile | |
US7333994B2 (en) | System and method for database having relational node structure | |
Whang et al. | Odysseus: a high-performance ORDBMS tightly-coupled with IR features | |
US7475059B2 (en) | Adapting business objects for searches and searching adapted business objects | |
US20080177701A1 (en) | System and method for searching a volume of files | |
US8495025B2 (en) | Foldering by stable query | |
CN115080684B (en) | Network disk document indexing method and device, network disk and storage medium | |
US7299224B2 (en) | Method and infrastructure for processing queries in a database | |
US8805820B1 (en) | Systems and methods for facilitating searches involving multiple indexes | |
CN113779068A (en) | Data query method, device, equipment and storage medium | |
Watson et al. | Exploring the design space of metadata-focused file management systems | |
KR20190089420A (en) | Data construction and management system of sub index storage method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |