CN116186133A - Electronic document management method integrating forward index and backward index - Google Patents

Electronic document management method integrating forward index and backward index Download PDF

Info

Publication number
CN116186133A
CN116186133A CN202211729747.1A CN202211729747A CN116186133A CN 116186133 A CN116186133 A CN 116186133A CN 202211729747 A CN202211729747 A CN 202211729747A CN 116186133 A CN116186133 A CN 116186133A
Authority
CN
China
Prior art keywords
document
electronic document
database
index database
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211729747.1A
Other languages
Chinese (zh)
Inventor
任岩
顾爽
潘月浩
张露
徐夏
陶昊然
金晨
蒙森荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Aerospace Information Research Institute
Original Assignee
Suzhou Aerospace Information Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Aerospace Information Research Institute filed Critical Suzhou Aerospace Information Research Institute
Publication of CN116186133A publication Critical patent/CN116186133A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides a method for managing electronic documents by fusing a forward index and an inverse index, which comprises the steps of selecting a database with a search engine as the forward index and a database with a search engine as the inverse index, and packaging a unified database API to fuse and connect the two databases; when the electronic document is stored, storing structural data of the electronic document in a forward index database, storing text data of the electronic document in an inverse index database, and correlating the data in the forward index database with the data in the inverse index database through the ID of the electronic document; when searching the document, searching is carried out in the forward index database through the structural information of the document according to different requirements, or the full-text efficient searching of the document is realized in the reverse index database through the keywords. The invention not only satisfies the structured management and storage functions of electronic document management, but also realizes the efficient retrieval function of massive text contents.

Description

Electronic document management method integrating forward index and backward index
Technical Field
The invention relates to the field of computer software, in particular to an electronic document management method integrating forward and reverse indexes.
Background
With the development of information technology, electronic document management systems are gradually being used by more and more enterprises as main management schemes of documents. However, the main functions of the current electronic document management systems are biased towards management, and little attention is paid to efficient retrieval of massive text content. Even though many electronic document management systems have retrieval functionality, it is difficult to efficiently retrieve from a vast array of text, subject to the limitations of the management systems generally employing relational structured databases (which use forward index search engines). And a simple management system taking the inverted index database as a bottom layer can carry out efficient retrieval on massive texts, but is difficult to carry out effective structured management on documents.
Disclosure of Invention
The invention aims to provide an electronic document management method integrating forward indexes and reverse indexes.
The technical solution for realizing the purpose of the invention is as follows: an electronic document management method integrating forward and reverse indexes includes the following steps:
step 1, selecting a database with a search engine as a forward index and a database with a search engine as an inverse index, coding and designing a uniform access interface, supporting uniform access operation on two databases, and realizing fusion and connection on the two databases;
step 2, when the electronic document is stored, storing structural data of the electronic document in a forward index database, storing text data of the electronic document in an backward index database, and correlating the data in the forward index database with the data in the backward index database through the ID of the electronic document;
and 3, searching in a forward index database according to the structural information of the document or realizing the full-text efficient retrieval of the document in an inverted index database through keywords according to different requirements when searching the document.
Further, step 2, when storing the electronic document, storing the structured data of the electronic document in the forward index database, storing the text data of the electronic document in the reverse index database, and associating the data in the forward index database with the data in the reverse index database by the ID of the electronic document, the specific method is as follows:
(1) Before entering data, initializing a search engine into a table structure of a database indexed in a forward direction, wherein the table structure comprises a directory table and an electronic document table, the directory table is a self-association table, and a parent directory attribute of the directory table references a main key of the table; the parent directory attribute of the electronic document table is an external key which references the main key of the directory table;
(4) Determining the category of a document to be stored, including a primary catalog, a secondary catalog and own names, uploading and analyzing the document, obtaining the title and the full text content, and generating a global ID for the document;
(5) Inquiring the ID of the direct father catalog of the document in the catalog, if the ID does not exist, establishing relevant catalog data in the catalog, and inputting the ID, the title and the ID of the father catalog of the document into a forward index database; the ID, title and full text content of the document are segmented and then input into an inverted index database, so that the data in the two databases are correlated through the ID of the electronic document.
Further, in step 3, when searching the document, searching is performed in the forward index database through structural information of the document or the full text efficient search of the document is realized in the reverse index database through keywords according to different requirements, and the specific method is as follows:
(1) If the specific name and the category information of the file are determined, the document is found in a first-level manner according to the category of the file, namely, the document is searched in a forward index database;
(2) If the specific name and the category information of the document are not determined, searching the document through the inverted index database according to a certain keyword in the document;
an electronic document management system integrating forward and reverse indexes is characterized in that electronic document management integrating forward and reverse indexes is realized based on the electronic document management method integrating forward and reverse indexes.
A computer device comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein when the processor executes the computer program, the electronic document management method for fusing the forward index and the reverse index is based on the electronic document management method for fusing the forward index and the reverse index.
A computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs electronic document management incorporating forward and reverse indexes based on the electronic document management method incorporating forward and reverse indexes.
Compared with the prior art, the invention has the remarkable advantages that: the method not only meets the structural management and storage functions of electronic document management, but also realizes the efficient retrieval function of massive text contents.
Drawings
Fig. 1 is a forward index schematic.
Fig. 2 is a schematic diagram of an inverted index.
FIG. 3 is a schematic diagram of an electronic document management method incorporating forward and reverse indexes;
FIG. 4 is a flow chart of a method of electronic document management that fuses forward and reverse indexes.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
The forward index is found by key, as shown in FIG. 1. When a new document is added, a block space is newly built for the new document, and the new document is connected to the back of the original index file; when deleting the document, directly finding the index information corresponding to the document, and deleting the document. Conventional relational database search engines typically employ forward indexes. The inverted index is to find the key by value, i.e., the document by attribute value, as shown in fig. 2. Documents containing this word can be quickly obtained with the keyword through the inverted index.
Accordingly, the present invention proposes a method for managing electronic documents by fusing a forward index and a backward index, as shown in fig. 3, comprising the following steps:
step 1, a database using a forward index search engine (which can be understood as a conventional relational database) and a database using an inverse index search engine are selected, respectively. And encapsulating the call of the database to realize that the same application system is connected with two types of databases.
And 2, dividing the data to be input when the file is stored, and inputting different types of data into a corresponding database according to the need. The method comprises the following steps:
(1) Before entering data, it is first confirmed whether the search engine has been initialized to the table structure of the forward indexed database, i.e., the relational database. Mainly comprises a catalog table and an electronic document table. If not, a table needs to be established first. Wherein the directory table is a self-associated table, and its parent directory attribute references the primary key of the table; the parent directory attribute of the electronic document table is an foreign key that references the primary key of the directory table.
(6) The category of the document to be stored is determined, including a primary directory, a secondary directory, a self name, and the like. Uploading and analyzing the file to obtain the title and the full text content. A global ID is generated for the document.
(7) And inquiring the ID of the direct parent directory of the document in the directory table, and if the ID does not exist, establishing relevant directory data in the directory table. Entering the ID, title and parent directory ID of the document into a forward index database; the ID, title and full text content of the document are segmented and then input into an inverted index database. Thus, the data in the two types of databases can be correlated by the ID of the electronic document.
Step 3, when searching the document, searching in the forward index database according to different requirements through the structural information of the document; the efficient full-text retrieval of the document can be realized in the inverted index database through the keywords in the text. Comprising the following aspects:
(1) If the summary information of the file is known, the document can be conveniently found by one level by its category, which is found in the forward index database.
(2) If the specific name and category information of the document are not known, the document can be quickly searched out by a database based on the inverted index according to a certain keyword in the document.
(3) A more typical application of inverted index retrieval is when you want to find all documents in the system that contain a certain keyword, enter that keyword, you can quickly retrieve all relevant documents in a huge amount of text.
Examples
To verify the effectiveness of the inventive protocol, the following experiments were performed.
Step 1, firstly selecting a relational database with a search engine as a forward index, and then selecting a database with a bottom layer as an inverse index. And writing a program to access the two databases simultaneously, and realizing corresponding database storage and retrieval methods.
Step 2, a technical document Java technical specification needs to be input into an electronic document management system, and the specific content of the electronic document management system comprises 10 tens of thousands of words, wherein the following words are included: any magic value is not allowed to appear directly in the code. The method comprises the following steps:
(1) Firstly, confirming whether a table is built in a relational database, if not, firstly, building a directory table which is a self-association table, wherein the parent directory attribute of the table references the main key of the table; and then establishing an electronic document table, wherein the father directory attribute in the electronic document table is a foreign key which references the main key of the directory table.
(2) Determining the classification of the computer technology class, wherein the primary catalog is an industrial technology class, and the secondary catalog is a computer technology class; determining its name as "Java technical Specification"; uploading a file and obtaining 10-thousand words of full text content after file analysis; a globally unique ID is generated for the document using a snowflake algorithm.
(3) If the ID of the computer technology class in the directory table does not exist, the entry table is newly built in the directory table, the father directory of the directory is the industrial technology class, and if the industrial technology class directory does not exist, the industrial technology class directory needs to be newly built first. After obtaining the ID of the computer technology class directory, storing the ID of the Java technical specification, the title and the ID of a parent directory (computer technology class directory) of the Java technical specification into a forward index database; the ID, title and full text content of Java technical specifications are stored in an inverted index database. Thus, after the ID of Java technical specification is obtained from any type of database, the same document can be queried from another type of database.
And 3, when the electronic document of Java technical Specification needs to be found, different retrieval methods can be used for different scenes. Comprising the following aspects:
(1) If its specific category is clear, the document summary information can be found directly by "industrial technology class-computer technology class-Java technical specification" where it is retrieved by a relational database underlying the forward index search engine. The document summary information is found, so that the ID of the document summary information is obtained, and the full-text content of the document is obtained through the ID reverse index database.
(2) If the specific category of the electronic document is not clear, the related document can be obtained by searching Java or technical specification or magic value, then the document is found, and the document is searched by a database with the bottom layer being an inverted index, so that the searching of the title and the whole text is very time-consuming by a relational database under the condition of massive texts.
(3) If a document is not specifically found, but all documents related to a certain field are found, full text retrieval can be directly performed, for example, a code is input to find related documents in a programming field, and the code is contained in Java technical Specification, so that the document can be retrieved, and other documents containing the code in a title or content can be retrieved. This is retrieved in an inverted index database, which is very efficient in retrieving large amounts of text.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (6)

1. An electronic document management method integrating forward and reverse indexes is characterized by comprising the following steps:
step 1, selecting a database with a search engine as a forward index and a database with a search engine as an inverse index, and packaging a unified database API to fuse and connect the two databases;
step 2, when the electronic document is stored, storing structural data of the electronic document in a forward index database, storing text data of the electronic document in an backward index database, and correlating the data in the forward index database with the data in the backward index database through the ID of the electronic document;
and 3, searching in a forward index database according to the structural information of the document or realizing the full-text efficient retrieval of the document in an inverted index database through keywords according to different requirements when searching the document.
2. The method for managing electronic documents by fusing forward and reverse indexes as claimed in claim 1, wherein in the step 2, when storing the electronic documents, the structured data of the electronic documents are stored in the forward index database, the text data of the electronic documents are stored in the reverse index database, and the data in the forward index database and the data in the reverse index database are related to each other by the ID of the electronic documents, specifically comprising:
(1) Before entering data, initializing a search engine into a table structure of a database indexed in a forward direction, wherein the table structure comprises a directory table and an electronic document table, the directory table is a self-association table, and a parent directory attribute of the directory table references a main key of the table; the parent directory attribute of the electronic document table is an external key which references the main key of the directory table;
(2) Determining the category of a document to be stored, including a primary catalog, a secondary catalog and own names, uploading and analyzing the document, obtaining the title and the full text content, and generating a global ID for the document;
(3) Inquiring the ID of the direct father catalog of the document in the catalog, if the ID does not exist, establishing relevant catalog data in the catalog, and inputting the ID, the title and the ID of the father catalog of the document into a forward index database; the ID, title and full text content of the document are segmented and then input into an inverted index database, so that the data in the two databases are correlated through the ID of the electronic document.
3. The method for managing electronic documents by fusing forward and backward indexes as claimed in claim 1, wherein step 3, when searching documents, searching is performed in a forward index database through structural information of the documents or the full text efficient search of the documents is realized in the backward index database through keywords according to different requirements, and the specific method comprises the following steps:
(1) If the specific name and the category information of the file are determined, the document is found in a first-level manner according to the category of the file, namely, the document is searched in a forward index database;
(2) If the specific name and category information of the document are not determined, the document is retrieved through the inverted index database according to a certain keyword within the document.
4. An electronic document management system integrating forward and reverse indexes, wherein the electronic document management system integrating forward and reverse indexes is realized based on the electronic document management method integrating forward and reverse indexes as claimed in any one of claims 1 to 3.
5. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing electronic document management incorporating forward and reverse indexes based on the electronic document management method incorporating forward and reverse indexes of any one of claims 1 to 3 when the computer program is executed by the processor.
6. A computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements electronic document management incorporating forward and reverse indexes based on the electronic document management method incorporating forward and reverse indexes as set forth in any one of claims 1 to 3.
CN202211729747.1A 2022-08-29 2022-12-30 Electronic document management method integrating forward index and backward index Pending CN116186133A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211038247.3A CN115442242A (en) 2022-08-29 2022-08-29 Workflow arrangement system and method based on importance ordering
CN2022110382473 2022-08-29

Publications (1)

Publication Number Publication Date
CN116186133A true CN116186133A (en) 2023-05-30

Family

ID=84245199

Family Applications (5)

Application Number Title Priority Date Filing Date
CN202211038247.3A Pending CN115442242A (en) 2022-08-29 2022-08-29 Workflow arrangement system and method based on importance ordering
CN202211625573.4A Active CN115858168B (en) 2022-08-29 2022-12-16 Earth application model arrangement system and method based on importance ranking
CN202211731819.6A Active CN116127190B (en) 2022-08-29 2022-12-30 Digital earth resource recommendation system and method
CN202211729747.1A Pending CN116186133A (en) 2022-08-29 2022-12-30 Electronic document management method integrating forward index and backward index
CN202310035450.3A Pending CN116542326A (en) 2022-08-29 2023-01-10 Knowledge representation method and system based on time sequence convolution

Family Applications Before (3)

Application Number Title Priority Date Filing Date
CN202211038247.3A Pending CN115442242A (en) 2022-08-29 2022-08-29 Workflow arrangement system and method based on importance ordering
CN202211625573.4A Active CN115858168B (en) 2022-08-29 2022-12-16 Earth application model arrangement system and method based on importance ranking
CN202211731819.6A Active CN116127190B (en) 2022-08-29 2022-12-30 Digital earth resource recommendation system and method

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202310035450.3A Pending CN116542326A (en) 2022-08-29 2023-01-10 Knowledge representation method and system based on time sequence convolution

Country Status (1)

Country Link
CN (5) CN115442242A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117421487A (en) * 2023-12-19 2024-01-19 西安康奈网络科技有限公司 Multiple network information screening management system based on artificial intelligence

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117370424B (en) * 2023-12-07 2024-02-13 深圳市易图资讯股份有限公司 Mobile application comment data analysis mining method and system for economic information analysis

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110078516A1 (en) * 2009-09-28 2011-03-31 International Business Machines Corporation Method and a system for performing a two-phase commit protocol
CN109684537A (en) * 2018-10-29 2019-04-26 昆明理工大学 A kind of the knowledge resource intelligently pushing system and its method for pushing of Business Process-oriented
US11537446B2 (en) * 2019-08-14 2022-12-27 Microsoft Technology Licensing, Llc Orchestration and scheduling of services
CN110795219B (en) * 2019-10-24 2022-03-18 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Resource scheduling method and system suitable for multiple computing frameworks
CN111079009B (en) * 2019-12-11 2023-05-26 中国地质大学(武汉) User interest detection method and system for government map service
CN111752723B (en) * 2020-06-06 2021-05-04 中国科学院电子学研究所苏州研究院 Visual multi-source service management system and implementation method thereof
CN112463363B (en) * 2020-11-06 2022-08-26 苏州浪潮智能科技有限公司 Resource arranging method, device, equipment and storage medium
CN114138486B (en) * 2021-12-02 2024-03-26 中国人民解放军国防科技大学 Method, system and medium for arranging containerized micro-services for cloud edge heterogeneous environment
CN114422582B (en) * 2022-01-20 2023-05-16 中国科学院软件研究所 Dynamic service combination method and device for technological resources
CN114638021A (en) * 2022-03-18 2022-06-17 北京邮电大学 Internet of things lightweight block chain system security evaluation method
CN114756170B (en) * 2022-04-02 2023-03-24 苏州空天信息研究院 Storage isolation system and method for container application

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117421487A (en) * 2023-12-19 2024-01-19 西安康奈网络科技有限公司 Multiple network information screening management system based on artificial intelligence
CN117421487B (en) * 2023-12-19 2024-03-08 西安康奈网络科技有限公司 Multiple network information screening management system based on artificial intelligence

Also Published As

Publication number Publication date
CN116127190B (en) 2023-07-28
CN116542326A (en) 2023-08-04
CN115858168B (en) 2023-10-13
CN116127190A (en) 2023-05-16
CN115858168A (en) 2023-03-28
CN115442242A (en) 2022-12-06

Similar Documents

Publication Publication Date Title
US6898592B2 (en) Scoping queries in a search engine
US8051045B2 (en) Archive indexing engine
RU2409847C2 (en) Mapping system model to database object
JP6006267B2 (en) System and method for narrowing a search using index keys
US7228299B1 (en) System and method for performing file lookups based on tags
US7849063B2 (en) Systems and methods for indexing content for fast and scalable retrieval
CN116186133A (en) Electronic document management method integrating forward index and backward index
US7133867B2 (en) Text and attribute searches of data stores that include business objects
US9330178B2 (en) Search engine
US20050120004A1 (en) Systems and methods for indexing content for fast and scalable retrieval
US20150293958A1 (en) Scalable data structures
US20080114733A1 (en) User-structured data table indexing
US20050076018A1 (en) Sorting result buffer
CN105117433A (en) Method and system for statistically querying HBase based on analysis performed by Hive on HFile
US7333994B2 (en) System and method for database having relational node structure
Whang et al. Odysseus: a high-performance ORDBMS tightly-coupled with IR features
US7475059B2 (en) Adapting business objects for searches and searching adapted business objects
US20080177701A1 (en) System and method for searching a volume of files
US8495025B2 (en) Foldering by stable query
CN115080684B (en) Network disk document indexing method and device, network disk and storage medium
US7299224B2 (en) Method and infrastructure for processing queries in a database
US8805820B1 (en) Systems and methods for facilitating searches involving multiple indexes
CN113779068A (en) Data query method, device, equipment and storage medium
Watson et al. Exploring the design space of metadata-focused file management systems
KR20190089420A (en) Data construction and management system of sub index storage method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination