CN112579649A - Index technology-based K-V inversion retrieval method - Google Patents
Index technology-based K-V inversion retrieval method Download PDFInfo
- Publication number
- CN112579649A CN112579649A CN202011550712.2A CN202011550712A CN112579649A CN 112579649 A CN112579649 A CN 112579649A CN 202011550712 A CN202011550712 A CN 202011550712A CN 112579649 A CN112579649 A CN 112579649A
- Authority
- CN
- China
- Prior art keywords
- main body
- index
- identification code
- unique identification
- global unique
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 21
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a method for realizing K-V reversal retrieval based on an index technology, which comprises the steps of constructing index serial numbers corresponding to keywords in the same main body one by one in a database, wherein the keywords corresponding to different main bodies are completely the same, partially the same or completely different; establishing a main body global unique identification code which is uniquely corresponding to each index serial number corresponding to each key word in the same main body; the method comprises the steps of establishing an index relationship between a main body global unique identification code and a main body and an index relationship between the main body global unique identification code and each keyword of the main body, namely, one main body corresponds to one main body global unique identification code, indexing through the main body global unique identification code, retrieving a corresponding target main body and all keywords contained in the target main body, improving the text analysis efficiency, being capable of quickly obtaining the rough information of the target main body and being beneficial to the label analysis of the text.
Description
Technical Field
The invention relates to the field of retrieval, in particular to a method for realizing K-V inversion retrieval based on an indexing technology.
Background
In a relational database, an index is a single, physical storage structure that orders one or more columns of values in a database table, which is a collection of one or more columns of values in a table and a corresponding list of logical pointers to data pages in the table that physically identify the values. The index is equivalent to the directory of the book, and the required content can be quickly found according to the page number in the directory. The index provides pointers to data values stored in a specified column of the table, and then sorts these pointers according to the sorting order that you specify. The database uses the index to find a particular value and then follows the pointer to find the row containing that value. This allows SQL statements corresponding to tables to be executed faster and to quickly access specific information in the database tables. When a large number of records exist in a table, if the table is required to be queried, the first information searching mode is full-table searching, all records are taken out one by one and compared with query conditions one by one, and then records meeting the conditions are returned, so that a large amount of time of a database system is consumed, and a large amount of disk I/O operations are caused; the second is to build an index in the table, then find the index value meeting the query condition in the index, and finally quickly find the corresponding record in the table through the ROWID (equivalent to the page number) stored in the index.
In a K-V (key-value) database, the indexing technology can only realize one-way retrieval, that is, a target subject is retrieved according to keywords, but the target subject cannot be retrieved according to which keywords the target subject contains.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a method for realizing K-V reverse retrieval based on an index technology, wherein the same ID is inserted between an index key and a target main body, and during retrieval, bidirectional retrieval, namely reverse retrieval can be realized according to the ID, namely all keywords contained in the target main body can be obtained through the ID, so that the rough information of the target main body can be known.
The purpose of the invention is realized by the following technical scheme:
a method for realizing K-V inversion retrieval based on an index technology comprises the following steps:
s100: constructing database index relationships
Constructing index serial numbers corresponding to the keywords in the same main body one by one in a database, wherein the keywords corresponding to different main bodies have the conditions of being completely the same, partially the same or completely different;
establishing a main body global unique identification code which is uniquely corresponding to each index serial number corresponding to each key word in the same main body;
establishing an index relationship between the main body global unique identification code and the main body and an index relationship between the main body global unique identification code and each keyword of the main body, namely that one main body corresponds to one main body global unique identification code;
s200: forward indexing
Indexing through the keywords to retrieve the corresponding target main bodies, namely retrieving all the target main bodies containing the keywords;
s300: reverse index
And indexing through the global unique identification code of the main body, and retrieving the corresponding target main body and all the keywords contained in the target main body.
According to the scheme, a unique main body global unique identification code is added between the key words and the target main body, forward or reverse indexing can be achieved through the main body global unique identification code, the target main body can be indexed through the main body global unique identification code, and all key words corresponding to the target main body can also be indexed through the main body global unique identification code.
Further, there may be multiple index sequence numbers for the same key in the database.
Furthermore, each keyword in the database is stored in a cluster, that is, one keyword is a minimum storage unit.
Furthermore, the storage structure of the database adopts multilayer key-value, and the whole storage structure is a multilayer K-V storage structure, so that the relation between the key words, the main body and the unique identification codes exists in a multi-stage index no matter in a positive sequence or in a reverse sequence, and the retrieval efficiency of the data is improved step by utilizing the multi-stage index and the hash value.
Further, the subject global unique identification code is a hash string according to the designated content hash.
Furthermore, the index sequence number adopts a structure of an identification segment and a sequence number, wherein the identification segment is used for representing the corresponding main body, and the sequence number represents the sequence number of the keyword.
The invention has the beneficial effects that: according to the method and the device, a unique main body global unique identification code is added between the keyword and the target main body, forward or reverse indexing, especially reverse indexing can be achieved, analysis of the target main body, especially news text, can be improved, namely after the news text is obtained, keyword information of the news text can be rapidly obtained, and therefore operations such as label identification of the text are achieved.
Drawings
FIG. 1 is a schematic view of the present invention.
Detailed Description
The technical solution of the present invention is further described in detail with reference to the following specific examples, but the scope of the present invention is not limited to the following.
A method for realizing K-V inversion retrieval based on an index technology comprises the following steps:
s100: constructing database index relationships
Constructing index serial numbers corresponding to the keywords in the same main body one by one in a database, wherein the keywords corresponding to different main bodies have the conditions of being completely the same, partially the same or completely different;
establishing a main body global unique identification code which is uniquely corresponding to each index serial number corresponding to each key word in the same main body;
establishing an index relationship between the main body global unique identification code and the main body and an index relationship between the main body global unique identification code and each keyword of the main body, namely that one main body corresponds to one main body global unique identification code;
s200: forward indexing
Indexing through the keywords to retrieve the corresponding target main bodies, namely retrieving all the target main bodies containing the keywords;
s300: reverse index
And indexing through the global unique identification code of the main body, and retrieving the corresponding target main body and all the keywords contained in the target main body.
Optionally, in the K-V inversion retrieval method based on the index technology, a plurality of index serial numbers may exist in the same keyword in the database.
Optionally, an index-based method for implementing K-V inversion retrieval is used, where each keyword in the database is stored in a cluster manner, that is, one keyword is a minimum storage unit, so that the result obtained by the reverse index is data using each keyword as a unit, rather than a segment of text composed of each keyword.
Optionally, a method for implementing K-V inversion retrieval based on an index technology, where a storage structure of a database is implemented by using a multi-layer key-value, in other words, the method is implemented by using a database based on a key-value structure, and since the entire storage structure is a multi-layer K-V storage structure, no matter in a positive sequence or in an inversion, a relationship between a keyword and a main body, and a unique identification code is a multi-level index, and the retrieval efficiency of data is improved step by using the multi-level index and a hash value.
Optionally, in the K-V inversion retrieval method based on the index technology, the main body global unique identification code is a hash character string according to the designated content.
Optionally, an index technology-based method for implementing K-V inversion retrieval is provided, where the index sequence number adopts a structure of an identification segment and a sequence number, where the identification segment is used to represent a corresponding main body, and the sequence number represents a sequence number of the keyword.
Referring to fig. 1, the principle of the present invention is schematically illustrated, and the working principle is as follows:
firstly, an index relation table is created under a K-V database structure, an index ID is created in a one-to-one correspondence mode aiming at a first key phrase key group key _1-key _ n, namely Serl _1-Serl _ n, a global unique identification code Value3 is created aiming at Serl _1-Serl _ n, wherein the Serl _1-Serl _ n share the same Value3, and finally, an index relation between the Value3 and a pointing target file is created, wherein the index relations comprise forward indexes and reverse indexes;
similarly, a second key phrase key _1-key _ m is created, an index ID is created in a one-to-one correspondence manner, namely, Serl _1-Serl _ m, a globally unique identification code Value4 is created for the Serl _1-Serl _ m, wherein the Serl _1-Serl _ m share the same Value4, and finally, an index relationship between the Value4 and a pointed target file is created, wherein the index relationships all include forward indexes and reverse indexes;
in the forward index, the target file, that is, the path represented by the character identifier 1 in fig. 1, can be obtained by searching step by step according to the keywords;
in the reverse index, all keywords contained in the target file can be indexed according to the global unique identification code Value4 or Value 3.
The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and that various other combinations, modifications, and environments may be resorted to, falling within the scope of the concept as disclosed herein, either as described above or as apparent to those skilled in the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (6)
1. A method for realizing K-V inversion retrieval based on an index technology is characterized by comprising the following steps:
s100: constructing database index relationships
Constructing index serial numbers corresponding to the keywords in the same main body one by one in a database;
establishing a main body global unique identification code which is uniquely corresponding to each index serial number corresponding to each key word in the same main body;
establishing an index relationship between the main body global unique identification code and the main body and an index relationship between the main body global unique identification code and each keyword of the main body, namely that one main body corresponds to one main body global unique identification code;
s200: forward indexing
Indexing through the keywords to retrieve the corresponding target main bodies, namely retrieving all the target main bodies containing the keywords;
s300: reverse index
And indexing through the global unique identification code of the main body, and retrieving the corresponding target main body and all the keywords contained in the target main body.
2. The method of claim 1, wherein there may be multiple index sequence numbers for the same key in the database.
3. The method for realizing K-V inversion retrieval based on index technology as claimed in claim 2, wherein the database stores clusters of keywords, that is, one keyword is a minimum storage unit.
4. The method for realizing K-V inversion retrieval based on index technology as claimed in claim 3, wherein the storage structure of the database adopts multi-layer key-value.
5. The method for realizing K-V inversion retrieval based on index technology as claimed in claim 4, wherein the subject global unique identification code is hash string according to the specified content.
6. The method of claim 5, wherein the index sequence number has a structure of an identification segment and a sequence number, wherein the identification segment is used to represent the corresponding main body, and the sequence number represents the sequence number of the keyword.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011550712.2A CN112579649A (en) | 2020-12-24 | 2020-12-24 | Index technology-based K-V inversion retrieval method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011550712.2A CN112579649A (en) | 2020-12-24 | 2020-12-24 | Index technology-based K-V inversion retrieval method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112579649A true CN112579649A (en) | 2021-03-30 |
Family
ID=75139548
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011550712.2A Pending CN112579649A (en) | 2020-12-24 | 2020-12-24 | Index technology-based K-V inversion retrieval method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112579649A (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1295295A (en) * | 1999-11-04 | 2001-05-16 | 英业达集团(西安)电子技术有限公司 | Word looking-up method for electronic dictionary with fast polling index structure |
CN101072205A (en) * | 2007-06-21 | 2007-11-14 | 腾讯科技(深圳)有限公司 | Chat information searching method and system |
US20130074148A1 (en) * | 2010-05-20 | 2013-03-21 | Oedses Klaas Van Megchelen | Method and system for compiling a unique sample code for specific web content |
CN103116586A (en) * | 2011-11-17 | 2013-05-22 | 中国电信股份有限公司 | Document reading achieving method, terminal, document conversion server and processing system |
CN103699569A (en) * | 2013-09-06 | 2014-04-02 | 安徽科大讯飞信息科技股份有限公司 | Index structure and index method |
CN104794123A (en) * | 2014-01-20 | 2015-07-22 | 阿里巴巴集团控股有限公司 | Method and device for establishing NoSQL database index for semi-structured data |
CN110019647A (en) * | 2017-10-25 | 2019-07-16 | 华为技术有限公司 | A kind of keyword search methodology, device and search engine |
CN110309146A (en) * | 2019-05-09 | 2019-10-08 | 全知科技(杭州)有限责任公司 | A kind of codomain data directory library method for building up for supporting two-way index |
-
2020
- 2020-12-24 CN CN202011550712.2A patent/CN112579649A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1295295A (en) * | 1999-11-04 | 2001-05-16 | 英业达集团(西安)电子技术有限公司 | Word looking-up method for electronic dictionary with fast polling index structure |
CN101072205A (en) * | 2007-06-21 | 2007-11-14 | 腾讯科技(深圳)有限公司 | Chat information searching method and system |
US20130074148A1 (en) * | 2010-05-20 | 2013-03-21 | Oedses Klaas Van Megchelen | Method and system for compiling a unique sample code for specific web content |
CN103116586A (en) * | 2011-11-17 | 2013-05-22 | 中国电信股份有限公司 | Document reading achieving method, terminal, document conversion server and processing system |
CN103699569A (en) * | 2013-09-06 | 2014-04-02 | 安徽科大讯飞信息科技股份有限公司 | Index structure and index method |
CN104794123A (en) * | 2014-01-20 | 2015-07-22 | 阿里巴巴集团控股有限公司 | Method and device for establishing NoSQL database index for semi-structured data |
CN110019647A (en) * | 2017-10-25 | 2019-07-16 | 华为技术有限公司 | A kind of keyword search methodology, device and search engine |
CN110309146A (en) * | 2019-05-09 | 2019-10-08 | 全知科技(杭州)有限责任公司 | A kind of codomain data directory library method for building up for supporting two-way index |
Non-Patent Citations (3)
Title |
---|
J GIRIDHARAN等: "Inverted index and interval lists for keyword search", 《2014 INTERNATIONAL CONFERENCE ON GREEN COMPUTING COMMUNICATION AND ELECTRICAL ENGINEERING (ICGCCEE)》 * |
万里勇: "基于索引技术的XML查询优化研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
生有涯,知无涯: "什么是正向索引、反向索引(倒排索引)?", 《CSDN(HTTPS://BLOG.CSDN.NET/QQ_38923792/ARTICLE/DETAILS/104027587)》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11204905B2 (en) | Trie-based indices for databases | |
US7356549B1 (en) | System and method for cross-reference linking of local partitioned B-trees | |
US6968338B1 (en) | Extensible database framework for management of unstructured and semi-structured documents | |
US6266660B1 (en) | Secondary index search | |
US20120131022A1 (en) | Methods and systems for merging data sets | |
EP1999565A2 (en) | Hyperspace index | |
CN100433019C (en) | Data storage and retrieving method and system | |
US7363284B1 (en) | System and method for building a balanced B-tree | |
CN101493824A (en) | Data retrieval method and device for database | |
US20120265765A1 (en) | Self-indexer and self indexing system | |
CN111522820A (en) | Data storage structure, storage retrieval method, system, device and storage medium | |
Nørvåg | Supporting temporal text-containment queries in temporal document databases | |
CN112579649A (en) | Index technology-based K-V inversion retrieval method | |
Putz | Using a relational database for an inverted text index | |
US7870138B2 (en) | File storage and retrieval method | |
CN1287316C (en) | Method and system for compressing column becoming longer in period of indexing high key code generation | |
Nørvåg | Space-efficient support for temporal text indexing in a document archive context | |
Tsuchida et al. | Implementing vertical splitting for large scale multidimensional datasets and its evaluations | |
Phanluong | A simple and efficient method for computing data cubes | |
WO2001025962A1 (en) | Database organization for increasing performance by splitting tables | |
Mertens | A low-resource approach to SemTab 2022 | |
Praveena et al. | IndexingStrategies for Performance Optimization of Relational Databases | |
CN117131012B (en) | Sustainable and extensible lightweight multi-version ordered key value storage system | |
US20230161744A1 (en) | Method of processing data to be written to a database | |
CN108959308A (en) | A kind of reply can supplemental data indexing means |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210330 |
|
RJ01 | Rejection of invention patent application after publication |