CN112579649A

CN112579649A - Index technology-based K-V inversion retrieval method

Info

Publication number: CN112579649A
Application number: CN202011550712.2A
Authority: CN
Inventors: 周道华; 李武鸿; 杨陈; 周涛; 曾俊; 黄泓蓓; 黄维; 伏彦林; 刘杰; 王小腊; 洪江; 彭容; 罗玉; 周林; 张明娟; 许江泽; 吴婷婷; 詹飞
Original assignee: Chengdu Zhongke Daqi Software Co ltd
Current assignee: Chengdu Zhongke Daqi Software Co ltd
Priority date: 2020-12-24
Filing date: 2020-12-24
Publication date: 2021-03-30

Abstract

The invention relates to a method for realizing K-V reversal retrieval based on an index technology, which comprises the steps of constructing index serial numbers corresponding to keywords in the same main body one by one in a database, wherein the keywords corresponding to different main bodies are completely the same, partially the same or completely different; establishing a main body global unique identification code which is uniquely corresponding to each index serial number corresponding to each key word in the same main body; the method comprises the steps of establishing an index relationship between a main body global unique identification code and a main body and an index relationship between the main body global unique identification code and each keyword of the main body, namely, one main body corresponds to one main body global unique identification code, indexing through the main body global unique identification code, retrieving a corresponding target main body and all keywords contained in the target main body, improving the text analysis efficiency, being capable of quickly obtaining the rough information of the target main body and being beneficial to the label analysis of the text.

Description

Index technology-based K-V inversion retrieval method

Technical Field

The invention relates to the field of retrieval, in particular to a method for realizing K-V inversion retrieval based on an indexing technology.

Background

In a relational database, an index is a single, physical storage structure that orders one or more columns of values in a database table, which is a collection of one or more columns of values in a table and a corresponding list of logical pointers to data pages in the table that physically identify the values. The index is equivalent to the directory of the book, and the required content can be quickly found according to the page number in the directory. The index provides pointers to data values stored in a specified column of the table, and then sorts these pointers according to the sorting order that you specify. The database uses the index to find a particular value and then follows the pointer to find the row containing that value. This allows SQL statements corresponding to tables to be executed faster and to quickly access specific information in the database tables. When a large number of records exist in a table, if the table is required to be queried, the first information searching mode is full-table searching, all records are taken out one by one and compared with query conditions one by one, and then records meeting the conditions are returned, so that a large amount of time of a database system is consumed, and a large amount of disk I/O operations are caused; the second is to build an index in the table, then find the index value meeting the query condition in the index, and finally quickly find the corresponding record in the table through the ROWID (equivalent to the page number) stored in the index.

In a K-V (key-value) database, the indexing technology can only realize one-way retrieval, that is, a target subject is retrieved according to keywords, but the target subject cannot be retrieved according to which keywords the target subject contains.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a method for realizing K-V reverse retrieval based on an index technology, wherein the same ID is inserted between an index key and a target main body, and during retrieval, bidirectional retrieval, namely reverse retrieval can be realized according to the ID, namely all keywords contained in the target main body can be obtained through the ID, so that the rough information of the target main body can be known.

The purpose of the invention is realized by the following technical scheme:

a method for realizing K-V inversion retrieval based on an index technology comprises the following steps:

s100: constructing database index relationships

Constructing index serial numbers corresponding to the keywords in the same main body one by one in a database, wherein the keywords corresponding to different main bodies have the conditions of being completely the same, partially the same or completely different;

establishing a main body global unique identification code which is uniquely corresponding to each index serial number corresponding to each key word in the same main body;

establishing an index relationship between the main body global unique identification code and the main body and an index relationship between the main body global unique identification code and each keyword of the main body, namely that one main body corresponds to one main body global unique identification code;

s200: forward indexing

Indexing through the keywords to retrieve the corresponding target main bodies, namely retrieving all the target main bodies containing the keywords;

s300: reverse index

And indexing through the global unique identification code of the main body, and retrieving the corresponding target main body and all the keywords contained in the target main body.

According to the scheme, a unique main body global unique identification code is added between the key words and the target main body, forward or reverse indexing can be achieved through the main body global unique identification code, the target main body can be indexed through the main body global unique identification code, and all key words corresponding to the target main body can also be indexed through the main body global unique identification code.

Further, there may be multiple index sequence numbers for the same key in the database.

Furthermore, each keyword in the database is stored in a cluster, that is, one keyword is a minimum storage unit.

Furthermore, the storage structure of the database adopts multilayer key-value, and the whole storage structure is a multilayer K-V storage structure, so that the relation between the key words, the main body and the unique identification codes exists in a multi-stage index no matter in a positive sequence or in a reverse sequence, and the retrieval efficiency of the data is improved step by utilizing the multi-stage index and the hash value.

Further, the subject global unique identification code is a hash string according to the designated content hash.

Furthermore, the index sequence number adopts a structure of an identification segment and a sequence number, wherein the identification segment is used for representing the corresponding main body, and the sequence number represents the sequence number of the keyword.

The invention has the beneficial effects that: according to the method and the device, a unique main body global unique identification code is added between the keyword and the target main body, forward or reverse indexing, especially reverse indexing can be achieved, analysis of the target main body, especially news text, can be improved, namely after the news text is obtained, keyword information of the news text can be rapidly obtained, and therefore operations such as label identification of the text are achieved.

Drawings

FIG. 1 is a schematic view of the present invention.

Detailed Description

The technical solution of the present invention is further described in detail with reference to the following specific examples, but the scope of the present invention is not limited to the following.

s100: constructing database index relationships

s200: forward indexing

s300: reverse index

Optionally, in the K-V inversion retrieval method based on the index technology, a plurality of index serial numbers may exist in the same keyword in the database.

Optionally, an index-based method for implementing K-V inversion retrieval is used, where each keyword in the database is stored in a cluster manner, that is, one keyword is a minimum storage unit, so that the result obtained by the reverse index is data using each keyword as a unit, rather than a segment of text composed of each keyword.

Optionally, a method for implementing K-V inversion retrieval based on an index technology, where a storage structure of a database is implemented by using a multi-layer key-value, in other words, the method is implemented by using a database based on a key-value structure, and since the entire storage structure is a multi-layer K-V storage structure, no matter in a positive sequence or in an inversion, a relationship between a keyword and a main body, and a unique identification code is a multi-level index, and the retrieval efficiency of data is improved step by using the multi-level index and a hash value.

Optionally, in the K-V inversion retrieval method based on the index technology, the main body global unique identification code is a hash character string according to the designated content.

Optionally, an index technology-based method for implementing K-V inversion retrieval is provided, where the index sequence number adopts a structure of an identification segment and a sequence number, where the identification segment is used to represent a corresponding main body, and the sequence number represents a sequence number of the keyword.

Referring to fig. 1, the principle of the present invention is schematically illustrated, and the working principle is as follows:

firstly, an index relation table is created under a K-V database structure, an index ID is created in a one-to-one correspondence mode aiming at a first key phrase key group key _1-key _ n, namely Serl _1-Serl _ n, a global unique identification code Value3 is created aiming at Serl _1-Serl _ n, wherein the Serl _1-Serl _ n share the same Value3, and finally, an index relation between the Value3 and a pointing target file is created, wherein the index relations comprise forward indexes and reverse indexes;

similarly, a second key phrase key _1-key _ m is created, an index ID is created in a one-to-one correspondence manner, namely, Serl _1-Serl _ m, a globally unique identification code Value4 is created for the Serl _1-Serl _ m, wherein the Serl _1-Serl _ m share the same Value4, and finally, an index relationship between the Value4 and a pointed target file is created, wherein the index relationships all include forward indexes and reverse indexes;

in the forward index, the target file, that is, the path represented by the character identifier 1 in fig. 1, can be obtained by searching step by step according to the keywords;

in the reverse index, all keywords contained in the target file can be indexed according to the global unique identification code Value4 or Value 3.

The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and that various other combinations, modifications, and environments may be resorted to, falling within the scope of the concept as disclosed herein, either as described above or as apparent to those skilled in the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method for realizing K-V inversion retrieval based on an index technology is characterized by comprising the following steps:

s100: constructing database index relationships

Constructing index serial numbers corresponding to the keywords in the same main body one by one in a database;

s200: forward indexing

s300: reverse index

2. The method of claim 1, wherein there may be multiple index sequence numbers for the same key in the database.

3. The method for realizing K-V inversion retrieval based on index technology as claimed in claim 2, wherein the database stores clusters of keywords, that is, one keyword is a minimum storage unit.

4. The method for realizing K-V inversion retrieval based on index technology as claimed in claim 3, wherein the storage structure of the database adopts multi-layer key-value.

5. The method for realizing K-V inversion retrieval based on index technology as claimed in claim 4, wherein the subject global unique identification code is hash string according to the specified content.

6. The method of claim 5, wherein the index sequence number has a structure of an identification segment and a sequence number, wherein the identification segment is used to represent the corresponding main body, and the sequence number represents the sequence number of the keyword.