CN112579649A - Index technology-based K-V inversion retrieval method - Google Patents

Index technology-based K-V inversion retrieval method Download PDF

Info

Publication number
CN112579649A
CN112579649A CN202011550712.2A CN202011550712A CN112579649A CN 112579649 A CN112579649 A CN 112579649A CN 202011550712 A CN202011550712 A CN 202011550712A CN 112579649 A CN112579649 A CN 112579649A
Authority
CN
China
Prior art keywords
main body
index
identification code
unique identification
global unique
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011550712.2A
Other languages
Chinese (zh)
Inventor
周道华
李武鸿
杨陈
周涛
曾俊
黄泓蓓
黄维
伏彦林
刘杰
王小腊
洪江
彭容
罗玉
周林
张明娟
许江泽
吴婷婷
詹飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Zhongke Daqi Software Co ltd
Original Assignee
Chengdu Zhongke Daqi Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Zhongke Daqi Software Co ltd filed Critical Chengdu Zhongke Daqi Software Co ltd
Priority to CN202011550712.2A priority Critical patent/CN112579649A/en
Publication of CN112579649A publication Critical patent/CN112579649A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method for realizing K-V reversal retrieval based on an index technology, which comprises the steps of constructing index serial numbers corresponding to keywords in the same main body one by one in a database, wherein the keywords corresponding to different main bodies are completely the same, partially the same or completely different; establishing a main body global unique identification code which is uniquely corresponding to each index serial number corresponding to each key word in the same main body; the method comprises the steps of establishing an index relationship between a main body global unique identification code and a main body and an index relationship between the main body global unique identification code and each keyword of the main body, namely, one main body corresponds to one main body global unique identification code, indexing through the main body global unique identification code, retrieving a corresponding target main body and all keywords contained in the target main body, improving the text analysis efficiency, being capable of quickly obtaining the rough information of the target main body and being beneficial to the label analysis of the text.

Description

Index technology-based K-V inversion retrieval method
Technical Field
The invention relates to the field of retrieval, in particular to a method for realizing K-V inversion retrieval based on an indexing technology.
Background
In a relational database, an index is a single, physical storage structure that orders one or more columns of values in a database table, which is a collection of one or more columns of values in a table and a corresponding list of logical pointers to data pages in the table that physically identify the values. The index is equivalent to the directory of the book, and the required content can be quickly found according to the page number in the directory. The index provides pointers to data values stored in a specified column of the table, and then sorts these pointers according to the sorting order that you specify. The database uses the index to find a particular value and then follows the pointer to find the row containing that value. This allows SQL statements corresponding to tables to be executed faster and to quickly access specific information in the database tables. When a large number of records exist in a table, if the table is required to be queried, the first information searching mode is full-table searching, all records are taken out one by one and compared with query conditions one by one, and then records meeting the conditions are returned, so that a large amount of time of a database system is consumed, and a large amount of disk I/O operations are caused; the second is to build an index in the table, then find the index value meeting the query condition in the index, and finally quickly find the corresponding record in the table through the ROWID (equivalent to the page number) stored in the index.
In a K-V (key-value) database, the indexing technology can only realize one-way retrieval, that is, a target subject is retrieved according to keywords, but the target subject cannot be retrieved according to which keywords the target subject contains.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a method for realizing K-V reverse retrieval based on an index technology, wherein the same ID is inserted between an index key and a target main body, and during retrieval, bidirectional retrieval, namely reverse retrieval can be realized according to the ID, namely all keywords contained in the target main body can be obtained through the ID, so that the rough information of the target main body can be known.
The purpose of the invention is realized by the following technical scheme:
a method for realizing K-V inversion retrieval based on an index technology comprises the following steps:
s100: constructing database index relationships
Constructing index serial numbers corresponding to the keywords in the same main body one by one in a database, wherein the keywords corresponding to different main bodies have the conditions of being completely the same, partially the same or completely different;
establishing a main body global unique identification code which is uniquely corresponding to each index serial number corresponding to each key word in the same main body;
establishing an index relationship between the main body global unique identification code and the main body and an index relationship between the main body global unique identification code and each keyword of the main body, namely that one main body corresponds to one main body global unique identification code;
s200: forward indexing
Indexing through the keywords to retrieve the corresponding target main bodies, namely retrieving all the target main bodies containing the keywords;
s300: reverse index
And indexing through the global unique identification code of the main body, and retrieving the corresponding target main body and all the keywords contained in the target main body.
According to the scheme, a unique main body global unique identification code is added between the key words and the target main body, forward or reverse indexing can be achieved through the main body global unique identification code, the target main body can be indexed through the main body global unique identification code, and all key words corresponding to the target main body can also be indexed through the main body global unique identification code.
Further, there may be multiple index sequence numbers for the same key in the database.
Furthermore, each keyword in the database is stored in a cluster, that is, one keyword is a minimum storage unit.
Furthermore, the storage structure of the database adopts multilayer key-value, and the whole storage structure is a multilayer K-V storage structure, so that the relation between the key words, the main body and the unique identification codes exists in a multi-stage index no matter in a positive sequence or in a reverse sequence, and the retrieval efficiency of the data is improved step by utilizing the multi-stage index and the hash value.
Further, the subject global unique identification code is a hash string according to the designated content hash.
Furthermore, the index sequence number adopts a structure of an identification segment and a sequence number, wherein the identification segment is used for representing the corresponding main body, and the sequence number represents the sequence number of the keyword.
The invention has the beneficial effects that: according to the method and the device, a unique main body global unique identification code is added between the keyword and the target main body, forward or reverse indexing, especially reverse indexing can be achieved, analysis of the target main body, especially news text, can be improved, namely after the news text is obtained, keyword information of the news text can be rapidly obtained, and therefore operations such as label identification of the text are achieved.
Drawings
FIG. 1 is a schematic view of the present invention.
Detailed Description
The technical solution of the present invention is further described in detail with reference to the following specific examples, but the scope of the present invention is not limited to the following.
A method for realizing K-V inversion retrieval based on an index technology comprises the following steps:
s100: constructing database index relationships
Constructing index serial numbers corresponding to the keywords in the same main body one by one in a database, wherein the keywords corresponding to different main bodies have the conditions of being completely the same, partially the same or completely different;
establishing a main body global unique identification code which is uniquely corresponding to each index serial number corresponding to each key word in the same main body;
establishing an index relationship between the main body global unique identification code and the main body and an index relationship between the main body global unique identification code and each keyword of the main body, namely that one main body corresponds to one main body global unique identification code;
s200: forward indexing
Indexing through the keywords to retrieve the corresponding target main bodies, namely retrieving all the target main bodies containing the keywords;
s300: reverse index
And indexing through the global unique identification code of the main body, and retrieving the corresponding target main body and all the keywords contained in the target main body.
Optionally, in the K-V inversion retrieval method based on the index technology, a plurality of index serial numbers may exist in the same keyword in the database.
Optionally, an index-based method for implementing K-V inversion retrieval is used, where each keyword in the database is stored in a cluster manner, that is, one keyword is a minimum storage unit, so that the result obtained by the reverse index is data using each keyword as a unit, rather than a segment of text composed of each keyword.
Optionally, a method for implementing K-V inversion retrieval based on an index technology, where a storage structure of a database is implemented by using a multi-layer key-value, in other words, the method is implemented by using a database based on a key-value structure, and since the entire storage structure is a multi-layer K-V storage structure, no matter in a positive sequence or in an inversion, a relationship between a keyword and a main body, and a unique identification code is a multi-level index, and the retrieval efficiency of data is improved step by using the multi-level index and a hash value.
Optionally, in the K-V inversion retrieval method based on the index technology, the main body global unique identification code is a hash character string according to the designated content.
Optionally, an index technology-based method for implementing K-V inversion retrieval is provided, where the index sequence number adopts a structure of an identification segment and a sequence number, where the identification segment is used to represent a corresponding main body, and the sequence number represents a sequence number of the keyword.
Referring to fig. 1, the principle of the present invention is schematically illustrated, and the working principle is as follows:
firstly, an index relation table is created under a K-V database structure, an index ID is created in a one-to-one correspondence mode aiming at a first key phrase key group key _1-key _ n, namely Serl _1-Serl _ n, a global unique identification code Value3 is created aiming at Serl _1-Serl _ n, wherein the Serl _1-Serl _ n share the same Value3, and finally, an index relation between the Value3 and a pointing target file is created, wherein the index relations comprise forward indexes and reverse indexes;
similarly, a second key phrase key _1-key _ m is created, an index ID is created in a one-to-one correspondence manner, namely, Serl _1-Serl _ m, a globally unique identification code Value4 is created for the Serl _1-Serl _ m, wherein the Serl _1-Serl _ m share the same Value4, and finally, an index relationship between the Value4 and a pointed target file is created, wherein the index relationships all include forward indexes and reverse indexes;
in the forward index, the target file, that is, the path represented by the character identifier 1 in fig. 1, can be obtained by searching step by step according to the keywords;
in the reverse index, all keywords contained in the target file can be indexed according to the global unique identification code Value4 or Value 3.
The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and that various other combinations, modifications, and environments may be resorted to, falling within the scope of the concept as disclosed herein, either as described above or as apparent to those skilled in the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (6)

1. A method for realizing K-V inversion retrieval based on an index technology is characterized by comprising the following steps:
s100: constructing database index relationships
Constructing index serial numbers corresponding to the keywords in the same main body one by one in a database;
establishing a main body global unique identification code which is uniquely corresponding to each index serial number corresponding to each key word in the same main body;
establishing an index relationship between the main body global unique identification code and the main body and an index relationship between the main body global unique identification code and each keyword of the main body, namely that one main body corresponds to one main body global unique identification code;
s200: forward indexing
Indexing through the keywords to retrieve the corresponding target main bodies, namely retrieving all the target main bodies containing the keywords;
s300: reverse index
And indexing through the global unique identification code of the main body, and retrieving the corresponding target main body and all the keywords contained in the target main body.
2. The method of claim 1, wherein there may be multiple index sequence numbers for the same key in the database.
3. The method for realizing K-V inversion retrieval based on index technology as claimed in claim 2, wherein the database stores clusters of keywords, that is, one keyword is a minimum storage unit.
4. The method for realizing K-V inversion retrieval based on index technology as claimed in claim 3, wherein the storage structure of the database adopts multi-layer key-value.
5. The method for realizing K-V inversion retrieval based on index technology as claimed in claim 4, wherein the subject global unique identification code is hash string according to the specified content.
6. The method of claim 5, wherein the index sequence number has a structure of an identification segment and a sequence number, wherein the identification segment is used to represent the corresponding main body, and the sequence number represents the sequence number of the keyword.
CN202011550712.2A 2020-12-24 2020-12-24 Index technology-based K-V inversion retrieval method Pending CN112579649A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011550712.2A CN112579649A (en) 2020-12-24 2020-12-24 Index technology-based K-V inversion retrieval method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011550712.2A CN112579649A (en) 2020-12-24 2020-12-24 Index technology-based K-V inversion retrieval method

Publications (1)

Publication Number Publication Date
CN112579649A true CN112579649A (en) 2021-03-30

Family

ID=75139548

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011550712.2A Pending CN112579649A (en) 2020-12-24 2020-12-24 Index technology-based K-V inversion retrieval method

Country Status (1)

Country Link
CN (1) CN112579649A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1295295A (en) * 1999-11-04 2001-05-16 英业达集团(西安)电子技术有限公司 Word looking-up method for electronic dictionary with fast polling index structure
CN101072205A (en) * 2007-06-21 2007-11-14 腾讯科技(深圳)有限公司 Chat information searching method and system
US20130074148A1 (en) * 2010-05-20 2013-03-21 Oedses Klaas Van Megchelen Method and system for compiling a unique sample code for specific web content
CN103116586A (en) * 2011-11-17 2013-05-22 中国电信股份有限公司 Document reading achieving method, terminal, document conversion server and processing system
CN103699569A (en) * 2013-09-06 2014-04-02 安徽科大讯飞信息科技股份有限公司 Index structure and index method
CN104794123A (en) * 2014-01-20 2015-07-22 阿里巴巴集团控股有限公司 Method and device for establishing NoSQL database index for semi-structured data
CN110019647A (en) * 2017-10-25 2019-07-16 华为技术有限公司 A kind of keyword search methodology, device and search engine
CN110309146A (en) * 2019-05-09 2019-10-08 全知科技(杭州)有限责任公司 A kind of codomain data directory library method for building up for supporting two-way index

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1295295A (en) * 1999-11-04 2001-05-16 英业达集团(西安)电子技术有限公司 Word looking-up method for electronic dictionary with fast polling index structure
CN101072205A (en) * 2007-06-21 2007-11-14 腾讯科技(深圳)有限公司 Chat information searching method and system
US20130074148A1 (en) * 2010-05-20 2013-03-21 Oedses Klaas Van Megchelen Method and system for compiling a unique sample code for specific web content
CN103116586A (en) * 2011-11-17 2013-05-22 中国电信股份有限公司 Document reading achieving method, terminal, document conversion server and processing system
CN103699569A (en) * 2013-09-06 2014-04-02 安徽科大讯飞信息科技股份有限公司 Index structure and index method
CN104794123A (en) * 2014-01-20 2015-07-22 阿里巴巴集团控股有限公司 Method and device for establishing NoSQL database index for semi-structured data
CN110019647A (en) * 2017-10-25 2019-07-16 华为技术有限公司 A kind of keyword search methodology, device and search engine
CN110309146A (en) * 2019-05-09 2019-10-08 全知科技(杭州)有限责任公司 A kind of codomain data directory library method for building up for supporting two-way index

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
J GIRIDHARAN等: "Inverted index and interval lists for keyword search", 《2014 INTERNATIONAL CONFERENCE ON GREEN COMPUTING COMMUNICATION AND ELECTRICAL ENGINEERING (ICGCCEE)》 *
万里勇: "基于索引技术的XML查询优化研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
生有涯,知无涯: "什么是正向索引、反向索引(倒排索引)?", 《CSDN(HTTPS://BLOG.CSDN.NET/QQ_38923792/ARTICLE/DETAILS/104027587)》 *

Similar Documents

Publication Publication Date Title
US11204905B2 (en) Trie-based indices for databases
US7356549B1 (en) System and method for cross-reference linking of local partitioned B-trees
US6968338B1 (en) Extensible database framework for management of unstructured and semi-structured documents
US6266660B1 (en) Secondary index search
US20120131022A1 (en) Methods and systems for merging data sets
EP1999565A2 (en) Hyperspace index
CN100433019C (en) Data storage and retrieving method and system
US7363284B1 (en) System and method for building a balanced B-tree
CN101493824A (en) Data retrieval method and device for database
US20120265765A1 (en) Self-indexer and self indexing system
CN111522820A (en) Data storage structure, storage retrieval method, system, device and storage medium
Nørvåg Supporting temporal text-containment queries in temporal document databases
CN112579649A (en) Index technology-based K-V inversion retrieval method
Putz Using a relational database for an inverted text index
US7870138B2 (en) File storage and retrieval method
CN1287316C (en) Method and system for compressing column becoming longer in period of indexing high key code generation
Nørvåg Space-efficient support for temporal text indexing in a document archive context
Tsuchida et al. Implementing vertical splitting for large scale multidimensional datasets and its evaluations
Phanluong A simple and efficient method for computing data cubes
WO2001025962A1 (en) Database organization for increasing performance by splitting tables
Mertens A low-resource approach to SemTab 2022
Praveena et al. IndexingStrategies for Performance Optimization of Relational Databases
CN117131012B (en) Sustainable and extensible lightweight multi-version ordered key value storage system
US20230161744A1 (en) Method of processing data to be written to a database
CN108959308A (en) A kind of reply can supplemental data indexing means

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210330

RJ01 Rejection of invention patent application after publication