CN101782922A - Multi-level bucket hashing index method for searching mass data - Google Patents

Multi-level bucket hashing index method for searching mass data Download PDF

Info

Publication number
CN101782922A
CN101782922A CN200910256103A CN200910256103A CN101782922A CN 101782922 A CN101782922 A CN 101782922A CN 200910256103 A CN200910256103 A CN 200910256103A CN 200910256103 A CN200910256103 A CN 200910256103A CN 101782922 A CN101782922 A CN 101782922A
Authority
CN
China
Prior art keywords
bucket
index
disk
retrieval
bytes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN200910256103A
Other languages
Chinese (zh)
Other versions
CN101782922B (en
Inventor
王希常
马磊
刘江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANDONG SHANDA OUMA SOFTWARE CO., LTD.
Original Assignee
SHANDONG SHANDA OUMA SOFTWARE CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANDONG SHANDA OUMA SOFTWARE CO Ltd filed Critical SHANDONG SHANDA OUMA SOFTWARE CO Ltd
Priority to CN2009102561033A priority Critical patent/CN101782922B/en
Publication of CN101782922A publication Critical patent/CN101782922A/en
Application granted granted Critical
Publication of CN101782922B publication Critical patent/CN101782922B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a multi-level bucket hashing index method for searching mass data, which relates to the field of the mass data storage, and is characterized in that: (1) a bucket mapping table is provided for reducing the occupation space of the hashing index document on the disk; (2) hashing index adopts multi-level bucket, the size of the disk blocks is integral multiple of that of sectors, each disk block is provided with one or more basic buckets and can be provided with within-block overflow bucket and can be provided with overall overflow buckets; (3) a data caching structure for providing the index documents provides data caching mapping tables, and the management of the data caching is realized by a double-linked list on the caching mapping table. The occupation space of the index document on the disk is reduced through the mapping table; and the area of the disk blocks is integral multiple of that of the disk sectors, the disk reading-writing times can be reduced through the data caching structure, and the internal memory utilization efficiency and the searching efficiency of the data are improved.

Description

A kind of multi-level bucket hash indexing method towards searching mass data
Technical field
The present invention relates to a kind of multi-level bucket hash indexing method, belong to data storage, retrieval technique field towards searching mass data.
Background technology
Recall precision is an important indicator of mass data storage, service application system, index technology has important effect in data space tissue and retrieval, large database and data-storage applications system all support the hashed table index technology at present, Data Source increases rapidly, how fast to obtain information of interest exactly, become the subject matter that people pay close attention to, therefore characteristics such as magnanimity are had higher requirement to retrieval technique, various information retrieval, filtration, extractive technique become the emphasis of research gradually.A very important advantage of hash index is that recall precision does not increase with the growth of data volume, and the principal element that influences the Hash performance is disk read-write number of times and hash-collision problem.Hash index mainly contains dual mode at present, static hash index and dynamic hash index.
Summary of the invention
At the deficiencies in the prior art, the invention provides a kind of multi-level bucket hash indexing method towards searching mass data.
A kind of multi-level bucket hash indexing method towards searching mass data comprises the creation method and the search method of hash index, and the creation method of hash index is as follows:
1) information of creating index is determined a key word;
2) in calculator memory, set up the mapping table of index bucket, i.e. the cryptographic hash h of key word and the index bucket memory location c on disk;
3) judge that the index bucket whether on disk, judges promptly whether the value of memory location equals the maximal value of 8 bytes; If equal the maximal value of 8 bytes, the index bucket of not stored on the disk is described, continue step 4); If be not equal to the maximal value of 8 bytes, the existing index bucket of having stored on the disk is described, change step (7);
During the index bucket 4) do not stored on the disk, on disk, create a new disk block d and a canned data, set up a new index bucket, determine the new sequence number of index bucket in disk block d;
5) upgrade mapping table, make c=d;
6) upgrade disk, repeatedly storage;
When 7) having the index bucket of having stored on the disk, determine the sequence number of this index bucket in disk block;
8) judge that whether this index bucket has enough new key words of space storage, if enough spaces are arranged, changes step (6); If there are not enough spaces, key word overflows at this index bucket, stores overflow bucket in the disk block into; If overflow bucket does not have enough spaces yet in the disk block, key word overflow bucket in disk block overflows, and stores overall overflow bucket into.
The search method of hash index is as follows:
1) information for the treatment of search index is determined a key word;
2) read mapping table;
3) judge that index bucket to be retrieved whether on disk, judges promptly whether the value of memory location equals the maximal value of 8 bytes; If equal the maximal value of 8 bytes, the index bucket of not stored on the disk to be retrieved is described, retrieval finishes; If be not equal to the maximal value of 8 bytes, the existing index bucket of having stored to be retrieved on the disk is described, change step (4);
4), obtain the disk block number of index barrel number to be retrieved and this index bucket place disk block in the mapping table if be not equal to the maximal value of 8 bytes;
5) retrieval in the bucket, if retrieve, then retrieval finishes; If retrieval is less than, overflow bucket retrieval in disk block;
6) overflow bucket retrieves in disk block, and retrieval finishes; If in disk block overflow bucket retrieval less than, then in overall overflow bucket retrieval, retrieval finishes.
When storage and retrieval mass data, index file itself is bigger, and hash index takes up room bigger, in order to reduce the hash index file as far as possible, improve dusk utilization and file and read performance, the invention provides a barrel mapping table, avoided the empty bucket in the hash index file; During data retrieval, read number of times, the invention provides cache management, improved the utilization factor of internal memory, when the data of bucket are in internal memory,, avoided the read operation of disk directly from interior access data in order to reduce disk; In order to reduce the performance decline that hash-collision causes, the invention provides based on overflow bucket and overall overflow bucket in the disk block structured piece, having reduced the disk read-write operation that conflict causes, improved efficient, experimental results show that the present invention has very high practical value.
The present invention can make full use of disk and internal memory, and reduces the disk read-write number of times, improves mass data storage, recall precision.
Description of drawings
Fig. 1 is the index creation process flow diagram.
Fig. 2 is the indexed search process flow diagram.
Embodiment:
Embodiment:
A kind of multi-level bucket hash indexing method towards searching mass data comprises the creation method and the search method of hash index, and the creation method of hash index is as follows:
1) information of creating index is determined a key word;
2) in calculator memory, set up the mapping table of index bucket, i.e. the cryptographic hash h of key word and the index bucket memory location c on disk;
3) judge that the index bucket whether on disk, judges promptly whether the value of memory location equals the maximal value of 8 bytes; If equal the maximal value of 8 bytes, the index bucket of not stored on the disk is described, continue step 4); If be not equal to the maximal value of 8 bytes, the existing index bucket of having stored on the disk is described, change step (7);
During the index bucket 4) do not stored on the disk, on disk, create a new disk block d and a canned data, set up a new index bucket, determine the new sequence number of index bucket in disk block d;
5) upgrade mapping table, make c=d;
6) upgrade disk, repeatedly storage;
When 7) having the index bucket of having stored on the disk, determine the sequence number of this index bucket in disk block;
8) judge that whether this index bucket has enough new key words of space storage, if enough spaces are arranged, changes step (6); If there are not enough spaces, key word overflows at this index bucket, stores overflow bucket in the disk block into; If overflow bucket does not have enough spaces yet in the disk block, key word overflow bucket in disk block overflows, and stores overall overflow bucket into.
The search method of hash index is as follows:
1) information for the treatment of search index is determined a key word;
2) read mapping table;
3) judge that index bucket to be retrieved whether on disk, judges promptly whether the value of memory location equals the maximal value of 8 bytes; If equal the maximal value of 8 bytes, the index bucket of not stored on the disk to be retrieved is described, retrieval finishes; If be not equal to the maximal value of 8 bytes, the existing index bucket of having stored to be retrieved on the disk is described, change step (4);
4), obtain the disk block number of index barrel number to be retrieved and this index bucket place disk block in the mapping table if be not equal to the maximal value of 8 bytes;
5) retrieval in the bucket, if retrieve, then retrieval finishes; If retrieval is less than, overflow bucket retrieval in disk block;
6) overflow bucket retrieves in disk block, and retrieval finishes; If in disk block overflow bucket retrieval less than, then in overall overflow bucket retrieval, retrieval finishes.

Claims (1)

1. the multi-level bucket hash indexing method towards searching mass data is characterized in that, method comprises the creation method and the search method of hash index, and the creation method of hash index is as follows:
1) information of creating index is determined a key word;
2) in calculator memory, set up the mapping table of index bucket, i.e. the cryptographic hash h of key word and the index bucket memory location c on disk;
3) judge that the index bucket whether on disk, judges promptly whether the value of memory location equals the maximal value of 8 bytes; If equal the maximal value of 8 bytes, the index bucket of not stored on the disk is described, continue step 4); If be not equal to the maximal value of 8 bytes, the existing index bucket of having stored on the disk is described, change step (7);
During the index bucket 4) do not stored on the disk, on disk, create a new disk block d and a canned data, set up a new index bucket, determine the new sequence number of index bucket in disk block d;
5) upgrade mapping table, make c=d;
6) upgrade disk, repeatedly storage;
When 7) having the index bucket of having stored on the disk, determine the sequence number of this index bucket in disk block;
8) judge that whether this index bucket has enough new key words of space storage, if enough spaces are arranged, changes step (6); If there are not enough spaces, key word overflows at this index bucket, stores overflow bucket in the disk block into; If overflow bucket does not have enough spaces yet in the disk block, key word overflow bucket in disk block overflows, and stores overall overflow bucket into.
The search method of hash index is as follows:
1) information for the treatment of search index is determined a key word;
2) read mapping table;
3) judge that index bucket to be retrieved whether on disk, judges promptly whether the value of memory location equals the maximal value of 8 bytes; If equal the maximal value of 8 bytes, the index bucket of not stored on the disk to be retrieved is described, retrieval finishes; If be not equal to the maximal value of 8 bytes, the existing index bucket of having stored to be retrieved on the disk is described, change step (4);
4), obtain the disk block number of index barrel number to be retrieved and this index bucket place disk block in the mapping table if be not equal to the maximal value of 8 bytes;
5) retrieval in the bucket, if retrieve, then retrieval finishes; If retrieval is less than, overflow bucket retrieval in disk block;
6) overflow bucket retrieves in disk block, and retrieval finishes; If in disk block overflow bucket retrieval less than, then in overall overflow bucket retrieval, retrieval finishes.
CN2009102561033A 2009-12-29 2009-12-29 Multi-level bucket hashing index method for searching mass data Active CN101782922B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009102561033A CN101782922B (en) 2009-12-29 2009-12-29 Multi-level bucket hashing index method for searching mass data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009102561033A CN101782922B (en) 2009-12-29 2009-12-29 Multi-level bucket hashing index method for searching mass data

Publications (2)

Publication Number Publication Date
CN101782922A true CN101782922A (en) 2010-07-21
CN101782922B CN101782922B (en) 2012-01-18

Family

ID=42522921

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009102561033A Active CN101782922B (en) 2009-12-29 2009-12-29 Multi-level bucket hashing index method for searching mass data

Country Status (1)

Country Link
CN (1) CN101782922B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101937474A (en) * 2010-10-14 2011-01-05 广州从兴电子开发有限公司 Mass data query method and device
CN102413067A (en) * 2010-09-28 2012-04-11 微软公司 Techniques to support large numbers of subscribers to a real-time event
CN102779180A (en) * 2012-06-29 2012-11-14 华为技术有限公司 Operation processing method of data storage system and data storage system
CN102890675A (en) * 2011-07-18 2013-01-23 阿里巴巴集团控股有限公司 Method and device for storing and finding data
WO2014177080A1 (en) * 2013-11-06 2014-11-06 中兴通讯股份有限公司 Method and device for processing resource object storage
CN104182409A (en) * 2013-05-24 2014-12-03 腾讯科技(深圳)有限公司 Method and device for optimizing multi-order hash
CN105320775A (en) * 2015-11-11 2016-02-10 中科曙光信息技术无锡有限公司 Data access method and apparatus
WO2016086580A1 (en) * 2014-12-04 2016-06-09 中兴通讯股份有限公司 Method and apparatus for analyzing user behaviors
CN105975587A (en) * 2016-05-05 2016-09-28 诸葛晴凤 Method for organizing and accessing memory database index with high performance
CN108255958A (en) * 2017-12-21 2018-07-06 百度在线网络技术(北京)有限公司 Data query method, apparatus and storage medium
CN108572958A (en) * 2017-03-07 2018-09-25 腾讯科技(深圳)有限公司 Data processing method and device
CN111338569A (en) * 2020-02-16 2020-06-26 西安奥卡云数据科技有限公司 Object storage back-end optimization method based on direct mapping
CN112612419A (en) * 2020-12-25 2021-04-06 西安交通大学 Data storage structure, storage method, reading method, equipment and medium of NVM (non-volatile memory)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100468400C (en) * 2005-09-30 2009-03-11 腾讯科技(深圳)有限公司 Method and system for improving information search speed
CN101359325B (en) * 2007-08-01 2010-06-16 北京启明星辰信息技术股份有限公司 Multi-key-word matching method for rapidly analyzing content
CN101464901B (en) * 2009-01-16 2012-03-21 华中科技大学 Object search method in object storage device

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102413067A (en) * 2010-09-28 2012-04-11 微软公司 Techniques to support large numbers of subscribers to a real-time event
CN102413067B (en) * 2010-09-28 2014-11-12 微软公司 Method and device for supporting large numbers of subscribers to a real-time event
CN101937474A (en) * 2010-10-14 2011-01-05 广州从兴电子开发有限公司 Mass data query method and device
CN102890675A (en) * 2011-07-18 2013-01-23 阿里巴巴集团控股有限公司 Method and device for storing and finding data
CN102890675B (en) * 2011-07-18 2015-05-13 阿里巴巴集团控股有限公司 Method and device for storing and finding data
CN102779180A (en) * 2012-06-29 2012-11-14 华为技术有限公司 Operation processing method of data storage system and data storage system
CN102779180B (en) * 2012-06-29 2015-09-09 华为技术有限公司 The operation processing method of data-storage system, data-storage system
CN104182409B (en) * 2013-05-24 2018-01-19 腾讯科技(深圳)有限公司 A kind of method and device optimized to multistage Hash
CN104182409A (en) * 2013-05-24 2014-12-03 腾讯科技(深圳)有限公司 Method and device for optimizing multi-order hash
WO2014177080A1 (en) * 2013-11-06 2014-11-06 中兴通讯股份有限公司 Method and device for processing resource object storage
CN104639570A (en) * 2013-11-06 2015-05-20 南京中兴新软件有限责任公司 Resource object storage processing method and device
WO2016086580A1 (en) * 2014-12-04 2016-06-09 中兴通讯股份有限公司 Method and apparatus for analyzing user behaviors
CN105320775A (en) * 2015-11-11 2016-02-10 中科曙光信息技术无锡有限公司 Data access method and apparatus
CN105320775B (en) * 2015-11-11 2019-05-14 中科曙光信息技术无锡有限公司 The access method and device of data
CN105975587A (en) * 2016-05-05 2016-09-28 诸葛晴凤 Method for organizing and accessing memory database index with high performance
CN105975587B (en) * 2016-05-05 2019-05-10 诸葛晴凤 A kind of high performance memory database index organization and access method
CN108572958A (en) * 2017-03-07 2018-09-25 腾讯科技(深圳)有限公司 Data processing method and device
CN108255958A (en) * 2017-12-21 2018-07-06 百度在线网络技术(北京)有限公司 Data query method, apparatus and storage medium
CN111338569A (en) * 2020-02-16 2020-06-26 西安奥卡云数据科技有限公司 Object storage back-end optimization method based on direct mapping
CN112612419A (en) * 2020-12-25 2021-04-06 西安交通大学 Data storage structure, storage method, reading method, equipment and medium of NVM (non-volatile memory)
CN112612419B (en) * 2020-12-25 2022-10-25 西安交通大学 Data storage structure, storage method, reading method, device and medium of NVM (non-volatile memory)

Also Published As

Publication number Publication date
CN101782922B (en) 2012-01-18

Similar Documents

Publication Publication Date Title
CN101782922B (en) Multi-level bucket hashing index method for searching mass data
CN110825748B (en) High-performance and easily-expandable key value storage method by utilizing differentiated indexing mechanism
US9047301B2 (en) Method for optimizing the memory usage and performance of data deduplication storage systems
CN102024047B (en) Data searching method and device thereof
US7689574B2 (en) Index and method for extending and querying index
Teng et al. LSbM-tree: Re-enabling buffer caching in data management for mixed reads and writes
CN113821171B (en) Key value storage method based on hash table and LSM tree
CN105468642A (en) Data storage method and apparatus
CN101464901B (en) Object search method in object storage device
WO2013152678A1 (en) Method and device for metadata query
CN105117417A (en) Read-optimized memory database Trie tree index method
WO2010035124A1 (en) File system for storage device which uses different cluster sizes
CN102622434A (en) Data storage method, data searching method and device
CN109388341A (en) A kind of system storage optimization method based on Device Mapper
US20140222777A1 (en) Relating to use of columnar databases
CN107766258B (en) Memory storage method and device and memory query method and device
KR20120103095A (en) Memory system and memory mapping method thereof
CN109299143B (en) Knowledge fast indexing method of data interoperation test knowledge base based on Redis cache
CN103399915A (en) Optimal reading method for index file of search engine
CN109213760B (en) High-load service storage and retrieval method for non-relational data storage
CN107273443B (en) Mixed indexing method based on metadata of big data model
CN118152434A (en) Data management method and computing device
CN103853772B (en) High-efficiency reverse index organizing method
CN103902693A (en) Method of read-optimized memory database T-tree index structure
CN113535092B (en) Storage engine, method and readable medium for reducing memory metadata

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C56 Change in the name or address of the patentee
CP03 Change of name, title or address

Address after: 250101, Bole Road, hi tech Zone, Shandong, Ji'nan, 128

Patentee after: SHANDONG SHANDA OUMA SOFTWARE CO., LTD.

Address before: Tianchen Avenue high tech Zone of Ji'nan City, Shandong Province, No. 1318 250101

Patentee before: Shandong Shanda Ouma Software Co., Ltd.