CN101782922B - Multi-level bucket hashing index method for searching mass data - Google Patents

Multi-level bucket hashing index method for searching mass data Download PDF

Info

Publication number
CN101782922B
CN101782922B CN2009102561033A CN200910256103A CN101782922B CN 101782922 B CN101782922 B CN 101782922B CN 2009102561033 A CN2009102561033 A CN 2009102561033A CN 200910256103 A CN200910256103 A CN 200910256103A CN 101782922 B CN101782922 B CN 101782922B
Authority
CN
China
Prior art keywords
bucket
index
disk
retrieval
bytes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN2009102561033A
Other languages
Chinese (zh)
Other versions
CN101782922A (en
Inventor
王希常
马磊
刘江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANDONG SHANDA OUMA SOFTWARE CO., LTD.
Original Assignee
SHANDONG SHANDA OUMA SOFTWARE CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANDONG SHANDA OUMA SOFTWARE CO Ltd filed Critical SHANDONG SHANDA OUMA SOFTWARE CO Ltd
Priority to CN2009102561033A priority Critical patent/CN101782922B/en
Publication of CN101782922A publication Critical patent/CN101782922A/en
Application granted granted Critical
Publication of CN101782922B publication Critical patent/CN101782922B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a multi-level bucket hashing index method for searching mass data, which relates to the field of the mass data storage, and is characterized in that: (1) a bucket mapping table is provided for reducing the occupation space of the hashing index document on the disk; (2) hashing index adopts multi-level bucket, the size of the disk blocks is integral multiple of that of sectors, each disk block is provided with one or more basic buckets and can be provided with within-block overflow bucket and can be provided with overall overflow buckets; (3) a data caching structure for providing the index documents provides data caching mapping tables, and the management of the data caching is realized by a double-linked list on the caching mapping table. The occupation space of the index document on the disk is reduced through the mapping table; and the area of the disk blocks is integral multiple of that of the disk sectors, the disk reading-writing times can be reduced through the data caching structure, and the internal memory utilization efficiency and the searching efficiency of the data are improved.

Description

A kind of multi-level bucket hash indexing method towards searching mass data
Technical field
The present invention relates to a kind of multi-level bucket hash indexing method, belong to data storage, retrieval technique field towards searching mass data.
Background technology
Recall precision is an important indicator of mass data storage, service application system; Index technology has important effect in data space tissue and retrieval; Large database and data-storage applications system all support the hashed table index technology at present; Data Source increases rapidly, how fast to obtain information of interest exactly, becomes the subject matter that people pay close attention to; Therefore characteristics such as magnanimity are had higher requirement to retrieval technique, various information retrievals, filtration, extractive technique become the emphasis of research gradually.A very important advantage of hash index is that recall precision does not increase with the growth of data volume, and the principal element that influences the Hash performance is disk read-write number of times and hash-collision problem.Hash index mainly contains dual mode at present, static hash index and dynamic hash index.
Summary of the invention
To the deficiency of prior art, the present invention provides a kind of multi-level bucket hash indexing method towards searching mass data.
A kind of multi-level bucket hash indexing method towards searching mass data comprises the creation method and the search method of hash index, and the creation method of hash index is following:
1) information of creating index is confirmed a key word;
2) in calculator memory, set up the mapping table of index bucket, i.e. the cryptographic hash h of key word and the index bucket memory location c on disk;
3) judge that the index bucket whether on disk, judges promptly whether the value of memory location equals the maximal value of 8 bytes; If equal the maximal value of 8 bytes, the index bucket of not stored on the disk is described, continue step 4); If be not equal to the maximal value of 8 bytes, the existing index bucket of having stored on the disk is described, change step (7);
During the index bucket 4) do not stored on the disk, on disk, create a new disk block d and a canned data, set up a new index bucket, confirm the new sequence number of index bucket in disk block d;
5) upgrade mapping table, make c=d;
6) upgrade disk, repeatedly storage;
When 7) having the index bucket of having stored on the disk, confirm the sequence number of this index bucket in disk block;
8) judge that whether this index bucket has enough new key words of space storage, if enough spaces are arranged, changes step (6); If there are not enough spaces, key word overflows at this index bucket, stores overflow bucket in the disk block into; If overflow bucket does not have enough spaces yet in the disk block, key word overflow bucket in disk block overflows, and stores overall overflow bucket into.
The search method of hash index is following:
1) information of treating search index is confirmed a key word;
2) read mapping table;
3) judge that index bucket to be retrieved whether on disk, judges promptly whether the value of memory location equals the maximal value of 8 bytes; If equal the maximal value of 8 bytes, the index bucket of not stored on the disk to be retrieved is described, retrieval finishes; If be not equal to the maximal value of 8 bytes, the existing index bucket of having stored to be retrieved on the disk is described, change step (4);
4), obtain the disk block number of index barrel number to be retrieved and this index bucket place disk block in the mapping table if be not equal to the maximal value of 8 bytes;
5) retrieval in the bucket, if retrieve, then retrieval finishes; If retrieval is less than, overflow bucket retrieval in disk block;
6) overflow bucket retrieves in disk block, and retrieval finishes; If in disk block overflow bucket retrieval less than, then in overall overflow bucket retrieval, retrieval finishes.
When storage and retrieval mass data, index file itself is bigger, and hash index takes up room bigger; In order to reduce the hash index file as far as possible; Improve dusk utilization and file read performance, the invention provides a barrel mapping table, avoided the empty bucket in the hash index file; During data retrieval, read number of times, the invention provides cache management, improved the utilization factor of internal memory, when the data of bucket are in internal memory,, avoided the read operation of disk directly from interior access data in order to reduce disk; In order to reduce the performance decline that hash-collision causes, the invention provides based on overflow bucket and overall overflow bucket in the disk block structured piece, having reduced the disk read-write operation that conflict causes, improved efficient, the present invention of experiment proof has very high practical value.
The present invention can make full use of disk and internal memory, and reduces the disk read-write number of times, improves mass data storage, recall precision.
Description of drawings
Fig. 1 is the index creation process flow diagram.
Fig. 2 is the indexed search process flow diagram.
Embodiment:
Embodiment:
A kind of multi-level bucket hash indexing method towards searching mass data comprises the creation method and the search method of hash index, and the creation method of hash index is following:
1) information of creating index is confirmed a key word;
2) in calculator memory, set up the mapping table of index bucket, i.e. the cryptographic hash h of key word and the index bucket memory location c on disk;
3) judge that the index bucket whether on disk, judges promptly whether the value of memory location equals the maximal value of 8 bytes; If equal the maximal value of 8 bytes, the index bucket of not stored on the disk is described, continue step 4); If be not equal to the maximal value of 8 bytes, the existing index bucket of having stored on the disk is described, change step (7);
During the index bucket 4) do not stored on the disk, on disk, create a new disk block d and a canned data, set up a new index bucket, confirm the new sequence number of index bucket in disk block d;
5) upgrade mapping table, make c=d;
6) upgrade disk, repeatedly storage;
When 7) having the index bucket of having stored on the disk, confirm the sequence number of this index bucket in disk block;
8) judge that whether this index bucket has enough new key words of space storage, if enough spaces are arranged, changes step (6); If there are not enough spaces, key word overflows at this index bucket, stores overflow bucket in the disk block into; If overflow bucket does not have enough spaces yet in the disk block, key word overflow bucket in disk block overflows, and stores overall overflow bucket into.
The search method of hash index is following:
1) information of treating search index is confirmed a key word;
2) read mapping table;
3) judge that index bucket to be retrieved whether on disk, judges promptly whether the value of memory location equals the maximal value of 8 bytes; If equal the maximal value of 8 bytes, the index bucket of not stored on the disk to be retrieved is described, retrieval finishes; If be not equal to the maximal value of 8 bytes, the existing index bucket of having stored to be retrieved on the disk is described, change step (4);
4), obtain the disk block number of index barrel number to be retrieved and this index bucket place disk block in the mapping table if be not equal to the maximal value of 8 bytes;
5) retrieval in the bucket, if retrieve, then retrieval finishes; If retrieval is less than, overflow bucket retrieval in disk block;
6) overflow bucket retrieves in disk block, and retrieval finishes; If in disk block overflow bucket retrieval less than, then in overall overflow bucket retrieval, retrieval finishes.

Claims (1)

1. the multi-level bucket hash indexing method towards searching mass data is characterized in that, method comprises the creation method and the search method of hash index, and the creation method of hash index is following:
1) information of creating index is confirmed a key word;
2) in calculator memory, set up the mapping table of index bucket, i.e. the cryptographic hash h of key word and the index bucket memory location c on disk;
3) judge that the index bucket whether on disk, judges promptly whether the value of memory location equals the maximal value of 8 bytes; If equal the maximal value of 8 bytes, the index bucket of not stored on the disk is described, continue step 4); If be not equal to the maximal value of 8 bytes, the existing index bucket of having stored on the disk is described, change step (7);
During the index bucket 4) do not stored on the disk, on disk, create a new disk block d and a canned data, set up a new index bucket, confirm the new sequence number of index bucket in disk block d;
5) upgrade mapping table, make c=d;
6) upgrade disk, repeatedly storage;
When 7) having the index bucket of having stored on the disk, confirm the sequence number of this index bucket in disk block;
8) judge that whether this index bucket has enough new key words of space storage, if enough spaces are arranged, changes step (6); If there are not enough spaces, key word overflows at this index bucket, stores overflow bucket in the disk block into; If overflow bucket does not have enough spaces yet in the disk block, key word overflow bucket in disk block overflows, and stores overall overflow bucket into;
The search method of hash index is following:
1) information of treating search index is confirmed a key word;
2) read mapping table;
3) judge that index bucket to be retrieved whether on disk, judges promptly whether the value of memory location equals the maximal value of 8 bytes; If equal the maximal value of 8 bytes, the index bucket of not stored on the disk to be retrieved is described, retrieval finishes; If be not equal to the maximal value of 8 bytes, the existing index bucket of having stored to be retrieved on the disk is described, change step (4);
4), obtain the disk block number of index barrel number to be retrieved and this index bucket place disk block in the mapping table if be not equal to the maximal value of 8 bytes;
5) retrieval in the bucket, if retrieve, then retrieval finishes; If retrieval is less than, overflow bucket retrieval in disk block;
6) overflow bucket retrieves in disk block, and retrieval finishes; If in disk block overflow bucket retrieval less than, then in overall overflow bucket retrieval, retrieval finishes.
CN2009102561033A 2009-12-29 2009-12-29 Multi-level bucket hashing index method for searching mass data Active CN101782922B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009102561033A CN101782922B (en) 2009-12-29 2009-12-29 Multi-level bucket hashing index method for searching mass data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009102561033A CN101782922B (en) 2009-12-29 2009-12-29 Multi-level bucket hashing index method for searching mass data

Publications (2)

Publication Number Publication Date
CN101782922A CN101782922A (en) 2010-07-21
CN101782922B true CN101782922B (en) 2012-01-18

Family

ID=42522921

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009102561033A Active CN101782922B (en) 2009-12-29 2009-12-29 Multi-level bucket hashing index method for searching mass data

Country Status (1)

Country Link
CN (1) CN101782922B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8379525B2 (en) * 2010-09-28 2013-02-19 Microsoft Corporation Techniques to support large numbers of subscribers to a real-time event
CN101937474A (en) * 2010-10-14 2011-01-05 广州从兴电子开发有限公司 Mass data query method and device
CN102890675B (en) * 2011-07-18 2015-05-13 阿里巴巴集团控股有限公司 Method and device for storing and finding data
CN102779180B (en) * 2012-06-29 2015-09-09 华为技术有限公司 The operation processing method of data-storage system, data-storage system
CN104182409B (en) * 2013-05-24 2018-01-19 腾讯科技(深圳)有限公司 A kind of method and device optimized to multistage Hash
CN104639570A (en) * 2013-11-06 2015-05-20 南京中兴新软件有限责任公司 Resource object storage processing method and device
CN105653568A (en) * 2014-12-04 2016-06-08 中兴通讯股份有限公司 Method and apparatus analyzing user behaviors
CN105320775B (en) * 2015-11-11 2019-05-14 中科曙光信息技术无锡有限公司 The access method and device of data
CN105975587B (en) * 2016-05-05 2019-05-10 诸葛晴凤 A kind of high performance memory database index organization and access method
CN108572958B (en) * 2017-03-07 2022-07-29 腾讯科技(深圳)有限公司 Data processing method and device
CN108255958B (en) * 2017-12-21 2022-05-03 百度在线网络技术(北京)有限公司 Data query method, device and storage medium
CN111338569A (en) * 2020-02-16 2020-06-26 西安奥卡云数据科技有限公司 Object storage back-end optimization method based on direct mapping
CN112612419B (en) * 2020-12-25 2022-10-25 西安交通大学 Data storage structure, storage method, reading method, device and medium of NVM (non-volatile memory)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1940922A (en) * 2005-09-30 2007-04-04 腾讯科技(深圳)有限公司 Method and system for improving information search speed
CN101359325A (en) * 2007-08-01 2009-02-04 北京启明星辰信息技术有限公司 Multi-key-word matching method for rapidly analyzing content
CN101464901A (en) * 2009-01-16 2009-06-24 华中科技大学 Object search method in object storage device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1940922A (en) * 2005-09-30 2007-04-04 腾讯科技(深圳)有限公司 Method and system for improving information search speed
CN101359325A (en) * 2007-08-01 2009-02-04 北京启明星辰信息技术有限公司 Multi-key-word matching method for rapidly analyzing content
CN101464901A (en) * 2009-01-16 2009-06-24 华中科技大学 Object search method in object storage device

Also Published As

Publication number Publication date
CN101782922A (en) 2010-07-21

Similar Documents

Publication Publication Date Title
CN101782922B (en) Multi-level bucket hashing index method for searching mass data
CN110825748B (en) High-performance and easily-expandable key value storage method by utilizing differentiated indexing mechanism
US7689574B2 (en) Index and method for extending and querying index
US9047301B2 (en) Method for optimizing the memory usage and performance of data deduplication storage systems
US10678654B2 (en) Systems and methods for data backup using data binning and deduplication
CN102364474B (en) Metadata storage system for cluster file system and metadata management method
CN102024047B (en) Data searching method and device thereof
CN104346357B (en) The file access method and system of a kind of built-in terminal
CN101464901B (en) Object search method in object storage device
US20100082537A1 (en) File system for storage device which uses different cluster sizes
CN102622434B (en) Data storage method, data searching method and device
CN105117417A (en) Read-optimized memory database Trie tree index method
CN103838853A (en) Mixed file system based on different storage media
CN116257523A (en) Column type storage indexing method and device based on nonvolatile memory
CN107766258B (en) Memory storage method and device and memory query method and device
CN105912696A (en) DNS (Domain Name System) index creating method and query method based on logarithm merging
CN109299143B (en) Knowledge fast indexing method of data interoperation test knowledge base based on Redis cache
CN103399915A (en) Optimal reading method for index file of search engine
Zhang et al. FlameDB: A key-value store with grouped level structure and heterogeneous Bloom filter
CN107273443B (en) Mixed indexing method based on metadata of big data model
CN103853772B (en) High-efficiency reverse index organizing method
CN110413724B (en) Data retrieval method and device
CN103902693A (en) Method of read-optimized memory database T-tree index structure
CN109213760B (en) High-load service storage and retrieval method for non-relational data storage
CN114996270A (en) Method and device for inquiring paging data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C56 Change in the name or address of the patentee
CP03 Change of name, title or address

Address after: 250101, Bole Road, hi tech Zone, Shandong, Ji'nan, 128

Patentee after: SHANDONG SHANDA OUMA SOFTWARE CO., LTD.

Address before: Tianchen Avenue high tech Zone of Ji'nan City, Shandong Province, No. 1318 250101

Patentee before: Shandong Shanda Ouma Software Co., Ltd.