CN101782922B - Multi-level bucket hashing index method for searching mass data - Google Patents
Multi-level bucket hashing index method for searching mass data Download PDFInfo
- Publication number
- CN101782922B CN101782922B CN2009102561033A CN200910256103A CN101782922B CN 101782922 B CN101782922 B CN 101782922B CN 2009102561033 A CN2009102561033 A CN 2009102561033A CN 200910256103 A CN200910256103 A CN 200910256103A CN 101782922 B CN101782922 B CN 101782922B
- Authority
- CN
- China
- Prior art keywords
- bucket
- index
- disk
- retrieval
- bytes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a multi-level bucket hashing index method for searching mass data, which relates to the field of the mass data storage, and is characterized in that: (1) a bucket mapping table is provided for reducing the occupation space of the hashing index document on the disk; (2) hashing index adopts multi-level bucket, the size of the disk blocks is integral multiple of that of sectors, each disk block is provided with one or more basic buckets and can be provided with within-block overflow bucket and can be provided with overall overflow buckets; (3) a data caching structure for providing the index documents provides data caching mapping tables, and the management of the data caching is realized by a double-linked list on the caching mapping table. The occupation space of the index document on the disk is reduced through the mapping table; and the area of the disk blocks is integral multiple of that of the disk sectors, the disk reading-writing times can be reduced through the data caching structure, and the internal memory utilization efficiency and the searching efficiency of the data are improved.
Description
Technical field
The present invention relates to a kind of multi-level bucket hash indexing method, belong to data storage, retrieval technique field towards searching mass data.
Background technology
Recall precision is an important indicator of mass data storage, service application system; Index technology has important effect in data space tissue and retrieval; Large database and data-storage applications system all support the hashed table index technology at present; Data Source increases rapidly, how fast to obtain information of interest exactly, becomes the subject matter that people pay close attention to; Therefore characteristics such as magnanimity are had higher requirement to retrieval technique, various information retrievals, filtration, extractive technique become the emphasis of research gradually.A very important advantage of hash index is that recall precision does not increase with the growth of data volume, and the principal element that influences the Hash performance is disk read-write number of times and hash-collision problem.Hash index mainly contains dual mode at present, static hash index and dynamic hash index.
Summary of the invention
To the deficiency of prior art, the present invention provides a kind of multi-level bucket hash indexing method towards searching mass data.
A kind of multi-level bucket hash indexing method towards searching mass data comprises the creation method and the search method of hash index, and the creation method of hash index is following:
1) information of creating index is confirmed a key word;
2) in calculator memory, set up the mapping table of index bucket, i.e. the cryptographic hash h of key word and the index bucket memory location c on disk;
3) judge that the index bucket whether on disk, judges promptly whether the value of memory location equals the maximal value of 8 bytes; If equal the maximal value of 8 bytes, the index bucket of not stored on the disk is described, continue step 4); If be not equal to the maximal value of 8 bytes, the existing index bucket of having stored on the disk is described, change step (7);
During the index bucket 4) do not stored on the disk, on disk, create a new disk block d and a canned data, set up a new index bucket, confirm the new sequence number of index bucket in disk block d;
5) upgrade mapping table, make c=d;
6) upgrade disk, repeatedly storage;
When 7) having the index bucket of having stored on the disk, confirm the sequence number of this index bucket in disk block;
8) judge that whether this index bucket has enough new key words of space storage, if enough spaces are arranged, changes step (6); If there are not enough spaces, key word overflows at this index bucket, stores overflow bucket in the disk block into; If overflow bucket does not have enough spaces yet in the disk block, key word overflow bucket in disk block overflows, and stores overall overflow bucket into.
The search method of hash index is following:
1) information of treating search index is confirmed a key word;
2) read mapping table;
3) judge that index bucket to be retrieved whether on disk, judges promptly whether the value of memory location equals the maximal value of 8 bytes; If equal the maximal value of 8 bytes, the index bucket of not stored on the disk to be retrieved is described, retrieval finishes; If be not equal to the maximal value of 8 bytes, the existing index bucket of having stored to be retrieved on the disk is described, change step (4);
4), obtain the disk block number of index barrel number to be retrieved and this index bucket place disk block in the mapping table if be not equal to the maximal value of 8 bytes;
5) retrieval in the bucket, if retrieve, then retrieval finishes; If retrieval is less than, overflow bucket retrieval in disk block;
6) overflow bucket retrieves in disk block, and retrieval finishes; If in disk block overflow bucket retrieval less than, then in overall overflow bucket retrieval, retrieval finishes.
When storage and retrieval mass data, index file itself is bigger, and hash index takes up room bigger; In order to reduce the hash index file as far as possible; Improve dusk utilization and file read performance, the invention provides a barrel mapping table, avoided the empty bucket in the hash index file; During data retrieval, read number of times, the invention provides cache management, improved the utilization factor of internal memory, when the data of bucket are in internal memory,, avoided the read operation of disk directly from interior access data in order to reduce disk; In order to reduce the performance decline that hash-collision causes, the invention provides based on overflow bucket and overall overflow bucket in the disk block structured piece, having reduced the disk read-write operation that conflict causes, improved efficient, the present invention of experiment proof has very high practical value.
The present invention can make full use of disk and internal memory, and reduces the disk read-write number of times, improves mass data storage, recall precision.
Description of drawings
Fig. 1 is the index creation process flow diagram.
Fig. 2 is the indexed search process flow diagram.
Embodiment:
Embodiment:
A kind of multi-level bucket hash indexing method towards searching mass data comprises the creation method and the search method of hash index, and the creation method of hash index is following:
1) information of creating index is confirmed a key word;
2) in calculator memory, set up the mapping table of index bucket, i.e. the cryptographic hash h of key word and the index bucket memory location c on disk;
3) judge that the index bucket whether on disk, judges promptly whether the value of memory location equals the maximal value of 8 bytes; If equal the maximal value of 8 bytes, the index bucket of not stored on the disk is described, continue step 4); If be not equal to the maximal value of 8 bytes, the existing index bucket of having stored on the disk is described, change step (7);
During the index bucket 4) do not stored on the disk, on disk, create a new disk block d and a canned data, set up a new index bucket, confirm the new sequence number of index bucket in disk block d;
5) upgrade mapping table, make c=d;
6) upgrade disk, repeatedly storage;
When 7) having the index bucket of having stored on the disk, confirm the sequence number of this index bucket in disk block;
8) judge that whether this index bucket has enough new key words of space storage, if enough spaces are arranged, changes step (6); If there are not enough spaces, key word overflows at this index bucket, stores overflow bucket in the disk block into; If overflow bucket does not have enough spaces yet in the disk block, key word overflow bucket in disk block overflows, and stores overall overflow bucket into.
The search method of hash index is following:
1) information of treating search index is confirmed a key word;
2) read mapping table;
3) judge that index bucket to be retrieved whether on disk, judges promptly whether the value of memory location equals the maximal value of 8 bytes; If equal the maximal value of 8 bytes, the index bucket of not stored on the disk to be retrieved is described, retrieval finishes; If be not equal to the maximal value of 8 bytes, the existing index bucket of having stored to be retrieved on the disk is described, change step (4);
4), obtain the disk block number of index barrel number to be retrieved and this index bucket place disk block in the mapping table if be not equal to the maximal value of 8 bytes;
5) retrieval in the bucket, if retrieve, then retrieval finishes; If retrieval is less than, overflow bucket retrieval in disk block;
6) overflow bucket retrieves in disk block, and retrieval finishes; If in disk block overflow bucket retrieval less than, then in overall overflow bucket retrieval, retrieval finishes.
Claims (1)
1. the multi-level bucket hash indexing method towards searching mass data is characterized in that, method comprises the creation method and the search method of hash index, and the creation method of hash index is following:
1) information of creating index is confirmed a key word;
2) in calculator memory, set up the mapping table of index bucket, i.e. the cryptographic hash h of key word and the index bucket memory location c on disk;
3) judge that the index bucket whether on disk, judges promptly whether the value of memory location equals the maximal value of 8 bytes; If equal the maximal value of 8 bytes, the index bucket of not stored on the disk is described, continue step 4); If be not equal to the maximal value of 8 bytes, the existing index bucket of having stored on the disk is described, change step (7);
During the index bucket 4) do not stored on the disk, on disk, create a new disk block d and a canned data, set up a new index bucket, confirm the new sequence number of index bucket in disk block d;
5) upgrade mapping table, make c=d;
6) upgrade disk, repeatedly storage;
When 7) having the index bucket of having stored on the disk, confirm the sequence number of this index bucket in disk block;
8) judge that whether this index bucket has enough new key words of space storage, if enough spaces are arranged, changes step (6); If there are not enough spaces, key word overflows at this index bucket, stores overflow bucket in the disk block into; If overflow bucket does not have enough spaces yet in the disk block, key word overflow bucket in disk block overflows, and stores overall overflow bucket into;
The search method of hash index is following:
1) information of treating search index is confirmed a key word;
2) read mapping table;
3) judge that index bucket to be retrieved whether on disk, judges promptly whether the value of memory location equals the maximal value of 8 bytes; If equal the maximal value of 8 bytes, the index bucket of not stored on the disk to be retrieved is described, retrieval finishes; If be not equal to the maximal value of 8 bytes, the existing index bucket of having stored to be retrieved on the disk is described, change step (4);
4), obtain the disk block number of index barrel number to be retrieved and this index bucket place disk block in the mapping table if be not equal to the maximal value of 8 bytes;
5) retrieval in the bucket, if retrieve, then retrieval finishes; If retrieval is less than, overflow bucket retrieval in disk block;
6) overflow bucket retrieves in disk block, and retrieval finishes; If in disk block overflow bucket retrieval less than, then in overall overflow bucket retrieval, retrieval finishes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2009102561033A CN101782922B (en) | 2009-12-29 | 2009-12-29 | Multi-level bucket hashing index method for searching mass data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2009102561033A CN101782922B (en) | 2009-12-29 | 2009-12-29 | Multi-level bucket hashing index method for searching mass data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101782922A CN101782922A (en) | 2010-07-21 |
CN101782922B true CN101782922B (en) | 2012-01-18 |
Family
ID=42522921
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2009102561033A Active CN101782922B (en) | 2009-12-29 | 2009-12-29 | Multi-level bucket hashing index method for searching mass data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101782922B (en) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8379525B2 (en) * | 2010-09-28 | 2013-02-19 | Microsoft Corporation | Techniques to support large numbers of subscribers to a real-time event |
CN101937474A (en) * | 2010-10-14 | 2011-01-05 | 广州从兴电子开发有限公司 | Mass data query method and device |
CN102890675B (en) * | 2011-07-18 | 2015-05-13 | 阿里巴巴集团控股有限公司 | Method and device for storing and finding data |
CN102779180B (en) * | 2012-06-29 | 2015-09-09 | 华为技术有限公司 | The operation processing method of data-storage system, data-storage system |
CN104182409B (en) * | 2013-05-24 | 2018-01-19 | 腾讯科技(深圳)有限公司 | A kind of method and device optimized to multistage Hash |
CN104639570A (en) * | 2013-11-06 | 2015-05-20 | 南京中兴新软件有限责任公司 | Resource object storage processing method and device |
CN105653568A (en) * | 2014-12-04 | 2016-06-08 | 中兴通讯股份有限公司 | Method and apparatus analyzing user behaviors |
CN105320775B (en) * | 2015-11-11 | 2019-05-14 | 中科曙光信息技术无锡有限公司 | The access method and device of data |
CN105975587B (en) * | 2016-05-05 | 2019-05-10 | 诸葛晴凤 | A kind of high performance memory database index organization and access method |
CN108572958B (en) * | 2017-03-07 | 2022-07-29 | 腾讯科技(深圳)有限公司 | Data processing method and device |
CN108255958B (en) * | 2017-12-21 | 2022-05-03 | 百度在线网络技术(北京)有限公司 | Data query method, device and storage medium |
CN111338569A (en) * | 2020-02-16 | 2020-06-26 | 西安奥卡云数据科技有限公司 | Object storage back-end optimization method based on direct mapping |
CN112612419B (en) * | 2020-12-25 | 2022-10-25 | 西安交通大学 | Data storage structure, storage method, reading method, device and medium of NVM (non-volatile memory) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1940922A (en) * | 2005-09-30 | 2007-04-04 | 腾讯科技(深圳)有限公司 | Method and system for improving information search speed |
CN101359325A (en) * | 2007-08-01 | 2009-02-04 | 北京启明星辰信息技术有限公司 | Multi-key-word matching method for rapidly analyzing content |
CN101464901A (en) * | 2009-01-16 | 2009-06-24 | 华中科技大学 | Object search method in object storage device |
-
2009
- 2009-12-29 CN CN2009102561033A patent/CN101782922B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1940922A (en) * | 2005-09-30 | 2007-04-04 | 腾讯科技(深圳)有限公司 | Method and system for improving information search speed |
CN101359325A (en) * | 2007-08-01 | 2009-02-04 | 北京启明星辰信息技术有限公司 | Multi-key-word matching method for rapidly analyzing content |
CN101464901A (en) * | 2009-01-16 | 2009-06-24 | 华中科技大学 | Object search method in object storage device |
Also Published As
Publication number | Publication date |
---|---|
CN101782922A (en) | 2010-07-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101782922B (en) | Multi-level bucket hashing index method for searching mass data | |
CN110825748B (en) | High-performance and easily-expandable key value storage method by utilizing differentiated indexing mechanism | |
US7689574B2 (en) | Index and method for extending and querying index | |
US9047301B2 (en) | Method for optimizing the memory usage and performance of data deduplication storage systems | |
US10678654B2 (en) | Systems and methods for data backup using data binning and deduplication | |
CN102364474B (en) | Metadata storage system for cluster file system and metadata management method | |
CN102024047B (en) | Data searching method and device thereof | |
CN104346357B (en) | The file access method and system of a kind of built-in terminal | |
CN101464901B (en) | Object search method in object storage device | |
US20100082537A1 (en) | File system for storage device which uses different cluster sizes | |
CN102622434B (en) | Data storage method, data searching method and device | |
CN105117417A (en) | Read-optimized memory database Trie tree index method | |
CN103838853A (en) | Mixed file system based on different storage media | |
CN116257523A (en) | Column type storage indexing method and device based on nonvolatile memory | |
CN107766258B (en) | Memory storage method and device and memory query method and device | |
CN105912696A (en) | DNS (Domain Name System) index creating method and query method based on logarithm merging | |
CN109299143B (en) | Knowledge fast indexing method of data interoperation test knowledge base based on Redis cache | |
CN103399915A (en) | Optimal reading method for index file of search engine | |
Zhang et al. | FlameDB: A key-value store with grouped level structure and heterogeneous Bloom filter | |
CN107273443B (en) | Mixed indexing method based on metadata of big data model | |
CN103853772B (en) | High-efficiency reverse index organizing method | |
CN110413724B (en) | Data retrieval method and device | |
CN103902693A (en) | Method of read-optimized memory database T-tree index structure | |
CN109213760B (en) | High-load service storage and retrieval method for non-relational data storage | |
CN114996270A (en) | Method and device for inquiring paging data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C56 | Change in the name or address of the patentee | ||
CP03 | Change of name, title or address |
Address after: 250101, Bole Road, hi tech Zone, Shandong, Ji'nan, 128 Patentee after: SHANDONG SHANDA OUMA SOFTWARE CO., LTD. Address before: Tianchen Avenue high tech Zone of Ji'nan City, Shandong Province, No. 1318 250101 Patentee before: Shandong Shanda Ouma Software Co., Ltd. |