CN101782922A - Multi-level bucket hashing index method for searching mass data - Google Patents
Multi-level bucket hashing index method for searching mass data Download PDFInfo
- Publication number
- CN101782922A CN101782922A CN200910256103A CN200910256103A CN101782922A CN 101782922 A CN101782922 A CN 101782922A CN 200910256103 A CN200910256103 A CN 200910256103A CN 200910256103 A CN200910256103 A CN 200910256103A CN 101782922 A CN101782922 A CN 101782922A
- Authority
- CN
- China
- Prior art keywords
- bucket
- index
- disk
- retrieval
- bytes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a multi-level bucket hashing index method for searching mass data, which relates to the field of the mass data storage, and is characterized in that: (1) a bucket mapping table is provided for reducing the occupation space of the hashing index document on the disk; (2) hashing index adopts multi-level bucket, the size of the disk blocks is integral multiple of that of sectors, each disk block is provided with one or more basic buckets and can be provided with within-block overflow bucket and can be provided with overall overflow buckets; (3) a data caching structure for providing the index documents provides data caching mapping tables, and the management of the data caching is realized by a double-linked list on the caching mapping table. The occupation space of the index document on the disk is reduced through the mapping table; and the area of the disk blocks is integral multiple of that of the disk sectors, the disk reading-writing times can be reduced through the data caching structure, and the internal memory utilization efficiency and the searching efficiency of the data are improved.
Description
Technical field
The present invention relates to a kind of multi-level bucket hash indexing method, belong to data storage, retrieval technique field towards searching mass data.
Background technology
Recall precision is an important indicator of mass data storage, service application system, index technology has important effect in data space tissue and retrieval, large database and data-storage applications system all support the hashed table index technology at present, Data Source increases rapidly, how fast to obtain information of interest exactly, become the subject matter that people pay close attention to, therefore characteristics such as magnanimity are had higher requirement to retrieval technique, various information retrieval, filtration, extractive technique become the emphasis of research gradually.A very important advantage of hash index is that recall precision does not increase with the growth of data volume, and the principal element that influences the Hash performance is disk read-write number of times and hash-collision problem.Hash index mainly contains dual mode at present, static hash index and dynamic hash index.
Summary of the invention
At the deficiencies in the prior art, the invention provides a kind of multi-level bucket hash indexing method towards searching mass data.
A kind of multi-level bucket hash indexing method towards searching mass data comprises the creation method and the search method of hash index, and the creation method of hash index is as follows:
1) information of creating index is determined a key word;
2) in calculator memory, set up the mapping table of index bucket, i.e. the cryptographic hash h of key word and the index bucket memory location c on disk;
3) judge that the index bucket whether on disk, judges promptly whether the value of memory location equals the maximal value of 8 bytes; If equal the maximal value of 8 bytes, the index bucket of not stored on the disk is described, continue step 4); If be not equal to the maximal value of 8 bytes, the existing index bucket of having stored on the disk is described, change step (7);
During the index bucket 4) do not stored on the disk, on disk, create a new disk block d and a canned data, set up a new index bucket, determine the new sequence number of index bucket in disk block d;
5) upgrade mapping table, make c=d;
6) upgrade disk, repeatedly storage;
When 7) having the index bucket of having stored on the disk, determine the sequence number of this index bucket in disk block;
8) judge that whether this index bucket has enough new key words of space storage, if enough spaces are arranged, changes step (6); If there are not enough spaces, key word overflows at this index bucket, stores overflow bucket in the disk block into; If overflow bucket does not have enough spaces yet in the disk block, key word overflow bucket in disk block overflows, and stores overall overflow bucket into.
The search method of hash index is as follows:
1) information for the treatment of search index is determined a key word;
2) read mapping table;
3) judge that index bucket to be retrieved whether on disk, judges promptly whether the value of memory location equals the maximal value of 8 bytes; If equal the maximal value of 8 bytes, the index bucket of not stored on the disk to be retrieved is described, retrieval finishes; If be not equal to the maximal value of 8 bytes, the existing index bucket of having stored to be retrieved on the disk is described, change step (4);
4), obtain the disk block number of index barrel number to be retrieved and this index bucket place disk block in the mapping table if be not equal to the maximal value of 8 bytes;
5) retrieval in the bucket, if retrieve, then retrieval finishes; If retrieval is less than, overflow bucket retrieval in disk block;
6) overflow bucket retrieves in disk block, and retrieval finishes; If in disk block overflow bucket retrieval less than, then in overall overflow bucket retrieval, retrieval finishes.
When storage and retrieval mass data, index file itself is bigger, and hash index takes up room bigger, in order to reduce the hash index file as far as possible, improve dusk utilization and file and read performance, the invention provides a barrel mapping table, avoided the empty bucket in the hash index file; During data retrieval, read number of times, the invention provides cache management, improved the utilization factor of internal memory, when the data of bucket are in internal memory,, avoided the read operation of disk directly from interior access data in order to reduce disk; In order to reduce the performance decline that hash-collision causes, the invention provides based on overflow bucket and overall overflow bucket in the disk block structured piece, having reduced the disk read-write operation that conflict causes, improved efficient, experimental results show that the present invention has very high practical value.
The present invention can make full use of disk and internal memory, and reduces the disk read-write number of times, improves mass data storage, recall precision.
Description of drawings
Fig. 1 is the index creation process flow diagram.
Fig. 2 is the indexed search process flow diagram.
Embodiment:
Embodiment:
A kind of multi-level bucket hash indexing method towards searching mass data comprises the creation method and the search method of hash index, and the creation method of hash index is as follows:
1) information of creating index is determined a key word;
2) in calculator memory, set up the mapping table of index bucket, i.e. the cryptographic hash h of key word and the index bucket memory location c on disk;
3) judge that the index bucket whether on disk, judges promptly whether the value of memory location equals the maximal value of 8 bytes; If equal the maximal value of 8 bytes, the index bucket of not stored on the disk is described, continue step 4); If be not equal to the maximal value of 8 bytes, the existing index bucket of having stored on the disk is described, change step (7);
During the index bucket 4) do not stored on the disk, on disk, create a new disk block d and a canned data, set up a new index bucket, determine the new sequence number of index bucket in disk block d;
5) upgrade mapping table, make c=d;
6) upgrade disk, repeatedly storage;
When 7) having the index bucket of having stored on the disk, determine the sequence number of this index bucket in disk block;
8) judge that whether this index bucket has enough new key words of space storage, if enough spaces are arranged, changes step (6); If there are not enough spaces, key word overflows at this index bucket, stores overflow bucket in the disk block into; If overflow bucket does not have enough spaces yet in the disk block, key word overflow bucket in disk block overflows, and stores overall overflow bucket into.
The search method of hash index is as follows:
1) information for the treatment of search index is determined a key word;
2) read mapping table;
3) judge that index bucket to be retrieved whether on disk, judges promptly whether the value of memory location equals the maximal value of 8 bytes; If equal the maximal value of 8 bytes, the index bucket of not stored on the disk to be retrieved is described, retrieval finishes; If be not equal to the maximal value of 8 bytes, the existing index bucket of having stored to be retrieved on the disk is described, change step (4);
4), obtain the disk block number of index barrel number to be retrieved and this index bucket place disk block in the mapping table if be not equal to the maximal value of 8 bytes;
5) retrieval in the bucket, if retrieve, then retrieval finishes; If retrieval is less than, overflow bucket retrieval in disk block;
6) overflow bucket retrieves in disk block, and retrieval finishes; If in disk block overflow bucket retrieval less than, then in overall overflow bucket retrieval, retrieval finishes.
Claims (1)
1. the multi-level bucket hash indexing method towards searching mass data is characterized in that, method comprises the creation method and the search method of hash index, and the creation method of hash index is as follows:
1) information of creating index is determined a key word;
2) in calculator memory, set up the mapping table of index bucket, i.e. the cryptographic hash h of key word and the index bucket memory location c on disk;
3) judge that the index bucket whether on disk, judges promptly whether the value of memory location equals the maximal value of 8 bytes; If equal the maximal value of 8 bytes, the index bucket of not stored on the disk is described, continue step 4); If be not equal to the maximal value of 8 bytes, the existing index bucket of having stored on the disk is described, change step (7);
During the index bucket 4) do not stored on the disk, on disk, create a new disk block d and a canned data, set up a new index bucket, determine the new sequence number of index bucket in disk block d;
5) upgrade mapping table, make c=d;
6) upgrade disk, repeatedly storage;
When 7) having the index bucket of having stored on the disk, determine the sequence number of this index bucket in disk block;
8) judge that whether this index bucket has enough new key words of space storage, if enough spaces are arranged, changes step (6); If there are not enough spaces, key word overflows at this index bucket, stores overflow bucket in the disk block into; If overflow bucket does not have enough spaces yet in the disk block, key word overflow bucket in disk block overflows, and stores overall overflow bucket into.
The search method of hash index is as follows:
1) information for the treatment of search index is determined a key word;
2) read mapping table;
3) judge that index bucket to be retrieved whether on disk, judges promptly whether the value of memory location equals the maximal value of 8 bytes; If equal the maximal value of 8 bytes, the index bucket of not stored on the disk to be retrieved is described, retrieval finishes; If be not equal to the maximal value of 8 bytes, the existing index bucket of having stored to be retrieved on the disk is described, change step (4);
4), obtain the disk block number of index barrel number to be retrieved and this index bucket place disk block in the mapping table if be not equal to the maximal value of 8 bytes;
5) retrieval in the bucket, if retrieve, then retrieval finishes; If retrieval is less than, overflow bucket retrieval in disk block;
6) overflow bucket retrieves in disk block, and retrieval finishes; If in disk block overflow bucket retrieval less than, then in overall overflow bucket retrieval, retrieval finishes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2009102561033A CN101782922B (en) | 2009-12-29 | 2009-12-29 | Multi-level bucket hashing index method for searching mass data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2009102561033A CN101782922B (en) | 2009-12-29 | 2009-12-29 | Multi-level bucket hashing index method for searching mass data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101782922A true CN101782922A (en) | 2010-07-21 |
CN101782922B CN101782922B (en) | 2012-01-18 |
Family
ID=42522921
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2009102561033A Active CN101782922B (en) | 2009-12-29 | 2009-12-29 | Multi-level bucket hashing index method for searching mass data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101782922B (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101937474A (en) * | 2010-10-14 | 2011-01-05 | 广州从兴电子开发有限公司 | Mass data query method and device |
CN102413067A (en) * | 2010-09-28 | 2012-04-11 | 微软公司 | Techniques to support large numbers of subscribers to a real-time event |
CN102779180A (en) * | 2012-06-29 | 2012-11-14 | 华为技术有限公司 | Operation processing method of data storage system and data storage system |
CN102890675A (en) * | 2011-07-18 | 2013-01-23 | 阿里巴巴集团控股有限公司 | Method and device for storing and finding data |
WO2014177080A1 (en) * | 2013-11-06 | 2014-11-06 | 中兴通讯股份有限公司 | Method and device for processing resource object storage |
CN104182409A (en) * | 2013-05-24 | 2014-12-03 | 腾讯科技(深圳)有限公司 | Method and device for optimizing multi-order hash |
CN105320775A (en) * | 2015-11-11 | 2016-02-10 | 中科曙光信息技术无锡有限公司 | Data access method and apparatus |
WO2016086580A1 (en) * | 2014-12-04 | 2016-06-09 | 中兴通讯股份有限公司 | Method and apparatus for analyzing user behaviors |
CN105975587A (en) * | 2016-05-05 | 2016-09-28 | 诸葛晴凤 | Method for organizing and accessing memory database index with high performance |
CN108255958A (en) * | 2017-12-21 | 2018-07-06 | 百度在线网络技术(北京)有限公司 | Data query method, apparatus and storage medium |
CN108572958A (en) * | 2017-03-07 | 2018-09-25 | 腾讯科技(深圳)有限公司 | Data processing method and device |
CN111338569A (en) * | 2020-02-16 | 2020-06-26 | 西安奥卡云数据科技有限公司 | Object storage back-end optimization method based on direct mapping |
CN112612419A (en) * | 2020-12-25 | 2021-04-06 | 西安交通大学 | Data storage structure, storage method, reading method, equipment and medium of NVM (non-volatile memory) |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100468400C (en) * | 2005-09-30 | 2009-03-11 | 腾讯科技(深圳)有限公司 | Method and system for improving information search speed |
CN101359325B (en) * | 2007-08-01 | 2010-06-16 | 北京启明星辰信息技术股份有限公司 | Multi-key-word matching method for rapidly analyzing content |
CN101464901B (en) * | 2009-01-16 | 2012-03-21 | 华中科技大学 | Object search method in object storage device |
-
2009
- 2009-12-29 CN CN2009102561033A patent/CN101782922B/en active Active
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102413067A (en) * | 2010-09-28 | 2012-04-11 | 微软公司 | Techniques to support large numbers of subscribers to a real-time event |
CN102413067B (en) * | 2010-09-28 | 2014-11-12 | 微软公司 | Method and device for supporting large numbers of subscribers to a real-time event |
CN101937474A (en) * | 2010-10-14 | 2011-01-05 | 广州从兴电子开发有限公司 | Mass data query method and device |
CN102890675A (en) * | 2011-07-18 | 2013-01-23 | 阿里巴巴集团控股有限公司 | Method and device for storing and finding data |
CN102890675B (en) * | 2011-07-18 | 2015-05-13 | 阿里巴巴集团控股有限公司 | Method and device for storing and finding data |
CN102779180A (en) * | 2012-06-29 | 2012-11-14 | 华为技术有限公司 | Operation processing method of data storage system and data storage system |
CN102779180B (en) * | 2012-06-29 | 2015-09-09 | 华为技术有限公司 | The operation processing method of data-storage system, data-storage system |
CN104182409B (en) * | 2013-05-24 | 2018-01-19 | 腾讯科技(深圳)有限公司 | A kind of method and device optimized to multistage Hash |
CN104182409A (en) * | 2013-05-24 | 2014-12-03 | 腾讯科技(深圳)有限公司 | Method and device for optimizing multi-order hash |
WO2014177080A1 (en) * | 2013-11-06 | 2014-11-06 | 中兴通讯股份有限公司 | Method and device for processing resource object storage |
CN104639570A (en) * | 2013-11-06 | 2015-05-20 | 南京中兴新软件有限责任公司 | Resource object storage processing method and device |
WO2016086580A1 (en) * | 2014-12-04 | 2016-06-09 | 中兴通讯股份有限公司 | Method and apparatus for analyzing user behaviors |
CN105320775A (en) * | 2015-11-11 | 2016-02-10 | 中科曙光信息技术无锡有限公司 | Data access method and apparatus |
CN105320775B (en) * | 2015-11-11 | 2019-05-14 | 中科曙光信息技术无锡有限公司 | The access method and device of data |
CN105975587A (en) * | 2016-05-05 | 2016-09-28 | 诸葛晴凤 | Method for organizing and accessing memory database index with high performance |
CN105975587B (en) * | 2016-05-05 | 2019-05-10 | 诸葛晴凤 | A kind of high performance memory database index organization and access method |
CN108572958A (en) * | 2017-03-07 | 2018-09-25 | 腾讯科技(深圳)有限公司 | Data processing method and device |
CN108255958A (en) * | 2017-12-21 | 2018-07-06 | 百度在线网络技术(北京)有限公司 | Data query method, apparatus and storage medium |
CN111338569A (en) * | 2020-02-16 | 2020-06-26 | 西安奥卡云数据科技有限公司 | Object storage back-end optimization method based on direct mapping |
CN112612419A (en) * | 2020-12-25 | 2021-04-06 | 西安交通大学 | Data storage structure, storage method, reading method, equipment and medium of NVM (non-volatile memory) |
CN112612419B (en) * | 2020-12-25 | 2022-10-25 | 西安交通大学 | Data storage structure, storage method, reading method, device and medium of NVM (non-volatile memory) |
Also Published As
Publication number | Publication date |
---|---|
CN101782922B (en) | 2012-01-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101782922B (en) | Multi-level bucket hashing index method for searching mass data | |
CN110825748B (en) | High-performance and easily-expandable key value storage method by utilizing differentiated indexing mechanism | |
US9047301B2 (en) | Method for optimizing the memory usage and performance of data deduplication storage systems | |
CN102024047B (en) | Data searching method and device thereof | |
US7689574B2 (en) | Index and method for extending and querying index | |
Teng et al. | LSbM-tree: Re-enabling buffer caching in data management for mixed reads and writes | |
CN113821171B (en) | Key value storage method based on hash table and LSM tree | |
CN105468642A (en) | Data storage method and apparatus | |
CN101464901B (en) | Object search method in object storage device | |
WO2013152678A1 (en) | Method and device for metadata query | |
CN105117417A (en) | Read-optimized memory database Trie tree index method | |
WO2010035124A1 (en) | File system for storage device which uses different cluster sizes | |
CN102622434A (en) | Data storage method, data searching method and device | |
CN109388341A (en) | A kind of system storage optimization method based on Device Mapper | |
US20140222777A1 (en) | Relating to use of columnar databases | |
CN107766258B (en) | Memory storage method and device and memory query method and device | |
KR20120103095A (en) | Memory system and memory mapping method thereof | |
CN109299143B (en) | Knowledge fast indexing method of data interoperation test knowledge base based on Redis cache | |
CN103399915A (en) | Optimal reading method for index file of search engine | |
CN109213760B (en) | High-load service storage and retrieval method for non-relational data storage | |
CN107273443B (en) | Mixed indexing method based on metadata of big data model | |
CN118152434A (en) | Data management method and computing device | |
CN103853772B (en) | High-efficiency reverse index organizing method | |
CN103902693A (en) | Method of read-optimized memory database T-tree index structure | |
CN113535092B (en) | Storage engine, method and readable medium for reducing memory metadata |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C56 | Change in the name or address of the patentee | ||
CP03 | Change of name, title or address |
Address after: 250101, Bole Road, hi tech Zone, Shandong, Ji'nan, 128 Patentee after: SHANDONG SHANDA OUMA SOFTWARE CO., LTD. Address before: Tianchen Avenue high tech Zone of Ji'nan City, Shandong Province, No. 1318 250101 Patentee before: Shandong Shanda Ouma Software Co., Ltd. |