CN117312239A - Method for storing data index based on combination mode - Google Patents
Method for storing data index based on combination mode Download PDFInfo
- Publication number
- CN117312239A CN117312239A CN202311288969.9A CN202311288969A CN117312239A CN 117312239 A CN117312239 A CN 117312239A CN 202311288969 A CN202311288969 A CN 202311288969A CN 117312239 A CN117312239 A CN 117312239A
- Authority
- CN
- China
- Prior art keywords
- data
- index
- storing
- combination
- indexes based
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000006835 compression Effects 0.000 claims description 3
- 238000007906 compression Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 2
- 230000007547 defect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/11—File system administration, e.g. details of archiving or snapshots
- G06F16/122—File system administration, e.g. details of archiving or snapshots using management policies
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
- G06F16/1744—Redundancy elimination performed by the file system using compression, e.g. sparse files
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method for storing data indexes based on a combination mode, which comprises the following steps: step S1: acquiring data stored in a database and the number of the data; step S2: judging the number of data strips; step S3: if the number of the data strips is less than or equal to 2000 ten thousand, storing the data index in the HashMap memory; step S4: if the number of data pieces is more than 2000 ten thousand pieces, storing the data in the SlimTrie, storing the SlimTrie data index in the index file according to the specified time period, and limiting the size of the file. According to the invention, the storage mode is rapidly determined, so that the scanning amount and the access time of data are reduced, and the query efficiency is improved; the combination of SlimTRI and HashMap realizes the storage of index data, reduces the storage space, provides better user experience, and saves the cost.
Description
Technical Field
The present invention relates to a method for storing data indexes, and more particularly, to a method for storing data indexes based on a combination manner.
Background
In the data processing process, quick positioning of data is a key technology. An index is a common positioning way, and a certain data position can be quickly positioned through the index. Indexing techniques are widely used in the fields of databases, geographic information systems, logistics management, etc., for example. With the increasing demand for real-time data, files also tend to be fragmented, such as short videos and live services, and often a video is only hundreds of KB in size, even tens of KB in size. In the prior art, for example, tree indexes, the intermediate nodes and branches of the Tree are utilized to divide the full quantity of keys into smaller parts, the intermediate nodes only store the keys, and the data parts are all stored in leaf nodes, so that the complete keys must be stored in the memory. The existing method for storing the data index has the following defects:
1. if 100TB of data is used, 10KB of files are stored in the whole, and only the index data required for managing the files can reach 10000GB of memory space. For the management of huge amounts of metadata, the index data required for managing these files occupies huge memory space;
2. because the complete key needs to be stored in the memory, the time cost consumed by traversal is high, and the searching performance is poor.
Disclosure of Invention
In order to solve the defects of the technology, the invention provides a method for storing data indexes based on a combination mode.
In order to solve the technical problems, the invention adopts the following technical scheme: a method of storing a data index based on a combination, comprising:
step S1: acquiring data stored in a database and the number of the data;
step S2: judging the number of data strips;
step S3: if the number of the data is less than or equal to 2000 ten thousand, storing the data index in a memory and storing the data index in a HashMap mode;
step S4: if the number of data pieces is more than 2000 ten thousand pieces, storing the data index in the SlimTrie, storing the SlimTrie index data in the index file according to the specified time period, and limiting the size of the index file.
Further, in step S3 of the method for storing data indexes based on the combination mode, the complete key is stored in the HashMap memory, the stored key is mapped to a new hash value by using the calculation of the hash function, and the indexes are built.
Further, in step S4 of the method for storing data indexes based on the combination mode of the present invention, the method includes:
the SlimTRI is generated on the basis of a standard Trie tree, invalid nodes in the standard Trie tree are cut off, and the memory overhead of the Trie is compressed to obtain the SlimTRI;
cutting off invalid nodes in the standard Trie tree, namely cutting off single branch nodes in the standard Trie tree;
compressing the memory overhead of the Trie to obtain SlimTRI, firstly storing the data combination of the whole Trie by using a compression array compatible array, and further realizing the compression of the Trie;
the SlimTrie uses an index to represent and mark a plurality of adjacent small files, so as to optimize the small files;
SlimTRI realizes distributed storage through key-value storage, keys in the key-value are self-increment ids, the value stores ids generated by zxy tile coordinates, and the zxy tile coordinate id generation rule is operated as follows:
z<<58|x<<29|y;
where z represents the zoom level, x represents the abscissa and y represents the ordinate.
Further, in step S4 of the method for storing data indexes based on a combination manner, the storing of the SlimTrie index data in the index file according to the specified time period includes:
the time period is practically specified as 5 to 10 minutes.
Further, in step S4 of the method for storing data indexes based on the combination method, the method limits the size of the index file, and includes:
the size of the index file is limited to 128M;
if the index file size exceeds 128M, the SlimTRI data index will be stored in the next index file.
The invention discloses a method for storing data indexes based on a combination mode, which is a method for realizing the data index storage by combining HashMap and SlimTRI, and the method rapidly determines the storage mode by the size of the data index volume, and has the following beneficial effects:
1. by quickly determining the storage mode, the scanning amount and the access time of data are reduced, and the SlimTrie supports distributed storage, so that the query efficiency is improved;
2. the data index is stored by combining SlimTRI and HashMap, so that the memory storage space is reduced, better user experience is provided for a client, and meanwhile, the cost is saved.
Drawings
FIG. 1 is a flow chart of an embodiment of the data volume determination and storage method of the present invention.
FIG. 2 is a diagram of a distributed SlimTRI memory of the present invention.
FIG. 3 is a graph of performance versus the present invention.
Detailed Description
The invention will be described in further detail with reference to the drawings and the detailed description.
The technical principle of the invention is as follows:
the implementation flowchart of the data volume judgment storage mode in the method for storing the data index based on the combination mode shown in fig. 1 includes:
the data index is stored based on the HashMap and SlimTrie combination.
Step S1: acquiring data stored in a database and the number of the data;
step S2: judging the number of data strips;
step S3: if the number of the data is less than or equal to 2000 ten thousand, storing the data index in a memory, storing the data index in a HashMap mode, firstly, mapping a key to be stored to a new hash value by using the calculation of a hash function, then establishing an index, and searching positioning data once through the hash index; when the number of data stored in the HashMap is within 2000 ten thousand, the query efficiency is optimal;
step S4: if the number of the data pieces is more than 2000 ten thousand pieces, storing index data in a SlimTRI memory; storing SlimTRI index data in an index file according to a specified time period, wherein the time period is practically specified as 5-10 minutes, and the experimental verification time period is specified as 5 minutes, so that the performance is optimal; limiting the size of the index file, wherein the size limit of the index file is 128M at maximum; if the index file size exceeds 128M, the SlimTRI data index will be stored in the next index file.
Fig. 2 is a distributed SlimTrie storage diagram of a method for storing data indexes based on a combination manner according to a first exemplary embodiment of the present invention, which is a preferred embodiment of the method shown in fig. 1, and in the method of this embodiment, for step S3, index data is stored in a SlimTrie memory, as shown in fig. 2, and the method of this embodiment includes:
the SlimTRI can support distributed storage by adopting a key-value technical structure, wherein the key is self-increment id, the value stores id generated by zxy tile coordinates, and the zxy tile coordinates id generation rule bit operation is as follows:
z<<58|x<<29|y;
where z represents the zoom level, x represents the abscissa and y represents the ordinate. A key-value database is a database that stores data in key-value pairs, each key corresponding to a unique value. For example, the index data is stored in 3 nodes, each node stores 2000 ten thousand index data, and if one node stops running, the other two nodes can still lock the index position rapidly. The method has the advantages that the high availability of the query index is realized, and the normal operation of the service is ensured.
FIG. 3 is a graph showing a comparison of performance of a method for storing data indexes based on a combination manner according to a second embodiment of the present invention, wherein the method of the present invention includes:
let n keys, each of length k, be. "O ()" is complexity, which means how fast the runtime will expand as the input size increases.
SlimTRI is built on the basis of a standard Trie tree. The SlimTRI generation process is divided into three parts: firstly, constructing indexes of all keys by using a Trie, and then cutting on the basis of the Trie to reduce the magnitude of index data from O (n x k) to O (n); the tree structure is realized in the form of pointers in the memory, but the pointers occupy 8 bytes in a 64-bit system, and the memory overhead is at least 8*n, so that the data structure of the whole Trie is stored through one compatible array (internal node, user data and Step information), and the memory overhead is reduced; optimizing the small files, and enabling a plurality of adjacent small files to have an index forbidden mark so as to balance IO overhead and space overhead.
To index n keys, at least log2 (n) bits are needed to distinguish n different keys, based on which the conclusion can be drawn: the memory overhead of the index of each file is independent of the length of the key, thus limiting the size of n, splitting the entire set of keys into multiple subsets of specified size. In each index, 10 bytes are taken as a total, the key information occupies 6 bytes, and the value occupies 4 bytes. Since the space overhead of SlimTrie is only related to the number n of keys and is irrelevant to the length of the keys, if the space overhead of SlimTrie in fig. 3 is O (n), the query time is O (log (n)), log () represents a meaning different from mathematics, where it is calculated by using a logarithm with a base of 2, and the number of operations is increased only once as the algorithm doubles with the input scale. In terms of complexity contrast, slimTrie space overhead is minimized with query time, and is superior to other data structures, both in space and performance.
The above embodiments are not intended to limit the present invention, and the present invention is not limited to the above examples, but is also intended to be limited to the following claims.
Claims (10)
1. A method for storing a data index based on a combination, comprising the steps of:
step S1: acquiring data stored in a database and the number of the data;
step S2: judging the number of data strips;
step S3: if the number of the data is less than or equal to 2000 ten thousand, storing the data index in a memory and storing the data index in a HashMap mode;
step S4: if the number of data pieces is more than 2000 ten thousand pieces, storing the data in the SlimTrie, storing the SlimTrie index data in the index file according to a specified time period, and limiting the size of the index file.
2. The method for storing data indexes based on a combination of claim 1, wherein: and (3) storing complete keys in a HashMap memory in the step (S3), mapping the stored keys to a new hash value by utilizing the calculation of a hash function, and building an index.
3. The method for storing data indexes based on a combination of claim 1, wherein: and S4, the SlimTRI in the step is generated on the basis of a standard Trie tree, and after invalid nodes in the standard Trie tree are cut, the memory overhead of the Trie is compressed to obtain the SlimTRI.
4. A method of storing data indexes based on a combination as claimed in claim 3, wherein: and cutting out invalid nodes in the standard Trie tree, namely cutting out single branch nodes in the standard Trie tree.
5. A method of storing data indexes based on a combination as claimed in claim 3, wherein: the memory overhead of the compressed Trie is obtained by the SlimTrie, and the data structure of the whole Trie is stored by a compressed array, so that the compression of the Trie is realized.
6. A method of storing data indexes based on a combination as claimed in claim 3, wherein: the SlimTrie identifies a plurality of adjacent doclets with an index, thereby optimizing the doclets.
7. The method for storing data indexes based on a combination of claim 1, wherein: and the SlimTRI in the step S4 realizes distributed storage through key-value storage.
8. The method of storing data indexes based on a combination of claim 7, wherein: the key in the key-value is self-increment id, the value stores id generated by zxy tile coordinates, and the generation rule of the zxy tile coordinates id is calculated as follows;
z<<58|x<<29|y;
where z represents the zoom level, x represents the abscissa and y represents the ordinate.
9. The method for storing data indexes based on a combination of claim 1, wherein: the time period in said step S4 is actually designated as 5 to 10 minutes.
10. The method for storing data indexes based on a combination of claim 1, wherein: in the step S4, the size limit of the index file is 128M at maximum; if the index file size exceeds 128M, the SlimTRI data index will be stored in the next index file.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311288969.9A CN117312239A (en) | 2023-10-08 | 2023-10-08 | Method for storing data index based on combination mode |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311288969.9A CN117312239A (en) | 2023-10-08 | 2023-10-08 | Method for storing data index based on combination mode |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117312239A true CN117312239A (en) | 2023-12-29 |
Family
ID=89280834
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311288969.9A Pending CN117312239A (en) | 2023-10-08 | 2023-10-08 | Method for storing data index based on combination mode |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117312239A (en) |
-
2023
- 2023-10-08 CN CN202311288969.9A patent/CN117312239A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110413611B (en) | Data storage and query method and device | |
CN111125089B (en) | Time sequence data storage method, device, server and storage medium | |
US11899641B2 (en) | Trie-based indices for databases | |
KR101792168B1 (en) | Managing storage of individually accessible data units | |
KR100856245B1 (en) | File system device and method for saving and seeking file thereof | |
CN111177302B (en) | Service bill processing method, device, computer equipment and storage medium | |
US7895171B2 (en) | Compressibility estimation of non-unique indexes in a database management system | |
CN1838124A (en) | Method for rapidly positioning grid + T tree index in mass data memory database | |
CN112148928B (en) | Cuckoo filter based on fingerprint family | |
CN111046034A (en) | Method and system for managing memory data and maintaining data in memory | |
CN103077197A (en) | Data storing method and device | |
CN108009265B (en) | Spatial data indexing method in cloud computing environment | |
US11868328B2 (en) | Multi-record index structure for key-value stores | |
CN112380004B (en) | Memory management method, memory management device, computer readable storage medium and electronic equipment | |
CN111143373A (en) | Data processing method and device, electronic equipment and storage medium | |
CN117312239A (en) | Method for storing data index based on combination mode | |
CN110221778A (en) | Processing method, system, storage medium and the electronic equipment of hotel's data | |
CN114416741A (en) | KV data writing and reading method and device based on multi-level index and storage medium | |
US20060230054A1 (en) | On-line organization of data sets | |
US20130218851A1 (en) | Storage system, data management device, method and program | |
CN111382120B (en) | Small file management method, system and computer equipment | |
CN114398373A (en) | File data storage and reading method and device applied to database storage | |
CN117540056B (en) | Method, device, computer equipment and storage medium for data query | |
CN113946580B (en) | Massive heterogeneous log data retrieval middleware | |
CN118132513A (en) | Fingerprint data determining method, device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |