CN108153907B - Dictionary storage management method for realizing space optimization through 16-bit Trie tree - Google Patents
Dictionary storage management method for realizing space optimization through 16-bit Trie tree Download PDFInfo
- Publication number
- CN108153907B CN108153907B CN201810046757.2A CN201810046757A CN108153907B CN 108153907 B CN108153907 B CN 108153907B CN 201810046757 A CN201810046757 A CN 201810046757A CN 108153907 B CN108153907 B CN 108153907B
- Authority
- CN
- China
- Prior art keywords
- node
- dictionary
- data
- byte
- trie tree
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2246—Trees, e.g. B+trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2219—Large Object storage; Management thereof
Abstract
The invention provides a dictionary storage management method for realizing space optimization through a 16-bit Trie tree, which comprises the following steps: acquiring complete dictionary data; creating a management linked list, and associating dictionary data with a root node of the linked list; and according to the dictionary data information, constructing a 16-bit Trie tree from the root node of the linked list to realize the storage management of the dictionary data. Compared with the prior art, the invention provides a dictionary storage management method for realizing space optimization through a 16-bit Trie tree, and dictionary data of the 16-bit Trie tree is constructed under a mapping table structure, so that the Trie tree is more compact in space, the complexity is basically unchanged, and the speeds of dictionary construction, indexing, modification and deletion are improved. Compared with the existing Trie tree, the algorithm provided by the invention can not only modify and traverse the dictionary at any time, but also sequence when the 16-bit Trie tree is constructed, thereby reducing the expense of updating the dictionary and maintaining higher index efficiency.
Description
Technical Field
The invention belongs to the field of data structure and information management, and particularly relates to a dictionary storage management method for realizing space optimization through a 16-bit Trie tree.
Background
In modern society, with the rapid development of the internet and the mass popularization of intelligent mobile devices, especially the arrival of the big data era, various cultures and knowledge continuously flood our brains, and our needs for various information are more and more, however, sometimes when facing too complicated and various information volumes, we will feel unpreferable, and how to efficiently store and manage these large-scale data files becomes a new challenge. The Trie tree is often used in the field of large-scale data file storage management as a data structure to help us process various data. The traditional Trie tree is an efficient index tree, an effective data retrieval organization structure can be established, the core idea is to change time in space, and the public prefix of a character string is utilized to reduce the query time so as to improve the efficiency and reduce meaningless character string comparison to the maximum extent.
Conventionally, a Trie tree is represented by a two-dimensional array in the form of a matrix or a linked list in the form of a list. Since the matrix form contains many empty elements and is a sparse data structure, the space consumption is large. The Trie represented in list form is not so fast in retrieval although it is more spatially compact. Afterwards, Aoe proposes to use double arrays to implement Trie, and although the double-array Trie algorithm effectively reduces the space waste of the Trie structure, there still exist some problems, firstly, the insertion time is slow compared with the dynamic retrieval method, and frequent updates cannot be processed. Another problem is that the space efficiency of the double array decreases as the number of deletes increases, because it retains the empty elements that the deletes create. In addition, the dictionary storage management method for the Trie tree based on the double array structure has another problem: each modification, pass, etc. operation needs to start at the root node and may face the problem of moving large amounts of constructed data.
Disclosure of Invention
In order to solve the above technical problem, the present invention provides a dictionary storage management method for implementing space optimization by a 16-bit Trie, comprising the following steps: acquiring complete dictionary data; creating a management linked list, and associating dictionary data with a root node of the linked list; and according to the dictionary data information, constructing a 16-bit Trie tree from the root node of the linked list to realize the storage management of the dictionary data. Compared with the prior art, the invention provides a dictionary storage management method for realizing space optimization through a 16-bit Trie tree, and dictionary data of the 16-bit Trie tree is constructed under a mapping table structure, so that the Trie tree is more compact in space, the complexity is basically unchanged, and the speeds of dictionary construction, indexing, modification and deletion are improved. Compared with the existing Trie tree, the algorithm provided by the invention can not only modify and traverse the dictionary at any time, but also sequence when the 16-bit Trie tree is constructed, thereby reducing the expense of updating the dictionary and maintaining higher index efficiency.
Therefore, the embodiment of the invention discloses a dictionary storage management method for realizing space optimization through a 16-bit Trie tree. The method comprises the following steps: acquiring complete dictionary data; constructing a 16-bit Trie tree to realize storage management of dictionary data; and constructing a 16-bit Trie tree node (non-root node) information LEAFSInfMap mapping table.
Preferably, the creating of the 16-bit Trie includes the following steps:
a. acquiring a first byte of dictionary data information, searching a corresponding root node according to the value of the byte, and associating the root node with the byte;
b. on the basis of the step a, taking the node as a parent node and taking the node as an initial state;
c. associating the upper four bits of the next byte of dictionary data information with the parent node as its child node;
d. taking the high four bits as a father node, taking the low four bits of the byte as a child node, and associating the high four bits with the father node;
e. acquiring the next byte of the dictionary data again, repeating the steps c and d until the last byte of the dictionary data is acquired, and executing the step d;
f. and obtaining the last byte of the dictionary data, and taking the lower four bits as a final state to complete the construction of the tree.
Preferably, the node information includes:
LEAFSInfo: child node list information;
nodeValue: the value represented by the current node;
end Key: whether the current node is a terminal node
leaf: child nodes and data pointers;
preferably, the child node list information value of the current node in the 16-bit Trie is represented by a 16-bit sized LEAFSInfo.
Preferably, the node value in the 16-bit Trie is represented by nodeValue, which is from 0 to 15 for representing the value represented by the node.
Preferably, the child node pointer currently stored by a node in the 16-bit Trie and the array of data pointers are represented by leaf, where the child node pointer points to a child node corresponding to the current node, the data pointer points to data corresponding to the current node, when the child node does not exist and the data pointer does not exist, the number of leaf elements is 0, and when the child node exists, the number of leaf elements is +1 (where the data pointer occupies one element).
Preferably, whether the node is a terminal node is represented by an endKey, 0 is represented by a non-terminal node, and 1 is represented by a terminal node.
Preferably, the LEAFSInfoMap mapping table is a possible combination of node (non-root node) information in the dictionary data, and is represented by a two-dimensional array of 65536 × 17.
According to the storage management method for realizing space optimization through the 16-bit Trie tree, the dictionary data of the 16-bit Trie tree is constructed under the mapping table structure, so that the Trie tree is more compact in space, the complexity is basically unchanged, and the speeds of dictionary construction, indexing, modification and deletion are improved. Compared with the existing Trie tree, the algorithm provided by the invention can not only modify and traverse the dictionary at any time, but also sequence when the 16-bit Trie tree is constructed, thereby reducing the expense of updating the dictionary and maintaining higher index efficiency.
It is to be understood that both the foregoing general description and the following detailed description are explanatory and exemplary and are intended to provide further explanation of the invention as claimed.
Drawings
Fig. 1 is a flowchart of a dictionary storage management method for implementing space optimization by a 16-bit Trie in an embodiment of the present invention.
Fig. 2 is a forest structure diagram of dictionary data storage for implementing a 16-bit Trie in an embodiment of the present invention.
Fig. 3 is a schematic flow chart of dictionary data storage for implementing a 16-bit Trie in the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment of the invention provides a dictionary storage management method for realizing space optimization through a 16-bit Trie tree.
Fig. 1 is a flowchart of a dictionary storage management method for implementing space optimization by using a 16-bit Trie according to an embodiment of the present invention.
Step S110, complete dictionary data is acquired.
For example, the dictionary stores the entry (key) data as shown in table 1 below:
(Anhui) | clearing away heat |
Anhui province | TSINGHUA University |
China | Qinghua garden |
Chinese song | Qinghuayuan (Chinese character of 'Qinghua') |
Chinese dance music | South mountain road |
China folk song | Nanjing |
Chinese opera | Nanjing people |
TABLE 1
And step S120, constructing a 16-bit Trie tree to realize storage management of dictionary data.
As shown in fig. 2, according to the dictionary data in table 1, there are some common prefixes (i.e. the same parent nodes) between the words, and a forest tree can be formed according to the prefixes, and the nodes of each tree are described as follows:
the dotted circle represents the terminal node of the tree (i.e. the endKey is 1);
the solid line circle represents a non-terminal node of the tree (i.e., the endKey is 0);
the word formed from the root node of the tree to the current terminal node is a complete entry in the dictionary;
words formed from some non-terminal node at the root node of the tree are common prefixes of some entries in the dictionary.
Step S121, the construction of the above 16-bit Trie tree comprises the following steps:
step S122, a, acquiring a first byte of dictionary data information, searching a corresponding root node according to the value of the byte, and associating the root node with the root node;
step S123, b, on the basis of the step a, taking the node as a father node and taking the father node as an initial state;
step S124, c, taking the upper four bits of the next byte of the dictionary data information as the child node of the dictionary data information, and associating the child node with the parent node;
step S125, d, taking the high four bits as a father node, taking the low four bits of the byte as a child node, and associating the high four bits with the father node;
step S126, e, obtaining the next byte of the dictionary data again, repeating the step C and the step D until the last byte of the dictionary data is obtained, and executing the step D;
and step S127, f, obtaining the last byte of the dictionary data, and taking the lower four bits as the final state to complete the construction of the tree.
Therefore, in the constructed 16-bit Trie tree, the first byte of the dictionary data is used as a root node, and the subsequent bytes are respectively split into high four bits and low four bits to form a high four-bit node and a low four-bit node. Thus, if the value of a node (non-root node) is 0-15, the dictionary forest tree can be subjected to entry management through the LEAFSInfoMap mapping table and the LEAFSInfo of the current node. The node information is described in detail below.
Each node (non-root node) has 16 child nodes at most, and the information of the child nodes in the node is represented by a LEAFSInfo with the size of 16 bits, for example, the LEAFSInfo has the value of 0x0009 (the binary value is 000000000001001), which represents that the node has only the nodes with the values of 0 and 3 below.
The node indicates the value represented by the current node by nodeValue, and the size of the node is from 0 to 15.
In the nodes, an endKey is used for representing whether the current node is a terminal node, 0 is represented as a non-terminal node, and 1 is represented as a terminal node.
The leaf pointer array is used in the node to represent the child nodes and data pointers where the leaf pointer array exists. Compared with the construction of a classical trie, the leaf pointer array is constructed dynamically, the size of the leaf pointer array changes along with the change of the number of child nodes, and the maximum element is 17. The method fundamentally avoids the waste of space caused by the fixed quantity of the neutron node pointers in the construction of the classical trie tree.
Step S130, a 16-bit Trie tree node (non-root node) information LEAFSInfoMap mapping table is constructed. The table is a two-dimensional array of 65536 × 17 size, where 65536 numbers are determined for all cases of 16-bit LEAFSInfo (i.e., 16 th power of 2), and the inside numbers in different cases of LEAFSInfo are respectively the corresponding 16 child node information in that case (represented by LEAFSInfMap [ LEAFSInfo ] [0] -LEAFSInfMap [ LEAFSInfo ] [15 ]), and the last bit element LEAFSInfMap [ LEAFSInfo ] [16] in the case of the current LEAFSInfo is used to represent the number of child nodes in the case of the current LEAFSInfo.
For example: and if the value of the LEAFSInfoMap [ LEAFSInfo ] [0] is not zero (namely, the current node has a child node with a value of 0), the child node with the value of 0 of the current node is located at the position of the LEAFSInfoMap [ LEAFSInfo ] [0] -1], and otherwise, the child node is not located.
The pseudo code of the step is as follows:
it can be seen that the LEAFFInfoMap mapping table is all possible combinations of node (non-root node) information in the dictionary data, and the LEAFFInfoMap mapping table and the LEAFFInfo information of the node can be used for inquiring whether all child nodes under the node exist data information associated with the node. Therefore, if a child node is inserted or deleted under the node, only the LEAFSInfo information needs to be updated, and the child node pointer corresponding to the leaf in the corresponding node is added or deleted, so that the storage management of the 16-bit Trie tree is enhanced.
According to the above detailed description of the embodiments of the present invention, it can be clearly understood that the storage management method for implementing space optimization by a 16-bit Trie according to the present invention constructs dictionary data of the 16-bit Trie by using a mapping table structure, so that the Trie is more compact in space while ensuring that the complexity is substantially unchanged, and the speed of dictionary construction, indexing, modification and deletion is improved. Compared with the existing Trie tree, the algorithm provided by the invention can not only modify and traverse the dictionary at any time, but also sequence when the 16-bit Trie tree is constructed, thereby reducing the expense of updating the dictionary and maintaining higher index efficiency.
Claims (4)
1. A dictionary storage management method for realizing space optimization through a 16-bit Trie tree is characterized by comprising the following steps:
acquiring complete dictionary data;
constructing a 16-bit Trie tree to realize storage management of dictionary data;
constructing a 16-bit Trie tree node information LEAFSInfMap mapping table;
the node information is non-root node information, and comprises the following steps: LEAFSInfo, nodeValue, endKey, leafs;
the LEAFSInfo is child node list information;
the information value of the LEAFSInfo is 16 bits;
the nodeValue is a value represented by the current node;
the nodeValue has a value of 0 to 15;
the endKey is a terminal node judgment function;
the leafs are child node pointers and data pointers;
the child node pointer points to a child node corresponding to the current node;
the data pointer points to data corresponding to the current node;
the data information of all child nodes under the node can be inquired through the LEAFSInfoMap and the LEAFSInfo;
the LEAFSInfoMap mapping table is a possible combination of the node information in the dictionary data, and the LEAFSInfoMap is represented by a 65536 × 17 two-dimensional array.
2. The method of claim 1, wherein the constructing the 16-bit Trie comprises the following steps:
a. acquiring a first byte of dictionary data information, searching a corresponding root node according to the value of the byte, and associating the root node with the byte;
b. on the basis of the step a, taking the byte as a parent node and taking the byte as a starting state;
c. associating the upper four bits of the next byte of dictionary data information with the parent node as its child node;
d. taking the high four bits as a father node, taking the low four bits of the byte as a child node, and associating the high four bits with the father node;
e. acquiring the next byte of the dictionary data again, repeating the steps c and d until the last byte of the dictionary data is acquired, and executing the step d;
f. and obtaining the last byte of the dictionary data, and taking the lower four bits as a final state to complete the construction of the tree.
3. The method as claimed in claim 1, wherein the number of leaf elements is 0 when the child node and the data pointer are absent, and the number of leaf elements is +1 when the child node is present, and the data pointer occupies one element.
4. The method for managing dictionary storage based on 16-bit Trie implementation of space optimization according to claim 1, wherein when the endKey is 0, the endKey is represented as a non-terminal node, and when the endKey is 1, the endKey is represented as a terminal node.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810046757.2A CN108153907B (en) | 2018-01-18 | 2018-01-18 | Dictionary storage management method for realizing space optimization through 16-bit Trie tree |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810046757.2A CN108153907B (en) | 2018-01-18 | 2018-01-18 | Dictionary storage management method for realizing space optimization through 16-bit Trie tree |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108153907A CN108153907A (en) | 2018-06-12 |
CN108153907B true CN108153907B (en) | 2021-01-22 |
Family
ID=62461799
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810046757.2A Active CN108153907B (en) | 2018-01-18 | 2018-01-18 | Dictionary storage management method for realizing space optimization through 16-bit Trie tree |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108153907B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109902033B (en) * | 2019-02-13 | 2023-03-14 | 山东华芯半导体有限公司 | LBA (logical Block addressing) distribution method and mapping method of namespace applied to NVMe SSD (network video management entity) controller |
CN110489516B (en) * | 2019-08-15 | 2022-03-18 | 厦门铅笔头信息科技有限公司 | Method for quickly establishing prefix index for massive structured data |
CN113329031B (en) * | 2019-10-10 | 2023-06-13 | 深圳前海微众银行股份有限公司 | Method and device for generating state tree of block |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101499094A (en) * | 2009-03-10 | 2009-08-05 | 焦点科技股份有限公司 | Data compression storing and retrieving method and system |
CN101788990A (en) * | 2009-01-23 | 2010-07-28 | 北京金远见电脑技术有限公司 | Global optimization and construction method and system of TRIE double-array |
CN103365991A (en) * | 2013-07-03 | 2013-10-23 | 深圳市华傲数据技术有限公司 | Method for realizing dictionary memory management of Trie tree based on one-dimensional linear space |
CN103365992A (en) * | 2013-07-03 | 2013-10-23 | 深圳市华傲数据技术有限公司 | Method for realizing dictionary search of Trie tree based on one-dimensional linear space |
EP3145134A1 (en) * | 2014-06-10 | 2017-03-22 | Huawei Technologies Co., Ltd. | Lookup device, lookup configuration method and lookup method |
-
2018
- 2018-01-18 CN CN201810046757.2A patent/CN108153907B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101788990A (en) * | 2009-01-23 | 2010-07-28 | 北京金远见电脑技术有限公司 | Global optimization and construction method and system of TRIE double-array |
CN101499094A (en) * | 2009-03-10 | 2009-08-05 | 焦点科技股份有限公司 | Data compression storing and retrieving method and system |
CN103365991A (en) * | 2013-07-03 | 2013-10-23 | 深圳市华傲数据技术有限公司 | Method for realizing dictionary memory management of Trie tree based on one-dimensional linear space |
CN103365992A (en) * | 2013-07-03 | 2013-10-23 | 深圳市华傲数据技术有限公司 | Method for realizing dictionary search of Trie tree based on one-dimensional linear space |
EP3145134A1 (en) * | 2014-06-10 | 2017-03-22 | Huawei Technologies Co., Ltd. | Lookup device, lookup configuration method and lookup method |
Also Published As
Publication number | Publication date |
---|---|
CN108153907A (en) | 2018-06-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108197313B (en) | Dictionary indexing method for realizing space optimization through 16-bit Trie tree | |
CN108153907B (en) | Dictionary storage management method for realizing space optimization through 16-bit Trie tree | |
CN103365991B (en) | A kind of dictionaries store management method realizing Trie tree based on one-dimensional linear space | |
CN100444167C (en) | Method for managing and searching dictionary with perfect even numbers group TRIE Tree | |
CN109471905B (en) | Block chain indexing method supporting time range and attribute range compound query | |
CN101576929B (en) | Fast vocabulary entry prompting realization method | |
KR100834760B1 (en) | Structure of index, apparatus and method for optimized index searching | |
CN102663058A (en) | URL duplication removing method in distributed network crawler system | |
CN101673307A (en) | Space data index method and system | |
CN103488704A (en) | Method and device for storing data | |
CN102867049B (en) | Chinese PINYIN quick word segmentation method based on word search tree | |
US9065469B2 (en) | Compression match enumeration | |
Almaslukh et al. | Evaluating spatial-keyword queries on streaming data | |
CN101158955A (en) | Construct method of Chinese word stock | |
Flor | A fast and flexible architecture for very large word n-gram datasets | |
CN108984626B (en) | Data processing method and device and server | |
KR100999408B1 (en) | Method for searching an ??? using hash tree | |
CN110995876B (en) | Method and device for storing and searching IP | |
KR101089722B1 (en) | Method and apparatus for prefix tree based indexing, and recording medium thereof | |
CN104809170A (en) | Storage method for tree type data under cloud environment | |
CN110134834B (en) | Method for accelerating IP positioning by using dynamic AVL forest cache | |
CN109885840A (en) | The dictionary sort method of space optimization is realized by 16 Trie trees | |
WO2001025962A1 (en) | Database organization for increasing performance by splitting tables | |
CN112463837B (en) | Relational database data storage query method | |
Jia et al. | Collision Analysis and an Efficient Double Array Construction Method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP01 | Change in the name or title of a patent holder | ||
CP01 | Change in the name or title of a patent holder |
Address after: 310018, No. 258, source street, Xiasha Higher Education Park, Hangzhou, Zhejiang Patentee after: China Jiliang University Patentee after: Hangzhou code pigeon Intelligent Technology Co.,Ltd. Address before: 310018, No. 258, source street, Xiasha Higher Education Park, Hangzhou, Zhejiang Patentee before: China Jiliang University Patentee before: HANGZHOU DAIMAGE INTELLIGENT TECHNOLOGY Co.,Ltd. |