CN108153907B - Dictionary storage management method for realizing space optimization through 16-bit Trie tree - Google Patents

Dictionary storage management method for realizing space optimization through 16-bit Trie tree Download PDF

Info

Publication number
CN108153907B
CN108153907B CN201810046757.2A CN201810046757A CN108153907B CN 108153907 B CN108153907 B CN 108153907B CN 201810046757 A CN201810046757 A CN 201810046757A CN 108153907 B CN108153907 B CN 108153907B
Authority
CN
China
Prior art keywords
node
dictionary
data
byte
trie tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810046757.2A
Other languages
Chinese (zh)
Other versions
CN108153907A (en
Inventor
肖英
屈晓芳
张宇
龚德浪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou code pigeon Intelligent Technology Co.,Ltd.
China Jiliang University
Original Assignee
Hangzhou Daimage Intelligent Technology Co ltd
China Jiliang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Daimage Intelligent Technology Co ltd, China Jiliang University filed Critical Hangzhou Daimage Intelligent Technology Co ltd
Priority to CN201810046757.2A priority Critical patent/CN108153907B/en
Publication of CN108153907A publication Critical patent/CN108153907A/en
Application granted granted Critical
Publication of CN108153907B publication Critical patent/CN108153907B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2219Large Object storage; Management thereof

Abstract

The invention provides a dictionary storage management method for realizing space optimization through a 16-bit Trie tree, which comprises the following steps: acquiring complete dictionary data; creating a management linked list, and associating dictionary data with a root node of the linked list; and according to the dictionary data information, constructing a 16-bit Trie tree from the root node of the linked list to realize the storage management of the dictionary data. Compared with the prior art, the invention provides a dictionary storage management method for realizing space optimization through a 16-bit Trie tree, and dictionary data of the 16-bit Trie tree is constructed under a mapping table structure, so that the Trie tree is more compact in space, the complexity is basically unchanged, and the speeds of dictionary construction, indexing, modification and deletion are improved. Compared with the existing Trie tree, the algorithm provided by the invention can not only modify and traverse the dictionary at any time, but also sequence when the 16-bit Trie tree is constructed, thereby reducing the expense of updating the dictionary and maintaining higher index efficiency.

Description

Dictionary storage management method for realizing space optimization through 16-bit Trie tree
Technical Field
The invention belongs to the field of data structure and information management, and particularly relates to a dictionary storage management method for realizing space optimization through a 16-bit Trie tree.
Background
In modern society, with the rapid development of the internet and the mass popularization of intelligent mobile devices, especially the arrival of the big data era, various cultures and knowledge continuously flood our brains, and our needs for various information are more and more, however, sometimes when facing too complicated and various information volumes, we will feel unpreferable, and how to efficiently store and manage these large-scale data files becomes a new challenge. The Trie tree is often used in the field of large-scale data file storage management as a data structure to help us process various data. The traditional Trie tree is an efficient index tree, an effective data retrieval organization structure can be established, the core idea is to change time in space, and the public prefix of a character string is utilized to reduce the query time so as to improve the efficiency and reduce meaningless character string comparison to the maximum extent.
Conventionally, a Trie tree is represented by a two-dimensional array in the form of a matrix or a linked list in the form of a list. Since the matrix form contains many empty elements and is a sparse data structure, the space consumption is large. The Trie represented in list form is not so fast in retrieval although it is more spatially compact. Afterwards, Aoe proposes to use double arrays to implement Trie, and although the double-array Trie algorithm effectively reduces the space waste of the Trie structure, there still exist some problems, firstly, the insertion time is slow compared with the dynamic retrieval method, and frequent updates cannot be processed. Another problem is that the space efficiency of the double array decreases as the number of deletes increases, because it retains the empty elements that the deletes create. In addition, the dictionary storage management method for the Trie tree based on the double array structure has another problem: each modification, pass, etc. operation needs to start at the root node and may face the problem of moving large amounts of constructed data.
Disclosure of Invention
In order to solve the above technical problem, the present invention provides a dictionary storage management method for implementing space optimization by a 16-bit Trie, comprising the following steps: acquiring complete dictionary data; creating a management linked list, and associating dictionary data with a root node of the linked list; and according to the dictionary data information, constructing a 16-bit Trie tree from the root node of the linked list to realize the storage management of the dictionary data. Compared with the prior art, the invention provides a dictionary storage management method for realizing space optimization through a 16-bit Trie tree, and dictionary data of the 16-bit Trie tree is constructed under a mapping table structure, so that the Trie tree is more compact in space, the complexity is basically unchanged, and the speeds of dictionary construction, indexing, modification and deletion are improved. Compared with the existing Trie tree, the algorithm provided by the invention can not only modify and traverse the dictionary at any time, but also sequence when the 16-bit Trie tree is constructed, thereby reducing the expense of updating the dictionary and maintaining higher index efficiency.
Therefore, the embodiment of the invention discloses a dictionary storage management method for realizing space optimization through a 16-bit Trie tree. The method comprises the following steps: acquiring complete dictionary data; constructing a 16-bit Trie tree to realize storage management of dictionary data; and constructing a 16-bit Trie tree node (non-root node) information LEAFSInfMap mapping table.
Preferably, the creating of the 16-bit Trie includes the following steps:
a. acquiring a first byte of dictionary data information, searching a corresponding root node according to the value of the byte, and associating the root node with the byte;
b. on the basis of the step a, taking the node as a parent node and taking the node as an initial state;
c. associating the upper four bits of the next byte of dictionary data information with the parent node as its child node;
d. taking the high four bits as a father node, taking the low four bits of the byte as a child node, and associating the high four bits with the father node;
e. acquiring the next byte of the dictionary data again, repeating the steps c and d until the last byte of the dictionary data is acquired, and executing the step d;
f. and obtaining the last byte of the dictionary data, and taking the lower four bits as a final state to complete the construction of the tree.
Preferably, the node information includes:
LEAFSInfo: child node list information;
nodeValue: the value represented by the current node;
end Key: whether the current node is a terminal node
leaf: child nodes and data pointers;
preferably, the child node list information value of the current node in the 16-bit Trie is represented by a 16-bit sized LEAFSInfo.
Preferably, the node value in the 16-bit Trie is represented by nodeValue, which is from 0 to 15 for representing the value represented by the node.
Preferably, the child node pointer currently stored by a node in the 16-bit Trie and the array of data pointers are represented by leaf, where the child node pointer points to a child node corresponding to the current node, the data pointer points to data corresponding to the current node, when the child node does not exist and the data pointer does not exist, the number of leaf elements is 0, and when the child node exists, the number of leaf elements is +1 (where the data pointer occupies one element).
Preferably, whether the node is a terminal node is represented by an endKey, 0 is represented by a non-terminal node, and 1 is represented by a terminal node.
Preferably, the LEAFSInfoMap mapping table is a possible combination of node (non-root node) information in the dictionary data, and is represented by a two-dimensional array of 65536 × 17.
According to the storage management method for realizing space optimization through the 16-bit Trie tree, the dictionary data of the 16-bit Trie tree is constructed under the mapping table structure, so that the Trie tree is more compact in space, the complexity is basically unchanged, and the speeds of dictionary construction, indexing, modification and deletion are improved. Compared with the existing Trie tree, the algorithm provided by the invention can not only modify and traverse the dictionary at any time, but also sequence when the 16-bit Trie tree is constructed, thereby reducing the expense of updating the dictionary and maintaining higher index efficiency.
It is to be understood that both the foregoing general description and the following detailed description are explanatory and exemplary and are intended to provide further explanation of the invention as claimed.
Drawings
Fig. 1 is a flowchart of a dictionary storage management method for implementing space optimization by a 16-bit Trie in an embodiment of the present invention.
Fig. 2 is a forest structure diagram of dictionary data storage for implementing a 16-bit Trie in an embodiment of the present invention.
Fig. 3 is a schematic flow chart of dictionary data storage for implementing a 16-bit Trie in the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment of the invention provides a dictionary storage management method for realizing space optimization through a 16-bit Trie tree.
Fig. 1 is a flowchart of a dictionary storage management method for implementing space optimization by using a 16-bit Trie according to an embodiment of the present invention.
Step S110, complete dictionary data is acquired.
For example, the dictionary stores the entry (key) data as shown in table 1 below:
(Anhui) clearing away heat
Anhui province TSINGHUA University
China Qinghua garden
Chinese song Qinghuayuan (Chinese character of 'Qinghua')
Chinese dance music South mountain road
China folk song Nanjing
Chinese opera Nanjing people
TABLE 1
And step S120, constructing a 16-bit Trie tree to realize storage management of dictionary data.
As shown in fig. 2, according to the dictionary data in table 1, there are some common prefixes (i.e. the same parent nodes) between the words, and a forest tree can be formed according to the prefixes, and the nodes of each tree are described as follows:
the dotted circle represents the terminal node of the tree (i.e. the endKey is 1);
the solid line circle represents a non-terminal node of the tree (i.e., the endKey is 0);
the word formed from the root node of the tree to the current terminal node is a complete entry in the dictionary;
words formed from some non-terminal node at the root node of the tree are common prefixes of some entries in the dictionary.
Step S121, the construction of the above 16-bit Trie tree comprises the following steps:
step S122, a, acquiring a first byte of dictionary data information, searching a corresponding root node according to the value of the byte, and associating the root node with the root node;
step S123, b, on the basis of the step a, taking the node as a father node and taking the father node as an initial state;
step S124, c, taking the upper four bits of the next byte of the dictionary data information as the child node of the dictionary data information, and associating the child node with the parent node;
step S125, d, taking the high four bits as a father node, taking the low four bits of the byte as a child node, and associating the high four bits with the father node;
step S126, e, obtaining the next byte of the dictionary data again, repeating the step C and the step D until the last byte of the dictionary data is obtained, and executing the step D;
and step S127, f, obtaining the last byte of the dictionary data, and taking the lower four bits as the final state to complete the construction of the tree.
Therefore, in the constructed 16-bit Trie tree, the first byte of the dictionary data is used as a root node, and the subsequent bytes are respectively split into high four bits and low four bits to form a high four-bit node and a low four-bit node. Thus, if the value of a node (non-root node) is 0-15, the dictionary forest tree can be subjected to entry management through the LEAFSInfoMap mapping table and the LEAFSInfo of the current node. The node information is described in detail below.
Each node (non-root node) has 16 child nodes at most, and the information of the child nodes in the node is represented by a LEAFSInfo with the size of 16 bits, for example, the LEAFSInfo has the value of 0x0009 (the binary value is 000000000001001), which represents that the node has only the nodes with the values of 0 and 3 below.
The node indicates the value represented by the current node by nodeValue, and the size of the node is from 0 to 15.
In the nodes, an endKey is used for representing whether the current node is a terminal node, 0 is represented as a non-terminal node, and 1 is represented as a terminal node.
The leaf pointer array is used in the node to represent the child nodes and data pointers where the leaf pointer array exists. Compared with the construction of a classical trie, the leaf pointer array is constructed dynamically, the size of the leaf pointer array changes along with the change of the number of child nodes, and the maximum element is 17. The method fundamentally avoids the waste of space caused by the fixed quantity of the neutron node pointers in the construction of the classical trie tree.
Step S130, a 16-bit Trie tree node (non-root node) information LEAFSInfoMap mapping table is constructed. The table is a two-dimensional array of 65536 × 17 size, where 65536 numbers are determined for all cases of 16-bit LEAFSInfo (i.e., 16 th power of 2), and the inside numbers in different cases of LEAFSInfo are respectively the corresponding 16 child node information in that case (represented by LEAFSInfMap [ LEAFSInfo ] [0] -LEAFSInfMap [ LEAFSInfo ] [15 ]), and the last bit element LEAFSInfMap [ LEAFSInfo ] [16] in the case of the current LEAFSInfo is used to represent the number of child nodes in the case of the current LEAFSInfo.
For example: and if the value of the LEAFSInfoMap [ LEAFSInfo ] [0] is not zero (namely, the current node has a child node with a value of 0), the child node with the value of 0 of the current node is located at the position of the LEAFSInfoMap [ LEAFSInfo ] [0] -1], and otherwise, the child node is not located.
The pseudo code of the step is as follows:
Figure GDA0001706288870000041
Figure GDA0001706288870000051
it can be seen that the LEAFFInfoMap mapping table is all possible combinations of node (non-root node) information in the dictionary data, and the LEAFFInfoMap mapping table and the LEAFFInfo information of the node can be used for inquiring whether all child nodes under the node exist data information associated with the node. Therefore, if a child node is inserted or deleted under the node, only the LEAFSInfo information needs to be updated, and the child node pointer corresponding to the leaf in the corresponding node is added or deleted, so that the storage management of the 16-bit Trie tree is enhanced.
According to the above detailed description of the embodiments of the present invention, it can be clearly understood that the storage management method for implementing space optimization by a 16-bit Trie according to the present invention constructs dictionary data of the 16-bit Trie by using a mapping table structure, so that the Trie is more compact in space while ensuring that the complexity is substantially unchanged, and the speed of dictionary construction, indexing, modification and deletion is improved. Compared with the existing Trie tree, the algorithm provided by the invention can not only modify and traverse the dictionary at any time, but also sequence when the 16-bit Trie tree is constructed, thereby reducing the expense of updating the dictionary and maintaining higher index efficiency.

Claims (4)

1. A dictionary storage management method for realizing space optimization through a 16-bit Trie tree is characterized by comprising the following steps:
acquiring complete dictionary data;
constructing a 16-bit Trie tree to realize storage management of dictionary data;
constructing a 16-bit Trie tree node information LEAFSInfMap mapping table;
the node information is non-root node information, and comprises the following steps: LEAFSInfo, nodeValue, endKey, leafs;
the LEAFSInfo is child node list information;
the information value of the LEAFSInfo is 16 bits;
the nodeValue is a value represented by the current node;
the nodeValue has a value of 0 to 15;
the endKey is a terminal node judgment function;
the leafs are child node pointers and data pointers;
the child node pointer points to a child node corresponding to the current node;
the data pointer points to data corresponding to the current node;
the data information of all child nodes under the node can be inquired through the LEAFSInfoMap and the LEAFSInfo;
the LEAFSInfoMap mapping table is a possible combination of the node information in the dictionary data, and the LEAFSInfoMap is represented by a 65536 × 17 two-dimensional array.
2. The method of claim 1, wherein the constructing the 16-bit Trie comprises the following steps:
a. acquiring a first byte of dictionary data information, searching a corresponding root node according to the value of the byte, and associating the root node with the byte;
b. on the basis of the step a, taking the byte as a parent node and taking the byte as a starting state;
c. associating the upper four bits of the next byte of dictionary data information with the parent node as its child node;
d. taking the high four bits as a father node, taking the low four bits of the byte as a child node, and associating the high four bits with the father node;
e. acquiring the next byte of the dictionary data again, repeating the steps c and d until the last byte of the dictionary data is acquired, and executing the step d;
f. and obtaining the last byte of the dictionary data, and taking the lower four bits as a final state to complete the construction of the tree.
3. The method as claimed in claim 1, wherein the number of leaf elements is 0 when the child node and the data pointer are absent, and the number of leaf elements is +1 when the child node is present, and the data pointer occupies one element.
4. The method for managing dictionary storage based on 16-bit Trie implementation of space optimization according to claim 1, wherein when the endKey is 0, the endKey is represented as a non-terminal node, and when the endKey is 1, the endKey is represented as a terminal node.
CN201810046757.2A 2018-01-18 2018-01-18 Dictionary storage management method for realizing space optimization through 16-bit Trie tree Active CN108153907B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810046757.2A CN108153907B (en) 2018-01-18 2018-01-18 Dictionary storage management method for realizing space optimization through 16-bit Trie tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810046757.2A CN108153907B (en) 2018-01-18 2018-01-18 Dictionary storage management method for realizing space optimization through 16-bit Trie tree

Publications (2)

Publication Number Publication Date
CN108153907A CN108153907A (en) 2018-06-12
CN108153907B true CN108153907B (en) 2021-01-22

Family

ID=62461799

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810046757.2A Active CN108153907B (en) 2018-01-18 2018-01-18 Dictionary storage management method for realizing space optimization through 16-bit Trie tree

Country Status (1)

Country Link
CN (1) CN108153907B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109902033B (en) * 2019-02-13 2023-03-14 山东华芯半导体有限公司 LBA (logical Block addressing) distribution method and mapping method of namespace applied to NVMe SSD (network video management entity) controller
CN110489516B (en) * 2019-08-15 2022-03-18 厦门铅笔头信息科技有限公司 Method for quickly establishing prefix index for massive structured data
CN113329031B (en) * 2019-10-10 2023-06-13 深圳前海微众银行股份有限公司 Method and device for generating state tree of block

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101499094A (en) * 2009-03-10 2009-08-05 焦点科技股份有限公司 Data compression storing and retrieving method and system
CN101788990A (en) * 2009-01-23 2010-07-28 北京金远见电脑技术有限公司 Global optimization and construction method and system of TRIE double-array
CN103365991A (en) * 2013-07-03 2013-10-23 深圳市华傲数据技术有限公司 Method for realizing dictionary memory management of Trie tree based on one-dimensional linear space
CN103365992A (en) * 2013-07-03 2013-10-23 深圳市华傲数据技术有限公司 Method for realizing dictionary search of Trie tree based on one-dimensional linear space
EP3145134A1 (en) * 2014-06-10 2017-03-22 Huawei Technologies Co., Ltd. Lookup device, lookup configuration method and lookup method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101788990A (en) * 2009-01-23 2010-07-28 北京金远见电脑技术有限公司 Global optimization and construction method and system of TRIE double-array
CN101499094A (en) * 2009-03-10 2009-08-05 焦点科技股份有限公司 Data compression storing and retrieving method and system
CN103365991A (en) * 2013-07-03 2013-10-23 深圳市华傲数据技术有限公司 Method for realizing dictionary memory management of Trie tree based on one-dimensional linear space
CN103365992A (en) * 2013-07-03 2013-10-23 深圳市华傲数据技术有限公司 Method for realizing dictionary search of Trie tree based on one-dimensional linear space
EP3145134A1 (en) * 2014-06-10 2017-03-22 Huawei Technologies Co., Ltd. Lookup device, lookup configuration method and lookup method

Also Published As

Publication number Publication date
CN108153907A (en) 2018-06-12

Similar Documents

Publication Publication Date Title
CN108197313B (en) Dictionary indexing method for realizing space optimization through 16-bit Trie tree
CN108153907B (en) Dictionary storage management method for realizing space optimization through 16-bit Trie tree
CN103365991B (en) A kind of dictionaries store management method realizing Trie tree based on one-dimensional linear space
CN100444167C (en) Method for managing and searching dictionary with perfect even numbers group TRIE Tree
CN109471905B (en) Block chain indexing method supporting time range and attribute range compound query
CN101576929B (en) Fast vocabulary entry prompting realization method
KR100834760B1 (en) Structure of index, apparatus and method for optimized index searching
CN102663058A (en) URL duplication removing method in distributed network crawler system
CN101673307A (en) Space data index method and system
CN103488704A (en) Method and device for storing data
CN102867049B (en) Chinese PINYIN quick word segmentation method based on word search tree
US9065469B2 (en) Compression match enumeration
Almaslukh et al. Evaluating spatial-keyword queries on streaming data
CN101158955A (en) Construct method of Chinese word stock
Flor A fast and flexible architecture for very large word n-gram datasets
CN108984626B (en) Data processing method and device and server
KR100999408B1 (en) Method for searching an ??? using hash tree
CN110995876B (en) Method and device for storing and searching IP
KR101089722B1 (en) Method and apparatus for prefix tree based indexing, and recording medium thereof
CN104809170A (en) Storage method for tree type data under cloud environment
CN110134834B (en) Method for accelerating IP positioning by using dynamic AVL forest cache
CN109885840A (en) The dictionary sort method of space optimization is realized by 16 Trie trees
WO2001025962A1 (en) Database organization for increasing performance by splitting tables
CN112463837B (en) Relational database data storage query method
Jia et al. Collision Analysis and an Efficient Double Array Construction Method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 310018, No. 258, source street, Xiasha Higher Education Park, Hangzhou, Zhejiang

Patentee after: China Jiliang University

Patentee after: Hangzhou code pigeon Intelligent Technology Co.,Ltd.

Address before: 310018, No. 258, source street, Xiasha Higher Education Park, Hangzhou, Zhejiang

Patentee before: China Jiliang University

Patentee before: HANGZHOU DAIMAGE INTELLIGENT TECHNOLOGY Co.,Ltd.