CN112784117B - Advanced radix tree construction method and construction system for mass data - Google Patents

Advanced radix tree construction method and construction system for mass data Download PDF

Info

Publication number
CN112784117B
CN112784117B CN202110012488.XA CN202110012488A CN112784117B CN 112784117 B CN112784117 B CN 112784117B CN 202110012488 A CN202110012488 A CN 202110012488A CN 112784117 B CN112784117 B CN 112784117B
Authority
CN
China
Prior art keywords
node
nodes
data
sub
radix tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110012488.XA
Other languages
Chinese (zh)
Other versions
CN112784117A (en
Inventor
蒋文保
祁亚楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Information Science and Technology University
Original Assignee
Beijing Information Science and Technology University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Information Science and Technology University filed Critical Beijing Information Science and Technology University
Priority to CN202110012488.XA priority Critical patent/CN112784117B/en
Publication of CN112784117A publication Critical patent/CN112784117A/en
Application granted granted Critical
Publication of CN112784117B publication Critical patent/CN112784117B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides a high-level radix tree construction method and a construction system for mass data, which solve the technical problem that the processing efficiency of the radix tree is reduced due to the fact that the existing mass data are continuously stored. The method comprises the following steps: establishing a base tree data structure, and storing data contents in nodes of the base tree data structure; and compressing or branching leaf nodes connected with the sub-nodes according to the weight of the sub-nodes or/and the change of the local tree form of the sub-nodes in the data storage process. The topological structure among partial nodes at the tail end of the branch is changed in a compression or branch changing mode, the branch depth of complete data is reduced as much as possible, the advantage characteristic of low node depth of the data structure of the advanced radix tree is formed, the tree topology is effectively balanced, the level depth of the complete tree node is further reduced, and the massive data retrieval efficiency is greatly improved.

Description

Advanced radix tree construction method and construction system for mass data
Technical Field
The invention relates to the technical field of data storage, in particular to a high-level radix tree construction method and a construction system for mass data.
Background
In the prior art, the storage of mass data requires the optimization of a data structure in consideration of retrieval efficiency and storage efficiency. The storage of massive data of the type of character string (which can be formed in a conversion manner) is generally considered to form a Radix Tree (Radix Tree) data structure. As a tree structure, a radix tree is a nonlinear data structure having a hierarchical structure defined in a branching relationship. As shown in the radix tree storage structure of FIG. 1, each character string key can be split into 1 or more character stacks by utilizing the radix tree data structure, one character string key is used as a branch, node splitting is not needed according to the length of each character, and the node where the branch is located corresponds to the current node from the root node, so that the (content) value of the character string key is spliced. The specific flow for constructing the radio Tree structure is as follows:
1) Counting the coincident characters of one or more coincident characters from the first character by all character string keys to be stored;
2) Splitting each character string key according to the coincident character and the non-coincident character with other character strings according to the character sequence;
3) According to the character sequence, one or more characters (i.e. overlapping characters) which are overlapped in the character string keys are stored in a node, the remaining characters of each character string key are continuously stored in child nodes of the node, if the overlapping characters are still in the remaining characters of each character string key, the overlapping characters are continuously stored in one child node until no overlapping characters exist, and the last remaining characters are stored in each leaf node.
And finally, splicing the characters stored in each branch from the first node to the leaf node in sequence, namely the (content) value of the character string key.
The Radix Tree (Radix Tree) data storage structure can better solve the storage and retrieval requirements of mass data with local similarity. However, due to the character string length and character string characteristics, the Radix Tree (Radix Tree) data storage structure is more prone to developing local branches, the overall branch structure is unbalanced, and the retrieval efficiency is reduced along with the increase of the data volume.
Disclosure of Invention
In view of the above problems, the embodiments of the present invention provide a method and a system for constructing an advanced radix tree for massive data, which solve the technical problem that the processing efficiency of the radix tree is reduced due to the continuous storage of the existing massive data.
The high-level radix tree construction method for mass data comprises the following steps:
establishing a base tree data structure, and storing data contents in nodes of the base tree data structure;
and compressing or branching leaf nodes connected with the sub-nodes according to the weight of the sub-nodes or/and the change of the local tree form of the sub-nodes in the data storage process.
The high-level radix tree construction system for mass data of the embodiment of the invention comprises:
the storage is used for storing the program codes of the processing procedure of the high-level radix tree construction method for mass data;
and a processor for executing the program code.
The high-level radix tree construction system for mass data of the embodiment of the invention comprises:
basic storage means for creating a radix tree data structure, storing data contents in nodes of the radix tree data structure;
and the advanced regulating device is used for compressing or grafting the leaf nodes connected with the sub-nodes according to the weight of the sub-nodes or/and the change of the local tree form of the sub-nodes in the data storage process.
According to the high-level radix tree construction method and the high-level radix tree construction system for mass data, the topological structure among the local nodes at the tail ends of the branches is changed in a compression or branching mode, the branch depth of the whole data is reduced as much as possible on the premise that the data content is stored in a lossless mode, a certain probability is increased for the number of sub-nodes in the same level, the advantage characteristic of low node depth of the high-level radix tree data structure is formed, the tree topology shape is effectively balanced, the level depth of the whole tree node is further reduced, and the mass data retrieval efficiency is greatly improved.
Drawings
FIG. 1 is a schematic diagram of a radix tree data structure for data content storage.
Fig. 2 is a flow chart of a high-level radix tree construction method for mass data according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of a topological structure of leaf node compression in an advanced radix tree construction method for mass data according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of a topological structure of leaf node branching in an advanced radix tree construction method for mass data according to an embodiment of the present invention.
Fig. 5 is a flow chart illustrating data storage in a high-level radix tree construction method for mass data according to an embodiment of the present invention.
Fig. 6 is a schematic diagram of the topology of the radix tree storage structure and the advanced radix tree storage structure during data content storage.
Fig. 7 is a schematic flow chart of data deletion in the high-level radix tree construction method for mass data according to an embodiment of the present invention.
Fig. 8 is a schematic diagram of the topology of the radix tree storage structure and the advanced radix tree storage structure during data content deletion.
Fig. 9 is a schematic flow chart of data updating in the high-level radix tree construction method for mass data according to an embodiment of the present invention.
Fig. 10 is a schematic diagram showing the topology of the radix tree storage structure and the advanced radix tree storage structure during the data content update process.
FIG. 11 is a schematic diagram of a data storage structure of ID data of a trust anchor system established using an advanced radix tree construction method for mass data according to an embodiment of the present invention.
FIG. 12 is a schematic diagram of an architecture of a high-level radix tree construction system for mass data according to an embodiment of the present invention.
Detailed Description
The present invention will be further described with reference to the drawings and the detailed description below, in order to make the objects, technical solutions and advantages of the present invention more apparent. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
An embodiment of the present invention is shown in fig. 2 for a high-level radix tree construction method for massive data. In fig. 2, this embodiment includes:
step 100: a radix tree data structure is established and data content is stored in nodes of the radix tree data structure.
According to the technical scheme, nodes for storing data contents are formed based on the radix tree data structure. Those skilled in the art will appreciate that a branch (i.e., a branch) may form a complete content store of data by forming branches in the radix tree data structure from the starting root node, hierarchical child nodes, and terminal leaf nodes, with the lowest level child node in the hierarchical child nodes being the terminal child node. Depending on the specific application field characteristics, the data includes, but is not limited to, index data or key data, and the data content includes, but is not limited to, numeric type data or character type data.
Step 200: and compressing or branching leaf nodes connected with the sub-nodes according to the weight of the sub-nodes or/and the change of the local tree form of the sub-nodes in the data storage process.
Those skilled in the art will appreciate that the importance of the end sub-nodes in the local topology and the local tree structure in which the end sub-nodes are located will change as the data increases or decreases during the data storage process. Such changes can be quantified as changes in end-point node weights and changes in the local tree shape.
The compression of the leaf nodes is to combine the data content in the leaf nodes into the corresponding end sub nodes in a lossless manner, the deliberation of the leaf nodes is to combine each leaf node with the independent copy of the corresponding end sub nodes respectively, and combine the data content in the leaf nodes into the corresponding independent copy in a lossless manner.
According to the high-level radix tree construction method for mass data, the topological structure among partial nodes at the tail ends of branches is changed in a compression or variable branch mode, the branch depth of complete data is reduced as much as possible on the premise that the lossless storage of data content is guaranteed, the number of sub-nodes in the same level is increased to a certain extent, the high-level radix tree data structure is formed to have the advantage of low node depth, tree topology shapes are effectively balanced, the level depth of the complete tree nodes is further reduced, and the mass data retrieval efficiency is greatly improved.
The topological structure of leaf node compression in the high-level radix tree construction method for mass data according to an embodiment of the invention is shown in fig. 3. Referring to fig. 2, in one embodiment of the present invention, compressing leaf nodes includes:
step 210: when it is determined in the radix tree data structure that only one leaf node is maintained at the end child node of the branch, the leaf node is incorporated into the end child node.
One skilled in the art will appreciate that a branch may include a number of sub-nodes connected in series, the sub-nodes forming a hierarchy, the lowest sub-node of the connected leaf nodes being the end sub-node. The node type characteristics of the adjacent nodes have relativity, and leaf nodes or sub-nodes of the next level in which the current sub-node exists disappear after data deletion processing, so that the current sub-node is converted into the leaf node, and the sub-node of the previous level of the current sub-node is converted into the terminal sub-node.
The leaf node compression process in the above embodiment may be completed by an optimization process for the radix tree data structure when the data storage forms a steady state.
The advanced radix tree construction method for mass data effectively reduces the tree branch depth by compressing and simplifying the storage structure topology of the data content, so that the tree structure tends to be balanced. The format and the integrity of the data content are adapted while the retrieval efficiency is improved.
As shown in FIG. 2, in one embodiment of the present invention, the specific process of incorporating leaf nodes into terminal sub-nodes includes:
step 211: setting a connection symbol; the connection symbol is set according to the character string format, and can be a connection symbol or symbol sequence, a character or character sequence or a connection virtual identifier.
Step 212: copying the data content of the leaf node in the terminal node, and attaching the data content to the tail of the original data content of the terminal node according to the connection symbol;
step 213: leaf nodes are deleted.
The topology structure of leaf node branching in the high-level radix tree construction method for mass data according to an embodiment of the present invention is shown in fig. 4. Referring to fig. 2, in one embodiment of the present invention, the method includes:
step 220: when the number of leaf nodes of the determined terminal node is not unique, an alternative leaf node is incorporated into the determined terminal node copy.
And determining the terminal nodes as explicitly optimized nodes in the terminal nodes. One specific embodiment of explicit optimization is to perform the deliberation of the remaining leaf nodes when the number of leaf nodes remains still not unique after the occurrence of a leaf node deletion. And when the number of the leaf nodes is unique after the leaf nodes are deleted, compressing the remaining leaf nodes.
Forming the copy of the determined end sub-nodes includes copying the storage structure of the determined end sub-nodes, the data content of the end sub-nodes, the topology with the upper level sub-nodes, etc. The original end child node also serves as one of the copies. And determining that the number of end sub-node copies (containing the original end sub-nodes) is less than or equal to the number of leaf nodes.
In one embodiment of the present invention, leaf nodes are correspondingly incorporated into a certain terminal sub-node such that the leaf nodes formed by the terminal sub-nodes are determined, and the default setting is not compressed or branched.
In one embodiment of the present invention, incorporating the leaf node alternatives into the determined terminal node copy includes incorporating all leaf nodes of the determined terminal node one by one with the corresponding determined terminal node copy.
In one embodiment of the present invention, incorporating leaf node alternatives into the determined end sub-node copy includes incorporating portions of leaf nodes of the determined end sub-node one by one with corresponding determined end sub-node copies.
The leaf node branching process in the above embodiment may be completed by an optimization process for the radix tree data structure when the data storage forms a steady state.
The high-level radix tree construction method for mass data effectively reduces the tree branch depth by changing the local tree topology structure through the variable branching, so that the tree structure tends to be balanced. The format and the integrity of the data content are adapted while the retrieval efficiency is improved.
An embodiment of the present invention is used for data storage in the high-level radix tree construction method of mass data, as shown in fig. 5. In fig. 5, the incremental data storage process includes:
step 231: and determining the branch for storing the inserted data according to the existing radix tree topology structure.
Those skilled in the art will appreciate that the newly added inserted data is stored step by step along the branches of the radix tree topology according to the degree of matching between the root node and each level of sub-nodes. The root node is the starting point of all branches, and when the data content is stored along the corresponding sequential sub-nodes in the topological structure of the radix tree, the constituent nodes of the branches such as leaf nodes, terminal sub-nodes, sub-terminal sub-nodes with the hierarchy higher than and adjacent to the terminal sub-nodes can be determined.
Step 232: when judging that the unique sub-node with matched data content coincidence degree exists in the direct sub-node of the root node.
The data content coincidence matching means that the data content in the direct sub-node is coincident with the content of the current data content of the inserted data, which accords with a threshold value. The current data content is the remaining data content after being stored via the root node.
Step 233: and judging that the number of sub-nodes below the root node is less than or equal to the branch branching degree of the inserted data.
One way of quantifying branch branching degree may be to quantify the number of constituent nodes of a branch.
Step 234: and determining the weight of the unique sub-node < the weight of the inserted node.
The weights include, but are not limited to, set weights for data content or set weights for nodes, etc.
Step 235: the root node is set up to add a direct sub-node to store the current data content.
Step 236: otherwise, setting leaf nodes of the unique sub-nodes, and storing data content which is not matched with the unique sub-nodes.
The high-level radix tree construction method for mass data in the embodiment of the invention forms the improvement of the balance of the local tree topology structure by measuring the change of local branches and the sub-node weight in the process of storing the inserted data and reducing the branch depth of the newly added inserted data, and the increment forms the high-level radix tree storage structure to improve the retrieval efficiency.
The topology differences of the radix tree storage structure and the advanced radix tree storage structure in the data content storage process are shown in fig. 6. In fig. 6, hierarchical differentiation of sub-nodes storing data content in different tree structures is represented.
In the method for constructing an advanced radix tree for mass data according to an embodiment of the present invention, as shown in fig. 7, an existing data deletion process includes:
step 241: determining a current node to be deleted and a corresponding father node;
typically, the node to be deleted currently is a leaf node or a child node of a gradually formed leaf node.
Step 242: judging that when the node to be deleted is a leaf node and the father node only has a unique leaf node after deletion;
step 243: the unique leaf nodes are merged into the parent node.
The high-level radix tree construction method for mass data improves the depth of branch topological structures by performing node compression on the type judgment of the father node and the son node, improves the balance of the local tree topological structure, reduces the quantity to form a high-level radix tree storage structure, and improves the retrieval efficiency.
The topology differences of the radix tree storage structure and the advanced radix tree storage structure during existing data deletion are shown in fig. 8. In fig. 8, hierarchical differentiation of node changes in different tree structures is represented.
Topology optimization in the advanced radix tree construction method for massive data according to an embodiment of the present invention is shown in fig. 9. In fig. 9, the topology optimization process includes:
step 251: and determining nodes to be updated.
Step 252: and when judging that the sub-nodes under the nodes to be updated have associated sub-nodes only connected with the leaf nodes.
Step 253: and judging that the number of sub-nodes below the node to be updated is less than or equal to the branch branching degree below the node to be updated.
Step 254: and judging that the weight of the associated sub-node is less than the weight of the node to be updated.
Step 255: and (5) the leaf nodes below the associated sub-nodes are branched.
Step 256: otherwise, maintaining the leaf node topology structure under the associated sub-node.
The leaf node branching process in the above embodiment may be completed by an optimization process for the radix tree data structure when the data storage forms a steady state.
The high-level radix tree construction method for mass data improves the local branch topological structure and the branch depth through active branch variation, improves the balance of the local tree topological structure, actively forms a high-level radix tree storage structure, and improves the retrieval efficiency.
The topology differences of the radix tree storage structure and the advanced radix tree storage structure during the data content update process are shown in fig. 10. In fig. 10, hierarchical differentiation of data storage content and nodes is represented.
As shown in fig. 2, in an embodiment of the present invention, a radix tree data structure is established, and a specific process of storing data contents in nodes of the radix tree data structure includes:
step 10: the position of the point is determined according to the repeatability of the local format in the data sequence, and the data point is divided into continuous data fragments.
The repeatability of the local format includes, but is not limited to, format characters in the data, format strings, connection virtual identifiers (e.g., identifiers such as pointers to stacks or stacks in a memory state), etc. The data sequence includes a number of data, with incremental or decremental changes in the data, and the data points divided into sequential data segments.
Step 20: nodes of the radix tree and data content of the nodes are determined according to the repeatability of the data fragments in the data sequence.
The degree of repetition of the data segment is related to the weight of the node at which the data segment is located. Nodes of the radix tree are formed while determining data content of the nodes.
Step 30: and forming the branch structure of the radix tree by using the nodes according to the sequence of repeated data fragments in the data fragments.
The order in which the data segments are repeated is related to the topology in which the nodes form branches. The repeated data segments form nodes, the sequence nodes of the corresponding branches of the data are formed through the sequence of the nodes, and the sequence node at the bottom layer is the leaf node with the minimum repetition degree.
The high-level radix tree construction method for mass data forms nodes and node topological structures of radix trees aiming at formatted data sets, and forms complete branches of each data through the nodes. The building features of the radix tree are fully utilized to accommodate formatting the data set.
The data storage structure of DNS data of the trust anchor system established using the advanced radix tree construction method for mass data of the above embodiment is shown in fig. 11. In fig. 11, the high-level radix tree construction method for mass data using the above embodiment includes:
performing point forming on continuous data fragments of the DNS data according to the DNS data format;
establishing a radix tree data structure, and storing continuous data fragments in nodes of the radix tree data structure to form DNS data branches; nodes in the DNS data tree branches form a hierarchy comprising:
the root node is of a first layer structure, the storage content is mainly national information, the storage content in the second layer is mainly city information, the storage content in the third layer is mainly organization information, the storage content in the fourth layer is mainly user name, and the storage content in the fifth layer is mainly public key information corresponding to the user;
and carrying out topological structure optimization on the radix tree data structure through an optimization process to form an advanced radix tree data storage structure, so that the tree branch depth of the DNS data is reduced.
An embodiment of the present invention provides a high-level radix tree construction system for mass data, including:
a memory for storing program codes of the processing procedure of the high-level radix tree construction method for mass data of the above embodiment;
and the processor is used for executing the program code of the processing procedure of the high-level radix tree construction method for mass data in the embodiment.
The processor may employ a DSP (Digital Signal Processing) digital signal processor, an FPGA (Field-Programmable Gate Array) Field programmable gate array, a MCU (Microcontroller Unit) system board, a SoC (system on a chip) system board, or an PLC (Programmable Logic Controller) minimum system including I/O.
An advanced radix tree construction system for massive data according to an embodiment of the present invention is shown in fig. 12. In fig. 12, the present embodiment includes:
basic storage means 10 for creating a radix tree data structure in which data contents are stored in nodes;
and the high-level adjusting device 20 is used for compressing or grafting the leaf nodes connected with the terminal nodes according to the terminal node weights of the terminal nodes or/and the local tree shape change of the terminal nodes in the data storage process.
As shown in fig. 12, in an embodiment of the present invention, the basic storage device 10 includes:
a format data point dividing module 01, configured to determine a point dividing position according to a repetition degree of a local format in a data sequence, and divide a data point into continuous data segments;
a node forming module 02, configured to determine nodes of the radix tree and data contents of the nodes according to the repetition degree of the data segments in the data sequence;
the branch forming module 03 is configured to form a branch structure of the radix tree by using nodes according to the sequence of the repeated data segments in the data segments.
As shown in fig. 12, in an embodiment of the present invention, the high-level adjustment device 20 includes:
an edge node compression module 21, configured to incorporate leaf nodes into terminal sub-nodes when it is determined in the radix tree data structure that only one leaf node is retained by the terminal sub-nodes of the branches;
the edge node optimizing module 22 is configured to incorporate the leaf node alternative into the determined end node copy when the number of leaf nodes of the determined end node is not the same.
As shown in fig. 12, in an embodiment of the present invention, the edge node compression module 21 includes:
a preset unit b11 for setting a connection symbol;
a link unit b12, configured to copy the data content of the leaf node in the terminal node, and append the tail of the original data content of the terminal node according to the connection symbol;
and the clearing unit b13 is used for deleting leaf nodes.
As shown in fig. 12, in an embodiment of the present invention, the advanced adjusting apparatus 20 further includes:
the basic storage module B31 is used for determining the storage branches of the inserted data according to the existing radix tree topology structure;
a first matching module B32, configured to, when it is determined that there is a unique sub-node with matched data content overlapping ratio in the direct sub-node of the root node;
the first shaping module B33 is used for judging that the number of sub-nodes below the root node is less than or equal to the branch branching degree of the inserted data;
the first weighting module B34 is configured to determine that the weight of the unique sub-node is less than the weight of the inserted node;
a first optimizing module B35, configured to set the root node to add a direct sub-node to store the current data content;
the first configuration module B36 is configured to set a leaf node of the unique sub-node, and store data content that does not match the unique sub-node.
As shown in fig. 12, in an embodiment of the present invention, the advanced adjusting apparatus 20 further includes:
the hierarchy determining module B41 is configured to determine a node to be deleted currently and a corresponding parent node;
the second matching module B42 is configured to determine that when the node to be deleted is a leaf node, and the deleted parent node has only a unique leaf node;
a second optimization module B43, configured to incorporate the unique leaf node into the parent node.
As shown in fig. 12, in an embodiment of the present invention, the advanced adjusting apparatus 20 further includes:
the basic positioning module B51 is used for determining nodes to be updated;
the third matching module B52 is configured to, when it is determined that there are associated sub-nodes that are only connected to leaf nodes in sub-nodes under the node to be updated;
the third shaping module B53 is used for judging that the number of sub-nodes below the node to be updated is less than or equal to the branch branching degree below the node to be updated;
the third balancing module B54 is configured to determine that the weight of the associated child node is less than the weight of the node to be updated;
a third optimization module B55, configured to prune leaf nodes under the associated child nodes;
a third configuration module B56 is configured to maintain the leaf node topology under the associated child node otherwise.
The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims (9)

1. A method of high-level radix tree construction for mass data, comprising:
establishing a radix tree data structure, and storing data content in nodes of the radix tree data structure, wherein the data type of the data content is a character string;
compressing or branching leaf nodes connected with the sub-nodes according to the weight of the sub-nodes or/and the change of the local tree form of the sub-nodes in the data storage process; the leaf node compression or the branch change of the sub-node connection comprises a topological structure optimization process:
determining nodes to be updated;
when judging that the sub-nodes under the node to be updated have the associated sub-nodes only connected with the leaf nodes,
and when the number of sub-nodes below the node to be updated is less than or equal to the branch branching degree below the node to be updated,
and when judging that the weight of the associated sub-node is less than the weight of the node to be updated,
and (3) the leaf nodes under the associated sub-nodes are branched, otherwise, the leaf node topological structure under the associated sub-nodes is maintained.
2. The high-level radix tree construction method for mass data of claim 1, wherein said leaf node compression comprises:
when it is determined in the radix tree data structure that only one leaf node is maintained at the end child node of the branch, the leaf node is incorporated into the end child node.
3. The high-level radix tree construction method for mass data of claim 2, wherein said incorporating leaf nodes into terminal sub-nodes comprises:
setting a connection symbol;
copying the data content of the leaf node in the terminal node, and attaching the data content to the tail of the original data content of the terminal node according to the connection symbol;
leaf nodes are deleted.
4. The high-level radix tree construction method for mass data of claim 1, wherein said pair of leaf node variational comprises:
when the number of leaf nodes of the determined terminal node is not unique, an alternative leaf node is incorporated into the determined terminal node copy.
5. The high-level radix tree construction method for mass data as claimed in claim 1, wherein the leaf node compression or the variable branching of the sub-node connection comprises an incremental data storage process:
determining the storage branches of the inserted data according to the existing radix tree topology structure;
when judging that the unique sub-node with matched data content coincidence degree exists in the direct sub-node of the root node,
when the number of sub-nodes under the root node is less than or equal to the branch branching degree of the inserted data,
and the weight of the unique sub-node is determined < the weight of the inserted node,
setting a root node, adding a direct sub-node to store the current data content, otherwise setting a leaf node of a unique sub-node, and storing the data content which is not matched with the unique sub-node.
6. The high-level radix tree construction method for mass data as claimed in claim 1, wherein the leaf node compression or the variable branching of the sub-node connection comprises an existing data deletion process:
determining a current node to be deleted and a corresponding father node;
judging that when the node to be deleted is a leaf node and the father node only has a unique leaf node after deletion;
the unique leaf nodes are merged into the parent node.
7. The high-level radix tree construction method for massive data according to claim 1, wherein the creating the radix tree data structure, storing the data content in the nodes of the radix tree data structure, comprises:
determining the point division positions according to the repeatability of the local format in the data sequence, and dividing the data points into continuous data fragments;
determining nodes of the radix tree and data contents of the nodes according to the repeatability of the data fragments in the data sequence;
and forming the branch structure of the radix tree by utilizing the nodes according to the sequence of repeated data fragments in the data fragments.
8. A high-level radix tree construction system for mass data, comprising:
a memory for storing program code for the process of the high-level radix tree construction method for mass data as claimed in any one of claims 1 to 7;
and a processor for executing the program code.
9. A high-level radix tree construction system for mass data, comprising:
the basic storage device is used for establishing a radix tree data structure, storing data contents in nodes of the radix tree data structure, wherein the data types of the data contents are character strings;
the advanced regulating device is used for compressing or grafting leaf nodes connected with the sub-nodes according to the weight of the sub-nodes or/and the change of the local tree form of the sub-nodes in the data storage process; the high-level adjustment device further includes:
the basic positioning module is used for determining nodes to be updated;
the third matching module is used for judging that the sub-nodes under the nodes to be updated have associated sub-nodes which are only connected with the leaf nodes;
the third configuration module is used for judging that the number of sub-nodes below the node to be updated is less than or equal to the branch branching degree below the node to be updated;
the third weighing module is used for judging that the weight of the associated sub-node is less than the weight of the node to be updated;
the third optimization module is used for carrying out variable branching on leaf nodes below the associated sub-nodes;
and the third configuration module is used for maintaining the leaf node topology structure under the associated sub-nodes if not.
CN202110012488.XA 2021-01-06 2021-01-06 Advanced radix tree construction method and construction system for mass data Active CN112784117B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110012488.XA CN112784117B (en) 2021-01-06 2021-01-06 Advanced radix tree construction method and construction system for mass data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110012488.XA CN112784117B (en) 2021-01-06 2021-01-06 Advanced radix tree construction method and construction system for mass data

Publications (2)

Publication Number Publication Date
CN112784117A CN112784117A (en) 2021-05-11
CN112784117B true CN112784117B (en) 2023-06-02

Family

ID=75755685

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110012488.XA Active CN112784117B (en) 2021-01-06 2021-01-06 Advanced radix tree construction method and construction system for mass data

Country Status (1)

Country Link
CN (1) CN112784117B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105119830A (en) * 2015-09-14 2015-12-02 珠海赞同科技有限公司 Load balancing software defined networking routing aggregation method based on packing optimization
CN106716412A (en) * 2014-09-25 2017-05-24 甲骨文国际公司 System and method for supporting zero-copy binary radix tree in a distributed computing environment
WO2019098871A1 (en) * 2017-11-20 2019-05-23 Huawei Technologies Co., Ltd. Data storage system and method of providing a data storage system
CN111373389A (en) * 2017-11-20 2020-07-03 华为技术有限公司 Data storage system and method for providing a data storage system
CN111400306A (en) * 2020-02-20 2020-07-10 上海交通大学 RDMA (remote direct memory Access) -and non-volatile memory-based radix tree access system
CN112000847A (en) * 2020-08-19 2020-11-27 东北大学 GPU parallel-based adaptive radix tree dynamic indexing method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9697220B2 (en) * 2013-12-13 2017-07-04 Oracle International Corporation System and method for supporting elastic data metadata compression in a distributed data grid
US10917338B2 (en) * 2018-03-06 2021-02-09 Clarkson University System and method for building a hierarchical data structure

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106716412A (en) * 2014-09-25 2017-05-24 甲骨文国际公司 System and method for supporting zero-copy binary radix tree in a distributed computing environment
CN105119830A (en) * 2015-09-14 2015-12-02 珠海赞同科技有限公司 Load balancing software defined networking routing aggregation method based on packing optimization
WO2019098871A1 (en) * 2017-11-20 2019-05-23 Huawei Technologies Co., Ltd. Data storage system and method of providing a data storage system
CN111316255A (en) * 2017-11-20 2020-06-19 华为技术有限公司 Data storage system and method for providing a data storage system
CN111373389A (en) * 2017-11-20 2020-07-03 华为技术有限公司 Data storage system and method for providing a data storage system
CN111400306A (en) * 2020-02-20 2020-07-10 上海交通大学 RDMA (remote direct memory Access) -and non-volatile memory-based radix tree access system
CN112000847A (en) * 2020-08-19 2020-11-27 东北大学 GPU parallel-based adaptive radix tree dynamic indexing method

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Composite Radix Tree-A Storage Method for Efficient Retrieval of Massive Data;Yanan Qi等;《2021 China Automation Congress (CAC)》;第5205-5210页 *
NDN网络中动态名字查找方法的研究;刘渊;《中国优秀硕士学位论文全文数据库(信息科技辑)》(第2019年第03期期);I139-5 *
The adaptive radix tree: ARTful indexing for main-memory databases;Viktor Leis等;《2013 IEEE 29th International Conference on Data Engineering (ICDE)》;第38-48页 *
刘渊.NDN网络中动态名字查找方法的研究.《中国优秀硕士学位论文全文数据库(信息科技辑)》.2019,(第2019年第03期期),I139-5. *
面向非易失内存的数据一致性研究综述;肖仁智等;《计算机研究与发展》;第57卷(第01期);第85-101页 *

Also Published As

Publication number Publication date
CN112784117A (en) 2021-05-11

Similar Documents

Publication Publication Date Title
US5566332A (en) Method and combination for minimizing data conversions when data is transferred between a first database storing data in a first format and a second database storing data in a second format
US5278978A (en) Method and system for describing and exchanging data between heterogeneous database systems with data converted by the receiving database system
EP2895968B1 (en) Optimal data representation and auxiliary structures for in-memory database query processing
CN102411580B (en) The search method of XML document and device
CN110599169B (en) Data processing method, device, terminal and medium
CN103514201A (en) Method and device for querying data in non-relational database
CN103198149A (en) Method and system for query error correction
WO2021068346A1 (en) Method and device for location querying based on geohash algorithm, computer device, and storage medium
CN101286935A (en) Route searching method based on IP address scope
CN105468719A (en) Query error correction method and device, and computation equipment
US20130307710A1 (en) Compression match enumeration
CN102567419B (en) Mass data storage device and method based on tree structure
CN110071871A (en) A kind of large model pool ip address matching process
CN114372177A (en) Excel table data matching method
CN112102840A (en) Semantic recognition method, device, terminal and storage medium
CN112784117B (en) Advanced radix tree construction method and construction system for mass data
CN104077272B (en) A kind of method and apparatus of dictionary compression
CN108153907A (en) The memory management method of space optimization is realized by 16 Trie trees
CN116301656A (en) Data storage method, system and equipment based on log structure merging tree
CN109361399A (en) A kind of method, apparatus, equipment and storage medium obtaining byte sequence
CN110851448B (en) Local data storage optimization method based on block chain intelligent contract
CN110474776B (en) Credible block data dynamic deleting method
CN112765099A (en) Data file processing method and processing device
CN115221360A (en) Tree structure configuration method and system
CN110807092A (en) Data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant