CN112769896B - Distributed node optimization method and system, electronic equipment and storage medium - Google Patents
Distributed node optimization method and system, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN112769896B CN112769896B CN202011518575.4A CN202011518575A CN112769896B CN 112769896 B CN112769896 B CN 112769896B CN 202011518575 A CN202011518575 A CN 202011518575A CN 112769896 B CN112769896 B CN 112769896B
- Authority
- CN
- China
- Prior art keywords
- distributed node
- data points
- dimension
- distributed
- multiple dimensions
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
Abstract
The invention provides a distributed node optimization method and system, electronic equipment and storage medium, comprising the following steps: obtaining the distribution situation of the data points in multiple dimensions based on the variances of the data points of the distributed nodes in multiple dimensions; sorting the plurality of dimensions according to the variance based on the distribution of the data points in the plurality of dimensions; and S3, sequentially segmenting the data points in the corresponding dimension according to the order of the variance. According to the invention, through the segmentation dimension selection optimization, the time consumption caused by frequently updating the nodes is reduced, the searching efficiency is improved, the high-performance matching of the nf-hipac method is fully utilized, and the running smoothness of the current firewall system can be ensured.
Description
Technical Field
The present invention relates to the field of information optimization technologies, and in particular, to a distributed node optimization method and system, an electronic device, and a storage medium.
Background
Each node of the k-d tree is a binary tree of k-dimensional value points, each node on the tree represents a hyperplane which is perpendicular to the coordinate axis of the current dividing dimension and divides the space into two parts in the dimension, one part is in the left subtree and the other part is in the right subtree. If the dividing dimension of the current node is d, the coordinate values of all points on the left subtree in d dimension are smaller than the current value, and the coordinate values of all points on the right subtree in d dimension are larger than or equal to the current value, the definition is true for any child node, how to construct the tree is the most critical place, because the processing problem can be effectively solved no matter the dividing tree, the line segment tree, the dictionary tree, or even other data structures or methods (such as KMP). The high efficiency of the K-D tree is that the tree is highly well established, and the high places are reflected on the rule of 'building the points in the point set E into a binary tree according to a certain rule', and the construction process of the K-D tree determines the searching efficiency of the tree to a great extent, so that the aim of optimizing searching can be realized if the data characteristics are fully considered in the construction process.
The k-d tree is suitable for multi-dimensional discrete data searching, an nf-hipac method is adopted in the current firewall system, the method is suitable for multi-dimensional interval data matching, compared with a binary tree of the k-d tree, the nf-hipac adopts an M-ary tree, each layer represents one dimension, the current dimension is divided into M intervals, each interval represents a node, each node is divided in a second dimension, and the like. The k-d tree method can reach the time complexity of log (n) (worst is O (n)), and each dimension of the nf-hipac method adopts a binary search method, so that the time complexity is mlog (n), m is the number of dimensions, n is the number of rules, and 3 points can be optimized in the process of constructing and deleting the k-d tree:
1. and selecting and optimizing the segmentation dimension.
2. Deletion operation optimization.
3. Fixed length rules exceeding 32 bits employ the nf-hipac approach.
Disclosure of Invention
The invention provides a distributed node optimization method and system, electronic equipment and a storage medium, which can reduce time consumption caused by frequently updating nodes and improve searching efficiency.
The invention provides a distributed node optimization method, which comprises the following steps:
s1, obtaining distribution conditions of data points of a distributed node in multiple dimensions based on variances of the data points of the distributed node in the multiple dimensions;
s2, sorting the multiple dimensions according to the variance based on the distribution condition of the data points in the multiple dimensions;
and S3, sequentially segmenting the data points in the corresponding dimension according to the order of the variance.
Preferably, the method for optimizing a distributed node, wherein the step S1 further includes:
marking the data points to be deleted;
and returning to the step S1 when the number of the data points to be deleted reaches a preset threshold value.
Preferably, the distributed node optimization method further includes:
and maintaining a weight field in each distributed node, recording the number of NULL pointers of the child nodes of the distributed node, and executing reconstruction on the distributed node when the number of NULL pointers of the child nodes of the distributed node is different by about 2.
Preferably, the distributed node optimization method further includes:
and when the distributed node exceeds a fixed length rule of 32 bits, splitting the distributed node, and converting the distributed node into a high dimension and a low dimension under the current dimension.
Preferably, the distributed node optimization method includes: when the distributed node is in the interval, for matching in the interval, low-dimensional direct matching is skipped.
Preferably, in the distributed node optimization method, when the distributed node is at an edge, the low-dimensional matching is continuously processed until the distributed node is located in an interval or the matching is completed.
The invention provides a distributed node optimization system, which comprises:
the distribution situation analysis module is used for obtaining the distribution situation of the data points in a plurality of dimensions based on the variances of the data points of the distributed nodes in the plurality of dimensions;
the sorting module is used for sorting the multiple dimensions according to the variance based on the distribution condition of the data points in the multiple dimensions;
and the segmentation module is used for sequentially segmenting the data points in the corresponding dimension according to the order of the variances.
Preferably, the distributed node optimization system further includes:
the marking module is used for marking the data points to be deleted;
and the updating module is used for returning to the distribution condition analysis module when the number of the data points to be deleted reaches a preset threshold value.
The invention provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the distributed node optimization method when executing the program.
The present invention provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the distributed node optimization method.
According to the distributed node optimization method and system, the electronic equipment and the storage medium, before construction is started, the distribution situation of data points in each dimension is compared, the larger the variance of the coordinate value of the data point in a certain dimension is, the more dispersed the variance is, and the smaller the variance is, the more concentrated the distribution is; the segmentation can be performed from the dimension with large variance, so that good segmentation effect and balance can be obtained. According to the invention, through the segmentation dimension selection optimization, the time consumption caused by frequently updating the nodes is reduced, the searching efficiency is improved, the high-performance matching of the nf-hipac method is fully utilized, and the running smoothness of the current firewall system can be ensured.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a distributed node optimization method provided by the present invention;
FIG. 2 is a schematic diagram of a distributed node optimization system provided by the present invention;
fig. 3 is a schematic diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Fig. 1 is a schematic diagram of a distributed node optimization method provided by the present invention, as shown in fig. 1, where the method includes:
s1, obtaining distribution conditions of data points of a distributed node in multiple dimensions based on variances of the data points of the distributed node in the multiple dimensions;
s2, sorting the multiple dimensions according to the variance based on the distribution condition of the data points in the multiple dimensions;
and S3, sequentially segmenting the data points in the corresponding dimension according to the order of the variance.
The distributed node optimization method comprises the steps of selecting and optimizing the segmentation dimension: before construction starts, comparing the distribution situation of the data points in each dimension, wherein the larger the variance of the coordinate value of the data point in a certain dimension is, the more dispersed the data point is, and the smaller the variance is, the more concentrated the distribution is. The segmentation can be performed from the dimension with large variance, so that good segmentation effect and balance can be obtained. Since the depth-first match is not recursively constructed, it is not necessary to calculate the variance every step to determine the dimension of the cut, and for depth-first match, if only a certain dimension is determined, then the cut will be made directly in this dimension, in the case of three dimensions, into a slice, in the case of two dimensions, into a line, and for the next cut it is no longer necessary to consider this dimension, since in this dimension it has arrived exactly at its interval. Therefore, only one dimension sequence needs to be found, and the variance-based method is not needed like a KD tree. The dimension of Rule with the least average distribution in each interval can be used as the first dimension, and so on. This is because, when Rule distribution is not even and the Rule set union of the next dimension is intersected, the probability that the corresponding interval is subjected to intersection calculation and then the unique Default Rule appears, so that quick failure is caused is relatively high, the number of child nodes is reduced, and for an M-ary tree, the child of the node close to the root is cut off, and the overgrowth of the node at the lower layer can be restrained.
Preferably, the step S1 further includes:
marking the data points to be deleted;
and returning to the step S1 when the number of the data points to be deleted reaches a preset threshold value. The deletion of nf-hipac tree nodes has a great influence on the segmentation of the relevant dimension, so a similar k-d tree optimization method is also adopted, only one mark is made during deletion, and the relevant dimension is segmented again after the number of deletions reaches a certain threshold.
Further comprises:
and maintaining a weight field in each distributed node, recording the number of NULL pointers of the child nodes of the distributed node, and executing reconstruction on the distributed node when the number of NULL pointers of the child nodes of the distributed node is different by about 2.
Further comprises:
and when the distributed node exceeds a fixed length rule of 32 bits, splitting the distributed node, and converting the distributed node into a high dimension and a low dimension under the current dimension.
Comprising the following steps: when the distributed node is in the interval, for the matching in the interval, the direct hit can be achieved, and the low-dimension direct matching is skipped.
And when the distributed node is at the edge, continuing to process the low-dimensional matching until the distributed node is positioned in the interval or the matching is completed.
The k-d tree adopts a breadth segmentation method, namely segmentation is performed according to each dimension in turn until segmentation is impossible, nf-hipac adopts a depth segmentation method, namely segmentation of the next dimension is performed after direct segmentation in the dimension, and the optimization of k-d tree segmentation selection is worth referencing although the methods are different, data characteristics are fully considered, data distribution of each dimension is counted, and an optimal segmentation sequence is selected, so that the purpose of optimizing search can be achieved.
The method for deleting the nodes by the k-d tree is also a reference standard, namely the deleting action is not really executed when deleting, but the current node state is marked, when the number of the deleted nodes reaches a threshold value, the deleting action can be truly executed, and the deleting is firstly executed and then added when the current firewall system updates the nodes, so that the optimization mode can greatly optimize the performance reduction caused by frequent segmentation interval when the current firewall system updates the nodes.
According to the invention, through the optimization of the segmentation dimension selection, the node deletion and the fixed length rule optimization exceeding 32 bits, the time consumption caused by frequently updating the nodes is reduced, the searching efficiency is improved, the high-performance matching of the nf-hipac method is fully utilized, and the running smoothness of the current firewall system can be ensured.
The distributed node optimization system provided by the invention is described below, and the distributed node optimization system described below and the distributed node optimization method described above can be referred to correspondingly.
Fig. 2 is a schematic diagram of a distributed node optimization system provided in the present invention, as shown in fig. 2, the system includes:
the distribution situation analysis module 10 is configured to obtain a distribution situation of a data point of a distributed node in multiple dimensions based on variances of the data point in the multiple dimensions;
a sorting module 20, configured to sort the multiple dimensions according to the variance based on the distribution of the data points in the multiple dimensions;
and the segmentation module 30 is used for sequentially segmenting the data points in the corresponding dimension according to the order of the variance.
The distributed node optimization system further includes:
the marking module is used for marking the data points to be deleted;
and the updating module is used for returning to the distribution condition analysis module 10 when the number of the data points to be deleted reaches a preset threshold value.
Fig. 3 illustrates a physical schematic diagram of an electronic device, which may include: processor 310, communication interface (Communications Interface) 320, memory 330 and communication bus 340, wherein processor 310, communication interface 320, memory 330 accomplish communication with each other through communication bus 340. Processor 310 may invoke logic instructions in memory 330 to perform a distributed node optimization method comprising:
s1, obtaining distribution conditions of data points of a distributed node in multiple dimensions based on variances of the data points of the distributed node in the multiple dimensions;
s2, sorting the multiple dimensions according to the variance based on the distribution condition of the data points in the multiple dimensions;
and S3, sequentially segmenting the data points in the corresponding dimension according to the order of the variance.
Further, the logic instructions in the memory 330 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, are capable of performing a distributed node optimization method comprising:
s1, obtaining distribution conditions of data points of a distributed node in multiple dimensions based on variances of the data points of the distributed node in the multiple dimensions;
s2, sorting the multiple dimensions according to the variance based on the distribution condition of the data points in the multiple dimensions;
and S3, sequentially segmenting the data points in the corresponding dimension according to the order of the variance.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which when executed by a processor is implemented to perform a distributed node optimization method comprising:
s1, obtaining distribution conditions of data points of a distributed node in multiple dimensions based on variances of the data points of the distributed node in the multiple dimensions;
s2, sorting the multiple dimensions according to the variance based on the distribution condition of the data points in the multiple dimensions;
and S3, sequentially segmenting the data points in the corresponding dimension according to the order of the variance.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (9)
1. A method for optimizing a distributed node, comprising:
s1, obtaining distribution conditions of data points of a distributed node in multiple dimensions based on variances of the data points of the distributed node in the multiple dimensions;
s2, sorting the multiple dimensions according to the variance based on the distribution condition of the data points in the multiple dimensions;
s3, sequentially segmenting the data points in corresponding dimensions according to the order of the variances;
and maintaining a weight field in each distributed node, recording the number of NULL pointers of the child nodes of the distributed node, and executing reconstruction on the distributed node when the number of NULL pointers of the child nodes of the distributed node is different by about 2.
2. The distributed node optimization method according to claim 1, wherein the step S1 further comprises:
marking the data points to be deleted;
and returning to the step S1 when the number of the data points to be deleted reaches a preset threshold value.
3. The distributed node optimization method of claim 1, further comprising:
when the distributed node exceeds a fixed length rule of 32 bits, splitting the distributed node, and converting the distributed node into a high dimension and a low dimension under the current dimension;
the method is characterized in that an nf-hipac method is adopted to realize M-ary tree, each layer represents one dimension, the current dimension is divided into M sections, and each section represents a node.
4. A distributed node optimization method according to claim 3, comprising: when the distributed node is in the interval, for matching in the interval, low-dimensional direct matching is skipped.
5. A distributed node optimization method as claimed in claim 3 wherein when the distributed node is at an edge, then processing low-dimensional matches continues until the distributed node is inside an interval or matches are complete.
6. A distributed node optimization system, comprising:
the distribution situation analysis module is used for obtaining the distribution situation of the data points in a plurality of dimensions based on the variances of the data points of the distributed nodes in the plurality of dimensions;
the sorting module is used for sorting the multiple dimensions according to the variance based on the distribution condition of the data points in the multiple dimensions;
the segmentation module is used for sequentially segmenting the data points in the corresponding dimension according to the order of the variance;
and maintaining a weight field in each distributed node, recording the number of NULL pointers of the child nodes of the distributed node, and executing reconstruction on the distributed node when the number of NULL pointers of the child nodes of the distributed node is different by about 2.
7. The distributed node optimization system of claim 6 further comprising:
the marking module is used for marking the data points to be deleted;
and the updating module is used for returning to the distribution condition analysis module when the number of the data points to be deleted reaches a preset threshold value.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the distributed node optimization method according to any of claims 1-5 when executing the program.
9. A non-transitory computer readable storage medium, having stored thereon a computer program, which when executed by a processor, implements the steps of the distributed node optimization method according to any of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011518575.4A CN112769896B (en) | 2020-12-21 | 2020-12-21 | Distributed node optimization method and system, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011518575.4A CN112769896B (en) | 2020-12-21 | 2020-12-21 | Distributed node optimization method and system, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112769896A CN112769896A (en) | 2021-05-07 |
CN112769896B true CN112769896B (en) | 2023-07-18 |
Family
ID=75695097
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011518575.4A Active CN112769896B (en) | 2020-12-21 | 2020-12-21 | Distributed node optimization method and system, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112769896B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111177190A (en) * | 2018-11-13 | 2020-05-19 | 杭州海康威视数字技术股份有限公司 | Data processing method and device, electronic equipment and readable storage medium |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8645380B2 (en) * | 2010-11-05 | 2014-02-04 | Microsoft Corporation | Optimized KD-tree for scalable search |
US9058347B2 (en) * | 2012-08-30 | 2015-06-16 | Facebook, Inc. | Prospective search of objects using K-D forest |
-
2020
- 2020-12-21 CN CN202011518575.4A patent/CN112769896B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111177190A (en) * | 2018-11-13 | 2020-05-19 | 杭州海康威视数字技术股份有限公司 | Data processing method and device, electronic equipment and readable storage medium |
Non-Patent Citations (2)
Title |
---|
Supporting Multi-Dimensional Range Query in HD Tree;Yunfeng Gu等;《2010 14th IEEE/ACM Symposium on Distributed Simulation and Real-Time Applications》;20101231;71-79 * |
嵌入式Linux 防火墙的设计与实现;饶明等;《智能计算机与应用》;20111030;第1卷(第3期);35-39 * |
Also Published As
Publication number | Publication date |
---|---|
CN112769896A (en) | 2021-05-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105389349B (en) | Dictionary update method and device | |
US20130151562A1 (en) | Method of calculating feature-amount of digital sequence, and apparatus for calculating feature-amount of digital sequence | |
EP3968178A1 (en) | Log parsing method and device, server and storage medium | |
CN107895038B (en) | Link prediction relation recommendation method and device | |
CN110719106B (en) | Social network graph compression method and system based on node classification and sorting | |
CN111177491A (en) | Regular expression matching method and device, electronic equipment and storage medium | |
US20220005546A1 (en) | Non-redundant gene set clustering method and system, and electronic device | |
WO2016043757A1 (en) | Data to be backed up in a backup system | |
WO2021141656A1 (en) | System and method to dynamically generate a set of api endpoints | |
CN111984732B (en) | Method, node and blockchain network for implementing decentralization search on blockchain | |
CN110659175A (en) | Log trunk extraction method, log trunk classification method, log trunk extraction equipment and log trunk storage medium | |
Khan et al. | Set-based unified approach for attributed graph summarization | |
CN112769896B (en) | Distributed node optimization method and system, electronic equipment and storage medium | |
JP5964781B2 (en) | SEARCH DEVICE, SEARCH METHOD, AND SEARCH PROGRAM | |
CN111491300A (en) | Risk detection method, device, equipment and storage medium | |
CN110807061A (en) | Method for searching frequent subgraphs of uncertain graphs based on layering | |
CN107729898B (en) | Method and device for detecting text lines in text image | |
CN113065419A (en) | Pattern matching algorithm and system based on flow high-frequency content | |
CN110209837B (en) | Device information dictionary generation method and computing device | |
CN111460088A (en) | Similar text retrieval method, device and system | |
CN115801020B (en) | Definite finite state automaton compression method, matching method, device and medium | |
US20230195705A1 (en) | Branching for tree structure in database system | |
CN117729176B (en) | Method and device for aggregating application program interfaces based on network address and response body | |
Le et al. | Improving efficiency of incremental mining by trie structure and pre-large itemsets | |
CN116361517A (en) | Enterprise word size duplicate checking method, device, equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |