CN112769896B - Distributed node optimization method and system, electronic equipment and storage medium - Google Patents

Distributed node optimization method and system, electronic equipment and storage medium Download PDF

Info

Publication number
CN112769896B
CN112769896B CN202011518575.4A CN202011518575A CN112769896B CN 112769896 B CN112769896 B CN 112769896B CN 202011518575 A CN202011518575 A CN 202011518575A CN 112769896 B CN112769896 B CN 112769896B
Authority
CN
China
Prior art keywords
distributed node
data points
dimension
distributed
multiple dimensions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011518575.4A
Other languages
Chinese (zh)
Other versions
CN112769896A (en
Inventor
陈桂亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING ZHONGKE WANGWEI INFORMATION TECHNOLOGY CO LTD
Original Assignee
BEIJING ZHONGKE WANGWEI INFORMATION TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING ZHONGKE WANGWEI INFORMATION TECHNOLOGY CO LTD filed Critical BEIJING ZHONGKE WANGWEI INFORMATION TECHNOLOGY CO LTD
Priority to CN202011518575.4A priority Critical patent/CN112769896B/en
Publication of CN112769896A publication Critical patent/CN112769896A/en
Application granted granted Critical
Publication of CN112769896B publication Critical patent/CN112769896B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Abstract

The invention provides a distributed node optimization method and system, electronic equipment and storage medium, comprising the following steps: obtaining the distribution situation of the data points in multiple dimensions based on the variances of the data points of the distributed nodes in multiple dimensions; sorting the plurality of dimensions according to the variance based on the distribution of the data points in the plurality of dimensions; and S3, sequentially segmenting the data points in the corresponding dimension according to the order of the variance. According to the invention, through the segmentation dimension selection optimization, the time consumption caused by frequently updating the nodes is reduced, the searching efficiency is improved, the high-performance matching of the nf-hipac method is fully utilized, and the running smoothness of the current firewall system can be ensured.

Description

Distributed node optimization method and system, electronic equipment and storage medium
Technical Field
The present invention relates to the field of information optimization technologies, and in particular, to a distributed node optimization method and system, an electronic device, and a storage medium.
Background
Each node of the k-d tree is a binary tree of k-dimensional value points, each node on the tree represents a hyperplane which is perpendicular to the coordinate axis of the current dividing dimension and divides the space into two parts in the dimension, one part is in the left subtree and the other part is in the right subtree. If the dividing dimension of the current node is d, the coordinate values of all points on the left subtree in d dimension are smaller than the current value, and the coordinate values of all points on the right subtree in d dimension are larger than or equal to the current value, the definition is true for any child node, how to construct the tree is the most critical place, because the processing problem can be effectively solved no matter the dividing tree, the line segment tree, the dictionary tree, or even other data structures or methods (such as KMP). The high efficiency of the K-D tree is that the tree is highly well established, and the high places are reflected on the rule of 'building the points in the point set E into a binary tree according to a certain rule', and the construction process of the K-D tree determines the searching efficiency of the tree to a great extent, so that the aim of optimizing searching can be realized if the data characteristics are fully considered in the construction process.
The k-d tree is suitable for multi-dimensional discrete data searching, an nf-hipac method is adopted in the current firewall system, the method is suitable for multi-dimensional interval data matching, compared with a binary tree of the k-d tree, the nf-hipac adopts an M-ary tree, each layer represents one dimension, the current dimension is divided into M intervals, each interval represents a node, each node is divided in a second dimension, and the like. The k-d tree method can reach the time complexity of log (n) (worst is O (n)), and each dimension of the nf-hipac method adopts a binary search method, so that the time complexity is mlog (n), m is the number of dimensions, n is the number of rules, and 3 points can be optimized in the process of constructing and deleting the k-d tree:
1. and selecting and optimizing the segmentation dimension.
2. Deletion operation optimization.
3. Fixed length rules exceeding 32 bits employ the nf-hipac approach.
Disclosure of Invention
The invention provides a distributed node optimization method and system, electronic equipment and a storage medium, which can reduce time consumption caused by frequently updating nodes and improve searching efficiency.
The invention provides a distributed node optimization method, which comprises the following steps:
s1, obtaining distribution conditions of data points of a distributed node in multiple dimensions based on variances of the data points of the distributed node in the multiple dimensions;
s2, sorting the multiple dimensions according to the variance based on the distribution condition of the data points in the multiple dimensions;
and S3, sequentially segmenting the data points in the corresponding dimension according to the order of the variance.
Preferably, the method for optimizing a distributed node, wherein the step S1 further includes:
marking the data points to be deleted;
and returning to the step S1 when the number of the data points to be deleted reaches a preset threshold value.
Preferably, the distributed node optimization method further includes:
and maintaining a weight field in each distributed node, recording the number of NULL pointers of the child nodes of the distributed node, and executing reconstruction on the distributed node when the number of NULL pointers of the child nodes of the distributed node is different by about 2.
Preferably, the distributed node optimization method further includes:
and when the distributed node exceeds a fixed length rule of 32 bits, splitting the distributed node, and converting the distributed node into a high dimension and a low dimension under the current dimension.
Preferably, the distributed node optimization method includes: when the distributed node is in the interval, for matching in the interval, low-dimensional direct matching is skipped.
Preferably, in the distributed node optimization method, when the distributed node is at an edge, the low-dimensional matching is continuously processed until the distributed node is located in an interval or the matching is completed.
The invention provides a distributed node optimization system, which comprises:
the distribution situation analysis module is used for obtaining the distribution situation of the data points in a plurality of dimensions based on the variances of the data points of the distributed nodes in the plurality of dimensions;
the sorting module is used for sorting the multiple dimensions according to the variance based on the distribution condition of the data points in the multiple dimensions;
and the segmentation module is used for sequentially segmenting the data points in the corresponding dimension according to the order of the variances.
Preferably, the distributed node optimization system further includes:
the marking module is used for marking the data points to be deleted;
and the updating module is used for returning to the distribution condition analysis module when the number of the data points to be deleted reaches a preset threshold value.
The invention provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the distributed node optimization method when executing the program.
The present invention provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the distributed node optimization method.
According to the distributed node optimization method and system, the electronic equipment and the storage medium, before construction is started, the distribution situation of data points in each dimension is compared, the larger the variance of the coordinate value of the data point in a certain dimension is, the more dispersed the variance is, and the smaller the variance is, the more concentrated the distribution is; the segmentation can be performed from the dimension with large variance, so that good segmentation effect and balance can be obtained. According to the invention, through the segmentation dimension selection optimization, the time consumption caused by frequently updating the nodes is reduced, the searching efficiency is improved, the high-performance matching of the nf-hipac method is fully utilized, and the running smoothness of the current firewall system can be ensured.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a distributed node optimization method provided by the present invention;
FIG. 2 is a schematic diagram of a distributed node optimization system provided by the present invention;
fig. 3 is a schematic diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Fig. 1 is a schematic diagram of a distributed node optimization method provided by the present invention, as shown in fig. 1, where the method includes:
s1, obtaining distribution conditions of data points of a distributed node in multiple dimensions based on variances of the data points of the distributed node in the multiple dimensions;
s2, sorting the multiple dimensions according to the variance based on the distribution condition of the data points in the multiple dimensions;
and S3, sequentially segmenting the data points in the corresponding dimension according to the order of the variance.
The distributed node optimization method comprises the steps of selecting and optimizing the segmentation dimension: before construction starts, comparing the distribution situation of the data points in each dimension, wherein the larger the variance of the coordinate value of the data point in a certain dimension is, the more dispersed the data point is, and the smaller the variance is, the more concentrated the distribution is. The segmentation can be performed from the dimension with large variance, so that good segmentation effect and balance can be obtained. Since the depth-first match is not recursively constructed, it is not necessary to calculate the variance every step to determine the dimension of the cut, and for depth-first match, if only a certain dimension is determined, then the cut will be made directly in this dimension, in the case of three dimensions, into a slice, in the case of two dimensions, into a line, and for the next cut it is no longer necessary to consider this dimension, since in this dimension it has arrived exactly at its interval. Therefore, only one dimension sequence needs to be found, and the variance-based method is not needed like a KD tree. The dimension of Rule with the least average distribution in each interval can be used as the first dimension, and so on. This is because, when Rule distribution is not even and the Rule set union of the next dimension is intersected, the probability that the corresponding interval is subjected to intersection calculation and then the unique Default Rule appears, so that quick failure is caused is relatively high, the number of child nodes is reduced, and for an M-ary tree, the child of the node close to the root is cut off, and the overgrowth of the node at the lower layer can be restrained.
Preferably, the step S1 further includes:
marking the data points to be deleted;
and returning to the step S1 when the number of the data points to be deleted reaches a preset threshold value. The deletion of nf-hipac tree nodes has a great influence on the segmentation of the relevant dimension, so a similar k-d tree optimization method is also adopted, only one mark is made during deletion, and the relevant dimension is segmented again after the number of deletions reaches a certain threshold.
Further comprises:
and maintaining a weight field in each distributed node, recording the number of NULL pointers of the child nodes of the distributed node, and executing reconstruction on the distributed node when the number of NULL pointers of the child nodes of the distributed node is different by about 2.
Further comprises:
and when the distributed node exceeds a fixed length rule of 32 bits, splitting the distributed node, and converting the distributed node into a high dimension and a low dimension under the current dimension.
Comprising the following steps: when the distributed node is in the interval, for the matching in the interval, the direct hit can be achieved, and the low-dimension direct matching is skipped.
And when the distributed node is at the edge, continuing to process the low-dimensional matching until the distributed node is positioned in the interval or the matching is completed.
The k-d tree adopts a breadth segmentation method, namely segmentation is performed according to each dimension in turn until segmentation is impossible, nf-hipac adopts a depth segmentation method, namely segmentation of the next dimension is performed after direct segmentation in the dimension, and the optimization of k-d tree segmentation selection is worth referencing although the methods are different, data characteristics are fully considered, data distribution of each dimension is counted, and an optimal segmentation sequence is selected, so that the purpose of optimizing search can be achieved.
The method for deleting the nodes by the k-d tree is also a reference standard, namely the deleting action is not really executed when deleting, but the current node state is marked, when the number of the deleted nodes reaches a threshold value, the deleting action can be truly executed, and the deleting is firstly executed and then added when the current firewall system updates the nodes, so that the optimization mode can greatly optimize the performance reduction caused by frequent segmentation interval when the current firewall system updates the nodes.
According to the invention, through the optimization of the segmentation dimension selection, the node deletion and the fixed length rule optimization exceeding 32 bits, the time consumption caused by frequently updating the nodes is reduced, the searching efficiency is improved, the high-performance matching of the nf-hipac method is fully utilized, and the running smoothness of the current firewall system can be ensured.
The distributed node optimization system provided by the invention is described below, and the distributed node optimization system described below and the distributed node optimization method described above can be referred to correspondingly.
Fig. 2 is a schematic diagram of a distributed node optimization system provided in the present invention, as shown in fig. 2, the system includes:
the distribution situation analysis module 10 is configured to obtain a distribution situation of a data point of a distributed node in multiple dimensions based on variances of the data point in the multiple dimensions;
a sorting module 20, configured to sort the multiple dimensions according to the variance based on the distribution of the data points in the multiple dimensions;
and the segmentation module 30 is used for sequentially segmenting the data points in the corresponding dimension according to the order of the variance.
The distributed node optimization system further includes:
the marking module is used for marking the data points to be deleted;
and the updating module is used for returning to the distribution condition analysis module 10 when the number of the data points to be deleted reaches a preset threshold value.
Fig. 3 illustrates a physical schematic diagram of an electronic device, which may include: processor 310, communication interface (Communications Interface) 320, memory 330 and communication bus 340, wherein processor 310, communication interface 320, memory 330 accomplish communication with each other through communication bus 340. Processor 310 may invoke logic instructions in memory 330 to perform a distributed node optimization method comprising:
s1, obtaining distribution conditions of data points of a distributed node in multiple dimensions based on variances of the data points of the distributed node in the multiple dimensions;
s2, sorting the multiple dimensions according to the variance based on the distribution condition of the data points in the multiple dimensions;
and S3, sequentially segmenting the data points in the corresponding dimension according to the order of the variance.
Further, the logic instructions in the memory 330 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, are capable of performing a distributed node optimization method comprising:
s1, obtaining distribution conditions of data points of a distributed node in multiple dimensions based on variances of the data points of the distributed node in the multiple dimensions;
s2, sorting the multiple dimensions according to the variance based on the distribution condition of the data points in the multiple dimensions;
and S3, sequentially segmenting the data points in the corresponding dimension according to the order of the variance.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which when executed by a processor is implemented to perform a distributed node optimization method comprising:
s1, obtaining distribution conditions of data points of a distributed node in multiple dimensions based on variances of the data points of the distributed node in the multiple dimensions;
s2, sorting the multiple dimensions according to the variance based on the distribution condition of the data points in the multiple dimensions;
and S3, sequentially segmenting the data points in the corresponding dimension according to the order of the variance.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (9)

1. A method for optimizing a distributed node, comprising:
s1, obtaining distribution conditions of data points of a distributed node in multiple dimensions based on variances of the data points of the distributed node in the multiple dimensions;
s2, sorting the multiple dimensions according to the variance based on the distribution condition of the data points in the multiple dimensions;
s3, sequentially segmenting the data points in corresponding dimensions according to the order of the variances;
and maintaining a weight field in each distributed node, recording the number of NULL pointers of the child nodes of the distributed node, and executing reconstruction on the distributed node when the number of NULL pointers of the child nodes of the distributed node is different by about 2.
2. The distributed node optimization method according to claim 1, wherein the step S1 further comprises:
marking the data points to be deleted;
and returning to the step S1 when the number of the data points to be deleted reaches a preset threshold value.
3. The distributed node optimization method of claim 1, further comprising:
when the distributed node exceeds a fixed length rule of 32 bits, splitting the distributed node, and converting the distributed node into a high dimension and a low dimension under the current dimension;
the method is characterized in that an nf-hipac method is adopted to realize M-ary tree, each layer represents one dimension, the current dimension is divided into M sections, and each section represents a node.
4. A distributed node optimization method according to claim 3, comprising: when the distributed node is in the interval, for matching in the interval, low-dimensional direct matching is skipped.
5. A distributed node optimization method as claimed in claim 3 wherein when the distributed node is at an edge, then processing low-dimensional matches continues until the distributed node is inside an interval or matches are complete.
6. A distributed node optimization system, comprising:
the distribution situation analysis module is used for obtaining the distribution situation of the data points in a plurality of dimensions based on the variances of the data points of the distributed nodes in the plurality of dimensions;
the sorting module is used for sorting the multiple dimensions according to the variance based on the distribution condition of the data points in the multiple dimensions;
the segmentation module is used for sequentially segmenting the data points in the corresponding dimension according to the order of the variance;
and maintaining a weight field in each distributed node, recording the number of NULL pointers of the child nodes of the distributed node, and executing reconstruction on the distributed node when the number of NULL pointers of the child nodes of the distributed node is different by about 2.
7. The distributed node optimization system of claim 6 further comprising:
the marking module is used for marking the data points to be deleted;
and the updating module is used for returning to the distribution condition analysis module when the number of the data points to be deleted reaches a preset threshold value.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the distributed node optimization method according to any of claims 1-5 when executing the program.
9. A non-transitory computer readable storage medium, having stored thereon a computer program, which when executed by a processor, implements the steps of the distributed node optimization method according to any of claims 1-5.
CN202011518575.4A 2020-12-21 2020-12-21 Distributed node optimization method and system, electronic equipment and storage medium Active CN112769896B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011518575.4A CN112769896B (en) 2020-12-21 2020-12-21 Distributed node optimization method and system, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011518575.4A CN112769896B (en) 2020-12-21 2020-12-21 Distributed node optimization method and system, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112769896A CN112769896A (en) 2021-05-07
CN112769896B true CN112769896B (en) 2023-07-18

Family

ID=75695097

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011518575.4A Active CN112769896B (en) 2020-12-21 2020-12-21 Distributed node optimization method and system, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112769896B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111177190A (en) * 2018-11-13 2020-05-19 杭州海康威视数字技术股份有限公司 Data processing method and device, electronic equipment and readable storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8645380B2 (en) * 2010-11-05 2014-02-04 Microsoft Corporation Optimized KD-tree for scalable search
US9058347B2 (en) * 2012-08-30 2015-06-16 Facebook, Inc. Prospective search of objects using K-D forest

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111177190A (en) * 2018-11-13 2020-05-19 杭州海康威视数字技术股份有限公司 Data processing method and device, electronic equipment and readable storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Supporting Multi-Dimensional Range Query in HD Tree;Yunfeng Gu等;《2010 14th IEEE/ACM Symposium on Distributed Simulation and Real-Time Applications》;20101231;71-79 *
嵌入式Linux 防火墙的设计与实现;饶明等;《智能计算机与应用》;20111030;第1卷(第3期);35-39 *

Also Published As

Publication number Publication date
CN112769896A (en) 2021-05-07

Similar Documents

Publication Publication Date Title
CN105389349B (en) Dictionary update method and device
US20130151562A1 (en) Method of calculating feature-amount of digital sequence, and apparatus for calculating feature-amount of digital sequence
EP3968178A1 (en) Log parsing method and device, server and storage medium
CN107895038B (en) Link prediction relation recommendation method and device
CN110719106B (en) Social network graph compression method and system based on node classification and sorting
CN111177491A (en) Regular expression matching method and device, electronic equipment and storage medium
US20220005546A1 (en) Non-redundant gene set clustering method and system, and electronic device
WO2016043757A1 (en) Data to be backed up in a backup system
WO2021141656A1 (en) System and method to dynamically generate a set of api endpoints
CN111984732B (en) Method, node and blockchain network for implementing decentralization search on blockchain
CN110659175A (en) Log trunk extraction method, log trunk classification method, log trunk extraction equipment and log trunk storage medium
Khan et al. Set-based unified approach for attributed graph summarization
CN112769896B (en) Distributed node optimization method and system, electronic equipment and storage medium
JP5964781B2 (en) SEARCH DEVICE, SEARCH METHOD, AND SEARCH PROGRAM
CN111491300A (en) Risk detection method, device, equipment and storage medium
CN110807061A (en) Method for searching frequent subgraphs of uncertain graphs based on layering
CN107729898B (en) Method and device for detecting text lines in text image
CN113065419A (en) Pattern matching algorithm and system based on flow high-frequency content
CN110209837B (en) Device information dictionary generation method and computing device
CN111460088A (en) Similar text retrieval method, device and system
CN115801020B (en) Definite finite state automaton compression method, matching method, device and medium
US20230195705A1 (en) Branching for tree structure in database system
CN117729176B (en) Method and device for aggregating application program interfaces based on network address and response body
Le et al. Improving efficiency of incremental mining by trie structure and pre-large itemsets
CN116361517A (en) Enterprise word size duplicate checking method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant