CN112769896B

CN112769896B - Distributed node optimization method and system, electronic equipment and storage medium

Info

Publication number: CN112769896B
Application number: CN202011518575.4A
Authority: CN
Inventors: 陈桂亮
Original assignee: BEIJING ZHONGKE WANGWEI INFORMATION TECHNOLOGY CO LTD
Current assignee: BEIJING ZHONGKE WANGWEI INFORMATION TECHNOLOGY CO LTD
Priority date: 2020-12-21
Filing date: 2020-12-21
Publication date: 2023-07-18
Anticipated expiration: 2040-12-21
Also published as: CN112769896A

Abstract

The invention provides a distributed node optimization method and system, electronic equipment and storage medium, comprising the following steps: obtaining the distribution situation of the data points in multiple dimensions based on the variances of the data points of the distributed nodes in multiple dimensions; sorting the plurality of dimensions according to the variance based on the distribution of the data points in the plurality of dimensions; and S3, sequentially segmenting the data points in the corresponding dimension according to the order of the variance. According to the invention, through the segmentation dimension selection optimization, the time consumption caused by frequently updating the nodes is reduced, the searching efficiency is improved, the high-performance matching of the nf-hipac method is fully utilized, and the running smoothness of the current firewall system can be ensured.

Description

Distributed node optimization method and system, electronic equipment and storage medium

Technical Field

The present invention relates to the field of information optimization technologies, and in particular, to a distributed node optimization method and system, an electronic device, and a storage medium.

Background

Each node of the k-d tree is a binary tree of k-dimensional value points, each node on the tree represents a hyperplane which is perpendicular to the coordinate axis of the current dividing dimension and divides the space into two parts in the dimension, one part is in the left subtree and the other part is in the right subtree. If the dividing dimension of the current node is d, the coordinate values of all points on the left subtree in d dimension are smaller than the current value, and the coordinate values of all points on the right subtree in d dimension are larger than or equal to the current value, the definition is true for any child node, how to construct the tree is the most critical place, because the processing problem can be effectively solved no matter the dividing tree, the line segment tree, the dictionary tree, or even other data structures or methods (such as KMP). The high efficiency of the K-D tree is that the tree is highly well established, and the high places are reflected on the rule of 'building the points in the point set E into a binary tree according to a certain rule', and the construction process of the K-D tree determines the searching efficiency of the tree to a great extent, so that the aim of optimizing searching can be realized if the data characteristics are fully considered in the construction process.

The k-d tree is suitable for multi-dimensional discrete data searching, an nf-hipac method is adopted in the current firewall system, the method is suitable for multi-dimensional interval data matching, compared with a binary tree of the k-d tree, the nf-hipac adopts an M-ary tree, each layer represents one dimension, the current dimension is divided into M intervals, each interval represents a node, each node is divided in a second dimension, and the like. The k-d tree method can reach the time complexity of log (n) (worst is O (n)), and each dimension of the nf-hipac method adopts a binary search method, so that the time complexity is mlog (n), m is the number of dimensions, n is the number of rules, and 3 points can be optimized in the process of constructing and deleting the k-d tree:

1. and selecting and optimizing the segmentation dimension.

2. Deletion operation optimization.

3. Fixed length rules exceeding 32 bits employ the nf-hipac approach.

Disclosure of Invention

The invention provides a distributed node optimization method and system, electronic equipment and a storage medium, which can reduce time consumption caused by frequently updating nodes and improve searching efficiency.

The invention provides a distributed node optimization method, which comprises the following steps:

s1, obtaining distribution conditions of data points of a distributed node in multiple dimensions based on variances of the data points of the distributed node in the multiple dimensions;

s2, sorting the multiple dimensions according to the variance based on the distribution condition of the data points in the multiple dimensions;

and S3, sequentially segmenting the data points in the corresponding dimension according to the order of the variance.

Preferably, the method for optimizing a distributed node, wherein the step S1 further includes:

marking the data points to be deleted;

and returning to the step S1 when the number of the data points to be deleted reaches a preset threshold value.

Preferably, the distributed node optimization method further includes:

and maintaining a weight field in each distributed node, recording the number of NULL pointers of the child nodes of the distributed node, and executing reconstruction on the distributed node when the number of NULL pointers of the child nodes of the distributed node is different by about 2.

Preferably, the distributed node optimization method further includes:

and when the distributed node exceeds a fixed length rule of 32 bits, splitting the distributed node, and converting the distributed node into a high dimension and a low dimension under the current dimension.

Preferably, the distributed node optimization method includes: when the distributed node is in the interval, for matching in the interval, low-dimensional direct matching is skipped.

Preferably, in the distributed node optimization method, when the distributed node is at an edge, the low-dimensional matching is continuously processed until the distributed node is located in an interval or the matching is completed.

The invention provides a distributed node optimization system, which comprises:

the distribution situation analysis module is used for obtaining the distribution situation of the data points in a plurality of dimensions based on the variances of the data points of the distributed nodes in the plurality of dimensions;

the sorting module is used for sorting the multiple dimensions according to the variance based on the distribution condition of the data points in the multiple dimensions;

and the segmentation module is used for sequentially segmenting the data points in the corresponding dimension according to the order of the variances.

Preferably, the distributed node optimization system further includes:

the marking module is used for marking the data points to be deleted;

and the updating module is used for returning to the distribution condition analysis module when the number of the data points to be deleted reaches a preset threshold value.

The invention provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the distributed node optimization method when executing the program.

The present invention provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the distributed node optimization method.

According to the distributed node optimization method and system, the electronic equipment and the storage medium, before construction is started, the distribution situation of data points in each dimension is compared, the larger the variance of the coordinate value of the data point in a certain dimension is, the more dispersed the variance is, and the smaller the variance is, the more concentrated the distribution is; the segmentation can be performed from the dimension with large variance, so that good segmentation effect and balance can be obtained. According to the invention, through the segmentation dimension selection optimization, the time consumption caused by frequently updating the nodes is reduced, the searching efficiency is improved, the high-performance matching of the nf-hipac method is fully utilized, and the running smoothness of the current firewall system can be ensured.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a distributed node optimization method provided by the present invention;

FIG. 2 is a schematic diagram of a distributed node optimization system provided by the present invention;

fig. 3 is a schematic diagram of an electronic device provided by the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Fig. 1 is a schematic diagram of a distributed node optimization method provided by the present invention, as shown in fig. 1, where the method includes:

The distributed node optimization method comprises the steps of selecting and optimizing the segmentation dimension: before construction starts, comparing the distribution situation of the data points in each dimension, wherein the larger the variance of the coordinate value of the data point in a certain dimension is, the more dispersed the data point is, and the smaller the variance is, the more concentrated the distribution is. The segmentation can be performed from the dimension with large variance, so that good segmentation effect and balance can be obtained. Since the depth-first match is not recursively constructed, it is not necessary to calculate the variance every step to determine the dimension of the cut, and for depth-first match, if only a certain dimension is determined, then the cut will be made directly in this dimension, in the case of three dimensions, into a slice, in the case of two dimensions, into a line, and for the next cut it is no longer necessary to consider this dimension, since in this dimension it has arrived exactly at its interval. Therefore, only one dimension sequence needs to be found, and the variance-based method is not needed like a KD tree. The dimension of Rule with the least average distribution in each interval can be used as the first dimension, and so on. This is because, when Rule distribution is not even and the Rule set union of the next dimension is intersected, the probability that the corresponding interval is subjected to intersection calculation and then the unique Default Rule appears, so that quick failure is caused is relatively high, the number of child nodes is reduced, and for an M-ary tree, the child of the node close to the root is cut off, and the overgrowth of the node at the lower layer can be restrained.

Preferably, the step S1 further includes:

marking the data points to be deleted;

and returning to the step S1 when the number of the data points to be deleted reaches a preset threshold value. The deletion of nf-hipac tree nodes has a great influence on the segmentation of the relevant dimension, so a similar k-d tree optimization method is also adopted, only one mark is made during deletion, and the relevant dimension is segmented again after the number of deletions reaches a certain threshold.

Further comprises:

Comprising the following steps: when the distributed node is in the interval, for the matching in the interval, the direct hit can be achieved, and the low-dimension direct matching is skipped.

And when the distributed node is at the edge, continuing to process the low-dimensional matching until the distributed node is positioned in the interval or the matching is completed.

The k-d tree adopts a breadth segmentation method, namely segmentation is performed according to each dimension in turn until segmentation is impossible, nf-hipac adopts a depth segmentation method, namely segmentation of the next dimension is performed after direct segmentation in the dimension, and the optimization of k-d tree segmentation selection is worth referencing although the methods are different, data characteristics are fully considered, data distribution of each dimension is counted, and an optimal segmentation sequence is selected, so that the purpose of optimizing search can be achieved.

The method for deleting the nodes by the k-d tree is also a reference standard, namely the deleting action is not really executed when deleting, but the current node state is marked, when the number of the deleted nodes reaches a threshold value, the deleting action can be truly executed, and the deleting is firstly executed and then added when the current firewall system updates the nodes, so that the optimization mode can greatly optimize the performance reduction caused by frequent segmentation interval when the current firewall system updates the nodes.

According to the invention, through the optimization of the segmentation dimension selection, the node deletion and the fixed length rule optimization exceeding 32 bits, the time consumption caused by frequently updating the nodes is reduced, the searching efficiency is improved, the high-performance matching of the nf-hipac method is fully utilized, and the running smoothness of the current firewall system can be ensured.

The distributed node optimization system provided by the invention is described below, and the distributed node optimization system described below and the distributed node optimization method described above can be referred to correspondingly.

Fig. 2 is a schematic diagram of a distributed node optimization system provided in the present invention, as shown in fig. 2, the system includes:

the distribution situation analysis module 10 is configured to obtain a distribution situation of a data point of a distributed node in multiple dimensions based on variances of the data point in the multiple dimensions;

a sorting module 20, configured to sort the multiple dimensions according to the variance based on the distribution of the data points in the multiple dimensions;

and the segmentation module 30 is used for sequentially segmenting the data points in the corresponding dimension according to the order of the variance.

The distributed node optimization system further includes:

the marking module is used for marking the data points to be deleted;

and the updating module is used for returning to the distribution condition analysis module 10 when the number of the data points to be deleted reaches a preset threshold value.

Fig. 3 illustrates a physical schematic diagram of an electronic device, which may include: processor 310, communication interface (Communications Interface) 320, memory 330 and communication bus 340, wherein processor 310, communication interface 320, memory 330 accomplish communication with each other through communication bus 340. Processor 310 may invoke logic instructions in memory 330 to perform a distributed node optimization method comprising:

Further, the logic instructions in the memory 330 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, are capable of performing a distributed node optimization method comprising:

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which when executed by a processor is implemented to perform a distributed node optimization method comprising:

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for optimizing a distributed node, comprising:

s3, sequentially segmenting the data points in corresponding dimensions according to the order of the variances;

2. The distributed node optimization method according to claim 1, wherein the step S1 further comprises:

marking the data points to be deleted;

3. The distributed node optimization method of claim 1, further comprising:

when the distributed node exceeds a fixed length rule of 32 bits, splitting the distributed node, and converting the distributed node into a high dimension and a low dimension under the current dimension;

the method is characterized in that an nf-hipac method is adopted to realize M-ary tree, each layer represents one dimension, the current dimension is divided into M sections, and each section represents a node.

4. A distributed node optimization method according to claim 3, comprising: when the distributed node is in the interval, for matching in the interval, low-dimensional direct matching is skipped.

5. A distributed node optimization method as claimed in claim 3 wherein when the distributed node is at an edge, then processing low-dimensional matches continues until the distributed node is inside an interval or matches are complete.

6. A distributed node optimization system, comprising:

the segmentation module is used for sequentially segmenting the data points in the corresponding dimension according to the order of the variance;

7. The distributed node optimization system of claim 6 further comprising:

the marking module is used for marking the data points to be deleted;

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the distributed node optimization method according to any of claims 1-5 when executing the program.

9. A non-transitory computer readable storage medium, having stored thereon a computer program, which when executed by a processor, implements the steps of the distributed node optimization method according to any of claims 1-5.