CN112769896A - Distributed node optimization method and system, electronic device and storage medium - Google Patents

Distributed node optimization method and system, electronic device and storage medium Download PDF

Info

Publication number
CN112769896A
CN112769896A CN202011518575.4A CN202011518575A CN112769896A CN 112769896 A CN112769896 A CN 112769896A CN 202011518575 A CN202011518575 A CN 202011518575A CN 112769896 A CN112769896 A CN 112769896A
Authority
CN
China
Prior art keywords
data points
distributed node
distributed
optimization method
variance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011518575.4A
Other languages
Chinese (zh)
Other versions
CN112769896B (en
Inventor
陈桂亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING ZHONGKE WANGWEI INFORMATION TECHNOLOGY CO LTD
Original Assignee
BEIJING ZHONGKE WANGWEI INFORMATION TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING ZHONGKE WANGWEI INFORMATION TECHNOLOGY CO LTD filed Critical BEIJING ZHONGKE WANGWEI INFORMATION TECHNOLOGY CO LTD
Priority to CN202011518575.4A priority Critical patent/CN112769896B/en
Publication of CN112769896A publication Critical patent/CN112769896A/en
Application granted granted Critical
Publication of CN112769896B publication Critical patent/CN112769896B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a distributed node optimization method and system, electronic equipment and a storage medium, wherein the distributed node optimization method comprises the following steps: obtaining the distribution condition of the data points on multiple dimensions based on the variance of the data points of the distributed nodes on the multiple dimensions; based on the distribution situation of the data points on the multiple dimensions, sorting the multiple dimensions according to the variance; and S3, sequentially cutting the data points in the corresponding dimension according to the order of the variance. According to the invention, through segmentation dimension selection optimization, the time consumption brought by frequent node updating is reduced, the searching efficiency is improved, the high-performance matching of the nf-hipac method is fully utilized, and the smooth operation of the current firewall system can be ensured.

Description

Distributed node optimization method and system, electronic device and storage medium
Technical Field
The present invention relates to the field of information optimization technologies, and in particular, to a distributed node optimization method and system, an electronic device, and a storage medium.
Background
Each node of the k-d tree is a binary tree of k-dimensional numerical value points, each node on the binary tree represents a hyperplane, the hyperplane is perpendicular to a coordinate axis of a current division dimension, and the space is divided into two parts on the dimension, wherein one part is on the left sub-tree and the other part is on the right sub-tree. That is, if the division dimension of the current node is d, all the coordinates of points on the left sub-tree in the d-dimension are smaller than the current value, and all the coordinates of points on the right sub-tree in the d-dimension are greater than or equal to the current value, this definition holds true for any sub-node thereof, and how to construct a tree is the most critical place, because no matter whether the division tree, the line segment tree, the dictionary tree, or even other data structures or methods (such as KMP), the problem can be efficiently handled, and the preprocessing is mainly good. The K-D tree is efficient because the tree is built at a high level, which is embodied in the rule of building points in the point set E into a binary tree according to a certain rule, and the building process of the K-D tree largely determines the searching efficiency of the tree, so that the aim of optimizing the searching can be realized if the data characteristics are fully considered in the building process.
The k-d tree is suitable for searching multidimensional discrete data, an nf-hipac method is adopted in the current firewall system, the method is suitable for data matching of multidimensional intervals, compared with a binary tree of the k-d tree, the nf-hipac adopts an M-ary tree, each layer represents one dimension, the current dimension is divided into M intervals, each interval represents one node, each node is divided in the second dimension, and the like. The k-d tree method can reach the time complexity of log (n) (O (n) in the worst case), each dimension of the nf-hipac method adopts a binary search method, so the time complexity is mlog (n), m is the dimension number, n is the rule number, and 3 points can be optimized in the process of constructing and deleting the k-d tree:
1. and selecting and optimizing segmentation dimension.
2. And optimizing the deleting operation.
3. The nf-hipac method is adopted for the fixed length rule exceeding 32 bits.
Disclosure of Invention
The invention provides a distributed node optimization method and system, electronic equipment and a storage medium, wherein the distributed node optimization method and system can reduce time consumption caused by frequent node updating and improve searching efficiency.
The invention provides a distributed node optimization method, which comprises the following steps:
s1, obtaining the distribution condition of the data points on multiple dimensions based on the variance of the data points of the distributed nodes on the multiple dimensions;
s2, sorting the dimensions according to the variance based on the distribution of the data points on the dimensions;
and S3, sequentially cutting the data points in the corresponding dimension according to the order of the variance.
Preferably, the distributed node optimization method further includes, before the step S1:
marking data points to be deleted;
when the number of data points to be deleted reaches the preset threshold, returning to S1.
Preferably, the distributed node optimization method further includes:
maintaining a weight field at each distributed node, recording the number of NULL pointers of the child nodes of the distributed nodes, and when the difference between the number of NULL pointers of the child nodes of the distributed nodes is 2, performing reconstruction on the distributed nodes.
Preferably, the distributed node optimization method further includes:
when the distributed node exceeds a 32-bit fixed length rule, the distributed node is split and converted into a high dimension and a low dimension under the current dimension.
Preferably, the distributed node optimization method includes: and when the distributed node is positioned in the interval, skipping low-dimensional direct matching for the matching positioned in the interval.
Preferably, in the distributed node optimization method, when the distributed node is at an edge, the low-dimensional matching is continuously processed until the distributed node is located inside an interval or the matching is completed.
The invention provides a distributed node optimization system, comprising:
the distribution condition analysis module is used for obtaining the distribution condition of the data points on multiple dimensions based on the variances of the data points of the distributed nodes on the multiple dimensions;
the sorting module is used for sorting the dimensions according to the variance based on the distribution condition of the data points on the dimensions;
and the segmentation module is used for sequentially segmenting the data points on the corresponding dimensionality according to the sequence of the variance.
Preferably, the distributed node optimization system further includes:
the marking module is used for marking the data points to be deleted;
and the updating module is used for returning to the distribution condition analysis module when the number of the data points to be deleted reaches a preset threshold value.
The invention provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps of the distributed node optimization method when executing the program.
The present invention provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the distributed node optimization method described.
Before the construction of the distributed node optimization method and system, the electronic equipment and the storage medium, the distribution condition of data points in each dimension is compared, the larger the variance of the coordinate values of the data points in a certain dimension is, the more dispersed the data points are, and the smaller the variance is, the more concentrated the data points are; the segmentation is started from the dimension with large variance, so that good segmentation effect and balance can be obtained. According to the invention, through segmentation dimension selection optimization, the time consumption brought by frequent node updating is reduced, the searching efficiency is improved, the high-performance matching of the nf-hipac method is fully utilized, and the smooth operation of the current firewall system can be ensured.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic diagram of a distributed node optimization method provided by the present invention;
FIG. 2 is a schematic diagram of a distributed node optimization system provided by the present invention;
fig. 3 is a schematic diagram of an electronic device provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic diagram of a distributed node optimization method provided by the present invention, as shown in fig. 1, the method includes:
s1, obtaining the distribution condition of the data points on multiple dimensions based on the variance of the data points of the distributed nodes on the multiple dimensions;
s2, sorting the dimensions according to the variance based on the distribution of the data points on the dimensions;
and S3, sequentially cutting the data points in the corresponding dimension according to the order of the variance.
The distributed node optimization method comprises the following steps of segmentation dimension selection optimization: before construction, the distribution condition of data points in each dimension is compared, the distribution of the data points in a certain dimension is more dispersed when the variance of the coordinate values of the dimension is larger, and the distribution is more concentrated when the variance is smaller. The segmentation is started from the dimension with large variance, so that good segmentation effect and balance can be obtained. Since the depth-first matching is not constructed recursively, it is not necessary to calculate the variance in each step to determine the cutting dimension, and for the depth-first matching, as long as a certain dimension is determined, the cutting is directly completed in the dimension, for the three-dimensional case, the cutting is directly completed in one slice, for the two-dimensional case, the cutting is completed in one line, and for the next cutting, the dimension does not need to be considered, because in the dimension, it has reached its interval precisely. Therefore, only one dimensional sequence needs to be found, and variance-based as with the KD-tree is not needed. The dimension of the Rule which is distributed most unevenly among all intervals can be used as a first dimension, and so on. This is because when Rule distribution unevenness and Rule set of the next dimension are combined to obtain an intersection, the probability that a corresponding interval has a unique Default Rule after the intersection is obtained for calculation is relatively high, which results in a rapid failure, and this reduces the number of child nodes.
Preferably, the S1 further includes:
marking data points to be deleted;
when the number of data points to be deleted reaches the preset threshold, returning to S1. Deletion of the nf-hipac tree nodes has a large influence on segmentation of relevant dimensions, so that a similar k-d tree optimization method is adopted, only one mark is made during deletion, and when the number of deletions reaches a certain threshold value, segmentation is carried out on the relevant dimensions again.
Further comprising:
maintaining a weight field at each distributed node, recording the number of NULL pointers of the child nodes of the distributed nodes, and when the difference between the number of NULL pointers of the child nodes of the distributed nodes is 2, performing reconstruction on the distributed nodes.
Further comprising:
when the distributed node exceeds a 32-bit fixed length rule, the distributed node is split and converted into a high dimension and a low dimension under the current dimension.
The method comprises the following steps: when the distributed nodes are located in the interval, the matching in the interval can be directly hit, and the low-dimensional direct matching is skipped.
And when the distributed node is positioned at the edge, continuing to process low-dimensional matching until the distributed node is positioned in the interval or the matching is completed.
The k-d tree adopts a breadth segmentation method, namely segmentation is carried out according to each dimension in turn until segmentation cannot be carried out, the nf-hipac adopts a depth segmentation method, namely segmentation is directly finished on the dimension, and then segmentation of the next dimension is carried out, although the methods are different, optimization of k-d tree segmentation selection is worth reference, data characteristics are fully considered, data distribution of each dimension is counted, an optimal segmentation sequence is selected, and the purpose of optimization searching can be achieved.
The method for deleting the nodes by the k-d tree is also a reference standard, namely, the deleting action is not really executed during deleting, but the current node state is marked, and the deleting action can be really executed after the number of the deleted nodes reaches a threshold value.
The method reduces time consumption brought by frequently updating the nodes, improves searching efficiency, fully utilizes high-performance matching of the nf-hipac method, and can ensure that the current firewall system runs more smoothly by segmenting dimension selection optimization, deleting the nodes and optimizing the fixed-length rule exceeding 32 bits.
The distributed node optimization system provided by the present invention is described below, and the distributed node optimization system described below and the distributed node optimization method described above may be referred to correspondingly.
Fig. 2 is a schematic diagram of a distributed node optimization system provided in the present invention, and as shown in fig. 2, the system includes:
the distribution situation analysis module 10 is configured to obtain a distribution situation of data points of distributed nodes in multiple dimensions based on variances of the data points in the multiple dimensions;
the sorting module 20 is configured to sort the multiple dimensions according to the variance based on the distribution of the data points in the multiple dimensions;
and the segmentation module 30 is configured to sequentially segment the data points in the corresponding dimension according to the order of the variance.
The distributed node optimization system further includes:
the marking module is used for marking the data points to be deleted;
and the updating module is configured to return to the distribution analysis module 10 when the number of the data points to be deleted reaches a preset threshold.
Fig. 3 illustrates a physical structure diagram of an electronic device, which may include: a processor (processor)310, a communication Interface (communication Interface)320, a memory (memory)330 and a communication bus 340, wherein the processor 310, the communication Interface 320 and the memory 330 communicate with each other via the communication bus 340. Processor 310 may invoke logic instructions in memory 330 to perform a distributed node optimization method comprising:
s1, obtaining the distribution condition of the data points on multiple dimensions based on the variance of the data points of the distributed nodes on the multiple dimensions;
s2, sorting the dimensions according to the variance based on the distribution of the data points on the dimensions;
and S3, sequentially cutting the data points in the corresponding dimension according to the order of the variance.
In addition, the logic instructions in the memory 330 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform a method of distributed node optimization, the method comprising:
s1, obtaining the distribution condition of the data points on multiple dimensions based on the variance of the data points of the distributed nodes on the multiple dimensions;
s2, sorting the dimensions according to the variance based on the distribution of the data points on the dimensions;
and S3, sequentially cutting the data points in the corresponding dimension according to the order of the variance.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program that, when executed by a processor, is implemented to perform a method of distributed node optimization, the method comprising:
s1, obtaining the distribution condition of the data points on multiple dimensions based on the variance of the data points of the distributed nodes on the multiple dimensions;
s2, sorting the dimensions according to the variance based on the distribution of the data points on the dimensions;
and S3, sequentially cutting the data points in the corresponding dimension according to the order of the variance.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A distributed node optimization method is characterized by comprising the following steps:
s1, obtaining the distribution condition of the data points on multiple dimensions based on the variance of the data points of the distributed nodes on the multiple dimensions;
s2, sorting the dimensions according to the variance based on the distribution of the data points on the dimensions;
and S3, sequentially cutting the data points in the corresponding dimension according to the order of the variance.
2. The distributed node optimization method of claim 1, wherein the S1 is preceded by:
marking data points to be deleted;
when the number of data points to be deleted reaches the preset threshold, returning to S1.
3. The distributed node optimization method of claim 2, further comprising:
maintaining a weight field at each distributed node, recording the number of NULL pointers of the child nodes of the distributed nodes, and when the difference between the number of NULL pointers of the child nodes of the distributed nodes is 2, performing reconstruction on the distributed nodes.
4. The distributed node optimization method of claim 3, further comprising:
when the distributed node exceeds a 32-bit fixed length rule, the distributed node is split and converted into a high dimension and a low dimension under the current dimension.
5. The distributed node optimization method of claim 4, comprising: and when the distributed node is positioned in the interval, skipping low-dimensional direct matching for the matching positioned in the interval.
6. The distributed node optimization method of claim 4, wherein when the distributed node is at an edge, then low-dimensional matching continues to be processed until the distributed node is inside an interval or matching is complete.
7. A distributed node optimization system, comprising:
the distribution condition analysis module is used for obtaining the distribution condition of the data points on multiple dimensions based on the variances of the data points of the distributed nodes on the multiple dimensions;
the sorting module is used for sorting the dimensions according to the variance based on the distribution condition of the data points on the dimensions;
and the segmentation module is used for sequentially segmenting the data points on the corresponding dimensionality according to the sequence of the variance.
8. The distributed node optimization system of claim 7, further comprising:
the marking module is used for marking the data points to be deleted;
and the updating module is used for returning to the distribution condition analysis module when the number of the data points to be deleted reaches a preset threshold value.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the distributed node optimization method according to any one of claims 1 to 6 when executing the program.
10. A non-transitory computer readable storage medium, having stored thereon a computer program, characterized in that the computer program, when being executed by a processor, is adapted to carry out the steps of the distributed node optimization method according to any one of claims 1 to 6.
CN202011518575.4A 2020-12-21 2020-12-21 Distributed node optimization method and system, electronic equipment and storage medium Active CN112769896B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011518575.4A CN112769896B (en) 2020-12-21 2020-12-21 Distributed node optimization method and system, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011518575.4A CN112769896B (en) 2020-12-21 2020-12-21 Distributed node optimization method and system, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112769896A true CN112769896A (en) 2021-05-07
CN112769896B CN112769896B (en) 2023-07-18

Family

ID=75695097

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011518575.4A Active CN112769896B (en) 2020-12-21 2020-12-21 Distributed node optimization method and system, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112769896B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120117122A1 (en) * 2010-11-05 2012-05-10 Microsoft Corporation Optimized KD-Tree for Scalable Search
US20140067870A1 (en) * 2012-08-30 2014-03-06 Vikram Chandrasekhar Prospective Search of Objects Using K-D Forest
CN111177190A (en) * 2018-11-13 2020-05-19 杭州海康威视数字技术股份有限公司 Data processing method and device, electronic equipment and readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120117122A1 (en) * 2010-11-05 2012-05-10 Microsoft Corporation Optimized KD-Tree for Scalable Search
US20140067870A1 (en) * 2012-08-30 2014-03-06 Vikram Chandrasekhar Prospective Search of Objects Using K-D Forest
US20150234876A1 (en) * 2012-08-30 2015-08-20 Facebook, Inc. Prospective search of objects using k-d forest
CN111177190A (en) * 2018-11-13 2020-05-19 杭州海康威视数字技术股份有限公司 Data processing method and device, electronic equipment and readable storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YUNFENG GU等: "Supporting Multi-Dimensional Range Query in HD Tree", 《2010 14TH IEEE/ACM SYMPOSIUM ON DISTRIBUTED SIMULATION AND REAL-TIME APPLICATIONS》 *
饶明等: "嵌入式Linux 防火墙的设计与实现", 《智能计算机与应用》 *

Also Published As

Publication number Publication date
CN112769896B (en) 2023-07-18

Similar Documents

Publication Publication Date Title
CN105389349B (en) Dictionary update method and device
KR102230661B1 (en) SQL review methods, devices, servers and storage media
CN104750705A (en) Information returning method and device
CN107895038B (en) Link prediction relation recommendation method and device
CN111325030A (en) Text label construction method and device, computer equipment and storage medium
US20220005546A1 (en) Non-redundant gene set clustering method and system, and electronic device
CN113901214B (en) Method and device for extracting form information, electronic equipment and storage medium
CN111339248A (en) Data attribute filling method, device, equipment and computer readable storage medium
CN108763536A (en) Data bank access method and device
CN109657060B (en) Safety production accident case pushing method and system
CN110826488B (en) Image identification method and device for electronic document and storage equipment
CN112769896A (en) Distributed node optimization method and system, electronic device and storage medium
CN117171164A (en) Data storage method, device, terminal equipment and storage medium
CN111460088A (en) Similar text retrieval method, device and system
CN108664548B (en) Network access behavior characteristic group dynamic mining method and system under degradation condition
CN113326688A (en) Ideological and political theory word duplication checking processing method and device
CN116361517B (en) Enterprise word size duplicate checking method, device, equipment and medium
CN111291182A (en) Hotspot event discovery method, device, equipment and storage medium
CN112559474A (en) Log processing method and device
CN105488183B (en) The method and apparatus for excavating rock cave mural painting spatial and temporal association in rock cave mural painting group
CN114139512B (en) Electronic form control method, electronic form control device, computer readable storage medium and server
CN116483881B (en) Data sampling method and device based on pull Ding Chao cube, electronic equipment and medium
CN112579841B (en) Multi-mode database establishment method, retrieval method and system
CN117729176B (en) Method and device for aggregating application program interfaces based on network address and response body
CN117725555B (en) Multi-source knowledge tree association fusion method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant