CN117609894B

CN117609894B - Partition strategy-based high-performance message classification method, equipment and medium

Info

Publication number: CN117609894B
Application number: CN202410094291.9A
Authority: CN
Inventors: 钟金诚; 陈曙晖; 虞万荣; 王飞; 魏子令
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2024-01-23
Filing date: 2024-01-23
Publication date: 2024-04-09
Anticipated expiration: 2044-01-23
Also published as: CN117609894A

Abstract

The application relates to a high-performance message classification method, equipment and medium based on partition strategy. The method comprises the following steps: defining a rule set according to a plurality of metadata fields and constructing a decision tree; the decision tree comprises partition nodes and partition nodes; the method comprises the steps of distributing partition nodes and partition node layers in a decision tree, wherein an odd layer is the partition node, an even layer is the partition node, classifying messages according to the partition node, the partition node and the leaf node, sequentially searching sub-nodes of the partition node by network messages when the partition node is arranged, generating a sub-node list positioning value by the network messages according to masks of the partition node when the partition node is arranged, searching only one sub-node according to the positioning value for rule matching, organizing the contained rules in a linked list according to priority order when the leaf node is arranged, and sequentially searching the linked list until the rule matching is achieved. By adopting the method, the message classification performance can be improved.

Description

Partition strategy-based high-performance message classification method, equipment and medium

Technical Field

The present disclosure relates to the field of packet classification technologies, and in particular, to a method, an apparatus, and a medium for classifying a high-performance packet based on a partition policy.

Background

Message classification is a fundamental problem of computer networks, and algorithms for solving the message classification problem are widely used in various network devices and functions such as routers, switches, firewalls, and network intrusion detection systems.

The message classification problem relates to a rule set, wherein each rule consists of three parts, namely priority, a matching domain and actions taken after successful matching. Wherein the rule matching field is defined by message header metadata (e.g., IP address, port number, etc.), which determines how a rule matches. The message classification problem is to search the rule set for matching the network message and return a matching rule with the highest priority.

The current method for solving the message classification problem comprises the following steps: decision tree methods, tuple space methods, hybrid methods of decision tree and tuple space, etc. The decision tree method is difficult to support dynamic updating of rule sets and memory explosion can occur on a large-scale rule set due to the difficulty in avoiding rule duplication (namely, one rule is distributed to a plurality of child nodes) when child nodes are partitioned. The tuple space method has the problems of more multi-field combined tuples and low classification performance. The mixing method of decision tree and tuple space balances the decision tree and tuple space, so that a better balance of performances in all aspects can be achieved, but the classification performance is still lower.

Disclosure of Invention

Based on this, it is necessary to provide a method, a device and a medium for classifying a high-performance message based on a partition policy, which can improve the classification performance of the message.

A method for classifying high-performance messages based on partition strategy, the method comprising:

acquiring a network message header; the network message header comprises a plurality of metadata fields;

defining a rule set according to a plurality of metadata fields and constructing a decision tree; the decision tree comprises partition nodes and partition nodes;

partition nodes and partition node interlayer distribution in the decision tree, wherein an odd layer is a partition node, an even layer is a partition node, and nodes, containing rules, of which the number is smaller than a preset threshold value in the partition nodes and the partition nodes are set as leaf nodes;

the method comprises the steps that message classification is carried out according to partition nodes, partition nodes and leaf nodes, when the partition nodes are arranged, the network messages sequentially search sub-nodes of the partition nodes in sequence, and pruning of subsequent sub-nodes is carried out according to the highest priority of matched rules in front terminal nodes in the searching process; when the nodes are divided, the network message generates a child node list positioning value according to the mask of the divided nodes, only one child node is searched for rule matching according to the positioning value, the contained rules are organized in a linked list according to the priority order when the leaf nodes are arranged, the linked list is searched for in order until the rule matching is hit, and the matching is finished, so that the message classification is finished;

when the rule set is updated dynamically, the rule to be updated enters the decision tree from the root node to reconstruct the decision tree.

In one embodiment, defining a rule set and constructing a decision tree from a plurality of metadata fields includes:

the root node of the decision tree is initially defined as a partition node, a sub-node is constructed according to a rule set by adopting a heuristic method in the partition node, a new partition is generated by traversing the rule set once according to the heuristic method, and each new partition constructs a sub-partition node until the residual rule number of the rule set is less than or equal to a leaf node threshold value and is used for constructing a sub-leaf node.

In one embodiment, constructing child nodes from rule sets using heuristics in partition nodes includes:

firstly, setting a minimum mask valid bit number threshold B; initializing a mask M value to be all 1, traversing the rule set according to the priority order, and performing AND operation on the mask of each rule R in the rule set and M in sequence; when the effective bit number of the operation result mask is greater than or equal to B, removing the rule set from the rule R, and placing the rule set in a new partition and updating M at the same time, otherwise, reserving the rule R in the rule set and dividing the rule set into other partitions in the subsequent process; each new partition constructs a sub-partition node until the remaining number of rules of the rule set is less than or equal to the leaf node threshold and is used to construct a sub-leaf node.

In one embodiment, when dynamic rule updating is performed, a rule to be updated enters a decision tree from a root node to perform decision tree reconstruction, including:

when the rule is in the partition node of the decision tree, traversing all child nodes of the partition node in sequence until the update is completed;

when the rule is in the dividing node of the decision tree, generating a child node list positioning value according to the mask of the dividing node, positioning one child node according to the positioning value and attempting to finish updating in the child node;

when the rule is inserted into the leaf node of the decision tree, inserting the rule into a proper position of a rule linked list, when the rule is deleted, sequentially matching each rule, and deleting the rule when the rule is equal; when the leaf node rule number is greater than a predefined threshold or equal to 0, reconstructing a part of the structure of the decision tree.

In one embodiment, after rule deletion, when a node contains a rule number equal to 0, a pointer low significant digit marking strategy is adopted to mark the node; the pointer low significant digit marking strategy marks whether each node contains rules or not by using a pointer idle bit in a child node pointer list of a father node, and only selects nodes which are marked as not containing rules by traversing the decision tree when occupied memory exceeds a certain threshold.

A computer device comprising a memory storing a computer program and a processor which when executing the computer program performs the steps of:

A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

The method, the device and the medium for classifying the high-performance messages based on the partition strategy are characterized in that firstly, a rule set is defined according to a plurality of metadata fields, and a decision tree is constructed; the decision tree comprises partition nodes and partition nodes; partition nodes and partition node interlayer distribution in the decision tree, wherein an odd layer is a partition node, an even layer is a partition node, and nodes, containing rules, of which the number is smaller than a preset threshold value in the partition nodes and the partition nodes are set as leaf nodes; the method comprises the steps that message classification is carried out according to partition nodes, partition nodes and leaf nodes, when the partition nodes are arranged, the network messages sequentially search sub-nodes of the partition nodes in sequence, and pruning of subsequent sub-nodes is carried out according to the highest priority of matched rules in front terminal nodes in the searching process; when the nodes are divided, the network message generates a child node list positioning value according to the mask of the divided nodes, only one child node is searched for rule matching according to the positioning value, the contained rules are organized in a linked list according to the priority order when the leaf nodes are arranged, the linked list is searched for in order until the rule matching is hit, and the matching is finished, so that the message classification is finished; when the rule set is updated dynamically, the rule to be updated enters the decision tree from the root node to reconstruct the decision tree, the rule replication problem of the decision tree method is avoided through the partition strategy, the rapid message classification and the dynamic rule update can be performed, the rule replication problem of the existing decision tree algorithm is solved, and the method has the characteristics of small occupied space, high classification performance and support of dynamic rule update.

Drawings

FIG. 1 is a flow chart of a method for classifying high-performance messages based on partition strategy in one embodiment;

FIG. 2 is a diagram of an example message classification rule set in one embodiment;

FIG. 3 is a schematic diagram of a decision tree structure constructed on an example rule set in one embodiment;

FIG. 4 is a diagram illustrating a search process for message classification in another embodiment;

FIG. 5 is a flow diagram of dynamic rule insertion in one embodiment;

FIG. 6 is a flow logic diagram of rule partitioning in one embodiment;

fig. 7 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

In one embodiment, as shown in fig. 1, a method for classifying high-performance messages based on partition strategies is provided, which includes the following steps:

step 102, obtaining a network message header; the network message header comprises a plurality of metadata fields; defining a rule set according to a plurality of metadata fields and constructing a decision tree; the decision tree includes partition nodes and partition nodes.

And 104, distributing partition nodes and partition node layers in the decision tree, wherein an odd layer is a partition node, an even layer is a partition node, and the partition node comprise nodes with the rule number smaller than a preset threshold value as leaf nodes. The leaf node is set to be necessary for the normal operation of the subsequent message classification function. The leaf nodes are termination nodes of the search paths, and one message classification search path ends at the leaf nodes.

By designing partition nodes and partition nodes in the decision tree, the rule replication problem of the decision tree method is avoided, and rapid message classification and dynamic rule update can be performed.

Step 106, classifying the messages according to the partition nodes, the partition nodes and the leaf nodes, and sequentially searching the sub-nodes of the partition nodes by the network messages when the network messages are in the partition nodes, and pruning the subsequent sub-nodes according to the highest priority of the matched rule in the front terminal node in the searching process; when the nodes are divided, the network message generates a child node list positioning value according to the mask of the divided nodes, only one child node is searched for rule matching according to the positioning value, the contained rules are organized in a linked list according to the priority order when the nodes are in leaf nodes, the linked list is searched for in order until the rule matching is hit, and the matching is finished, so that the message classification is finished.

In this step, the following packet classification procedure is described according to an example, and it is assumed that a packet p whose values in the X domain and the Y domain are (11, 10) is subjected to the packet classification procedure as shown in fig. 4. The message p firstly enters a root node A, and the root node A is a partition node, so that the message p is sequentially searched in child nodes B, C, D of the A node; searching a B point, wherein the B point is a divided node, a public determination bit mask is (11, 00), and performing AND operation based on the mask and the p value to obtain a search key value key= (11), wherein the key value (11) is not matched with the values of child nodes E (01) and F (10) of the B node, and the message p is not matched with any rule in the B node; continuing searching the C node, wherein the C node is a dividing node, the public determination bit mask is (00, 11), and performing AND operation based on the mask and the p value to obtain a search key value key= (/ 10), wherein the key value is equal to the G node value (/ 10), so that the message p is matched with the G node containing rule R3; because the priority of the matched rule R3 is higher than the highest priority of the rule contained in the C node, the C node is pruned without searching, and finally the optimal matching rule of the message p is determined to be R3.

And step 108, when the rule set is updated dynamically, the rule to be updated enters the decision tree from the root node to reconstruct the decision tree.

By continuously updating the dynamic rules of the rule set, the method and the device can adapt to various message data, and can enable the reconstructed decision tree to match the message to the corresponding rule as soon as possible to finish message classification, so that the classification speed is improved.

In the above-mentioned high-performance packet classification method based on partition strategy, firstly define rule sets according to multiple metadata fields and construct decision trees; the decision tree comprises partition nodes and partition nodes; partition nodes and partition node interlayer distribution in the decision tree, wherein an odd layer is a partition node, an even layer is a partition node, and nodes, containing rules, of which the number is smaller than a preset threshold value in the partition nodes and the partition nodes are set as leaf nodes; the method comprises the steps that message classification is carried out according to partition nodes, partition nodes and leaf nodes, when the partition nodes are arranged, the network messages sequentially search sub-nodes of the partition nodes in sequence, and pruning of subsequent sub-nodes is carried out according to the highest priority of matched rules in front terminal nodes in the searching process; when the nodes are divided, the network message generates a child node list positioning value according to the mask of the divided nodes, only one child node is searched for rule matching according to the positioning value, the contained rules are organized in a linked list according to the priority order when the leaf nodes are arranged, the linked list is searched for in order until the rule matching is hit, and the matching is finished, so that the message classification is finished; when the rule set is updated dynamically, the rule to be updated enters the decision tree from the root node to reconstruct the decision tree, the rule replication problem of the decision tree method is avoided through the partition strategy, the rapid message classification and the dynamic rule update can be performed, the rule replication problem of the existing decision tree algorithm is solved, and the method has the characteristics of small occupied space, high classification performance and support of dynamic rule update.

In a specific embodiment, taking the two-dimensional rule set given in fig. 2 as an example, assume that the leaf node threshold is 1 (i.e., a node containing a rule number of 1 or less is a leaf node); as shown in fig. 3, firstly, the rule set is placed in a decision tree root node a, the root node is set as a partition node by default, and rule set partition operation is performed: the rule set is divided into three subsets { R1, R2}, { R3, R4} and { R5} and placed in three child nodes B, C and D, respectively;

as the child nodes of the partition node are B, C and D partition nodes, wherein the D node only comprises one rule as a leaf node, and no further child node generation is performed; sub-space division is performed on the node B and the node C according to a common determination bit mask (11, 00), for example, the node B includes rules R1 (01) and R2 (10), and the common determination bits between the rules are two bits of the X domain, so that the node B can be divided into sub-nodes E and F; similarly, the node C includes rules R3 (10) and R4 (00) divided into child nodes G and H according to a common deterministic bit mask (00, 11);

child nodes E, F, G and H generated by the B, C node are partition nodes, and because each node contains only one rule, is smaller than the leaf node threshold value and is therefore also a leaf node, the further construction of child nodes is stopped, and the whole decision tree construction process is completed;

specifically, the regular partition flow in one partition node is shown in fig. 6, and the root node in fig. 3 is taken as an example for partitioning, assuming that a minimum mask valid bit number threshold b=2 is set. Initializing a mask M= (11, 11), traversing a rule set according to a priority order, performing AND operation on the mask (11, 00) of the rule R1 and M to obtain M ' = (11, 00), wherein the effective bit number of M ' is 2 and is more than or equal to a threshold value, so that the rule R1 is placed in a new partition, and M is updated by the value of M '; continuing to try rule R2, the result of AND operation between mask (11, 00) and M of rule R2 is still (11, 00), M significant bit is unchanged, so rule R2 is also placed in the new partition; the result of AND operation between the masks (00, 11) and M of the rule R3 is (00, 00), the effective bit number of the public mask is reduced to 0 and is smaller than B, so that the rule R3 is reserved in the rule set and is not placed in a new partition; rules R4 and R5 are identical to rule R3, and the mask and M are AND-operated to make the effective bit number smaller than B, so that the effective bit number is reserved in the rule set; traversing the rule set once to generate a first partition, wherein the first partition comprises rules { R1, R2}; because the rule set has more rules and the number of the rules is larger than the threshold value of the leaf node, performing second traversal according to the process to generate a second partition, wherein the second partition comprises rules { R3, R4}; after the second rule set traversal, only the remaining rule R5 in the rule set is less than or equal to the leaf node threshold, so the third partition contains only rule R5.

In a specific embodiment, taking an insertion rule R6 (11) as an example, the process of inserting the rule R6 is shown in fig. 5. The rule insertion process is similar to the message classification process, the rule R6 firstly enters the root node A, and as A is a partition node, the child nodes of A are tried to be traversed in sequence to find the child nodes capable of completing the rule insertion; the rule R6 enters a point B, B is a dividing node, a public mask of bits is determined to be (11, 00), the public mask is compatible with the value of R6, and the mask and the value of R6 are subjected to AND operation to obtain key= (11); since the key value is different from the E, F node value, the node B generates a new child node I (whose value is (11), and inserts the rule R6 into the rule linked list of the node I, thereby completing rule insertion.

In a specific embodiment, frequent memory allocation and recovery are avoided during rule updating through a marking strategy; only nodes which are marked as not containing rules when the performance is seriously affected by excessive occupied memory are selected to traverse the decision tree.

It should be understood that, although the steps in the flowchart of fig. 1 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 1 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of other steps or sub-steps of other steps.

In one embodiment, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a high performance message classification method based on partition policies. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structure shown in fig. 7 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. The method for classifying the high-performance messages based on the partition strategy is characterized by comprising the following steps:

defining a rule set according to the metadata fields and constructing a decision tree; the decision tree comprises partition nodes and partition nodes;

partition nodes and partition node interlayer distribution in the decision tree, wherein an odd layer is a partition node, an even layer is a partition node, and nodes, which contain rules, of which the number is smaller than a preset threshold, in the partition node and the partition node are set as leaf nodes;

the method comprises the steps that message classification is carried out according to partition nodes, partition nodes and leaf nodes, when the partition nodes are arranged, network messages are sequentially searched for sub-nodes of the partition nodes in sequence, and pruning of subsequent sub-nodes is carried out according to the highest priority of matched rules in front terminal nodes in the searching process; when the nodes are divided, the network message generates a child node list positioning value according to the mask of the divided nodes, only one child node is searched for rule matching according to the positioning value, the contained rules are organized in a linked list according to the priority order when the nodes are in the leaf nodes, the linked list is searched for in order until the rule matching is hit, and the matching is finished, so that the message classification is finished;

when the rule set is updated dynamically, the rule to be updated enters a decision tree from a root node to reconstruct the decision tree;

defining a rule set and constructing a decision tree from the plurality of metadata fields, comprising: the root node of the decision tree is initially defined as a partition node, a sub-node is constructed according to a rule set by adopting a heuristic method in the partition node, a new partition is generated by traversing the rule set once according to the heuristic method, and each new partition constructs a sub-partition node until the residual rule number of the rule set is less than or equal to a leaf node threshold value and is used for constructing a sub-leaf node;

when the dynamic rule is updated, the rule to be updated enters a decision tree from a root node to reconstruct the decision tree, and the method comprises the following steps:

when in the partition nodes of the decision tree, traversing all the child nodes of the partition nodes in sequence by rules until updating is completed;

when the rule is in the dividing node of the decision tree, generating a positioning value of a child node list according to the mask of the dividing node, positioning one child node according to the positioning value and attempting to finish updating in the child node;

2. The method of claim 1, wherein constructing child nodes from rule sets using heuristics in the partition nodes comprises:

3. The method according to claim 1, wherein the method further comprises:

after rule deletion, when a node contains a rule number equal to 0, marking the node by adopting a pointer low-significance digit marking strategy; the pointer low-significance digit marking strategy marks whether each node contains rules or not by using a pointer idle bit in a child node pointer list of a father node, and only selects nodes which are marked as not containing rules by traversing the decision tree when occupied memory exceeds a certain threshold.

4. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 3 when the computer program is executed.

5. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 3.