WO2019042305A1 - 网包分类决策树的建立 - Google Patents

网包分类决策树的建立 Download PDF

Info

Publication number
WO2019042305A1
WO2019042305A1 PCT/CN2018/102845 CN2018102845W WO2019042305A1 WO 2019042305 A1 WO2019042305 A1 WO 2019042305A1 CN 2018102845 W CN2018102845 W CN 2018102845W WO 2019042305 A1 WO2019042305 A1 WO 2019042305A1
Authority
WO
WIPO (PCT)
Prior art keywords
decision tree
classification rule
type
classification
leaf node
Prior art date
Application number
PCT/CN2018/102845
Other languages
English (en)
French (fr)
Inventor
徐达维
任凯
葛长忠
Original Assignee
新华三技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 新华三技术有限公司 filed Critical 新华三技术有限公司
Priority to US16/643,484 priority Critical patent/US11184279B2/en
Priority to JP2020512410A priority patent/JP6997297B2/ja
Priority to EP18851181.0A priority patent/EP3661153B1/en
Publication of WO2019042305A1 publication Critical patent/WO2019042305A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/02Capturing of monitoring data
    • H04L43/028Capturing of monitoring data by filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2441Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery
    • H04L45/08Learning-based routing, e.g. using neural networks or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/48Routing tree calculation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/22Parsing or analysis of headers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/06Generation of reports
    • H04L43/062Generation of reports related to network traffic

Definitions

  • the classification of the network packet refers to finding the pre-configured classification rule according to the value of each field in the packet header of the network packet, obtaining the classification rule with the highest priority matching, and performing the operation of configuring the classification rule.
  • Network packet classification is required for many network devices such as access control, flow control, load balancing, or intrusion detection.
  • the multi-domain segmentation algorithm based on decision tree is a typical network packet classification method.
  • the basic idea is to recursively divide the whole multidimensional space into subspaces.
  • the condition for the end of recursion is that all the rules contained in the current subspace fill this subspace in each dimension. This process can get a decision tree.
  • each network packet to be classified is searched, thereby obtaining a classification rule for matching the network packet.
  • FIG. 1 is a schematic flowchart of a method for establishing a network packet classification decision tree according to an embodiment of the present invention.
  • FIG. 2A is a schematic structural diagram of a network packet classification according to an embodiment of the present invention.
  • FIG. 2B is a schematic diagram of the principle of an iterator unit of a TreeBuilder according to an embodiment of the present invention.
  • 3A-3E are schematic diagrams of a decision tree provided by an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a search path of a network packet classification according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of hardware of a network packet classification decision tree establishing apparatus according to an embodiment of the present invention.
  • FIG. 6 is a functional structural diagram of a network packet classification decision tree establishment control logic according to an embodiment of the present invention.
  • the classification rule set includes multiple classification rules.
  • a classification rule includes dimension information and priority.
  • each dimension information is represented by a range of values.
  • the priority is the matching result used to determine which rule is returned when multiple rules satisfy the matching condition.
  • the matching result of the classification rule with the highest priority is returned as an example.
  • the format of the classification rule set can be as shown in Table 1.
  • the classification rule set in Table 1 contains a total of 9 classification rules of r0-r8.
  • sip source IP address
  • dip destination IP address
  • sport source port number
  • dport destination port number
  • prot protocol type
  • vlan Virtual Local Area Network
  • sysport system
  • the dimension information prot may include, but is not limited to, tcp (Transmission Control Protocol) or UDP (User Datagram Protocol, etc.).
  • tcp Transmission Control Protocol
  • UDP User Datagram Protocol
  • a classification rule template is a collection of template items.
  • a classification rule template can include one or more template items.
  • a template item includes one or more dimension information.
  • the format of the classification rule template including the three template items can be as shown in Table 2:
  • the template dimension is the dimension included in the template item.
  • the template items shown in Table 2 include five template dimensions in the classification rules, namely sip, dip, sport, dport, and prot.
  • a non-template dimension is a dimension other than the template dimension in the classification rule.
  • the classification rules shown in Table 1 include two non-template dimensions, namely vlan and sysport.
  • Template instantiation refers to applying a classification rule template to one or more non-template dimensions to obtain one or more classification rules.
  • the classification rules r0, r1, and r2 in the classification rule set shown in Table 1 can be obtained.
  • the values of the template dimensions sip, dip, sport, dport, and prot of the three classification rules respectively correspond to the values of the corresponding template dimensions in the classification rule template shown in Table 2, and the values of the non-template dimensions sysport are all 0x200.
  • ⁇ r0, r1, r2 ⁇ , ⁇ r3, r4, r5 ⁇ and ⁇ r6, r7, r8 ⁇ may be referred to as a subset of classification rules in the classification rule set shown in Table 1.
  • the hole-filling rule (fb_rule) is a classification rule in which all dimensions have a value of "*".
  • the hole-filling rule is the lowest priority classification rule. In other words, the classification rules included in the classification rule set have higher priority than the supplementary hole rules.
  • SPSR Same Pattern Sub Ruleset
  • SPSR refers to a subset of multiple classification rules obtained by applying a classification rule template to a plurality of different non-template dimensions.
  • applying a classification rule template to a plurality of different non-template dimensions may be referred to as corresponding different non-template dimension instantiations.
  • the classification rule subsets ⁇ r0, r1, r2 ⁇ , ⁇ r3, r4, r5 ⁇ and ⁇ r6, r7, r8 ⁇ are mutually identical subsets.
  • the hole-filling rules and themselves are the same subset of the model.
  • the same mode subset includes a plurality of classification rule subsets that are mutually identical subsets.
  • the same-mode subset set corresponding to the hole-filling rule only includes the classification rule of the hole-filling rule.
  • FIG. 1 is a schematic flowchart diagram of a method for establishing a network packet classification decision tree according to an embodiment of the present invention.
  • the method for establishing a packet classification decision tree can be applied to a network device.
  • the network device can include, but is not limited to, a switch, a router, and the like.
  • the method for establishing a packet classification decision tree may include the following steps:
  • Step 101 Create a classification rule set according to a non-template dimension of the classification rule set, and generate a first type decision tree.
  • the first type of decision tree is a decision tree corresponding to the non-template dimension of the classification rule set.
  • the classification rule set does not specifically refer to a fixed classification rule set, but may be any classification rule set used for performing network packet classification, and will not be repeated later.
  • the classification rule set needs to be built according to the non-template dimension of the classification rule set, and the classification rule set corresponding to the non-template dimension decision tree is generated. This is referred to as the first type of decision tree, or a non-template dimension decision tree.
  • non-template dimension decision tree For ease of understanding, the following description of the first type of decision tree is referred to as a non-template dimension decision tree.
  • the network device may establish a classification rule set based on a spatial segmentation manner.
  • the network device when the network device constructs the classification rule set according to the non-template dimension of the classification rule set, if the classification rule is not matched according to the selected dimension and the value, Then, the network device may set the classification rule included in the leaf node that does not match the classification rule as a hole filling rule. Wherein, when the subspace obtained by the spatial segmentation only contains the fill hole rule and does not include any other classification rules, the end condition is satisfied.
  • Step 102 Perform tree construction on each leaf node of the first type decision tree according to a template dimension of the classification rule set to generate a second type decision tree.
  • the second type decision tree is a decision tree corresponding to the template dimension of the leaf node of the first type decision tree.
  • each leaf node of the non-template dimension decision tree since the non-template dimension decision tree only performs spatial segmentation on each classification rule in the classification rule set according to the non-template dimension, each leaf node of the non-template dimension decision tree does not complete spatial segmentation, and also needs to follow the template dimension. Further spatial segmentation.
  • the leaf nodes of the non-template dimension decision tree are further spatially segmented according to the template dimension of the classification rule set to obtain a non-template dimension decision tree.
  • Each leaf node corresponds to a decision tree of a template dimension (referred to herein as a second type decision tree, or a template dimension decision tree).
  • the following description of the second type of decision tree is referred to as a template dimension decision tree.
  • the template dimension decision tree is obtained by spatially segmenting the leaf nodes of the non-template dimension decision tree. Therefore, the template dimension decision tree may be referred to as a subtree of the non-template dimension decision tree.
  • leaf node of the non-template dimension decision tree by the network device according to the template dimension of the classification rule set will be described below with reference to an example, and details are not described herein.
  • the values of the dimensions of the fill hole rule are all “*”, that is, any value within the range of values of each dimension may be used. Therefore, the hole is added according to the template dimension.
  • the rules for spatial segmentation still result in the hole-filling rules.
  • the template nodes that are spatially segmented according to the template dimension are still the hole-filling rules.
  • Step 103 Associate multiple leaf nodes in the first type decision tree that are mutually identical subsets with the same second type decision tree.
  • the plurality of classification rule subsets that are mutually identical subsets have the same template dimension, which is: when using the same algorithm, the plurality of classification rule subsets that are mutually identical subsets according to the template dimension
  • the decision tree is established, the tree structure of the obtained multiple template dimension decision trees is the same, but only the leaf nodes are different. Therefore, the template dimension decision trees corresponding to the plurality of classification rule subsets that are mutually identical subsets can be reused as the same template dimension decision tree.
  • the established template dimension decision tree may be reused, so that the non-template dimension decision tree is mutually identical.
  • Multiple leaf nodes are associated with the same template dimension decision tree to reduce the number of template dimension decision trees, thereby reducing the number of nodes in the decision tree corresponding to the classification rule set.
  • the plurality of leaf nodes that are mutually identical subsets means that the subsets of the classification rules corresponding to the plurality of leaf nodes are mutually the same subset of the modules.
  • the classification rule subset corresponding to the leaf node refers to a collection of classification rules included by the leaf node.
  • the decision tree corresponding to the classification rule set includes a non-template dimension decision tree and a template dimension decision tree.
  • the associating the plurality of leaf nodes that are mutually identical subsets in the first type of decision tree with the same second type of decision tree may include: using different ones of the same mode subset The classification rule subset is divided into the same same-mode subset group; the identifiers of the same-mode subsets to which the plurality of leaf nodes of the same-type subset of the first-type decision tree and the corresponding classification rule subset belong and the classification are established a mapping relationship between the indexes of the rule subsets in the same mode subset group; combining the second type decision trees corresponding to the plurality of leaf nodes of the same type of the same type of decision tree into the same second Type decision tree.
  • the leaf nodes of the second type decision tree obtained by the combination include a subset of the classification rules corresponding to each index in the same mode subset group.
  • the subset of the classification rules corresponding to the leaf node is a set of classification rules included by the leaf node.
  • the plurality of classification rule subsets obtained by instantiating the same classification rule template corresponding to different non-template dimensions are mutually identical subsets, and the plurality of classification rule subsets of the same model subset are mutually identical subsets. group.
  • the classification rule subset ⁇ r6, r7, r8 ⁇ will display the classification rule template shown in Table 2.
  • the classification rule subset ⁇ r0, r1, r2 ⁇ , ⁇ r3, r4, r5 ⁇ and ⁇ r6, r7, r8 ⁇ are mutually identical subsets and constitute one same Model set.
  • an index of the classification rule subset in the same model subset may be set for each classification rule subset in the same mode subset group.
  • a corresponding subset of classification rules can be determined based on the identity of the same subset of subsets (eg, the name of the same subset of subsets) and the index in the set of identical subsets.
  • the classification rule subset ⁇ r0, r1, r2 ⁇ , ⁇ r3, r4, r5 ⁇ and ⁇ r6, r7, r8 ⁇ belong to the same-module subset spsrg1 (the identity of the same-module group is spsrg1)
  • the classification The index of the rule subset ⁇ r0, r1, r2 ⁇ , ⁇ r3, r4, r5 ⁇ and ⁇ r6, r7, r8 ⁇ in the same-module subset spsrg1 is 0, 1, and 2, respectively, and the classification rule subset ⁇ r0 , r1, r2 ⁇ can be mapped to index 0 in the same mode subset set spsrg1, and the classification rule subset ⁇ r3, r4, r5 ⁇ can be mapped to index 1 in the same mode subset set spsrg1, classification rule subset ⁇ r6 , r7, r8 ⁇ can be mapped to index 2 in
  • the mutually-simulated subsets in the non-template dimension decision tree may be established.
  • the leaf node corresponding to the classification rule subset ⁇ r0, r1, r2 ⁇ is taken as an example, and the same mode subset group to which the leaf node and the classification rule subset ⁇ r0, r1, r2 ⁇ belong can be established.
  • the identifier spsrg1 and the mapping relationship between the classification rule subset ⁇ r0, r1, r2 ⁇ in the same mode subset group 0, and store the classification rule subset ⁇ r0, r1, r2 ⁇ in the leaf node The identity of the same mode subset set spsrg1 and the classification rule subset ⁇ r0, r1, r2 ⁇ in the same mode subset set spsrg1 index 0.
  • the network device may merge the template dimension decision trees corresponding to the plurality of classification rule subsets in the same same mode subset into one template dimension decision tree, and each leaf node of the merged template dimension decision tree may include the same mode subset A subset of the classification rules corresponding to each index in the group.
  • the template dimension decision trees corresponding to each of the plurality of classification rule subsets that are mutually identical subsets have the same structure, but only the leaf nodes are different.
  • each leaf node in the template dimension decision tree corresponding to the leaf node will include different non-template dimensions corresponding to the same template item.
  • Instantiate the obtained multiple classification rules, and only the classification rules with the highest priority among the multiple classification rules will take effect. Therefore, for a leaf node that includes a plurality of classification rule subsets that are mutually identical subsets of the non-template dimension decision tree, the network device may delete the lower priority classification rule subset included in the leaf node. .
  • the network device may first The classified network packet searches for a non-template dimension decision tree corresponding to the classification rule set, and determines a leaf node (referred to as a first target leaf node in the non-template dimension decision tree corresponding to the to-be-classified network packet), and obtains the first
  • the identifier of the same-mode subset set (referred to herein as the target homomorphic subset) stored in the target leaf node and the index of the classification rule subset (referred to herein as the target classification rule subset) in the target same-module subset ( This is called the target index).
  • the network device may search, according to the network packet to be classified, a template dimension decision tree corresponding to the target module group (referred to as a target template dimension decision tree) to determine a target template dimension decision tree corresponding to the to-be-classified network packet.
  • Leaf node (referred to herein as the second target leaf node).
  • the network device may determine, according to the classification rule corresponding to the target index included in the second target leaf node, a classification rule that matches the network packet to be classified.
  • FIG. 2A a schematic structural diagram of a network packet classification according to an embodiment of the present invention is shown.
  • a CPU Center Process Unit
  • a decision tree creation unit Tie Builder
  • the input classification rule set is compiled into a decision tree and sent to an FPGA (Field Programmable Gate Array) 220.
  • the FPGA 220 searches the decision tree corresponding to the classification rule set through the lookup engine (221) to determine the matching classification rule.
  • the decision tree creation unit 211 can complete the establishment of the decision tree in an iterative manner, and the core thereof is a heuristic selector 211-1 based on dimensions and values (Dim&Value, D&V for short).
  • the heuristic selector inputs a rule set (RS), and outputs two rule subsets and a decision tree node (Decision Tree Node, DT Node for short).
  • the heuristic selector selects a dimension and a value from the input dimension list (Dim List) according to the input rule set, and then divides the input rule set into two subsets according to the selected dimension and value. .
  • the rule is classified into the left subset; if the value range of the corresponding dimension of the rule is greater than or equal to the selected value, the rule is classified into the right If the value range of the corresponding dimension of the rule covers the selected value, the rule is divided into two, the smaller part is divided into the left subset, and the greater than or equal part is classified into the right subset. In this way, a spatial segmentation of the rule set is done by taking the selected dimension and value as a decision tree node. At the same time, the obtained two rule subsets continue to perform the above operations iteratively until the classification rule subset can no longer be divided.
  • the schematic diagram of the iterative unit in the decision tree creation unit 211 can be as shown in FIG. 2B.
  • the implementation process of the network packet classification decision tree establishment solution provided by the embodiment of the present invention may be as follows:
  • the non-template dimensions of the classification rule set include vlan and sysport, and the template dimensions include sip, dip, sport, dport, and prot.
  • the classification rule set shown in Table 1 is constructed according to the non-template dimensions vlan and sysport to obtain a non-template dimension decision tree corresponding to the classification rule set.
  • the structure diagram of the non-template dimension decision tree may be as shown in FIG. 3A.
  • the leaf nodes of the non-template dimension decision tree shown in FIG. 3A only include one hole-filling rule, and the leaf nodes of the rectangular frame contain multiple classification rules.
  • the leaf nodes of the non-template dimension decision tree shown in FIG. 3A are constructed according to the template dimensions sip, dip, sport, dport, and prot, to obtain a template dimension decision tree corresponding to each leaf node of the non-template dimension decision tree.
  • the template dimension decision tree obtained by constructing a leaf node that includes only one fill hole rule according to the template dimension includes only one node, and the node only includes the fill hole rule.
  • a schematic diagram of a template dimension decision tree obtained by constructing the leaf node according to the template dimension may be as shown in FIG. 3B;
  • a schematic diagram of a template dimension decision tree obtained by constructing the leaf node according to the template dimension may be as shown in FIG. 3C.
  • the classification rule subsets ⁇ r0, r1, r2 ⁇ and ⁇ R3, r4, and r5 ⁇ are mutually identical subsets. Therefore, the tree structure of the corresponding template dimension decision tree is the same, but only the leaf nodes are different.
  • the leaf nodes at the same position in the template dimension decision tree shown in FIG. 3B and FIG. 3C are respectively classified rules obtained by instantiating the same template item corresponding to different non-template dimensions.
  • the structure of the corresponding template dimension decision tree is also the same as the template dimension decision tree shown in FIG. 3B and FIG. 3C, and the leaf nodes at the same position are also instantiated by the same template item.
  • the leaf nodes including the classification rules r0, r1, r2, r6, r7, and r8 are ⁇ r0, r1, r2 ⁇ and ⁇ r6, r7, r8.
  • mutually is a subset of the same model. Therefore, the leaf nodes that do not contain the hole-filling rules in the template dimension decision tree based on the template dimension contain two classification rules, and the two rules are the same template item. It is instantiated corresponding to different non-template dimensions. Specifically, the classification rules r0 and r6 are obtained by instantiating different non-template dimensions corresponding to the template item me0 shown in Table 2.
  • the classification rules r1 and r7 are different non-template dimensions corresponding to the template item me1 shown in Table 2.
  • the classification rules r5 and r8 are obtained by instantiating different non-template dimensions corresponding to the template item me2 shown in Table 2.
  • the returned matching result is a higher priority classification rule. Therefore, for a leaf node that includes a plurality of classification rule subsets that are mutually identical subsets, it is only necessary to retain a subset of the classification rules with higher priority among the plurality of classification rule subsets of the same mode subset, and the priority can be prioritized.
  • the lower classification rule subset is filtered out, that is, only the classification rule subset ⁇ r0, r1, r2 ⁇ is retained, and the classification rule subset ⁇ r6, r7, r8 ⁇ is deleted. Furthermore, the schematic diagram of the template dimension decision tree corresponding to the leaf node is also shown in FIG. 3B.
  • the identity of the same-module subset corresponding to the hole-filling rule is spsrg0
  • the index of the hole-filling rule in the same-module subset spsrg0 is 0;
  • the identifier of the same-module subset corresponding to ⁇ r6, r7, r8 ⁇ is spsrg1
  • the classification rule subsets ⁇ r0, r1, r2 ⁇ , ⁇ r3, r4, r5 ⁇ and ⁇ r6, r7, r8 ⁇ are in the same
  • the indexes in the model set spsrg1 are 0, 1, and 2, respectively.
  • the template dimension decision trees corresponding to the plurality of leaf nodes of the same mode subset are merged into the same template dimension decision tree.
  • the leaf nodes in the template dimension decision tree obtained by the combination include classification rules corresponding to the indexes in the same mode subset group.
  • the template of the same-mode subset corresponding to the hole-filling rule includes only the rule of the hole-filling rule. Therefore, the template-dimension decision tree corresponding to the hole-filling rule includes only one leaf node, and the leaf node can directly record the Fill the hole rules.
  • the structure of the decision tree corresponding to the finally generated classification rule set shown in Table 1 may be as shown in FIG. 3E, including the non-template dimension decision tree 310 and the template dimension decision tree 320.
  • Sip Dip Sport Dport Prot Vlan Sysport Network package 1 1.1.1.1 10.1.1.10 12000 12000 Udp 10 0x200
  • Network package 2 1.1.1.1 10.1.1.10 12000 80 Tcp 30 0x100
  • the search path of the network packet 1 may be as shown by the double arrow in the solid line in FIG. 4, and the leaf nodes in the template dimension decision tree 320 that hits include three classification rules r2, r5, and r8;
  • the index in the leaf node of the template dimension decision tree 310 is 0 (illustrated as spsr: 0), so the net package 1 eventually hits the classification rule r2.
  • the search path of the network packet 2 can be as shown by the double-dashed double arrow in FIG. 4, and the leaf nodes in the template dimension decision tree 320 that hits include three classification rules r0, r3, and r6; and the non-template dimension decision that passes through it
  • the index in the leaf node of tree 310 is 2 (illustrated as spsr: 2), so network packet 2 eventually hits classification rule r6.
  • the first type of decision tree is generated by constructing a classification rule set according to the non-template dimension of the classification rule set, and the template dimension pair according to the classification rule set
  • Each leaf node in a type of decision tree is constructed to generate a second type of decision tree, so that multiple leaf nodes of the same type of decision tree are associated with the same type of second type decision tree, which can effectively reduce the template dimension.
  • the repetitive construction and the size of the decision tree can increase the size of the supported classification rule set, thereby improving the storage space utilization of the network device and the network packet classification processing capability.
  • FIG. 5 is a schematic structural diagram of hardware of a network packet classification decision tree establishing apparatus according to an embodiment of the present invention.
  • the packet classification decision tree establishing means can include a processor 501, a machine readable storage medium 502 storing machine executable instructions.
  • Processor 501 and machine readable storage medium 502 can communicate via system bus 503. And, by reading and executing the machine executable instructions in the machine readable storage medium 502 corresponding to the network packet classification decision tree establishment control logic, the processor 501 can perform the network packet classification decision tree establishment method described above.
  • the machine-readable storage medium 502 referred to herein can be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like.
  • a machine-readable storage medium can be a volatile memory, a non-volatile memory, or similar storage medium.
  • the machine-readable storage medium may be a RAM (Radom Access Memory), a flash memory, a storage drive (such as a hard disk drive), a solid state drive, any type of storage disk (such as an optical disk, etc.), or a combination thereof.
  • the foregoing network packet classification decision tree establishing apparatus may further include an FPGA (not shown in the figure). Accordingly, after the processor 501 completes the decision tree establishment, the established decision may be performed. The tree is sent to the FPGA, and the decision tree lookup process is performed by the FPGA.
  • the network packet classification decision tree establishment control logic may include a first tree building unit 601, a second tree building unit 602, and a multiplexing unit 603. among them:
  • the first building unit 601 is configured to construct the classification rule set according to the non-template dimension of the classification rule set to generate a first type decision tree.
  • the first type of decision tree is a decision tree corresponding to the non-template dimension of the classification rule set.
  • the second building unit 602 is configured to create a second type of decision tree by constructing a tree node in the first type of decision tree according to a template dimension of the classification rule set.
  • the second type of decision tree is a decision tree of the leaf node of the first type of decision tree corresponding to the template dimension.
  • the multiplexing unit 603 is configured to associate multiple leaf nodes that are mutually identical subsets in the first type of decision tree with the same second type decision tree.
  • the collection of classification rules included in each of the plurality of leaf nodes that are mutually identical subsets is a same subset of the modules.
  • the first building unit 601 is further configured to: when the non-template dimension and the corresponding value do not match the classification rule in the classification rule set, the classification rule included in the leaf node that does not match the classification rule Set to fill the hole rules.
  • the priority of the hole-filling rule is lower than the priority of all the classification rules in the classification rule set.
  • the second tree-making unit 602 is further configured to: when a leaf node that includes a plurality of classification rule subsets that are mutually identical subsets in the first type of decision tree, delete the leaf node to include a lower priority A subset of the classification rules.
  • the multiplexing unit 603 is further configured to divide a plurality of classification rule subsets that are mutually identical subsets into the same same mode subset; and establish a plurality of leaf nodes that are mutually identical subsets in the first type of decision tree. a mapping relationship between each leaf node and a corresponding subset of the subset of the classification rules and an index of the subset of the classification rules in the same subset of the subset; the first type of decision tree
  • the second type of decision trees corresponding to the plurality of leaf nodes of the same mode subset are merged into the same second type decision tree.
  • the leaf node of the merged second type decision tree includes a classification rule corresponding to each index in the same mode subset group.
  • the foregoing network packet classification decision tree establishment control logic may further include a searching unit 604, configured to: when receiving the network packet to be classified, search for the first type of decision tree corresponding to the classification rule set according to the to-be-classified network packet, Determining a first target leaf node in the first type of decision tree corresponding to the to-be-classified network packet; determining a target second type decision tree according to the identifier of the same-mode subset set stored in the first target leaf node, and Finding the target second type decision tree according to the to-be-classified network packet, to determine a second target leaf node in the second type decision tree corresponding to the to-be-classified network packet; and according to the first target leaf node
  • the index in the same-mode subset set stored in the second target leaf node is searched for the corresponding classification rule, and the classification rule is determined as the classification rule to be matched by the network packet to be classified.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本发明提供一种用于网包分类决策树建立的方法及装置。根据所述方法的一个示例,根据分类规则集的非模板维度对所述分类规则集进行建树,生成第一类型决策树;根据所述分类规则集的模板维度对所述第一类型决策树中的各叶子节点进行建树,生成第二类型决策树;使所述第一类型决策树中互为同模子集的多个叶子节点关联同一个第二类型决策树。其中,所述互为同模子集的多个叶子节点是指所述多个叶子节点各自包括的分类规则的集合互为同模子集。

Description

网包分类决策树的建立
相关申请的交叉引用
本专利申请要求于2017年8月31日提交的、申请号为201710771899.0、发明名称为“一种网包分类决策树建立方法及装置”的中国专利申请的优先权,该申请的全文以引用的方式并入本文中。
背景技术
网包分类是指根据网包的包头中各不同字段的值查找预先配置的分类规则,得到与之匹配的优先级最高的分类规则,并执行该分类规则配置的操作。许多网络设备提供的诸如访问控制、流量控制、负载均衡或入侵检测等功能,均需要使用网包分类。
基于决策树的多域分割算法是一类典型的网包分类方法,其基本思路是将整个多维空间递归地划分成子空间。递归结束的条件是当前子空间中包含的所有规则在各个维度上填满这个子空间。这个过程可以得到一棵决策树。基于这个决策树对每一个待分类网包进行查找,从而得到网包匹配的分类规则。
实践发现,决策树中节点越多,决策树占用的存储空间也会越多。然而,网络设备的存储空间是有限的。决策树的节点数量过多,会限制网包分类算法支持的分类规则集规模。因此,如何在不影响决策树查找效率的情况下,减少决策树的节点数量成为一个亟待解决的技术问题。
附图说明
图1是本发明实施例提供的一种网包分类决策树建立方法的流程示意图。
图2A是本发明实施例提供的一种网包分类的原理结构示意图。
图2B是本发明实施例提供的一种TreeBuilder的迭代单元的原理示意图。
图3A~3E是本发明实施例提供的决策树的示意图。
图4是本发明实施例提供的网包分类的查找路径示意图。
图5是本发明实施例提供的一种网包分类决策树建立装置的硬件结构示意图。
图6是本发明实施例提供的一种网包分类决策树建立控制逻辑的功能结构图。
具体实施方式
为了使本技术领域的人员更好地理解本发明实施例中的技术方案,下面先对本发明实施例提供的技术方案中涉及的部分概念进行简单说明。
分类规则集包括多条分类规则。一条分类规则包括维度信息、优先级。
其中,每个维度信息由一个取值范围表示。当匹配到分类规则后需要返回匹配结果。优先级为当多条规则满足匹配条件时,用于决定返回哪一条规则的匹配结果。本发明实施例中,以返回优先级最高的分类规则的匹配结果为例。其中,分类规则集的格式可以如表1所示。表1的分类规则集中包含r0-r8共9个分类规则。
表1
优先级 sip dip sport dport prot vlan sysport
r0 5 * * * 80 tcp * 0x200
r1 10 * * >=12000 * tcp * 0x200
r2 15 * 10.1.1.x >=10000 >=10000 udp * 0x200
r3 6 * * * 80 tcp [10,20) 0x300
r4 11 * * >=12000 * tcp [10,20) 0x300
r5 16 * 10.1.1.x >=10000 >=10000 udp [10,20) 0x300
r6 7 * * * 80 tcp 30 *
r7 12 * * >=12000 * tcp 30 *
r8 17 * 10.1.1.x >=10000 >=10000 udp 30 *
其中,sip(源IP地址)、dip(目的IP地址)、sport(源端口号)、dport(目的端口号)、prot(协议类型)、vlan(Virtual Local Area Network,虚拟局域网)以及sysport(系统端口)为表1所示分类规则的维度信息。维度信息prot可以包括但不限于tcp(Transmission Control Protocol,传输控制协议)或udp(User Datagram Protocol,用户数据报协议等。当某维度信息的取值为“*”时,表明该维度信息的值可以为该维度取值范围内的任意值。例如,对于维度信息dport,其取值范围为“0~(2 16-1)”。
分类规则模板为模板项的集合。一个分类规则模板可以包括一个或多个模板项。
模板项包括一个或多个维度信息。其中,包括3个模板项的分类规则模板的格式可以如表2所示:
表2
index 优先级 sip dip sport dport prot
me0 5 * * * 80 tcp
me1 10 * * >=12000 * tcp
me2 15 * 10.1.1.x >=10000 >=10000 udp
模板维度为模板项中包括的维度。例如,以表1和表2为例,表2所示的模板项包括分类规则中的5个模板维度,分别为sip、dip、sport、dport和prot。
非模板维度为分类规则中除模板维度之外的维度。例如,仍以表1和表2为例,表1所示的分类规则包括2个非模板维度,分别为vlan和sysport。
模板例化是指将分类规则模板应用到一个或多个非模板维度中,以得到一个或多个分类规则。
例如,将表2所示的分类规则模板应用到非模板维度sysport=0x200中,即可以得到表1所示的分类规则集中的分类规则r0、r1和r2。其中,这三条分类规则的模板维度sip、dip、sport、dport和prot的取值分别对应了表2所示的分类规则模板中相应模板维度的取值,而非模板维度sysport的取值均为0x200。
其中,模板例化可以指定多个非模板维度进行例化。例如,将表2所示的分类规则模板应用到非模板维度vlan=[10,20)和sysport=0x300,即可以得到表1所示分类规则集中的分类规则r3、r4和r5。
其中,{r0、r1、r2}、{r3、r4、r5}以及{r6、r7、r8}可以称为表1所示分类规则集中的分类规则子集。
补洞规则(fb_rule)是指所有维度的取值均为“*”的分类规则。补洞规则为优先级最低的分类规则。换言之,分类规则集包括的各分类规则的优先级均高于补洞规则的优先级。
同模子集(Same Pattern Sub Ruleset,简称SPSR)是指将一个分类规则模板应用到多个不同的非模板维度所得到的多个分类规则子集之间互为同模子集。其中,将一个分类规则模板应用到多个不同的非模板维度,可称为对应不同非模板维度例化。
例如,表1所示的分类规则集中,分类规则子集{r0、r1、r2}、{r3、r4、r5}和{r6、r7、r8}互为同模子集。其中,补洞规则与自身互为同模子集。
同模子集组(SPSR_Group)包括互为同模子集的多个分类规则子集。
其中,补洞规则对应的同模子集组仅包括补洞规则这一条分类规则。
为了使本发明实施例的上述目的、特征和优点能够更加明显易懂,下面结合附图对本发明实施例中技术方案作进一步详细的说明。
请参见图1,为本发明实施例提供的一种网包分类决策树建立方法的流程示意图。其中,该网包分类决策树建立方法可以应用于网络设备。该网络设备可以包括但不限于交换机、路由器等。如图1所示,该网包分类决策树建立方法可以包括以下步骤:
步骤101、根据分类规则集的非模板维度对分类规则集进行建树,生成第一类型决策树。其中,该第一类型决策树为分类规则集对应非模板维度的决策树。
本发明实施例中,分类规则集并不特指某一固定的分类规则集,而可以是任一用于进行网包分类的分类规则集,后续不再复述。
本发明实施例中,当需要创建分类规则集对应的网包分类决策树时,需要先根据分类规则集的非模板维度对分类规则集进行建树,生成分类规则集对应非模板维度的决策树(本文中称为第一类型决策树,或非模板维度决策树)。
为便于理解,以下将第一类型决策树称为非模板维度决策树进行说明。
在本发明实施例中,网络设备对分类规则集进行建树可采用基于空间分割的方式。
其中,网络设备根据分类规则集的非模板维度对分类规则集进行建树的具体实现,将在下文中结合实例进行说明,在此不做赘述。
值得说明的是,在本发明实施例中,当网络设备根据分类规则集的非模板维度对分类规则集进行建树时,若根据所选择的维度和取值匹配不到分类规则集中的分类规则,则网络设备可以将未匹配到分类规则的叶子节点中包括的分类规则设置为补洞规则。其中,当空间分割得到的子空间中只包含有补洞规则,而不包含任何其他分类规则时,满足结束条件。
步骤102、根据分类规则集的模板维度对第一类型决策树的各叶子节点进行建树,生成第二类型决策树。其中,该第二类型决策树为第一类型决策树的叶子节点对应模板维度的决策树。
本发明实施例中,由于非模板维度决策树仅仅是根据非模板维度对分类规则集中各分类规则进行了空间分割,非模板维度决策树的各叶子节点并未完成空间分割,还需要按照模板维度进一步进行空间分割。
相应地,网络设备建立了分类规则集对应的非模板维度决策树之后,还需要进一步根据分类规则集的模板维度对非模板维度决策树的各叶子节点进行空间分割,以得到非模板维度决策树的各叶子节点对应模板维度的决策树(本文中称为第二类型决策树,或模板维度决策树)。
为便于理解,以下将第二类型决策树称为模板维度决策树进行说明。
其中,由于模板维度决策树是通过对非模板维度决策树的叶子节点进行空间分割得到的,因此,该模板维度决策树可以称为非模板维度决策树的子树。
其中,网络设备根据分类规则集的模板维度对非模板维度决策树的叶子节点进行建树的具体实现,将在下文中结合实例进行说明,在此不做赘述。
值得说明的是,在本发明实施例中,由于补洞规则的各个维度的取值均为“*”,即可以为各个维度的取值范围内的任意值,因此,根据模板维度对补洞规则进行空间分割得到的仍然为补洞规则。相应地,对于非模板维度决策树中包括的分类规则为补洞规则的叶子节点,根据模板维度进行空间分割得到的模板维度决策树仍为补洞规则。
步骤103、使第一类型决策树中互为同模子集的多个叶子节点关联同一个第二类型决策树。
本发明实施例中,互为同模子集的多个分类规则子集包括的模板维度取值相同,这使得:当使用同一个算法根据模板维度对互为同模子集的多个分类规则子集建立决策树时,所得到的多个模板维度决策树的树结构是相同的,而只是叶子节点不同。因此,互为同模子集的多个分类规则子集对应的模板维度决策树可以复用为同一模板维度决策树。
相应地,网络设备建立了非模板维度决策树的各叶子节点对应的模板维度决策树之后,可以对建立的模板维度决策树进行复用,以使非模板维度决策树中互为同模子集的多个叶子节点关联同一个模板维度决策树,以减少模板维度决策树的数量,进而减少分类规则集对应的决策树的节点数量。其中,互为同模子集的多个叶子节点是指所述多个叶子节点各自对应的分类规则子集互为同模子集。叶子节点对应的分类规则子集是指该叶子节点包括的分类规则的集合。分类规则集对应的决策树包括非模板维度决策树和模板维度决策树。
在一种可选的实施方式中,上述使第一类型决策树中互为同模子集的多个叶子节点关联同一个第二类型决策树,可以包括:将互为同模子集的多个不同的分类规则子集划 分到同一个同模子集组;建立第一类型决策树中互为同模子集的多个叶子节点各自与对应的分类规则子集所属的同模子集组的标识以及该分类规则子集在该同模子集组中的索引之间的映射关系;将第一类型决策树中互为同模子集的多个叶子节点各自对应的第二类型决策树,合并为同一个第二类型决策树。这样,合并得到的该第二类型决策树的叶子节点包括同模子集组中各索引对应的分类规则子集。其中,叶子节点对应的分类规则子集为该叶子节点包括的分类规则的集合。
在该实施方式中,将同一个分类规则模板对应不同非模板维度例化而得到的多个分类规则子集互为同模子集,互为同模子集的多个分类规则子集构成同模子集组。
以表1所示分类规则集为例,分类规则子集{r0、r1、r2}是将表2所示分类规则模板应用到非模板维度sysport=0x200得到的,分类规则子集{r3、r4、r5}是将表2所示分类规则模板应用到非模板维度vlan=[10,20)和sysport=0x300得到的,分类规则子集{r6、r7、r8}将表2所示分类规则模板应用到非模板维度vlan=30得到的,因此,分类规则子集{r0、r1、r2}、{r3、r4、r5}和{r6、r7、r8}互为同模子集,且构成一个同模子集组。
在该实施方式中,可以为同模子集组中的各分类规则子集设置该分类规则子集在该同模子集中的索引。这样,可以根据同模子集组的标识(例如,同模子集组的名称)以及在该同模子集组中的索引确定对应的分类规则子集。
例如,假设分类规则子集{r0、r1、r2}、{r3、r4、r5}和{r6、r7、r8}属于同模子集组spsrg1(该同模子集组的标识为spsrg1),且分类规则子集{r0、r1、r2}、{r3、r4、r5}和{r6、r7、r8}在同模子集组spsrg1中的索引分别为0、1和2,则分类规则子集{r0、r1、r2}可以被映射到同模子集组spsrg1中的索引0,分类规则子集{r3、r4、r5}可以被映射到同模子集组spsrg1中的索引1,分类规则子集{r6、r7、r8}可以被映射到spsrg1中的索引2。
在该实施方式中,网络设备确定了各分类规则子集所属的同模子集组以及各分类规则子集在同模子集组中的索引之后,可以建立非模板维度决策树中互为同模子集的叶子节点与对应的分类规则子集所属的同模子集组的标识以及该分类规则子集在该同模子集组中的索引之间的映射关系。例如,对于互为同模子集的多个叶子节点中的每个叶子节点,在该叶子节点中存储该叶子节点对应的分类规则子集所属的同模子集组的标识,以及该对应的分类规则子集在该同模子集组中的索引。
基于以上举例具体来说,以对应分类规则子集为{r0、r1、r2}的叶子节点为例,可以 建立该叶子节点与分类规则子集{r0、r1、r2}所属的同模子集组的标识spsrg1以及分类规则子集{r0、r1、r2}在该同模子集组中的索引0之间的映射关系,并在该叶子节点中存储分类规则子集{r0、r1、r2}所属的同模子集组的标识spsrg1以及分类规则子集{r0、r1、r2}在该同模子集组spsrg1中的索引0。
网络设备可以将同一个同模子集组中多个分类规则子集各自对应的模板维度决策树合并为一个模板维度决策树,并且合并后的模板维度决策树的各叶子节点可以包括该同模子集组中各索引对应的分类规则子集。
在本发明实施例中,互为同模子集的多个分类规则子集各自对应的模板维度决策树的结构相同,而只是叶子节点不同。这样,当非模板维度决策树的叶子节点中存在互为同模子集的多个叶子节点时,该叶子节点对应的模板维度决策树中的各叶子节点将包括由同一模板项对应不同非模板维度例化得到的多条分类规则,而该多条分类规则中仅有优先级最高的分类规则会生效。因此,对于非模板维度决策树的叶子节点中存在的包括互为同模子集的多个分类规则子集的叶子节点,网络设备可以删除该叶子节点中包括的优先级较低的分类规则子集。
在本发明实施例中,网络设备建立了分类规则集对应的决策树(由非模板维度决策树和模板维度决策树组成)之后,当接收到待分类网包时,网络设备可以先根据该待分类网包查找分类规则集对应的非模板维度决策树,以确定该待分类网包对应的非模板维度决策树中的叶子节点(本文中称为第一目标叶子节点),并获取该第一目标叶子节点中存储的同模子集组(本文中称为目标同模子集组)的标识以及分类规则子集(本文中称为目标分类规则子集)在该目标同模子集组中的索引(本文中称为目标索引)。
然后,网络设备可以根据待分类网包查找该目标同模子集组对应的模板维度决策树(本文中称为目标模板维度决策树),以确定该待分类网包对应的目标模板维度决策树中的叶子节点(本文中称为第二目标叶子节点)。
最后,网络设备可以将第二目标叶子节点中包括的与目标索引对应的分类规则确定为与待分类网包匹配的分类规则。
为了使本领域技术人员更好地理解本发明实施例提供的技术方案,下面结合具体实例对本发明实施例提供的技术方案进行说明。
如图2A所示,为本发明实施例提供的一种网包分类的原理结构示意图,如图2A所示,CPU(Center Process Unit,中央处理单元)210通过决策树创建单元(Tree Builder) 211将输入的分类规则集编译为决策树,并下发给FPGA(Field Programmable Gate Array,现场可编程门阵列)220。当接收到网包时,FPGA 220通过查找引擎(Lookup Engine)221查找分类规则集对应的决策树,确定匹配的分类规则。
其中,决策树创建单元211可采用迭代的方式完成决策树的建立,其核心是一个基于维度和取值(Dim&Value,简称D&V)的启发式选择器211-1。其中,该启发式选择器输入一个规则集(Rule Set,简称RS),输出两个规则子集和一个决策树节点(Decision Tree Node,简称DT Node)。该启发式选择器根据输入的规则集,利用启发式算法从输入的维度列表(Dim List)中选择一个维度和取值,然后将输入的规则集按照选择的维度和取值分割为两个子集。若规则对应维度的取值范围小于所选择的取值的,则将该规则划入左子集;若规则对应维度的取值范围大于等于所选择的取值的,则将该规则划入右子集;若规则对应维度的取值范围覆盖住所选择的取值的,则将该规则一分为二,小于部分划入左子集,大于等于部分划入右子集。这样,通过将所选择的维度和取值作为一个决策树节点完成对规则集的一次空间分割。同时,将得到的两个规则子集继续迭代地执行上述操作,直至分类规则子集不能再被划分为止。其中,决策树创建单元211中的迭代单元的原理示意图可以如图2B所示。
基于图2A和图2B,本发明实施例提供的网包分类决策树建立方案的实现流程可以如下:
以表2所示的分类规则模板以及表1所示的分类规则集为例。其中,分类规则集的非模板维度包括vlan和sysport,模板维度包括sip、dip、sport、dport以及prot。
根据非模板维度vlan和sysport对表1所示的分类规则集进行建树,以得到该分类规则集对应的非模板维度决策树。其中,该非模板维度决策树的结构示意图可以如图3A所示。
图3A所示的非模板维度决策树的叶子节点中,非矩形框的叶子节点中均只包含一条补洞规则,而矩形框的叶子节点则包含多条分类规则。
根据模板维度sip、dip、sport、dport以及prot对图3A所示的非模板维度决策树的各叶子节点进行建树,以得到非模板维度决策树的各叶子节点对应的模板维度决策树。
其中,根据模板维度对只包含一条补洞规则的叶子节点进行建树得到的模板维度决策树仅包括一个节点,并且该节点仅包括补洞规则。
对于包括分类规则r0、r1和r2的叶子节点,根据模板维度对该叶子节点进行建树得 到的模板维度决策树的示意图可以如图3B所示;
对于包括分类规则r3、r4和r5的叶子节点,根据模板维度对该叶子节点进行建树得到的模板维度决策树的示意图可以如图3C所示。
基于图3B和图3C可以看出,由于分类规则子集{r0、r1、r2}是由表2所示的分类规则模板对应非模板维度sysport=0x200例化得到的,分类规则子集{r3、r4、r5}是由同一个表2所示的分类规则模板对应非模板维度sysport=0x300以及vlan=[10,20)例化得到的,分类规则子集{r0、r1、r2}和{r3、r4、r5}互为同模子集,因此,其对应的模板维度决策树的树结构相同,而只是叶子节点不同。其中,图3B和图3C所示的模板维度决策树中相同位置处的叶子节点分别为同一个模板项对应不同非模板维度例化得到的分类规则。
同理,由于分类规则子集{r6、r7、r8}是由同一个表2所示分类规则模板应用到非模板维度vlan=30得到的,这使得包括分类规则r6、r7和r8的叶子节点对应的模板维度决策树的结构也与图3B和图3C所示的模板维度决策树相同,且相同位置处的叶子节点也由是相同的模板项例化得到。
值得注意的是,对于图3A所示的非模板维度决策树中包括分类规则r0、r1、r2、r6、r7以及r8的叶子节点,由于{r0、r1、r2}和{r6、r7、r8}互为同模子集,因此,根据模板维度对该叶子节点进行建树得到的模板维度决策树中不含补洞规则的叶子节点均包含2条分类规则,并且这两条规则是由同一模板项对应不同的非模板维度例化得到。具体地,如分类规则r0和r6是由表2所示模板项me0对应不同的非模板维度例化得到的,分类规则r1和r7是由表2所示模板项me1对应不同的非模板维度例化得到的,分类规则r5和r8是由表2所示模板项me2对应不同的非模板维度例化得到的。当网包匹配到包括多个分类规则的叶子节点时,返回的匹配结果为优先级较高的分类规则。因此,对于包括互为同模子集的多个分类规则子集的叶子节点,只需要保留互为同模子集的多个分类规则子集中优先级较高的分类规则子集,而可以将优先级较低的分类规则子集过滤掉,即只保留分类规则子集{r0、r1、r2},而删除分类规则子集{r6、r7、r8}。进而,该叶子节点对应的模板维度决策树的结构示意图也如图3B所示。
此外,假设补洞规则对应的同模子集组的标识为spsrg0,补洞规则在同模子集组spsrg0中的索引为0;分类规则子集{r0、r1、r2}、{r3、r4、r5}和{r6、r7、r8}对应的同模子集组的标识为spsrg1,并且分类规则子集{r0、r1、r2}、{r3、r4、r5}和{r6、r7、r8}在同模子集组spsrg1中的索引分别为0、1和2。
对于图3A所示的非模板维度决策树上互为同模子集的多个叶子节点中的每一个叶子节点,建立该叶子节点与该叶子节点对应的分类规则子集所属的同模子集组的标识以及该分类规则子集在该同模子集组中的索引之间的映射关系。其中,处理后的非模板维度决策树的示意图可以如图3D所示。
将互为同模子集的多个叶子节点各自对应的模板维度决策树合并为同一个模板维度决策树。其中,合并得到的模板维度决策树中的叶子节点包括同模子集组中各索引对应的分类规则。
其中,由于补洞规则对应的同模子集组中仅包括补洞规则这一条分类规则,因此,补洞规则对应的模板维度决策树中仅包括一个叶子节点,并且该叶子节点中可以直接记录该补洞规则。
最终生成的表1所示分类规则集对应的决策树的结构可以如图3E所示,包括非模板维度决策树310和模板维度决策树320。
下面结合实例对网包分类具体处理流程进行简单说明。
在该实施例中,假设网包1和网包2中各字段的取值分别如表3所示:
表3
sip dip sport dport prot vlan sysport
网包1 1.1.1.1 10.1.1.10 12000 12000 udp 10 0x200
网包2 1.1.1.1 10.1.1.10 12000 80 tcp 30 0x100
其中,网包1的查找路径可以如图4中实线双箭头所示,其命中的模板维度决策树320中的叶子节点中包括3条分类规则r2、r5和r8;而由于其经过的非模板维度决策树310的叶子节点中的索引为0(图示为spsr:0),因此,网包1最终命中分类规则r2。
网包2的查找路径可以如图4中虚线双箭头所示,其命中的模板维度决策树320中的叶子节点中包括3条分类规则r0、r3和r6;而由于其经过的非模板维度决策树310的叶子节点中的索引为2(图示为spsr:2),因此,网包2最终命中分类规则r6。
通过以上描述可以看出,在本发明实施例提供的技术方案中,通过根据分类规则集的非模板维度对分类规则集进行建树生成第一类型决策树,并根据分类规则集的模板维度对第一类型决策树中的各叶子节点进行建树生成第二类型决策树,进而使第一类型决策树中互为同模子集的多个叶子节点关联同一个第二类型决策树,可有效减少模板维度 的重复建树以及决策树的规模,从而可提升支持的分类规则集的规模,进而可提升网络设备的存储空间利用率以及网包分类处理能力。
以上对本发明提供的方法进行了描述。下面对本发明提供的装置进行描述:
图5为本发明实施例提供的一种网包分类决策树建立装置的硬件结构示意图。该网包分类决策树建立装置可包括处理器501、存储有机器可执行指令的机器可读存储介质502。处理器501与机器可读存储介质502可经由系统总线503通信。并且,通过读取并执行机器可读存储介质502中与网包分类决策树建立控制逻辑对应的机器可执行指令,处理器501可执行上文描述的网包分类决策树建立方法。
本文中提到的机器可读存储介质502可以是任何电子、磁性、光学或其它物理存储装置,可以包含或存储信息,如可执行指令、数据,等等。例如,机器可读存储介质可以是易失存储器、非易失性存储器或者类似的存储介质。具体地,机器可读存储介质可以是RAM(Radom Access Memory,随机存取存储器)、闪存、存储驱动器(如硬盘驱动器)、固态硬盘、任何类型的存储盘(如光盘等)或者它们的组合。
在一种可选的实施方式中,上述网包分类决策树建立装置中还可以包括FPGA(未在图中示出),相应地,处理器501完成决策树建立后,可以将所建立的决策树下发至FPGA,由FPGA执行决策树查找处理。
如图6所示,从功能上划分,上述网包分类决策树建立控制逻辑可以包括第一建树单元601、第二建树单元602和复用单元603。其中:
第一建树单元601,用于根据分类规则集的非模板维度对所述分类规则集进行建树,生成第一类型决策树。其中,所述第一类型决策树为所述分类规则集对应所述非模板维度的决策树。
第二建树单元602,用于根据所述分类规则集的模板维度对所述第一类型决策树中的各叶子节点进行建树,生成第二类型决策树。其中,所述第二类型决策树为所述第一类型决策树的叶子节点对应所述模板维度的决策树。
复用单元603,用于使所述第一类型决策树中互为同模子集的多个叶子节点关联同一个第二类型决策树。其中,互为同模子集的多个叶子节点各自所包括的分类规则的集合互为同模子集。
所述第一建树单元601,还用于当根据所述非模板维度和对应的取值匹配不到所述分类规则集中的分类规则时,将未匹配到分类规则的叶子节点中包括的分类规则设置为 补洞规则。其中,所述补洞规则的优先级低于所述分类规则集中所有分类规则的优先级。
所述第二建树单元602,还用于当所述第一类型决策树中存在包括互为同模子集的多个分类规则子集的叶子节点时,删除该叶子节点中包括的优先级较低的分类规则子集。
所述复用单元603,还用于将互为同模子集的多个分类规则子集划分到同一同模子集组;建立所述第一类型决策树上互为同模子集的多个叶子节点中每个叶子节点与对应的分类规则子集所属的同模子集组的标识以及该分类规则子集在该同模子集组中的索引之间的映射关系;将所述第一类型决策树上互为同模子集的多个叶子节点对应的第二类型决策树合并为同一个第二类型决策树。其中,合并得到的第二类型决策树的叶子节点包括该同模子集组中各索引对应的分类规则。
上述网包分类决策树建立控制逻辑还可以包括查找单元604,用于:当接收到待分类网包时,根据所述待分类网包查找所述分类规则集对应的第一类型决策树,以确定所述待分类网包对应的所述第一类型决策树中的第一目标叶子节点;根据所述第一目标叶子节点中存储的同模子集组的标识确定目标第二类型决策树,并根据所述待分类网包查找所述目标第二类型决策树,以确定所述待分类网包对应的所述第二类型决策树中的第二目标叶子节点;根据所述第一目标叶子节点中存储的同模子集组中的索引从所述第二目标叶子节点中查找对应的分类规则,并将该分类规则确定为所述待分类网包匹配的分类规则。
需要说明的是,在本文中,诸如第一和第二等之类的术语前缀仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明保护的范围之内。

Claims (16)

  1. 一种网包分类决策树建立方法,包括:
    根据分类规则集的非模板维度对所述分类规则集进行建树,生成第一类型决策树;
    根据所述分类规则集的模板维度对所述第一类型决策树中的各叶子节点进行建树,生成第二类型决策树;
    使所述第一类型决策树中互为同模子集的多个叶子节点关联同一个第二类型决策树,其中,所述互为同模子集的多个叶子节点是指所述多个叶子节点各自包括的分类规则的集合互为同模子集。
  2. 根据权利要求1所述的方法,其特征在于,根据所述非模板维度对所述分类规则集进行建树,包括:
    当根据所述非模板维度和对应的取值匹配不到所述分类规则集中的分类规则时,将未匹配到分类规则的叶子节点中包括的分类规则设置为补洞规则;
    其中,所述补洞规则的优先级低于所述分类规则集中各分类规则的优先级。
  3. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    当所述第一类型决策树中存在包括互为同模子集的多个分类规则子集的叶子节点时,删除所述叶子节点中包括的优先级较低的分类规则子集。
  4. 根据权利要求1所述的方法,其特征在于,使所述第一类型决策树中互为同模子集的多个叶子节点关联同一个第二类型决策树,包括:
    将所述多个叶子节点包括的互为同模子集的多个分类规则子集划分到同一个同模子集组;
    对于所述多个叶子节点中的每个叶子节点,建立该叶子节点与该同模子集组的标识以及该叶子节点包括的分类规则子集在该同模子集组中的索引之间的映射关系;
    将所述多个叶子节点各自对应的所述第二类型决策树合并为一个第二类型决策树,其中,所述合并后的第二类型决策树的叶子节点包括该同模子集组中各分类规则。
  5. 根据权利要求4所述的方法,其特征在于,所述方法还包括:
    通过查找所述第一类型决策树,确定待分类网包在所述第一类型决策树中对应的第一目标叶子节点;
    根据所述第一目标叶子节点中存储的同模子集组的标识确定目标第二类型决策树;
    通过查找所述目标第二类型决策树,确定所述待分类网包在所述目标第二类型决策树中对应的第二目标叶子节点;
    根据所述第一目标叶子节点中存储的同模子集组中的索引,从所述第二目标叶子节 点中查找对应的分类规则;并
    将该查找到的分类规则确定为所述待分类网包匹配的分类规则。
  6. 一种网包分类决策树建立装置,其特征在于,包括:
    第一建树单元,用于根据分类规则集的非模板维度对所述分类规则集进行建树,生成第一类型决策树;
    第二建树单元,用于根据所述分类规则集的模板维度对所述第一类型决策树中的各叶子节点进行建树,生成第二类型决策树;
    复用单元,用于使所述第一类型决策树中互为同模子集的多个叶子节点关联同一个第二类型决策树,其中,所述互为同模子集的多个叶子节点是指所述多个叶子节点各自包括的分类规则的集合互为同模子集。
  7. 根据权利要求6所述的装置,其特征在于,所述第一建树单元,还用于
    当根据所述非模板维度和对应的取值匹配不到所述分类规则集中的分类规则时,将未匹配到分类规则的叶子节点中包括的分类规则设置为补洞规则;
    其中,所述补洞规则的优先级低于所述分类规则集中各分类规则的优先级。
  8. 根据权利要求6所述的装置,其特征在于,所述第二建树单元,还用于
    当所述第一类型决策树中存在包括互为同模子集的多个分类规则子集的叶子节点时,删除所述叶子节点中包括的优先级较低的分类规则子集。
  9. 根据权利要求6所述的装置,其特征在于,所述复用单元,还用于:
    将所述多个叶子节点包括的互为同模子集的多个分类规则子集划分到同一个同模子集组;
    对于所述多个叶子节点中的每个叶子节点,建立该叶子节点与该同模子集组的标识以及该叶子节点包括的分类规则子集在该同模子集组中的索引之间的映射关系;
    将所述多个叶子节点各自对应的所述第二类型决策树合并为一个第二类型决策树;其中,所述合并后的第二类型决策树的叶子节点包括该同模子集组中各分类规则。
  10. 根据权利要求9所述的装置,其特征在于,所述装置还包括查找单元,用于:
    通过查找所述第一类型决策树,确定待分类网包在所述第一类型决策树中对应的第一目标叶子节点;
    根据所述第一目标叶子节点中存储的同模子集组的标识确定目标第二类型决策树;
    通过查找所述目标第二类型决策树,确定所述待分类网包在所述目标第二类型决策树中对应的第二目标叶子节点;
    根据所述第一目标叶子节点中存储的同模子集组中的索引,从所述第二目标叶子节 点中查找对应的分类规则;并
    将该查找到的分类规则确定为所述待分类网包匹配的分类规则。
  11. 一种网包分类决策树建立装置,包括:
    处理器;和
    机器可读存储介质,所述机器可读存储介质存储有能够被所述处理器执行的机器可执行指令,所述处理器被所述机器可执行指令促使:
    根据分类规则集的非模板维度对所述分类规则集进行建树,生成第一类型决策树;
    根据所述分类规则集的模板维度对所述第一类型决策树中的各叶子节点进行建树,生成第二类型决策树;
    使所述第一类型决策树中互为同模子集的多个叶子节点关联同一个第二类型决策树,其中,所述互为同模子集的多个叶子节点是指所述多个叶子节点各自包括的分类规则的集合互为同模子集。
  12. 根据权利要求11所述的装置,其特征在于,所述处理器还被所述机器可执行指令促使执行:
    当根据所述非模板维度和对应的取值匹配不到所述分类规则集中的分类规则时,将未匹配到分类规则的叶子节点中包括的分类规则设置为补洞规则;
    其中,所述补洞规则的优先级低于所述分类规则集中各分类规则的优先级。
  13. 根据权利要求11所述的装置,其特征在于,所述处理器还被所述机器可执行指令促使执行:
    当所述第一类型决策树中存在包括互为同模子集的多个分类规则子集的叶子节点时,删除所述叶子节点中包括的优先级较低的分类规则子集。
  14. 根据权利要求11所述的装置,其特征在于,所述处理器还被所述机器可执行指令促使执行:
    将所述多个叶子节点包括的互为同模子集的多个分类规则子集划分到同一个同模子集组;
    对于所述多个叶子节点中的每个叶子节点,建立该叶子节点与该同模子集组的标识以及该叶子节点包括的分类规则子集在该同模子集组中的索引之间的映射关系;
    将所述多个叶子节点各自对应的所述第二类型决策树合并为一个第二类型决策树,其中,所述合并后的第二类型决策树的叶子节点包括该同模子集组中各分类规则。
  15. 根据权利要求14所述的装置,其特征在于,所述处理器还被所述机器可执行 指令促使执行:
    通过查找所述第一类型决策树,确定待分类网包在所述第一类型决策树中对应的第一目标叶子节点;
    根据所述第一目标叶子节点中存储的同模子集组的标识确定目标第二类型决策树;
    通过查找所述目标第二类型决策树,确定所述待分类网包在所述目标第二类型决策树中对应的第二目标叶子节点;
    根据所述第一目标叶子节点中存储的同模子集组中的索引,从所述第二目标叶子节点中查找对应的分类规则;并
    将该查找到的分类规则确定为所述待分类网包匹配的分类规则。
  16. 根据权利要求15所述的装置,其特征在于,
    所述装置还包括现场可编程逻辑门阵列;
    所述处理器还被所述机器可执行指令促使:将所述第一类型决策树以及所述第二类型决策树下发至所述现场可编程逻辑门阵列,以由所述现场可编程逻辑门阵列执行决策树查找处理。
PCT/CN2018/102845 2017-08-31 2018-08-29 网包分类决策树的建立 WO2019042305A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US16/643,484 US11184279B2 (en) 2017-08-31 2018-08-29 Building decision tree for packet classification
JP2020512410A JP6997297B2 (ja) 2017-08-31 2018-08-29 パケット分類決定木の確立
EP18851181.0A EP3661153B1 (en) 2017-08-31 2018-08-29 Building decision tree for packet classification

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710771899.0 2017-08-31
CN201710771899.0A CN108632235B (zh) 2017-08-31 2017-08-31 一种网包分类决策树建立方法及装置

Publications (1)

Publication Number Publication Date
WO2019042305A1 true WO2019042305A1 (zh) 2019-03-07

Family

ID=63705816

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/102845 WO2019042305A1 (zh) 2017-08-31 2018-08-29 网包分类决策树的建立

Country Status (5)

Country Link
US (1) US11184279B2 (zh)
EP (1) EP3661153B1 (zh)
JP (1) JP6997297B2 (zh)
CN (1) CN108632235B (zh)
WO (1) WO2019042305A1 (zh)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109474536A (zh) * 2018-10-18 2019-03-15 北京小米移动软件有限公司 报文控制方法及装置
US10832419B2 (en) * 2018-11-29 2020-11-10 International Business Machines Corporation Cognitive search analytics for multi-dimensional objects
US11900273B2 (en) * 2019-09-30 2024-02-13 Juniper Networks, Inc. Determining dependent causes of a computer system event
CN111242164A (zh) * 2019-12-27 2020-06-05 天津幸福生命科技有限公司 一种决策结果的确定方法、装置及设备
US11734155B2 (en) * 2021-07-22 2023-08-22 Disney Enterprises, Inc. Fully traceable and intermediately deterministic rule configuration and assessment framework
CN113810311A (zh) * 2021-09-14 2021-12-17 北京左江科技股份有限公司 一种基于多棵决策树的数据包分类方法
CN113762424B (zh) * 2021-11-09 2022-02-01 鹏城实验室 一种网络包分类方法及相关装置

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070056038A1 (en) * 2005-09-06 2007-03-08 Lok Technology, Inc. Fusion instrusion protection system
CN101478551A (zh) * 2009-01-19 2009-07-08 清华大学 基于多核处理器的多域网包分类方法
CN104579941A (zh) * 2015-01-05 2015-04-29 北京邮电大学 一种OpenFlow交换机中的报文分类方法

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102255788B (zh) 2010-05-19 2014-08-20 北京启明星辰信息技术股份有限公司 报文分类决策构建系统及方法、报文分类系统及方法
US9065860B2 (en) * 2011-08-02 2015-06-23 Cavium, Inc. Method and apparatus for multiple access of plural memory banks
CN102281196B (zh) * 2011-08-11 2017-10-10 中兴通讯股份有限公司 决策树生成方法及设备、基于决策树报文分类方法及设备
CN103049444B (zh) * 2011-10-12 2016-09-28 阿里巴巴集团控股有限公司 一种数据信息分类结构的存储方法和系统
US9595003B1 (en) * 2013-03-15 2017-03-14 Cavium, Inc. Compiler with mask nodes
US10051093B2 (en) * 2015-03-09 2018-08-14 Fortinet, Inc. Hardware accelerator for packet classification
CN106209614B (zh) * 2015-04-30 2019-09-17 新华三技术有限公司 一种网包分类方法和装置
CN105354588A (zh) * 2015-09-28 2016-02-24 北京邮电大学 一种构造决策树的方法
US10171377B2 (en) * 2017-04-18 2019-01-01 International Business Machines Corporation Orchestrating computing resources between different computing environments

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070056038A1 (en) * 2005-09-06 2007-03-08 Lok Technology, Inc. Fusion instrusion protection system
CN101478551A (zh) * 2009-01-19 2009-07-08 清华大学 基于多核处理器的多域网包分类方法
CN104579941A (zh) * 2015-01-05 2015-04-29 北京邮电大学 一种OpenFlow交换机中的报文分类方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3661153A4 *

Also Published As

Publication number Publication date
EP3661153B1 (en) 2023-06-07
CN108632235A (zh) 2018-10-09
US20200195552A1 (en) 2020-06-18
CN108632235B (zh) 2020-07-07
JP2020532241A (ja) 2020-11-05
US11184279B2 (en) 2021-11-23
EP3661153A1 (en) 2020-06-03
EP3661153A4 (en) 2020-06-03
JP6997297B2 (ja) 2022-01-17

Similar Documents

Publication Publication Date Title
WO2019042305A1 (zh) 网包分类决策树的建立
US9627063B2 (en) Ternary content addressable memory utilizing common masks and hash lookups
US10496680B2 (en) High-performance bloom filter array
US9984144B2 (en) Efficient lookup of TCAM-like rules in RAM
EP1515501B1 (en) Data structure for range-specified algorithms
US9704574B1 (en) Method and apparatus for pattern matching
US11687594B2 (en) Algorithmic TCAM based ternary lookup
Lim et al. Boundary cutting for packet classification
WO2020038399A1 (zh) 数据包的分类方法、装置及计算机可读存储介质
EP3276501B1 (en) Traffic classification method and device, and storage medium
US10630588B2 (en) System and method for range matching
Kesselman et al. Space and speed tradeoffs in TCAM hierarchical packet classification
Pao et al. A multi-pipeline architecture for high-speed packet classification
Yang et al. Fast OpenFlow table lookup with fast update
CN106487769B (zh) 一种访问控制列表acl的实现方法及装置
Han et al. A novel routing algorithm for IoT cloud based on hash offset tree
Hatami et al. High-performance architecture for flow-table lookup in SDN on FPGA
Lim et al. Two-dimensional packet classification algorithm using a quad-tree
US8166536B1 (en) Transformation of network filter expressions to a content addressable memory format
Lo et al. Flow entry conflict detection scheme for software-defined network
Kekely et al. Packet classification with limited memory resources
Chang et al. A high-speed and memory efficient pipeline architecture for packet classification
Mikawa et al. Run-based trie involving the structure of arbitrary bitmask rules
US11917042B2 (en) Optimizing header-based action selection
Raaj et al. FPGA based packet classification using multi-pipeline architecture

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18851181

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020512410

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018851181

Country of ref document: EP

Effective date: 20200228