WO2013149555A1 - Procédé et dispositif de génération d'un arbre de décision - Google Patents

Procédé et dispositif de génération d'un arbre de décision Download PDF

Info

Publication number
WO2013149555A1
WO2013149555A1 PCT/CN2013/073036 CN2013073036W WO2013149555A1 WO 2013149555 A1 WO2013149555 A1 WO 2013149555A1 CN 2013073036 W CN2013073036 W CN 2013073036W WO 2013149555 A1 WO2013149555 A1 WO 2013149555A1
Authority
WO
WIPO (PCT)
Prior art keywords
rule
encoding
undirected graph
weighted undirected
segment
Prior art date
Application number
PCT/CN2013/073036
Other languages
English (en)
Chinese (zh)
Inventor
胡晶
龚钧
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP13772797.0A priority Critical patent/EP2819355B1/fr
Publication of WO2013149555A1 publication Critical patent/WO2013149555A1/fr
Priority to US14/497,720 priority patent/US10026039B2/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2441Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]

Definitions

  • the present invention relates to computer technology, and in particular, to a method and an apparatus for generating a decision tree.
  • BACKGROUND OF THE INVENTION Flow classification generally refers to defining rules according to certain characteristics of a message, and using these rules to identify packets that meet certain characteristics, thereby realizing classification of the messages. Multiple messages that match a particular rule form a stream.
  • the traffic classification mechanism can implement different flows corresponding to different quality of service.
  • the decision tree-based flow classification method is based on the stream classification method based on special hardware such as Ternary Content Addressable Memory (TCAM), in terms of matching rule search speed and cost saving. Has a big advantage.
  • TCAM Ternary Content Addressable Memory
  • the principle of the flow classification method based on decision tree is to divide a rule set into multiple small rule sets by establishing a decision tree, and find rules matching the message in these small rule sets.
  • a decision tree includes a root node, a plurality of intermediate nodes, and a plurality of leaf nodes.
  • Performing a rule search using the decision tree can be implemented as follows: First, the packet header of the packet is parsed to obtain a keyword for searching, and one of the intermediate nodes of the decision tree is selected according to one or several bits of the keyword. Branching, to traverse the decision tree up to a leaf node of the decision tree, each leaf node containing a set of rules, the message matching a set of sub-rule sets included in the determined leaf node, if there are multiple sets of sub-rules of the set. The rule that matches the packet selects the rule with the highest priority from the multiple matching rules as the rule for classifying the packet. The flow classifier performs an action corresponding to the rule with the highest priority for the message.
  • the decision tree-based flow classification method can be HiCuts, HyperCuts or Modular.
  • the HiCuts and HyperCuts methods study the flow classification problem from a geometric perspective. From a geometric point of view, if the rules in the stream classifier consist of k fields corresponding to k-dimensional spaces, each rule corresponds to a "hyperrectangle" region of the k-dimensional space; and each message is Corresponds to a point in the k-dimensional space. The process of finding a rule that matches a message is equivalent to calculating which "super rectangles" the point corresponding to the message falls into. In the HiCuts and HyperCuts methods, each field in the rule is treated as a range, and the domains with different scopes are grouped together for segmentation. To divide the rule set into small sets of sub-rules.
  • a decision tree can be established by segmentation.
  • the middle node of the decision tree stores information related to the segmentation method, such as which dimension or dimension is selected for segmentation, the number of segments per dimension, and the set of sub-rules saved by the leaf node.
  • a rule set includes multiple rules of equal length.
  • Each rule includes multiple bits. Each bit is “0", “1” or a wildcard. The wildcard character can be represented by "*,,.
  • the rule set is segmented according to a specific algorithm. After selecting the reference position on which the rule set is segmented, the rule with the reference position "0" in the multiple rules included in the rule set is put into one Sub-rule collection. Among the multiple rules contained in the rule set, the rule whose base position is " ⁇ " is placed in another sub-rule set.
  • the rule whose base position is a wildcard is placed in the above two sub-rule sets.
  • the case where the above-described rule in which the reference position is a wildcard is placed in the two sub-rule sets in the present application is called rule copying.
  • a rule set is divided into two sub-rule sets. The above operations may be repeated for the generated set of sub-rules until the number of rules in each sub-rule set is less than a predetermined threshold. In this way, a binary decision tree can be established.
  • the intermediate node of the decision tree holds the identifier of the reference bit used to slice the rule and the pointer of the two child nodes of the intermediate node; each leaf node holds a set of sub-rules.
  • the above decision tree-based flow classification method has a high probability of occurrence of rule replication in the generation process of the decision tree.
  • the occurrence of rule replication means that more storage space is required.
  • the present invention provides a method and apparatus for generating a decision tree, which is advantageous for reducing the probability of occurrence of rule replication.
  • the embodiment of the present invention provides a method for generating a decision tree, including: generating an encoding rule set according to a rule set; the rule set includes multiple rules, and each rule is a string containing 0, 1, or a wildcard. Any two of the plurality of rules are not equal to each other; the encoding rule set includes a plurality of encoding rules, and any two of the plurality of encoding rules are not equal to each other; each of the plurality of encoding rules
  • the encoding rule corresponds to at least one of the plurality of rules, and each of the plurality of rules corresponds to one of the plurality of encoding rules
  • the encoding rule corresponding to the first rule is obtained by encoding the first rule according to the first function, where the first rule is any one of the multiple rules, and the first function is used for
  • the plurality of segments in the first rule are replaced by a plurality of code segments, thereby obtaining a first coding rule; the first rule is composed of the plurality of segments, each
  • the first weighted undirected graph includes a plurality of vertices, wherein the plurality of vertices correspond to the plurality of encoding rules, and each of the plurality of vertices corresponds to one vertex a sub-rule set; the sub-rule set corresponding to the first vertex includes all the rules corresponding to the first encoding rule; the first vertex is any one of the first weighted undirected graphs;
  • the first encoding rule is an encoding rule corresponding to the first vertex among the plurality of encoding rules;
  • the weight of the edge is a value of the second function obtained by using the second encoding rule and the third encoding rule as variables; the two vertices of the first edge respectively correspond to the second encoding rule and the third encoding rule;
  • the first side is any one of the first weighted undirected graphs;
  • the second function is configured to perform a bitwise operation on the second encoding rule and the third encoding rule, and calculate a bitwise operation The number of 1 in the result, the number of 1 in the result of the bit operation is the value of the second function;
  • the first threshold is an integer greater than or equal to 0 and less than or equal to X-1, where X is a bit in the first encoding rule. The number of cycles, the following operations are performed cyclically, until the newly generated weighted undirected graph has the largest weight The weight of the edge is less than or equal to the first threshold:
  • the encoding rule corresponding to the vertex of the vertex is that the fourth encoding rule and the fifth encoding rule respectively corresponding to the two vertices of the edge having the largest weight in the weighted undirected graph generated in the generated weighted undirected graph are a value of the third function obtained by the variable; the third function is configured to perform a bitwise AND operation on the fourth encoding rule and the fifth encoding rule, and the result of the bitwise operation is the value of the third function
  • the sub-rule set corresponding to the new vertex includes all the sub-rule sets corresponding to the two
  • an embodiment of the present invention provides a method for generating a decision tree, which includes:
  • the rule set includes a plurality of rules, each rule is a character string including 0, 1, or a wildcard, and any two of the multiple rules are not equal to each other; Include a plurality of coding rules, where any two of the plurality of coding rules are not equal to each other; each of the plurality of coding rules corresponds to at least one of the plurality of rules, and the plurality of rules Each rule corresponds to one of the plurality of coding rules; the coding rule corresponding to the first rule is obtained by encoding the first rule according to a first function, where the first rule is Any one of the first functions used to The plurality of segments in the first rule are replaced by a plurality of code segments, thereby obtaining a first coding rule; the first rule is composed of the plurality of segments, each segment includes at least one character; The rule is composed of the plurality of code segments, each code segment is one bit, and the plurality of segments correspond to the plurality of code segments, the position of the
  • the first weighted undirected graph includes a plurality of vertices, wherein the plurality of vertices correspond to the plurality of encoding rules, and each of the plurality of vertices corresponds to one vertex a sub-rule set; the sub-rule set corresponding to the first vertex includes all the rules corresponding to the first encoding rule; the first vertex is any one of the first weighted undirected graphs;
  • the first encoding rule is an encoding rule corresponding to the first vertex among the plurality of encoding rules;
  • the weight of the edge is a value of the second function obtained by using the second encoding rule and the third encoding rule as variables; the two vertices of the first edge respectively correspond to the second encoding rule and the third encoding rule;
  • the first side is any one of the first weighted undirected graphs;
  • the second function is configured to perform a bitwise operation on the second encoding rule and the third encoding rule, and calculate a bitwise operation The number of 1 in the result, the number of 1 in the result of the bit operation is the value of the second function;
  • the first threshold is an integer greater than or equal to 1 and less than or equal to X, and X is the number of bits in the first encoding rule, the following operations are performed cyclically until the weight of the edge with the smallest weight in the newly generated weighted undirected graph is performed. The value is greater than or equal to the first threshold:
  • the encoding rule corresponding to the vertex of the vertex is that the fourth encoding rule and the fifth encoding rule respectively corresponding to the two vertices of the edge having the smallest weight in the weighted undirected graph generated in the generated weighted undirected graph are a value of the third function obtained by the variable; the third function is configured to perform a bitwise AND operation on the fourth encoding rule and the fifth encoding rule, and the result of the bitwise operation is the value of the third function
  • the sub-rule set corresponding to the new vertex includes all the sub-rule sets corresponding
  • an embodiment of the present invention provides a method for generating a decision tree, including: generating an encoding rule set according to a rule set; the rule set includes multiple rules, and each rule is a string containing 0, 1, or a wildcard. And any two of the plurality of rules are not equal to each other; the coding rule set includes a plurality of coding rules, and any two of the plurality of coding rules are not equal to each other;
  • the encoding rule corresponds to at least one of the plurality of rules, each of the plurality of rules corresponding to one of the plurality of encoding rules; the encoding rule corresponding to the first rule is based on the first function a rule is encoded
  • the first rule is any one of the multiple rules, and the first function is used to replace multiple segments in the first rule with multiple code segments, thereby obtaining a first coding rule.
  • the first rule is composed of the plurality of segments, each segment includes at least one character; the first encoding rule is composed of the plurality of encoding segments, each encoding segment is one bit, and the plurality of segments Corresponding to the plurality of code segments, the position of the first segment in the first rule is consistent with the position of the first code segment in the first coding rule; the first segment is the plurality of Any one of the segments; the first code segment is one of the plurality of code segments corresponding to the first segment; the first rule is a variable of the first function; An encoding rule is a value of the first function; the first function is further configured to calculate the first encoded segment according to the first segment; in a scenario where the first segment includes at least two characters, if Place If the number of wildcards in the first segment is greater than or equal to N, then the first code segment is 1, and if the number of wildcards in the first segment is less than N, the first code segment is 0, and N is greater than Or an integer equal to 1 and less than or equal to
  • the first weighted undirected graph includes a plurality of vertices, wherein the plurality of vertices correspond to the plurality of encoding rules, and each of the plurality of vertices corresponds to one vertex a sub-rule set; the sub-rule set corresponding to the first vertex includes all the rules corresponding to the first encoding rule; the first vertex is any one of the first weighted undirected graphs;
  • the first encoding rule is an encoding rule corresponding to the first vertex among the plurality of encoding rules;
  • the weight of the edge is a value of the second function obtained by using the second encoding rule and the third encoding rule as variables; the two vertices of the first edge respectively correspond to the second encoding rule and the third encoding rule;
  • the first side is any one of the first weighted undirected graphs;
  • the second function is configured to perform a bitwise operation on the second encoding rule and the third encoding rule, and calculate a bitwise operation The number of 1 in the result, the number of 1 in the result of the bit operation is the value of the second function;
  • the first threshold is an integer greater than or equal to 1 and less than or equal to X, where X is the number of bits in the first encoding rule.
  • the following operations are performed cyclically until the weight of the edge with the smallest weight in the newly generated weighted undirected graph is greater than or equal to the first threshold: Generating a new vertex according to the edge with the smallest weight in the last generated weighted undirected graph in the generated weighted undirected graph, and generating a new weighted undirected graph according to the new vertex; the new strip
  • the weighted undirected graph includes all of the vertices except the two vertices of the edge having the smallest weight in the last generated weighted undirected graph in the generated weighted undirected graph;
  • the encoding rule corresponding to the vertex of the vertex is that the fourth encoding rule and the fifth encoding rule respectively corresponding to the two vertices of the edge having the smallest weight in the weighted undirected graph generated in the generated weighted undirected graph are a value of the third function obtained by the variable; the third function is configured to perform a bitwise AND operation on the fourth encoding rule and the fifth
  • an embodiment of the present invention provides a method for generating a decision tree, including: an encoding processing unit, configured to generate an encoding rule set according to a rule set; the rule set includes multiple rules, and each rule includes 0. 1 or a character string of a wildcard, any two of the plurality of rules are not equal to each other; the coding rule set includes a plurality of coding rules, and any two of the plurality of coding rules are not equal to each other;
  • Each of the plurality of encoding rules corresponds to at least one of the plurality of rules, each of the plurality of rules corresponding to one of the plurality of encoding rules; the encoding rule corresponding to the first rule is according to the a function is obtained by encoding the first rule, the first rule is any one of the multiple rules, and the first function is used to replace multiple segments in the first rule with multiple Encoding segments, thereby obtaining a first encoding rule; the first rule consisting of the plurality of segments, each segment comprising At least one character
  • a first weighted undirected graph generating unit configured to generate a first weighted undirected graph according to the encoding rule set generated by the encoding processing unit;
  • the first weighted undirected graph includes a plurality of vertices, Corresponding to the plurality of coding rules, each of the plurality of vertices corresponds to a sub-rule set; the sub-rule set corresponding to the first vertex includes all of the plurality of rules and the first coding rule Corresponding rules;
  • the first vertice is any one of the first weighted undirected graphs;
  • the first encoding rule is an encoding rule corresponding to the first vertex of the plurality of encoding rules;
  • An edge weight calculation unit configured to calculate a weight of each edge in the first weighted undirected graph according to the first weighted undirected graph generated by the first weighted undirected graph generating unit
  • An edge connecting any two vertices in the first weighted undirected graph is an edge of the first weighted undirected graph
  • a weight of the first side is a variable using a second encoding rule and a third encoding rule a value of the obtained second function
  • the two vertices of the first side respectively correspond to the second encoding rule and the third encoding rule
  • the first side is any of the first weighted undirected graph
  • the second function is configured to perform a bitwise operation on the second encoding rule and the third encoding rule, and calculate the number of 1 in the result of the bitwise operation, and the one of the results of the bitwise operation The number is the value of the second function
  • a comparison unit configured to compare, according to the calculation result of the edge weight calculation unit, if the weight of the edge with the largest weight in the first weighted undirected graph is greater than the first threshold, the first threshold is greater than or equal to 0 And an integer less than or equal to X-1, where X is the number of bits in the first encoding rule, then the following operations are performed cyclically until the weight of the edge with the largest weight in the newly generated weighted undirected graph is less than Or equal to the first threshold, sending a trigger signal to the decision tree generating unit: Generating a new vertex according to the edge with the largest weight in the last generated weighted undirected graph in the generated weighted undirected graph, and generating a new weighted undirected graph according to the new vertex; the new strip
  • the weighted undirected graph includes all of the vertices except the two vertices of the edge having the largest weight in the last generated weighted undirected graph in the generated weighted undirected graph; The encoding rule corresponding
  • the decision tree generating unit is configured to separately generate a decision tree for the sub-rule set corresponding to each vertex in the newly generated weighted undirected graph according to the trigger signal sent by the comparing unit.
  • an embodiment of the present invention provides a method for generating a decision tree, including: an encoding processing unit, configured to generate an encoding rule set according to a rule set; the rule set includes multiple rules, and each rule includes 0. 1 or a character string of a wildcard, any two of the plurality of rules are not equal to each other; the coding rule set includes a plurality of coding rules, and any two of the plurality of coding rules are not equal to each other;
  • Each of the plurality of encoding rules corresponds to at least one of the plurality of rules, each of the plurality of rules corresponding to one of the plurality of encoding rules; the encoding rule corresponding to the first rule is according to the a function is obtained by encoding the first rule, the first rule is any one of the multiple rules, and the first function is used to replace multiple segments in the first rule with multiple Encoding segments, thereby obtaining a first encoding rule; the first rule consisting of the plurality of segments, each segment comprising at least one character
  • a first weighted undirected graph generating unit configured to generate a first weighted undirected graph according to the encoding rule set generated by the encoding processing unit;
  • the first weighted undirected graph includes a plurality of vertices, Corresponding to the plurality of coding rules, each of the plurality of vertices corresponds to a sub-rule set; the sub-rule set corresponding to the first vertex includes all of the plurality of rules and the first coding rule Corresponding rules;
  • the first vertice is any one of the first weighted undirected graphs;
  • the first encoding rule is an encoding rule corresponding to the first vertex of the plurality of encoding rules;
  • An edge weight calculation unit configured to calculate a weight of each edge in the first weighted undirected graph according to the first weighted undirected graph generated by the first weighted undirected graph generating unit
  • An edge connecting any two vertices in the first weighted undirected graph is an edge of the first weighted undirected graph
  • a weight of the first side is a variable using a second encoding rule and a third encoding rule a value of the obtained second function
  • the two vertices of the first side respectively correspond to the second encoding rule and the third encoding rule
  • the first side is any of the first weighted undirected graph
  • the second function is configured to perform a bitwise operation on the second encoding rule and the third encoding rule, and calculate the number of 1 in the result of the bitwise operation, and the one of the results of the bitwise operation The number is the value of the second function
  • a comparison unit configured to compare, according to the calculation result of the edge weight calculation unit, if the weight of the edge with the smallest weight in the first weighted undirected graph is less than the first threshold, the first threshold is greater than or equal to 1 And an integer less than or equal to X, where X is the number of bits in the first encoding rule, Then, the following operations are performed cyclically until a trigger signal is sent to the decision tree generating unit when the weight of the edge with the smallest weight in the newly generated weighted undirected graph is greater than or equal to the first threshold:
  • the encoding rule corresponding to the vertex of the vertex is that the fourth encoding rule and the fifth encoding rule respectively corresponding to the two vertices of the edge having the smallest weight in the weighted undirected graph generated in the generated weighted undirected graph are a value of the third function obtained by the variable; the third function is configured to perform a bitwise AND operation on the fourth encoding rule and the fifth encoding rule, and the result of the bitwise operation is the value of the third function
  • the sub-rule set corresponding to the new vertex includes all the sub-rule sets corresponding
  • an embodiment of the present invention provides a method for generating a decision tree, including: an encoding processing unit, configured to generate an encoding rule set according to a rule set; the rule set includes multiple rules, and each rule includes 0. 1 or a character string of a wildcard, any two of the plurality of rules are not equal to each other; the coding rule set includes a plurality of coding rules, and any two of the plurality of coding rules are not equal to each other; Each of the plurality of encoding rules corresponds to at least one of the plurality of rules, each of the plurality of rules corresponding to one of the plurality of encoding rules; the encoding rule corresponding to the first rule is according to the a function to the stated The first rule is encoded, the first rule is any one of the multiple rules, and the first function is used to replace multiple segments in the first rule with multiple code segments, and further Obtaining a first encoding rule; the first rule is composed of the plurality of segments, each segment includes at least one character;
  • a first weighted undirected graph generating unit configured to generate a first weighted undirected graph according to the encoding rule set generated by the encoding processing unit;
  • the first weighted undirected graph includes a plurality of vertices, Corresponding to the plurality of coding rules, each of the plurality of vertices corresponds to a sub-rule set; the sub-rule set corresponding to the first vertex includes all of the plurality of rules and the first coding rule Corresponding rules;
  • the first vertice is any one of the first weighted undirected graphs;
  • the first encoding rule is an encoding rule corresponding to the first vertex of the plurality of encoding rules;
  • An edge weight calculation unit configured to calculate a weight of each edge in the first weighted undirected graph according to the first weighted undirected graph generated by the first weighted undirected graph generating unit
  • An edge connecting any two vertices in the first weighted undirected graph is an edge of the first weighted undirected graph
  • a weight of the first side is a variable using a second encoding rule and a third encoding rule a value of the obtained second function
  • the two vertices of the first side respectively correspond to the second encoding rule and the third encoding rule
  • the first side is any of the first weighted undirected graph
  • the second function is configured to perform a bitwise operation on the second encoding rule and the third encoding rule, and calculate the number of 1 in the result of the bitwise operation, and the one of the results of the bitwise operation The number is the value of the second function
  • a comparison unit configured to compare, according to the calculation result of the edge weight calculation unit, if the weight of the edge with the smallest weight in the first weighted undirected graph is less than the first threshold, the first threshold is large For an integer equal to 1 and less than or equal to X, where X is the number of bits in the first encoding rule, the following operations are performed cyclically until the weight of the edge with the smallest weight in the newly generated weighted undirected graph When the first threshold is greater than or equal to, the trigger signal is sent to the decision tree generating unit:
  • the encoding rule corresponding to the vertex of the vertex is that the fourth encoding rule and the fifth encoding rule respectively corresponding to the two vertices of the edge having the smallest weight in the weighted undirected graph generated in the generated weighted undirected graph are a value of the third function obtained by the variable; the third function is configured to perform a bitwise AND operation on the fourth encoding rule and the fifth encoding rule, and the result of the bitwise operation is the value of the third function
  • the sub-rule set corresponding to the new vertex includes all the sub-rule sets corresponding
  • each rule in the rule set is encoded to obtain a coding rule set.
  • Each vertex in the newly generated weighted undirected graph corresponds to a sub-rule set.
  • the rule set is divided into a plurality of sub-rule sets.
  • FIG. 2B is a flowchart of a method for generating a decision tree according to an embodiment of the present invention
  • FIG. 2 is a flowchart of another method for generating a decision tree according to an embodiment of the present invention
  • FIG. 2 is still another decision tree according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of a rule set and a corresponding set of coding rules according to an embodiment of the present invention
  • FIG. 3b is a schematic diagram of a first weighted undirected graph generated based on the encoding rule set shown in FIG. 3a according to an embodiment of the present invention
  • FIG. 4a is a schematic diagram of a set of coding rules corresponding to the weighted undirected graph shown in FIG. 4b according to an embodiment of the present invention
  • FIG. 4b is a schematic diagram of a weighted undirected graph generated based on the first weighted undirected graph shown in FIG. 3b according to an embodiment of the present invention
  • FIG. 5a is a schematic diagram of a set of coding rules corresponding to the weighted undirected graph shown in FIG. 5b according to an embodiment of the present invention
  • FIG. 5b is a schematic diagram of a weighted undirected graph generated based on the weighted undirected graph shown in FIG. 4b according to an embodiment of the present invention
  • FIG. 6a is a schematic diagram of a set of coding rules corresponding to the weighted undirected graph shown in FIG. 5b according to an embodiment of the present invention
  • FIG. 6b is a schematic diagram of a weighted undirected graph generated based on the weighted undirected graph shown in FIG. 5a according to an embodiment of the present invention
  • FIG. 7a is a schematic diagram of a coding rule set corresponding to the weighted undirected graph shown in FIG. 6a according to an embodiment of the present invention
  • FIG. 7b is a schematic diagram of a weighted undirected graph generated based on the weighted undirected graph shown in FIG. 6a according to an embodiment of the present invention.
  • FIG. 7c is a schematic diagram of a corresponding set of sub-rules shown in the weighted undirected graph shown in FIG. 7b according to an embodiment of the present invention.
  • FIG. 8a is a schematic diagram of a decision tree generated based on a sub-rule set shown in FIG. 7c according to an embodiment of the present invention.
  • FIG. 8b is a schematic diagram of another decision tree generated based on another sub-rule set shown in FIG. 7c according to an embodiment of the present invention.
  • FIG. 9 is a schematic diagram of a decision tree generated based on the rule set shown in FIG. 3a according to an embodiment of the present invention.
  • FIG. 10 is a flowchart of still another method for generating a decision tree according to an embodiment of the present invention
  • FIG. 11a is a schematic diagram of mapping multiple rules in a rule set shown in the diagram lib to a two-dimensional geometry measured in decimal units according to an embodiment of the present invention
  • Schematic diagram of space
  • Figure lib is a schematic diagram of another rule set and corresponding coding rule set provided by an embodiment of the present invention.
  • Figure 11c is a first weighted undirected graph generated based on the set of encoding rules shown in Figure lib according to an embodiment of the present invention
  • FIG. 12a is a set of coding rules corresponding to the weighted undirected graph shown in FIG. 12b according to an embodiment of the present invention
  • FIG. 12b is a weighted undirected graph generated based on the first weighted undirected graph shown in FIG. 11c according to an embodiment of the present invention.
  • FIG. 13a is a set of coding rules corresponding to the weighted undirected graph shown in FIG. 13b according to an embodiment of the present invention
  • FIG. 13b is a weighted undirected graph generated based on the weighted undirected graph shown in FIG. 12b according to an embodiment of the present invention
  • FIG. 14a is a set of coding rules corresponding to the weighted undirected graph shown in FIG. 14b according to an embodiment of the present invention
  • FIG. 14b is a weighted undirected graph generated based on the weighted undirected graph shown in FIG. 13b according to an embodiment of the present invention
  • FIG. 14c is a schematic diagram of a set of sub-rules corresponding to the weighted undirected graph shown in FIG. 14b according to an embodiment of the present invention
  • 15a is a schematic diagram of mapping a sub-rule set shown in FIG. 14c to a two-dimensional geometric space in units of decimals according to an embodiment of the present invention
  • Figure 15b is a schematic diagram of a decision tree generated according to the sub-rule set shown in Figure 15a according to an embodiment of the present invention
  • 16a is a schematic diagram of mapping another sub-rule set shown in FIG. 14c to a two-dimensional geometric space in units of decimals according to an embodiment of the present invention
  • Figure 16b is a schematic diagram of a decision tree generated according to the sub-rule set shown in Figure 16a according to an embodiment of the present invention
  • 17a is a schematic diagram of mapping another sub-rule set shown in FIG. 14c to a two-dimensional geometric space in units of decimals according to an embodiment of the present invention
  • FIG. 17b is a schematic diagram of a decision tree generated according to the sub-rule set shown in FIG. 17a according to an embodiment of the present invention.
  • 18a is a schematic diagram of mapping multiple rules in a rule set shown in FIG. 1b to a two-dimensional geometric space in units of decimals according to an embodiment of the present invention
  • Figure 18b is a schematic diagram of a decision tree generated based on a rule set shown in Figure lib according to an embodiment of the present invention
  • FIG. 19 is a schematic structural diagram of a device for generating a decision tree according to an embodiment of the present invention
  • FIG. 20 is a schematic structural diagram of another apparatus for generating a decision tree according to an embodiment of the present invention
  • FIG. 22 is a schematic structural diagram of another apparatus for generating a decision tree according to an embodiment of the present invention
  • the execution body of the method provided by the embodiment of the present invention may be a flow classifier or have a flow score.
  • Class of devices such as switches, routers, base stations, load balancers, or firewalls.
  • the traffic classifier may be a network processor (NP).
  • the device provided by the embodiment of the present invention may be a flow classifier, or a device with a flow classifier, such as a switch, a router, a base station, a load balancer, or a firewall.
  • the stream classifier can be a network processor.
  • FIG. 1 is a flowchart of a method for generating a decision tree according to an embodiment of the present invention. As shown in FIG. la, the method includes:
  • 11a Generate a set of encoding rules based on the rule set.
  • the rule set includes multiple rules, and each rule is a character string including 0, 1, or a wildcard, and any two of the multiple rules are not equal to each other.
  • the encoding rule set includes a plurality of encoding rules, and any two of the plurality of encoding rules are not equal to each other.
  • Each of the plurality of encoding rules corresponds to at least one of the plurality of rules, and each of the plurality of rules corresponds to one of the plurality of encoding rules.
  • the encoding rule corresponding to the first rule is obtained by encoding the first rule according to the first function, where the first rule is any one of the multiple rules, and the first function is used to A plurality of segments in a rule are replaced with a plurality of code segments, thereby obtaining a first coding rule.
  • the first rule consists of the plurality of segments, each segment comprising at least one character.
  • the first encoding rule is composed of the plurality of encoding segments, each encoding segment is one bit, and the plurality of segments correspond to the plurality of encoding segments, and the first segment is in the first rule The location coincides with the location of the first coded segment in the first coding rule.
  • the first segment is any one of the plurality of segments.
  • the first coding segment is one of the plurality of coding segments corresponding to the first segment.
  • the first rule is a variable of the first function.
  • the first encoding rule is a value of the first function.
  • the first function is further configured to calculate the first part according to the first segment A code segment.
  • the first segment includes at least two characters
  • the first encoded segment is 1 if a wildcard in the first segment is If the number is less than N
  • the first code segment is 0, N is an integer greater than or equal to 1 and less than or equal to M, and M is the number of symbols in the first segment.
  • the first segment includes at least two characters
  • the first encoded segment is 0, if the first segment is in the first segment
  • the number of wildcards is less than N
  • the first code segment is 0,
  • N is an integer greater than or equal to 1 and less than or equal to M
  • M is the number of symbols in the first segment.
  • the first encoded segment is 1 or 0.
  • the first weighted undirected graph includes a plurality of vertices, and the plurality of vertices correspond to the plurality of coding rules, and each of the plurality of vertices corresponds to one sub-rule set.
  • the set of sub-rules corresponding to the first vertex includes all the rules corresponding to the first encoding rule among the plurality of rules.
  • the first vertex is any one of the first weighted undirected graphs.
  • the first encoding rule is an encoding rule corresponding to the first vertex among the plurality of encoding rules.
  • the side connecting any two vertices in the first weighted undirected graph is the edge of the first weighted undirected graph.
  • the weight of the first side is the value of the second function obtained by the second encoding rule and the third encoding rule as variables.
  • the two vertices of the first side respectively correspond to the second coding rule and the third coding rule.
  • the first side is any one of the first weighted undirected graphs.
  • the second function is configured to perform a bitwise operation on the second encoding rule and the third encoding rule, and calculate the number of 1 in the result of the bit operation, and the number of 1 in the result of the bit operation is The value of the second function.
  • the bitwise operation can be an AND operation, or an operation or an exclusive OR operation.
  • the first threshold is an integer greater than or equal to 0 and less than or equal to X-1, and X is the number of bits in the first encoding rule.
  • the first operation includes: generating a new vertex according to the edge with the largest weight in the last generated weighted undirected graph in the generated weighted undirected graph, and generating the new vertex according to the new vertex A new undirected map with power.
  • the new weighted undirected graph includes all of the new vertices and the two vertices of the edge with the largest weight in the last generated weighted undirected graph in the generated weighted undirected graph vertex.
  • the encoding rule corresponding to the new vertex is a fourth encoding rule and a fifth corresponding to the two vertices of the edge with the largest weight in the weighted undirected graph generated in the generated weighted undirected graph.
  • the encoding rule is the value of the third function obtained by the variable.
  • the third function is configured to perform a bitwise AND operation on the fourth encoding rule and the fifth encoding rule, and the result of the bitwise AND operation is a value of the third function.
  • the sub-rule set corresponding to the new vertex includes all the rules in the sub-rule set corresponding to the two vertices of the edge with the largest weight in the last generated weighted undirected graph in the generated weighted undirected graph .
  • the weight of the second side in the new weighted undirected graph is the value of the fourth function obtained by using the sixth encoding rule and the seventh encoding rule as variables.
  • the two vertices of the second side respectively correspond to the sixth coding rule and the seventh coding rule.
  • the second side is any one of the new weighted undirected graphs.
  • the fourth function is configured to perform a bitwise operation on the sixth encoding rule and the seventh encoding rule, and calculate the number of 1 in the result of the bit operation, and the number of 1 in the result of the bit operation is The value of the fourth function.
  • the bitwise operation is an AND operation, or an operation or an exclusive OR operation or the like.
  • the first weighted undirected graph is included in the weighted undirected graph that has been generated, the first weighted undirected graph is in the generated weighted undirected graph. The resulting weighted undirected graph.
  • one edge may be randomly selected to generate a new vertex.
  • the sum of the number of rules included in the two sub-rule sets corresponding to the two vertices in each edge may be separately calculated, and two of the largest edges according to the multiple weights The edges with the largest number of rules included in the two sub-rule sets corresponding to the vertices respectively generate new vertices.
  • edges with the largest weight in the weighted undirected graph generated in the generated weighted undirected graph when there are multiple edges with the largest weight in the weighted undirected graph generated in the generated weighted undirected graph, for the edges (at least two) without the common vertex, a new one may be generated separately. Vertices (at least two), and generate new weighted undirected graphs based on new vertices (at least two).
  • a common vertex is a vertex that is shared by two or more of the edges with the largest weight. It should be noted that there are many weighted undirected graphs generated in the generated weighted undirected graph. When the edges with the largest weight are the same, for the two edges with the common vertex, new vertices cannot be generated separately.
  • each rule in the rule set is encoded to obtain a coding rule set.
  • the two vertices corresponding to the edge with the largest weight in the weighted undirected graph are merged until the weight of the edge with the largest weight in the newly generated weighted undirected graph satisfies certain conditions.
  • Each vertex in the newly generated weighted undirected graph corresponds to a sub-rule set.
  • the rule set is divided into a plurality of sub-rule sets.
  • an algorithm for generating a decision tree such as Modular, is obtained, and the probability of rule replication in multiple decision trees is lower than that obtained by running the same decision tree for the rule set.
  • the probability of rule replication in the tree Therefore, the technical solution provided by the embodiment of the present invention reduces the probability of occurrence of rule replication.
  • a plurality of sub-rule sets obtained by using the method provided by the embodiment of the present invention respectively run an algorithm for generating a decision tree, and the probability of rule replication in the plurality of decision trees is lower than an algorithm for running the same generated decision tree for the rule set.
  • the probability of a rule copy in a decision tree It can be determined from above that each sub-rule set contains fewer rules than the rule set contains. Therefore, the algorithm that runs a certain decision tree generates a plurality of decision trees for each of the plurality of sub-rule sets, and the height of the plurality of decision trees is less than or equal to the height of a decision tree generated by the algorithm that runs the same decision tree for the rule set.
  • the speed of performing rule matching in parallel for multiple decision trees is higher than or equal to the speed of performing rule matching separately for a decision tree.
  • the bitwise operation is an AND operation, or an operation or an exclusive OR operation.
  • the first encoded segment is 1; if the character in the first segment is 0, the first encoded segment is 1; if the first segment is The character in the character is a wildcard, and the first code segment is 0.
  • the decision tree for the sub-rule set corresponding to each vertex in the newly generated weighted undirected graph includes: A decision tree is separately generated by the Modular algorithm for the set of sub-rules corresponding to each vertex in the newly generated weighted undirected graph.
  • the following is a specific analysis of the technical solution provided by this embodiment by taking the bit operation as an AND operation and the first segment including one character as an example. That is, if the character in the first segment is 1, the first encoded segment is 1; if the character in the first segment is 0, the first encoded segment is 1; The character in the segment is a wildcard, and the first encoded segment is 0. That is, the first fragment can be 1, 0 or a wildcard.
  • the following two cases of the first segment being 1 and the first segment being 0 are collectively referred to as the first segment being a non-wildcard.
  • the first function is for encoding the first segment as the first encoded segment.
  • the non-wildcards in the rule are encoded as 1 for each rule in the rule set; the wildcard in the rule is encoded as 0.
  • a bit string obtained by performing a bitwise AND operation on two encoding rules if the bit at a certain position in the bit string is 1, it indicates that both bits in the corresponding positions in the two rules are non-wildcards. With this position as the reference position, rule copying does not occur. If the bit at a certain position in the bit string is 0, it means that at least one of the two bits of the corresponding position in the two rules is a wildcard. When this position is used as the reference position, rule copying is inevitable.
  • the weight of the edge of the weighted undirected graph is the number of bits in the bit string obtained by performing bitwise AND operations on the two encoding rules. Therefore, the larger the weight, the more the number of positions where the rule copy does not occur and which can be used as the reference position in the two coding rules. Selecting the above location as the base location avoids rule duplication.
  • FIG. 1b is a flowchart of another method for generating a decision tree according to an embodiment of the present invention.
  • the method shown in Figure lb includes:
  • l ib Generates a collection of encoding rules based on the rule set.
  • the first operation is performed cyclically until the weight of the edge with the smallest weight in the newly generated weighted undirected graph is greater than Or equal to the first threshold.
  • 15b Generate a decision tree separately for the sub-rule set corresponding to each vertex in the newly generated weighted undirected graph.
  • the rule set includes multiple rules, and each rule includes 0, 1 or A string of wildcard characters, and any two of the plurality of rules are not equal to each other.
  • the encoding rule set includes a plurality of encoding rules, and any two of the plurality of encoding rules are not equal to each other.
  • Each of the plurality of encoding rules corresponds to at least one of the plurality of rules, and each of the plurality of rules corresponds to one of the plurality of encoding rules.
  • the encoding rule corresponding to the first rule is obtained by encoding the first rule according to the first function, where the first rule is any one of the multiple rules, and the first function is used to A plurality of segments in a rule are replaced with a plurality of code segments, thereby obtaining a first coding rule.
  • the first rule consists of the plurality of segments, each segment comprising at least one character.
  • the first encoding rule is composed of the plurality of encoding segments, each encoding segment is one bit, and the plurality of segments correspond to the plurality of encoding segments, and the first segment is in the first rule The location coincides with the location of the first coded segment in the first coding rule.
  • the first segment is any one of the plurality of segments.
  • the first coding segment is one of the plurality of coding segments corresponding to the first segment.
  • the first rule is a variable of the first function.
  • the first encoding rule is a value of the first function.
  • the first function is further configured to calculate the first encoded segment according to the first segment.
  • the first segment includes at least two characters
  • the first encoded segment is 1 if a wildcard in the first segment is If the number is less than N
  • the first code segment is 0, N is an integer greater than or equal to 1 and less than or equal to M, and M is the number of symbols in the first segment.
  • the first segment includes at least two characters
  • the first encoded segment is 0, if the first segment is in the first segment
  • the number of wildcards is less than N
  • the first code segment is 0,
  • N is an integer greater than or equal to 1 and less than or equal to M
  • M is the number of symbols in the first segment.
  • the first encoded segment is 1 or 0.
  • the first weighted undirected graph includes a plurality of vertices, and the plurality of vertices correspond to the plurality of coding rules, and each of the plurality of vertices corresponds to one sub-rule set.
  • the set of sub-rules corresponding to the first vertex includes all the rules corresponding to the first encoding rule among the plurality of rules.
  • the first vertex is any one of the first weighted undirected graphs.
  • the first An encoding rule is an encoding rule corresponding to the first vertex among the plurality of encoding rules.
  • the side connecting any two vertices in the first weighted undirected graph is the edge of the first weighted undirected graph.
  • the weight of the first side is the value of the second function obtained by the second encoding rule and the third encoding rule as variables.
  • the two vertices of the first side respectively correspond to the second coding rule and the third coding rule.
  • the first side is any one of the first weighted undirected graphs.
  • the second function is configured to perform a bitwise operation on the second encoding rule and the third encoding rule, and calculate the number of 1 in the result of the bit operation, and the number of 1 in the result of the bit operation is The value of the second function.
  • the bitwise operation can be an AND operation, or an operation or an exclusive OR operation.
  • the first threshold is an integer greater than or equal to 0 and less than or equal to X-1, and X is the number of bits in the first encoding rule.
  • the first operation may include: generating a new vertex according to the edge with the smallest weight in the last generated weighted undirected graph in the generated weighted undirected graph, and generating a new according to the new vertex The weighted undirected graph.
  • the new weighted undirected graph includes all of the new vertices and the two vertices of the edge having the smallest weight in the last generated weighted undirected graph in the generated weighted undirected graph vertex.
  • the encoding rule corresponding to the new vertex is a fourth encoding rule and a fifth corresponding to the two vertices of the edge with the smallest weight in the weighted undirected graph generated in the generated weighted undirected graph.
  • the encoding rule is the value of the third function obtained by the variable.
  • the third function is configured to perform a bitwise AND operation on the fourth encoding rule and the fifth encoding rule, and the result of the bitwise AND operation is the value of the third function.
  • the sub-rule set corresponding to the new vertex includes all rules in the sub-rule set corresponding to the two vertices of the edge with the smallest weight in the last generated weighted undirected graph in the generated weighted undirected graph .
  • the weight of the second side in the new weighted undirected graph is the value of the fourth function obtained by using the sixth encoding rule and the seventh encoding rule as variables.
  • the two vertices of the second side respectively correspond to the sixth coding rule and the seventh coding rule.
  • the second side is any one of the new weighted undirected graphs.
  • the fourth function is configured to perform a bitwise operation on the sixth encoding rule and the seventh encoding rule, and calculate the number of 1 in the result of the bit operation, and the number of 1 in the result of the bit operation is The value of the fourth function.
  • the first weighted undirected graph is the last generated in the generated weighted undirected graph.
  • one edge may be randomly selected to generate a new vertex.
  • the sum of the number of rules included in the two sub-rule sets corresponding to the two vertices in each edge may be separately calculated, and two of the minimum edges according to the multiple weights The edges with the largest number of rules included in the two sub-rule sets corresponding to the vertices respectively generate new vertices.
  • a new one may be generated separately.
  • Vertices (at least two) and generate new weighted undirected graphs based on new vertices (at least two).
  • a common vertex is a vertex that is shared by two or more of the edges with the smallest weight. It should be noted that when there are multiple edges with the smallest weight in the weighted undirected graph generated in the generated weighted undirected graph, new edges cannot be generated separately for the two edges with common vertices. vertex.
  • each rule in the rule set is encoded to obtain a coding rule set.
  • the two vertices corresponding to the edge with the smallest weight in the weighted undirected graph are merged until the weight of the edge with the smallest weight in the newly generated weighted undirected graph satisfies certain conditions.
  • Each vertex in the newly generated weighted undirected graph corresponds to a sub-rule set.
  • the rule set is divided into a plurality of sub-rule sets.
  • an algorithm for generating a decision tree such as Modular, is obtained, and the probability of rule replication in multiple decision trees is lower than that obtained by running the same decision tree for the rule set.
  • the probability of rule replication in the tree Therefore, the technical solution provided by the embodiment of the present invention reduces the probability of occurrence of rule replication.
  • a plurality of sub-rule sets obtained by using the method provided by the embodiment of the present invention respectively run an algorithm for generating a decision tree, and the probability of rule replication in the plurality of decision trees is lower than an algorithm for running the same generated decision tree for the rule set.
  • the probability of a rule copy in a decision tree It can be determined from above that the number of rules included in each sub-rule set is smaller than the number of rules included in the rule set. Therefore, the algorithm that runs a certain decision tree generates a plurality of decision trees for each of the plurality of sub-rule sets, and the height of the plurality of decision trees is less than or equal to the height of a decision tree generated by the algorithm that runs the same decision tree for the rule set.
  • the lower the height of the decision tree the faster the execution rules match. Therefore, multiple decisions obtained by the method provided by the embodiment
  • the tree the speed of performing rule matching in parallel for multiple decision trees is higher than or equal to the speed at which rule matching is performed separately for a decision tree.
  • the bitwise operation is an AND operation, or an operation or an exclusive OR operation.
  • the first encoding segment is 0; if the character in the first segment is 0, the first encoding segment is 0; if the first segment The character in the character is a wildcard, and the first code segment is 1.
  • the generating a decision tree for each of the sub-rule sets corresponding to each vertex in the newly generated weighted undirected graph includes:
  • a decision tree is separately generated by the Modular algorithm for the set of sub-rules corresponding to each vertex in the newly generated weighted undirected graph.
  • the bitwise operation is ORed, and the first segment includes a character as an example to specifically analyze the technical solution provided by the embodiment. That is, if the character in the first segment is 1, the first encoded segment is 0; if the character in the first segment is 0, the first encoded segment is 0; The character in the segment is a wildcard, and the first encoded segment is 1. That is, the first fragment can be 1, 0 or a wildcard. For convenience of explanation, the following two cases of the first segment being 1 and the first segment being 0 are collectively referred to as the first segment being a non-wildcard.
  • the first function is for encoding the first segment as the first encoded segment.
  • the non-wildcards in the rule are encoded as 0 for each rule in the rule set; the wildcard in the rule is encoded as 1.
  • bit string obtained by performing a bitwise OR operation on two coding rules if a bit at a certain position in the bit string is 1, it indicates that at least one of the two bits located in the corresponding position in the two rules is a wildcard. When this position is used as the base position, rule copying occurs. If the bit at a location in the bit string is 0, then both bits in the corresponding position in both rules are non-wildcards. With this position as the reference position, rule copying does not occur.
  • the weight of the edge of the weighted undirected graph is the ratio of the two encoding rules to the bitwise OR operation.
  • the number of bits in the special string is 1. Therefore, the smaller the weight, the more the number of positions where the rule copy does not occur and which can be used as the reference position among the two coding rules. Selecting the above location as the reference location avoids rule duplication.
  • each rule in the rule set contains a plurality of segments, and each segment includes a 1-character scene.
  • the technical solution includes:
  • Figure 3a is a schematic diagram of a set of rules and a corresponding set of encoding rules.
  • the rules al0, a20, a30, a40, a50, and a60 correspond to the coding rules al, a2, a3, a4, a5, and a6, respectively.
  • Figure 3a uses the encoding method of encoding "*" in the rule to 0 and non-"*" in the rule to 1.
  • the "*” is a wildcard.
  • the meaning of "*” is that the value of the bit corresponding to "*" in the rule can be 0 or 1.
  • Figure 3b is a first weighted undirected graph generated based on the set of encoding rules shown in Figure 3a.
  • the coding rules al, a2, a3, a4, a5, and a6 correspond to the vertices al, a2, a3, a4, a5, and a6 in the first weighted undirected graph, respectively.
  • the first weighted undirected graph is a weighted undirected graph generated for the first time.
  • each vertex corresponds to an encoding rule; the weights of the edges connecting the two vertices on either side can be bitwisely performed by the execution of the encoding rules corresponding to the two vertices.
  • the operation is obtained.
  • the result of bit and operation is a bit string.
  • the weight of the edge is the number of 1s in the result of the bitwise AND operation. Performing a bitwise AND operation on the encoding rule corresponding to the two vertices respectively connected to any one of the edges to obtain a first operation result; and counting the number of 1s in the first operation result.
  • the vertices al and a3 of the first weighted undirected graph correspond to coding rules al and a3, respectively.
  • the bitwise AND operation is performed on the encoding rules al and a3, and the number of 1s in the obtained result is 4. Therefore, the weight of the edge connecting the vertices al and a3 in the first weighted undirected graph is 4.
  • the edge with the largest weight in the first weighted undirected graph is the edge connecting the vertices al and a3.
  • the first threshold is an integer between 0 and 7 in decimal.
  • the non-wildcards in the rules are encoded as one for each rule in the rule set.
  • the wildcard in the rule is encoded as 0.
  • Performing a bitwise AND operation on two encoding rules if a bit in a bit position in the bit string is 1 , it means that the two bits in the corresponding two positions are non-wildcards. With this position as the reference position, rule copying does not occur. If the bit at a certain position in the bit string is 0, it indicates that at least one of the two bits of the corresponding position in the two rules is a wildcard. When this position is used as the reference position, rule copying is inevitable.
  • the weight of the edge of the weighted undirected graph is the number of bits in the bit string obtained by performing bitwise AND operations on the two encoding rules. Therefore, the larger the weight, the more the number of positions where the rule copy does not occur and which can be used as the reference position in the two coding rules. Selecting the above location as the base location avoids rule duplication.
  • the edges connecting the vertices al and a3 have the largest weight.
  • the vertices al and a3 correspond to the coding rules al and a3, respectively. Therefore, the number of positions in which the rule copying does not occur in the encoding rules al and a3 and which can be used as the reference position is the largest, which is four. By selecting any of the above four positions as the reference position, rule copying can be avoided.
  • the vertices al and a3 correspond to two sub-rule sets, respectively, and the sub-rule set corresponding to the vertex al contains the rule al0.
  • the sub-rule set corresponding to the vertex a3 contains the rule a30.
  • the sub-rule set corresponding to the vertex al&a3 contains the rules alO and a30.
  • a total of four four positions can be used as the reference position where the rule copy does not occur. Therefore, when the algorithm that generates the decision tree is run against the set of sub-rules corresponding to the vertex al&a3, such as Modular, and the decision tree is generated, the probability of rule replication is low.
  • edges of the edges connected by the vertices a3 and a4 are also 4. Therefore, the edges to which the vertices a3 and a4 are connected are also the edges with the largest weight in the first weighted undirected graph.
  • One of the edges to which the vertices a3 and a4 are connected and the side to which the vertices a3 and al are connected may be randomly selected as the to-be-edged edge. For example, in the embodiment of the present invention, it is determined that the edge to which the vertex a3 and al are connected is the edge for generating a new vertex.
  • the method of generating a new vertex is, for example, determining the edge with the largest weight in the weighted undirected graph finally generated in the generated weighted undirected graph.
  • the encoding rule corresponding to the two vertices of the edge with the largest weight is subjected to bitwise AND operation to obtain the result of bitwise AND operation.
  • the result of the bit and operation is the encoding rule corresponding to the new vertex.
  • a new weighted undirected graph is generated from the two vertices and vertices of the edge with the largest weight in the generated weighted undirected graph.
  • the weighted undirected graph shown in Fig. 3b is the last generated weighted undirected graph in the generated weighted undirected graph.
  • the weighted undirected graph shown in Fig. 4b is the newly generated weighted undirected graph. Calculate the weights of the edges in the weighted undirected graph as shown in Figure 4b. The calculation method is the same as above, and will not be described here.
  • the first threshold is preset to be equal to 1.
  • the weighted undirected graph shown in Fig. 4b is the last generated weighted undirected graph in the generated weighted undirected graph.
  • the edge with the largest weight in the weighted undirected graph shown in Fig. 4b is the edge where the vertices a2 and a5 are connected.
  • the edges of the vertices a2 and a5 are bounded by a weight of 3. That is to say, the weight of the edge with the most weight is greater than the first threshold.
  • the bitwise AND operation is performed on the coding rules corresponding to the vertices a2 and a5, respectively, and the result of the bit and operation is the coding rule corresponding to the new vertex.
  • FIG. 5b is the newly generated weighted undirected graph.
  • the plurality of vertices included in the weighted undirected graph shown in Fig. 5b correspond to a plurality of encoding rules included in the encoding rule set shown in Fig. 5a.
  • the weights of the edges in the weighted undirected graph shown in Figure 5b are calculated according to the encoding rules shown in Figure 5a.
  • Fig. 5b is the last generated weighted undirected graph in the weighted undirected graph that has been generated.
  • the weight of the edge with the largest weight in the weighted undirected graph shown in Figure 5b has a weight of two. That is to say, the weight of the edge with the largest weight is greater than the first threshold.
  • the bitwise AND operation is performed on the coding rules corresponding to the vertices al&a2 and a4, respectively, and the result of the bit and operation is the coding rule corresponding to the new vertex. Substituting the vertices al&a2 and a4 in Fig.
  • FIG. 6b is the newly generated weighted undirected graph.
  • the three vertices included in the weighted undirected graph shown in Fig. 6b correspond to the three encoding rules included in the encoding rule set shown in Fig. 6a.
  • the weights of the sides in the weighted undirected graph shown in Fig. 6b are calculated according to the encoding rules shown in Fig. 6a.
  • the weight of the edge with the largest weight in the weighted undirected graph shown in Fig. 6b is 1, that is, less than or equal to the first threshold. Therefore, the segmentation of the rule set has been completed so far.
  • the sub-rule set corresponding to each vertex in the weighted undirected graph shown in FIG. 6b is a plurality of sub-rule sets after the rule set shown in FIG. 3a is segmented.
  • the vertices of the weighted undirected graph shown in Fig. 7b are two, which are the vertices al&a3&a4&a6 and the vertices a2&a5.
  • the vertices al&a3&a4&a6 and a2&a5 correspond to two sub-rule sets, respectively. That is to say, the rule set shown in Fig. 3a is divided into two sub-rule sets. Specifically, as shown in Figure 7c.
  • the sub-rule set corresponding to the vertex al&a3&a4&a6 includes: rules al0, a30, a40, and a60, that is, ⁇ alO, a30, a40, a60 ⁇ ; the sub-rule set corresponding to the vertex a2&a5 includes rules a20 and a50, that is, ⁇ a20, a50 ⁇ .
  • the decision tree is generated separately according to Modular's two sub-rule sets as shown in Figure 7c.
  • the embodiment divides the rule set shown in FIG. 3a into two sub-rule sets. For details, refer to FIG. 7c.
  • a decision tree is generated for each of the two sub-rule sets shown in Figure 7c by the Modular algorithm. See the decision tree shown in Figure 8a and the decision tree shown in Figure 8b for details.
  • the decision tree shown in Figure 9 is a decision tree generated by the Modular algorithm for the set of rules shown in Figure 3a. The position indicated by the broken line in Fig. 9 is the reference position.
  • the rule with the reference position of 0 is placed on the lower level node on the left side; the rule with the reference position of 1 is placed on the lower level node on the right side; the rule with the reference position of * is placed on the left side.
  • the next level node on the side is also placed on the next level node on the right side.
  • the decision tree shown in Fig. 9 has a high height and a high probability of occurrence of rule copying.
  • the rules al0, a20, a30, and a40 all involve rule replication.
  • the height of the decision tree is low, and the probability of occurrence of rule copying is low.
  • the probability of rule replication in the decision tree generated by the method provided by the embodiment is low, and the storage capacity required for each node of the storage decision tree is saved.
  • the method provided in this embodiment divides the rule set into multiple sub-rule sets, and runs an algorithm for generating a decision tree to generate a decision tree for each sub-rule set.
  • a rule set contains multiple rules.
  • the rule contained in the rule set can be the destination internet protocol address.
  • the destination internet protocol address can be 32 bits or 128 bits.
  • the network receives an internet protocol packet.
  • the flow classifier can parse the received internet protocol packet to obtain the destination internet protocol address of the internet protocol packet.
  • the flow classifier can perform rule matching in parallel in multiple decision trees according to the destination internet protocol address. If a rule matching the destination internet protocol address of the internet protocol packet is found in a plurality of decision trees, an action corresponding to the rule is performed.
  • the method for performing the matching search for each decision tree is the same as the prior art, and details are not described herein again.
  • the rule with the highest priority is determined as the final matching rule with the Internet Protocol packet according to the priority of each rule, and the final matching is performed on the Internet Protocol packet.
  • Actions corresponding to rules such as: pass, drop, perform traffic restriction, perform bandwidth guarantee, and so on.
  • the method provided in this embodiment may only perform the segmentation process on the newly added rule. For example, the same coding method is used to encode the newly added rule, and a new coding rule corresponding to the newly added rule is obtained, and the weighting rule is generated based on the newly added coding rule and the original rule set corresponding to the original rule set. To the picture. The rule based on the weighted undirected graph determines whether the newly added rule belongs to one of the plurality of sub-rule sets corresponding to the original rule set, or belongs to a new sub-rule set.
  • the sub-rule set to which the newly added rule belongs can be updated and the decision tree is regenerated; if it is the latter, a decision tree can be generated based on the new sub-rule set. It can be seen that, in the method provided in this embodiment, when a new rule is added to the rule set, the decision tree is not required to be re-generated for all the sub-rule sets that are divided into the original rule set, but only for the new rule.
  • the sub-rule set belongs to regenerate the decision tree, or the new sub-rule set corresponding to the new rule generates a decision tree.
  • FIG. 10 is a flowchart of a method for generating a decision tree according to an embodiment of the present invention. As shown in FIG. 10, the method includes:
  • the first threshold is an integer greater than or equal to 1 and less than or equal to X, and X is a bit in the first encoding rule. The number of times, the first operation is performed cyclically until the weight of the edge with the smallest weight in the newly generated weighted undirected graph is greater than or equal to the first threshold.
  • the rule set includes multiple rules, and each rule is a character string including 0, 1, or a wildcard, and any two of the multiple rules are not equal to each other.
  • the encoding rule set includes a plurality of encoding rules, and any two of the plurality of encoding rules are not equal to each other.
  • Each of the plurality of encoding rules corresponds to at least one of the plurality of rules, and each of the plurality of rules corresponds to one of the plurality of encoding rules.
  • the encoding rule corresponding to the first rule is obtained by encoding the first rule according to the first function, where the first rule is any one of the multiple rules, and the first function is used to A plurality of segments in a rule are replaced with a plurality of code segments, thereby obtaining a first coding rule.
  • the first rule consists of the plurality of segments, each segment containing at least one character.
  • the first encoding rule is composed of the plurality of encoding segments, each encoding segment is one bit, and the plurality of segments correspond to the plurality of encoding segments, and the first segment is in the first rule The location coincides with the location of the first coded segment in the first coding rule.
  • the first segment is any one of the plurality of segments.
  • the first coding segment is one of the plurality of coding segments corresponding to the first segment.
  • the first rule is a variable of the first function.
  • the first encoding rule is a value of the first function.
  • the first function is further for calculating the first encoded segment from the first segment.
  • the first segment includes at least two characters
  • the first encoded segment is 1 if a wildcard in the first segment is If the number is less than N
  • the first code segment is 0, N is an integer greater than or equal to 1 and less than or equal to M, and M is the number of symbols in the first segment.
  • the first segment includes at least two characters
  • the first encoded segment is 0, if the first segment is in the first segment
  • the number of wildcards is less than N
  • the first code segment is 0,
  • N is an integer greater than or equal to 1 and less than or equal to M
  • M is the number of symbols in the first segment.
  • the first encoded segment is 1 or 0.
  • the first weighted undirected graph includes a plurality of vertices, and the plurality of vertices correspond to the plurality of coding rules, and each of the plurality of vertices corresponds to one sub-rule set.
  • the sub-rule set corresponding to the first vertex includes all of the plurality of rules corresponding to the first encoding rule the rule of.
  • the first vertex is any one of the first weighted undirected graphs.
  • the first encoding rule is an encoding rule corresponding to the first vertex among the plurality of encoding rules.
  • the edge connecting any two vertices in the first weighted undirected graph is the edge of the first weighted undirected graph.
  • the weight of the first side is the value of the second function obtained by the second encoding rule and the third encoding rule as variables.
  • the two vertices of the first side respectively correspond to the second coding rule and the third coding rule.
  • the first side is any one of the first weighted undirected graphs.
  • the second function is configured to perform a bitwise operation on the second encoding rule and the third encoding rule, and calculate the number of 1 in the result of the bit operation, and the number of 1 in the result of the bit operation is The value of the second function.
  • the bitwise operation can be an AND operation, or an operation or an exclusive OR operation.
  • the first operation includes: generating a new vertex according to an edge with the smallest weight in the last generated weighted undirected graph in the generated weighted undirected graph, and generating a new one according to the new vertex Weighted undirected graph.
  • the new weighted undirected graph includes all of the new vertices and the two vertices of the edge having the smallest weight in the last generated weighted undirected graph in the generated weighted undirected graph vertex.
  • the encoding rule corresponding to the new vertex is a fourth encoding rule and a fifth corresponding to the two vertices of the edge with the smallest weight in the weighted undirected graph generated in the generated weighted undirected graph.
  • the encoding rule is the value of the third function obtained by the variable.
  • the third function is configured to perform a bitwise AND operation on the fourth encoding rule and the fifth encoding rule, and the result of the bitwise AND operation is a value of the third function.
  • the sub-rule set corresponding to the new vertex includes all rules in the sub-rule set corresponding to the two vertices of the edge with the smallest weight in the last generated weighted undirected graph in the generated weighted undirected graph .
  • the weight of the second side in the new weighted undirected graph is the value of the fourth function obtained by using the sixth encoding rule and the seventh encoding rule as variables.
  • the two vertices of the second side respectively correspond to the sixth coding rule and the seventh coding rule.
  • the second side is any one of the new weighted undirected graphs.
  • the fourth function is configured to perform a bitwise operation on the sixth encoding rule and the seventh encoding rule, and calculate the number of 1 in the result of the bit operation, and the number of 1 in the result of the bit operation is The value of the fourth function.
  • the bitwise operation is an AND operation, or an operation or an exclusive OR operation or the like.
  • the first weighted undirected graph is included in the weighted undirected graph that has been generated, the first weighted undirected graph is in the generated weighted undirected graph.
  • Final generation The weighted undirected graph.
  • the least weighted edge in the weighted undirected graph generated in the generated weighted undirected graph may have only one or more.
  • one edge may be randomly selected to generate a new vertex.
  • the sum of the number of rules included in the two sub-rule sets corresponding to the two vertices in each edge may be separately calculated, and two of the minimum edges according to the multiple weights The edges with the largest number of rules included in the two sub-rule sets corresponding to the vertices respectively generate new vertices.
  • a new one may be generated separately.
  • Vertices (at least two) and generate new weighted undirected graphs based on new vertices (at least two).
  • a common vertex is a vertex that is shared by two or more of the edges with the smallest weight. It should be noted that when there are multiple edges with the smallest weight in the weighted undirected graph generated in the generated weighted undirected graph, new edges cannot be generated separately for the two edges with common vertices. vertex.
  • each rule in the rule set is encoded to obtain a coding rule set.
  • the two vertices corresponding to the edge with the smallest weight in the weighted undirected graph are merged until the weight of the edge with the smallest weight in the newly generated weighted undirected graph satisfies certain conditions.
  • Each vertex in the newly generated weighted undirected graph corresponds to a sub-rule set.
  • the rule set is divided into a plurality of sub-rule sets.
  • an algorithm for generating a decision tree such as HyperCuts, obtains a decision that the probability of rule replication in multiple decision trees is lower than that of an algorithm that runs the same generated decision tree for the rule set.
  • the probability of rule replication in the tree Therefore, the technical solution provided by the embodiment of the present invention reduces the probability of occurrence of rule replication.
  • a plurality of sub-rule sets obtained by using the method provided by the embodiment of the present invention respectively run an algorithm for generating a decision tree, and the probability of rule replication in the plurality of decision trees is lower than an algorithm for running the same generated decision tree for the rule set.
  • the probability of a rule copy in a decision tree It can be determined from above that the number of rules included in each sub-rule set is smaller than the number of rules included in the rule set. Therefore, the algorithm that runs a certain decision tree generates a plurality of decision trees for each of the plurality of sub-rule sets, and the height of the plurality of decision trees is less than or equal to the height of a decision tree generated by the algorithm that runs the same decision tree for the rule set.
  • the lower the height of the decision tree the faster the execution rules match. Therefore, multiple decisions obtained by the method provided by the embodiment
  • the tree the speed of performing rule matching in parallel for multiple decision trees is higher than or equal to the speed at which rule matching is performed separately for a decision tree.
  • the bitwise operation is an AND operation, or an operation or an exclusive OR operation.
  • the first encoded segment is 1; if the character in the first segment is 0, the first encoded segment is 1; if the first segment is The character in the character is a wildcard, and the first code segment is 0.
  • the first encoding segment is 0; if the character in the first segment is 0, the first encoding segment is 0; if the first segment The character in the character is a wildcard, and the first code segment is 1.
  • the generating a decision tree for each of the sub-rule sets corresponding to each vertex in the newly generated weighted undirected graph includes:
  • a decision tree is generated for each of the sub-rule sets corresponding to each vertex in the newly generated weighted undirected graph by the HyperCuts algorithm.
  • the bitwise operation is an exclusive OR operation
  • the first segment includes one character
  • the non-wildcards and wildcards in the rule are respectively encoded as 1 and 0 as an example to specifically analyze the technical solution provided by the embodiment. That is, if the character in the first segment is 1, the first encoded segment is 1; if the character in the first segment is 0, the first encoded segment is 1; The character in the first segment is a wildcard, and the first encoded segment is 0. That is, the first fragment can be 1, 0 or a wildcard. For convenience of explanation, the following two cases of the first fragment and the first fragment being 0 are collectively referred to as the first fragment being a non-wildcard.
  • a decision tree is generated according to the HyperCuts algorithm for a plurality of sub-rule sets obtained through rule segmentation.
  • Rule segmentation involves rules, encoding rules, and bitwise XOR operations.
  • the following describes the relationship between rules, coding rules, and bitwise XOR operations:
  • the two rules are separately encoded to obtain two encoding rules.
  • two bits with the same position are encoded as 1 if they are all non-wildcards. That is to say, after performing the encoding operation on the two rules, the two bits in the corresponding positions in the two encoding rules are all 1. Therefore, a bitwise exclusive OR operation is performed on two bits of the corresponding positions in the two encoding rules, and the result is zero.
  • the two rules are separately encoded to obtain two encoding rules.
  • two bits with the same position if one is a non-wildcard and the other is a wildcard, are encoded as 1 and 0, respectively. That is to say, after performing the encoding operation on the two rules, the two bits of the corresponding positions in the two encoding rules are 1 and 0, respectively. Therefore, a bitwise XOR operation is performed on two bits of the corresponding positions in the two encoding rules, and the result is 1.
  • the two rules are separately encoded to obtain two encoding rules.
  • two bits with the same position, if they are all wildcards, are encoded as 0. That is to say, after performing the encoding operation on the two rules, the two bits in the corresponding positions in the two encoding rules are 0. Therefore, a bitwise XOR operation is performed on two bits of the corresponding positions in the two encoding rules, and the result is 0.
  • the first case is: Two of the two bits with the same coding rule position, one bit is a wildcard and one is a non-wildcard. If the bit position of the first case is selected as the reference position, rule copying occurs.
  • the weight of the edge of the weighted undirected graph is the number of bits in the bit string obtained by performing a bitwise XOR operation on the two encoding rules. That is, the weight reflects the number of times the first situation occurred. Obviously, the fewer the first case, the more the rule is copied and the more the number of reference positions can be. Selecting the above reference position for rule segmentation can avoid rule copying.
  • the bitwise operation is an exclusive OR operation
  • the first segment contains one character
  • the non-wildcards and wildcards in the rule are respectively encoded as 0 and 1 as an example to specifically analyze the technical solution provided by the embodiment. That is, if the character in the first segment is 1, the first encoded segment is 0; if the character in the first segment is 0, the first encoded segment is 0; The character in the first segment is a wildcard, and the first encoded segment is 1. That is, the first fragment can be 1, 0 or a wildcard. For convenience of explanation, the following two cases of the first fragment and the first fragment being 0 are collectively referred to as the first fragment being a non-wildcard.
  • a decision tree is generated according to the HyperCuts algorithm for a plurality of sub-rule sets obtained through rule segmentation.
  • Rule segmentation involves rules, coding rules And bitwise XOR operation. The following describes the relationship between rules, coding rules, and bitwise XOR operations:
  • the two rules are separately encoded to obtain two encoding rules.
  • two bits with the same position, if they are all non-wildcards, are encoded as 0. That is to say, after performing the encoding operation on the two rules, the two bits in the corresponding two encoding rules are 0. Therefore, a bitwise XOR operation is performed on two bits of the corresponding positions in the two encoding rules, and the result is 0.
  • the two rules are separately encoded to obtain two encoding rules.
  • two bits with the same position if one is a non-wildcard and the other is a wildcard, are encoded as 0 and 1, respectively. That is to say, after performing the encoding operation on the two rules, the two bits of the corresponding positions in the two encoding rules are 0 and 1, respectively. Therefore, a bitwise XOR operation is performed on two bits of the corresponding positions in the two encoding rules, and the result is 1.
  • the two rules are separately encoded to obtain two encoding rules.
  • two bits with the same position, if they are all wildcards, are coded as 1. That is to say, after performing the encoding operation on the two rules, the two bits in the corresponding positions in the two encoding rules are all 1. Therefore, a bitwise XOR operation is performed on two bits of the corresponding positions in the two encoding rules, and the result is 0.
  • the result is the number of times the second case occurs in the two rules.
  • the second case is: Two of the two bits with the same coding rule position, one bit is a wildcard and one is a non-wildcard. If the bit position of the second case is selected as the reference position, rule copying occurs.
  • the weight of the edge of the weighted undirected graph is the number of bits in the bit string obtained by performing a bitwise XOR operation on the two encoding rules. That is, the weight reflects the number of times the second situation occurs. Obviously, the fewer the second case, the more the rule is copied and the more the number of reference positions can be. Selecting the above reference position for rule segmentation can avoid rule copying.
  • a plurality of sub-rule sets obtained by using the method provided by the embodiment of the present invention respectively run an algorithm for generating a decision tree, and the probability of rule replication in the plurality of decision trees is lower than an algorithm for running the same generated decision tree for the rule set.
  • the probability of a rule copy in a decision tree It can be determined from above that the number of rules included in each sub-rule set is smaller than the number of rules included in the rule set. Therefore, the algorithm that runs a certain decision tree generates a plurality of decision trees for each of the plurality of sub-rule sets, and the height of the plurality of decision trees is less than or equal to the height of a decision tree generated by the algorithm that runs the same decision tree for the rule set.
  • the speed of performing rule matching in parallel for multiple decision trees is higher than or equal to the speed of performing rule matching separately for one decision tree.
  • HyperCuts is an algorithm for generating decision trees. HyperCuts builds a decision tree by dividing different segments of the rule into geometric spaces of different dimensions.
  • Figure 11a is a schematic diagram of the mapping of six rules in the rule set shown in Figure lib to a two-dimensional geometric space in decimal units. Specifically, the rule set shown in Fig. lib contains six rules, and the upper 4 bits and lower 4 bits of each rule are respectively mapped to the X axis and the Y axis of the two-dimensional geometric space shown in Fig. 11a.
  • Figure lib is a schematic diagram of a set of rules and a corresponding set of encoding rules.
  • the rule set includes rules A0, B0, C0, D0, E0, and F0. Each rule is 8 bits in length.
  • the rules A0, B0, C0, D0, E0, and F0 in the rule set correspond to eight, B, C, D, E, and F in the set of encoding rules, respectively.
  • the coding method is specifically as follows: the non-wildcard code is 1 and the wildcard code is 0.
  • FIG. 11c is a first weighted undirected graph according to an embodiment of the present invention.
  • the vertices in the first weighted undirected graph shown in Fig. 11c are A, B, C, D, E, and F, respectively.
  • Vertices A, B, C, D, E, and F correspond to a set of sub-rules, respectively.
  • the sub-rule set corresponding to vertex A contains rule A0.
  • the set of sub-rules corresponding to vertex A contains rule A0.
  • the set of sub-rules corresponding to vertex B contains rule B0.
  • the set of sub-rules corresponding to vertex C contains rule C0.
  • the set of sub-rules corresponding to the top point D contains the rule D0.
  • the set of sub-rules corresponding to vertex E contains rule E0.
  • the sub-rule set corresponding to the vertex F contains the rule F0.
  • the first weighted undirected graph contains six vertices, which are vertices A, B, C, D, E, and F, respectively. Among them, the vertices eight, B, C, D, E and F in Fig. 11c correspond to the coding rules A, B, C, D, E and F in Fig. lib, respectively.
  • the first threshold may be any one of 1 to 8. This embodiment determines the first threshold as 2.
  • the result of performing a bitwise XOR operation is a bit string, and the number of 1s in the bit string is the weight of the edge to which any two vertices are connected.
  • the corresponding coding rules for vertices A and C are 11101110 and 11101110, respectively.
  • Perform a bitwise XOR operation on the corresponding encoding rules of the vertices A and C The result is 00000000.
  • the number of 1s in the bit string is 0. That is to say, the edges of the vertices A and C are connected to each other with a weight of zero.
  • the edges of the vertices A and B are connected to each other with a weight of zero.
  • the edges of the vertex B and C connections have a weight of 0.
  • the weights of other edges are shown in Figure 11c, and are not described here. It can be determined from Fig. 11c that the edge of the smallest weight has a weight of zero. If the weight of the edge with the smallest weight is less than the first threshold, then the edge used to generate the new vertex needs to be determined.
  • the vertex, the C-connected edge, the vertex A, the B-connected edge, or the vertex B, and the C-connected edge can be randomly determined to be the edge for generating a new vertex.
  • the edge selected by the vertex and C is the edge for generating a new vertex
  • the weighted undirected graph shown in FIG. 12b is generated according to the new vertex A&C.
  • the weighted undirected graph shown in FIG. 12b is the last generated weighted undirected graph in the weighted undirected graph that has been generated.
  • the coding rule corresponding to the vertex A&C is a result of performing bitwise AND operation on two vertices for generating edges of the new vertex A&C, that is, the coding rules corresponding to the vertex A and the vertex C, respectively.
  • the result obtained by bit and operation is 11101110.
  • the vertices other than the vertices A&C in the weighted undirected graph shown in Fig. 12b are vertices in the weighted undirected graph shown in Fig. 1 lc, and specifically include vertices B, D, E, and F.
  • Fig. 12a For the coding rules corresponding to the five vertices in the weighted undirected graph shown in Fig. 12b, please refer to Fig. 12a.
  • the bitwise XOR operation is performed according to the coding rules corresponding to any two vertices in the weighted undirected graph shown in Fig. 12b.
  • the result of performing a bitwise XOR operation is a bit string, and the number of 1s in the bit string is the weight of the edge to which any two vertices are connected.
  • the edge of vertex A&C connected to vertex B has a weight of 0. See Figure 12b for the weights of the other edges in the weighted undirected graph shown in Figure 12b.
  • the edge with the smallest weight in the weighted undirected graph shown in Fig. 12b is the edge where the vertex A&C is connected to the vertex B.
  • the weight of the edge with the smallest weight is less than the first threshold. Determine the edge where the vertex A&C is connected to vertex B is the edge used to generate the new vertex.
  • the weighted undirected graph shown in Fig. 13b is generated on the basis of the vertices A&C&B and the weighted undirected graph shown in Fig. 12b. At this time, the weighted undirected graph shown in Fig. 13b is the last generated weighted undirected graph in the weighted undirected graph that has been generated.
  • the four vertices included in the weighted undirected graph shown in Fig. 13b are vertices A&C&B, D, E, and F, respectively.
  • the four encoding rules in the set of encoding rules shown in Figure 13a correspond to the four vertices in the weighted undirected graph shown in Figure 13b.
  • the vertices A&C and B of the weighted undirected graph shown in Fig. 12b are respectively subjected to a bitwise AND operation corresponding to the encoding rule, and the result of the bitwise AND operation is the encoding rule corresponding to the vertex A&C&B.
  • the line is XORed by bit.
  • the result of performing a bitwise XOR operation is a bit string, and the number of 1s in the bit string is the weight of the edge to which any two vertices are connected.
  • the edge of vertex E connected to vertex F has a weight of one. See Figure 13b for the weights of the other edges in the weighted undirected graph shown in Figure 13b.
  • the edge with the smallest weight in the weighted undirected graph shown in Fig. 13b is the edge where the vertex E is connected to the vertex F.
  • the weight of the edge with the smallest weight is less than the first threshold.
  • the edge connecting the vertex E to the vertex F is determined to be the edge for generating a new vertex.
  • a new vertex E&F is generated.
  • the weighted undirected graph shown in Fig. 14b is generated on the basis of the vertex E&F and the weighted undirected graph shown in Fig. 13b.
  • the three vertices included in the weighted undirected graph shown in Fig. 14b are vertices A&C&B, E&F, and D, respectively.
  • the three encoding rules in the set of encoding rules shown in Figure 14a correspond to the three vertices in the weighted undirected graph shown in Figure 14b.
  • the bitwise AND operation is performed on the two coding rules corresponding to the vertices E and F of the weighted undirected graph shown in Fig. 13b, and the result of the bitwise operation is the coding rule corresponding to the vertex E&F.
  • the bitwise XOR operation is performed according to the coding rules corresponding to any two vertices in the weighted undirected graph shown in Fig. 14b.
  • the result of performing a bitwise XOR operation is a bit string, and the number of 1s in the bit string is the weight of the edge to which any two vertices are connected.
  • the edge of vertex A&C&B connected to vertex D has a weight of 3. See Figure 14b for the weights of the other edges in the weighted undirected graph shown in Figure 14b.
  • the weighted undirected graph shown in Fig. 14b is the newly generated weighted undirected graph.
  • a decision tree is generated for each of the sub-rule sets corresponding to each vertex in the newly generated weighted undirected graph.
  • the newly generated weighted undirected graph consists of three vertices, vertex A&C&B, vertex E&F, and vertex D.
  • the sub-rule set corresponding to the vertex A&C&B includes rules A0, B0, and C0.
  • Figure 15a is a schematic diagram of the mapping of rules A0, B0, and CO to a two-dimensional geometric space measured in decimal.
  • Figure 15b is a schematic diagram of a decision tree generated for the set of sub-rules corresponding to the vertices A&C&B according to the HyperCuts algorithm. It can be seen that rule replication does not occur in rules A0, B0 and CO.
  • the sub-rule set corresponding to the vertex E&F includes rules E0 and F0.
  • Figure 16a is a schematic diagram of rules E0 and F0 mapped to a two-dimensional geometric space measured in decimal.
  • Figure 16b is a schematic diagram of a decision tree generated for a set of sub-rules corresponding to vertex E&F according to the HyperCuts algorithm. It can be seen that rule ⁇ 0 and F0 have no rule replication.
  • the sub-rule set corresponding to the vertex D includes the rule D0.
  • Figure 17a is a schematic diagram of a rule DO mapped to a two-dimensional geometric space measured in decimal.
  • Figure 17b is a schematic diagram of a decision tree generated for a set of sub-rules corresponding to vertex D according to the HyperCuts algorithm. It can be seen that the rule DO does not have rule replication.
  • Figure 18a is a diagram showing the mapping of multiple rules in the rule set shown in Figure lib to a two-dimensional geometric space in units of decimal.
  • Figure 18b is a diagram of a decision tree generated for the set of rules shown in Figure lib according to the HyperCuts algorithm.
  • FIG. 19 is a schematic structural diagram of a device for generating a decision tree according to an embodiment of the present invention.
  • the device can be implemented by the method provided in the first embodiment.
  • the apparatus includes: an encoding processing unit 191, configured to generate an encoding rule set according to a rule set; the rule set includes a plurality of rules, and each rule is a character string including 0, 1, or a wildcard.
  • the encoding rule set includes a plurality of encoding rules, and any two of the plurality of encoding rules are not equal to each other; each of the plurality of encoding rules is encoded
  • the rule corresponds to at least one of the plurality of rules, each of the plurality of rules corresponding to one of the plurality of coding rules;
  • the coding rule corresponding to the first rule is to the first rule according to the first function Encoding, the first rule is any one of the multiple rules, and the first function is used to replace multiple segments in the first rule with multiple encoded segments, thereby obtaining a first Encoding rules;
  • the first rule consists of the plurality of segments, each segment comprising at least one character;
  • the first encoding rule consists of the plurality of encoded segments, each The code segments are one bit, and the plurality of segments correspond to the plurality of code segments, the position of the first segment in the first rule and the position of the first code segment in the first coding rule Consistent;
  • a first weighted undirected graph generating unit 192 configured to generate a first weighted undirected graph according to the encoding rule set generated by the encoding processing unit 191;
  • the first weighted undirected graph includes a plurality of vertices, Corresponding to the plurality of coding rules, each of the plurality of vertices corresponds to a sub-rule set; the sub-rule set corresponding to the first vertex includes all and the first of the plurality of rules a rule corresponding to the encoding rule; the first vertex is any one of the first weighted undirected graphs;
  • the first encoding rule is an encoding corresponding to the first vertex of the plurality of encoding rules Rule
  • the edge weight calculation unit 193 is configured to calculate, according to the first weighted undirected graph generated by the first weighted undirected graph generating unit 192, each side of the first weighted undirected graph a weight; an edge connecting any two vertices in the first weighted undirected graph is an edge of the first weighted undirected graph; a weight of the first side is a second encoding rule and a third encoding rule a value of the second function obtained as a variable; the two vertices of the first side respectively correspond to the second encoding rule and the third encoding rule; the first side is the first weighted undirected graph Any one of the edges; the second function is configured to perform a bitwise operation on the second encoding rule and the third encoding rule, and calculate the number of 1 in the result of the bitwise operation, and the result of the bitwise operation is 1 The number of the second function is a value of the second function; the comparison unit 194 is configured to perform a comparison according to the calculation result of the edge weight calculation
  • the weighted undirected graph includes all of the vertices except the two vertices of the edge having the largest weight in the last generated weighted undirected graph in the generated weighted undirected graph;
  • the encoding rule corresponding to the vertex of the vertex is two of the edges with the largest weight in the last generated weighted undirected graph in the generated weighted undirected graph.
  • the fourth encoding rule and the fifth encoding rule respectively corresponding to the vertices are values of the third function obtained by the variable; the third function is configured to perform bitwise AND operation on the fourth encoding rule and the fifth encoding rule The result of the bit and operation is the value of the third function; the set of sub-rules corresponding to the new vertex includes the weighted maximum in the last generated weighted undirected graph in the generated weighted undirected graph The two vertices of the edge respectively correspond to all the rules in the set of sub-rules; the weight of the second side in the new weighted undirected graph is the fourth obtained by using the sixth encoding rule and the seventh encoding rule as variables a value of the function; the two vertices of the second side respectively correspond to the sixth encoding rule and the seventh encoding rule; the second side is any one of the new weighted undirected graphs; The fourth function is configured to perform a bitwise operation on the sixth encoding rule and the seventh encoding rule, and calculate
  • the decision tree generating unit 195 is configured to generate a decision tree for each sub-rule set corresponding to each vertex in the newly generated weighted undirected graph according to the trigger signal sent by the comparing unit 194.
  • the first weighted undirected graph is included in the generated weighted undirected graph, the first weighted undirected graph is the last generated strip in the generated weighted undirected graph. Undirected graph of rights.
  • the weighted undirected graph generated last in the generated weighted undirected graph may have only one of the largest weights, or may have more than one.
  • one edge may be randomly selected to generate a new vertex.
  • the sum of the number of rules included in the two sub-rule sets corresponding to the two vertices in each edge may be separately calculated, and two of the largest edges according to the multiple weights The edges with the largest number of rules included in the two sub-rule sets corresponding to the vertices respectively generate new vertices.
  • a new one may be generated separately.
  • Vertices (at least two) and generate new weighted undirected graphs based on new vertices (at least two).
  • a common vertex is a vertex that is shared by two or more of the edges with the largest weight. It should be noted that when there are multiple edges with the largest weight in the weighted undirected graph generated in the generated weighted undirected graph, new edges cannot be generated separately for the two edges with common vertices. vertex.
  • each rule in the rule set is encoded to obtain a coding rule set.
  • the two vertices corresponding to the edge with the largest weight in the weighted undirected graph are merged until the weight of the edge with the largest weight in the newly generated weighted undirected graph satisfies certain conditions.
  • Each vertex in the newly generated weighted undirected graph corresponds to a sub-rule set.
  • the rule set is divided into a plurality of sub-rule sets.
  • an algorithm for generating a decision tree such as Modular, obtains a probability that the rule replication in the plurality of decision trees is lower than that of the algorithm that runs the same generation decision tree for the rule set.
  • the probability of rule replication in the tree Therefore, the technical solution provided by the embodiment of the present invention reduces the probability of occurrence of rule replication.
  • a plurality of sub-rule sets obtained by using the apparatus provided by the embodiment of the present invention respectively run an algorithm for generating a decision tree, and the probability of rule replication in the plurality of decision trees obtained is lower than an algorithm for running the same generated decision tree for the rule set.
  • the probability of a rule copy in a decision tree It can be determined from above that each sub-rule set contains fewer rules than the rule set contains. Therefore, the algorithm that runs a certain decision tree generates a plurality of decision trees for each of the plurality of sub-rule sets, and the height of the plurality of decision trees is less than or equal to the height of a decision tree generated by the algorithm that runs the same decision tree for the rule set.
  • the speed of performing rule matching in parallel for multiple decision trees is higher than or equal to the speed of performing rule matching separately for a decision tree.
  • the first encoded segment is 1; if the character in the first segment is 0, the first encoded segment is 1; if the first segment is The character in the character is a wildcard, and the first code segment is 0.
  • the bitwise operation can be an AND operation, or an operation or an exclusive OR operation.
  • the decision tree generating unit is configured to separately generate a decision tree for the sub-rule set corresponding to each vertex in the newly generated weighted undirected graph by a Modular algorithm.
  • the bitwise operation is an AND operation
  • the first segment contains one character as The specific analysis of the technical solution provided by this embodiment is performed. That is, if the character in the first segment is 1, the first encoded segment is 1; if the character in the first segment is 0, the first encoded segment is 1; The character in the segment is a wildcard, and the first encoded segment is 0. That is, the first segment can be 1, 0 or a wildcard.
  • the following two cases of the first segment being 1 and the first segment being 0 are collectively referred to as the first segment being a non-wildcard.
  • the first function is for encoding the first segment as the first encoded segment.
  • the non-wildcards in the rule are encoded as 1 for each rule in the rule set; the wildcard in the rule is encoded as 0.
  • a bit string obtained by performing a bitwise AND operation on two encoding rules if the bit at a certain position in the bit string is 1, it indicates that both bits in the corresponding positions in the two rules are non-wildcards. With this position as the reference position, rule copying does not occur. If the bit at a certain position in the bit string is 0, it means that at least one of the two bits of the corresponding position in the two rules is a wildcard. When this position is used as the reference position, rule copying is inevitable.
  • the weight of the edge of the weighted undirected graph is the number of bits in the bit string obtained by performing bitwise AND operations on the two encoding rules. Therefore, the larger the weight, the more the number of positions where the rule copy does not occur and which can be used as the reference position in the two coding rules. Selecting the above location as the base location avoids rule duplication.
  • FIG. 20 is a schematic structural diagram of another apparatus for generating a decision tree according to an embodiment of the present invention.
  • the device can be implemented by the method provided in the first embodiment.
  • the apparatus includes: an encoding processing unit 201, configured to generate an encoding rule set according to a rule set; the rule set includes multiple rules, and each rule is a character string including 0, 1, or a wildcard.
  • the encoding rule set includes a plurality of encoding rules, and any two of the plurality of encoding rules are not equal to each other; each of the plurality of encoding rules is encoded
  • the rule corresponds to at least one of the plurality of rules, each of the plurality of rules corresponding to one of the plurality of coding rules;
  • the coding rule corresponding to the first rule is to the first rule according to the first function Encoding, the first rule is any one of the multiple rules, and the first function is used to replace multiple segments in the first rule with multiple encoded segments, thereby obtaining a first Encoding rules;
  • the first rule consists of the plurality of segments, each segment comprising at least one character;
  • the first encoding rule consists of the plurality of encoded segments, each The coded segment is a bit, and the plurality of segments correspond to the plurality of coded segments, the position of the first segment in the first rule and the position of the first code segment in the first coding rule
  • the first segment is a bit
  • a first weighted undirected graph generating unit 202 configured to generate a first weighted undirected graph according to the encoding rule set generated by the encoding processing unit 201;
  • the first weighted undirected graph includes a plurality of vertices, Corresponding to the plurality of coding rules, each of the plurality of vertices corresponds to a sub-rule set; the sub-rule set corresponding to the first vertex includes all and the first of the plurality of rules a rule corresponding to the encoding rule; the first vertex is any one of the first weighted undirected graphs;
  • the first encoding rule is an encoding corresponding to the first vertex of the plurality of encoding rules Rule
  • the edge weight calculation unit 203 is configured to calculate, according to the first weighted undirected graph generated by the first weighted undirected graph generating unit 202, each side of the first weighted undirected graph a weight; an edge connecting any two vertices in the first weighted undirected graph is an edge of the first weighted undirected graph; a weight of the first side is a second encoding rule and a third encoding rule a value of the second function obtained as a variable; the two vertices of the first side respectively correspond to the second encoding rule and the third encoding rule; the first side is the first weighted undirected graph Any one of the edges; the second function is configured to perform a bitwise operation on the second encoding rule and the third encoding rule, and calculate the number of 1 in the result of the bitwise operation, and the result of the bitwise operation is 1 The number of the second function is a value of the second function; the comparing unit 204 is configured to perform a comparison according to the calculation result of the edge weight
  • the encoding rule corresponding to the vertex of the vertex is that the fourth encoding rule and the fifth encoding rule respectively corresponding to the two vertices of the edge having the smallest weight in the weighted undirected graph generated in the generated weighted undirected graph are a value of the third function obtained by the variable; the third function is configured to perform a bitwise AND operation on the fourth encoding rule and the fifth encoding rule, and the result of the bitwise operation is the value of the third function
  • the sub-rule set corresponding to the new vertex includes all the sub-rule sets corresponding
  • the decision tree generating unit 205 is configured to generate a decision tree for each sub-rule set corresponding to each vertex in the newly generated weighted undirected graph according to the trigger signal sent by the comparing unit 204.
  • the first weighted undirected graph is included in the weighted undirected graph that has been generated, the first weighted undirected graph is the last generated weighted weight in the generated weighted undirected graph. To the picture.
  • one edge may be randomly selected to generate a new vertex.
  • the sum of the number of rules included in the two sub-rule sets corresponding to the two vertices in each edge may be separately calculated, and two of the minimum edges according to the multiple weights Two corresponding vertices The edge with the largest sum of the rules of the sub-rule set generates new vertices.
  • a new one may be generated separately.
  • Vertices (at least two) and generate new weighted undirected graphs based on new vertices (at least two).
  • a common vertex is a vertex that is shared by two or more of the edges with the smallest weight. It should be noted that when there are multiple edges with the smallest weight in the weighted undirected graph generated in the generated weighted undirected graph, new edges cannot be generated separately for the two edges with common vertices. vertex.
  • each rule in the rule set is encoded to obtain a coding rule set.
  • the two vertices corresponding to the edge with the smallest weight in the weighted undirected graph are merged until the weight of the edge with the smallest weight in the newly generated weighted undirected graph satisfies certain conditions.
  • Each vertex in the newly generated weighted undirected graph corresponds to a sub-rule set.
  • the rule set is divided into a plurality of sub-rule sets.
  • an algorithm for generating a decision tree such as Modular, obtains a probability that the rule replication in the plurality of decision trees is lower than that of the algorithm that runs the same generation decision tree for the rule set.
  • the probability of rule replication in the tree Therefore, the technical solution provided by the embodiment of the present invention reduces the probability of occurrence of rule replication.
  • a plurality of sub-rule sets obtained by using the apparatus provided by the embodiment of the present invention respectively run an algorithm for generating a decision tree, and the probability of rule replication in the plurality of decision trees obtained is lower than an algorithm for running the same generated decision tree for the rule set.
  • the probability of a rule copy in a decision tree It can be determined from above that each sub-rule set contains fewer rules than the rule set contains. Therefore, the algorithm that runs a certain decision tree generates a plurality of decision trees for each of the plurality of sub-rule sets, and the height of the plurality of decision trees is less than or equal to the height of a decision tree generated by the algorithm that runs the same decision tree for the rule set.
  • the speed of performing rule matching in parallel for multiple decision trees is higher than or equal to the speed of performing rule matching separately for a decision tree.
  • the bitwise operation is an AND operation, or an operation or an exclusive OR operation.
  • the first encoded segment is 0; The character in the first segment is 0, then the first encoded segment is 0; if the character in the first segment is a wildcard, the first encoded segment is 1.
  • the decision tree generating unit is configured to separately generate a decision tree for the sub-rule set corresponding to each vertex in the newly generated weighted undirected graph by a Modular algorithm.
  • the bitwise operation is ORed, and the first segment includes a character as an example to specifically analyze the technical solution provided by the embodiment. That is, if the character in the first segment is 1, the first encoded segment is 0; if the character in the first segment is 0, the first encoded segment is 0; The character in the segment is a wildcard, and the first encoded segment is 1. That is, the first fragment can be 1, 0 or a wildcard. For convenience of explanation, the following two cases of the first segment being 1 and the first segment being 0 are collectively referred to as the first segment being a non-wildcard.
  • the first function is for encoding the first segment as the first encoded segment.
  • the non-wildcards in the rule are encoded as 0 for each rule in the rule set; the wildcard in the rule is encoded as 1.
  • bit string obtained by performing a bitwise OR operation on two coding rules if a bit at a certain position in the bit string is 1, it indicates that at least one of the two bits located in the corresponding position in the two rules is a wildcard. When this position is used as the base position, rule copying occurs. If the bit at a location in the bit string is 0, then both bits in the corresponding position in both rules are non-wildcards. With this position as the reference position, rule copying does not occur.
  • the weight of the edge of the weighted undirected graph is the number of bits in the bit string obtained by performing a bitwise OR operation on the two encoding rules. Therefore, the smaller the weight, the more the number of positions where the rule copy does not occur and which can be used as the reference position among the two coding rules. Selecting the above location as the base location avoids rule duplication.
  • each rule in the rule set contains a plurality of segments, and each segment includes a 1-character scene.
  • the device shown in Figure 22 includes:
  • the encoding processing unit 221 is configured to generate an encoding rule set according to the rule set; the rule set includes multiple rules, each rule is a character string including 0, 1, or a wildcard, and any two of the multiple rules are not mutually Equal; the encoding rule set includes a plurality of encoding rules, Each of the plurality of coding rules is unequal to each other; each of the plurality of coding rules corresponds to at least one of the plurality of rules, and each of the plurality of rules corresponds to the plurality of rules One of the encoding rules; the encoding rule corresponding to the first rule is obtained by encoding the first rule according to the first function, the first rule is any one of the plurality of rules, the first function For replacing a plurality of segments in the first rule with a plurality of code segments, thereby obtaining a first coding rule; the first rule is composed of the plurality of segments, each segment including at least one character; The first encoding rule is composed of the plurality of encoding segments, each en
  • Figure 3a is a schematic diagram of a set of rules and a corresponding set of encoding rules.
  • the rules al0, a20, a30, a40, a50, and a60 correspond to the coding rules al, a2, a3, a4, a5, and a6, respectively.
  • Figure 3a uses the encoding method of encoding "*" in the rule to 0 and non-"*" in the rule to 1.
  • the "*” is a wildcard.
  • the meaning of "*” in this embodiment is that the value of the bit in the rule can be 0 or 1.
  • a first weighted undirected graph generating unit 222 configured to generate a first weighted undirected graph according to the encoding rule set generated by the encoding processing unit 221;
  • the first weighted undirected graph includes a plurality of vertices, Corresponding to the plurality of coding rules, each of the plurality of vertices corresponds to a sub-rule set; the sub-rule set corresponding to the first vertex includes all and the first of the plurality of rules a rule corresponding to the encoding rule; the first vertex is any one of the first weighted undirected graphs; the first encoding rule is an encoding corresponding to the first vertex of the plurality of encoding rules rule.
  • the edge weight calculation unit 223 is configured to calculate, according to the first weighted undirected graph generated by the first weighted undirected graph generating unit 222, each side of the first weighted undirected graph Weight An edge of any two vertices in the first weighted undirected graph is an edge of the first weighted undirected graph; a weight of the first side is obtained by using a second encoding rule and a third encoding rule as variables a value of the second function; the two vertices of the first side respectively correspond to the second encoding rule and the third encoding rule; the first side is any one of the first weighted undirected graphs
  • the second function is configured to perform a bitwise operation on the second encoding rule and the third encoding rule, and calculate the number of 1 in the result of the bit operation, and the number of 1 in the result of the bit operation Is the value of the second function.
  • Figure 3b is a first weighted undirected graph generated based on the set of encoding rules shown in Figure 3a.
  • the coding rules al, a2, a3, a4, a5, and a6 correspond to the vertices al, a2, a3, a4, a5, and a6 in the first weighted undirected graph, respectively.
  • the first weighted undirected graph is a weighted undirected graph generated for the first time.
  • each vertex corresponds to an encoding rule; the weights of the edges connecting the two vertices on either side can be bitwisely performed by the execution of the encoding rules corresponding to the two vertices. The operation is obtained.
  • the result of bit and operation is a bit string.
  • the weight of the edge is the number of 1s in the result of the bitwise AND operation. Performing a bitwise AND operation on the encoding rule corresponding to the two vertices respectively connected to any one of the edges to obtain a first operation result; and counting the number of 1s in the first operation result.
  • the vertices al and a3 of the first weighted undirected graph correspond to coding rules al and a3, respectively.
  • the bitwise AND operation is performed on the encoding rules al and a3, and the number of 1s in the obtained result is 4. Therefore, the weight of the edge connecting the vertices al and a3 in the first weighted undirected graph is 4.
  • the edge with the largest weight in the first weighted undirected graph is the edge connecting the vertices al and a3.
  • a first comparison unit 224 configured to perform comparison according to the calculation result of the edge weight calculation unit 223, if the weight of the edge with the largest weight in the first weighted undirected graph is greater than the first threshold, the first threshold X is an integer greater than or equal to 0 and less than or equal to X-1, X is the number of bits in the first encoding rule, and if so, a trigger signal is sent to the new vertex generating unit 225; otherwise, to the decision tree generating unit 228 Send a trigger signal.
  • the first threshold is an integer between 0 and 7 in decimal.
  • the non-wildcards in the rules are encoded as one for each rule in the rule set.
  • the wildcard in the rule is encoded as 0.
  • bit string obtained by performing the bitwise AND operation on the two coding rules if the bit of a certain position in the bit string is 1, it indicates that both bits in the corresponding positions in the two rules are non-wildcards. With this position as the reference position, rule copying does not occur. If the bit at a certain position in the bit string is 0, it indicates that at least one of the two bits of the corresponding position in the two rules is a wildcard. When this position is used as the reference position, rule copying is inevitable.
  • the weight of the edge of the weighted undirected graph is the number of bits in the bit string obtained by performing the bitwise AND operation on the two encoding rules. Therefore, the larger the weight, the more the number of positions in which the rule is copied and the position that can be used as the reference position does not occur in the two coding rules. Selecting the above location as the reference location avoids rule duplication.
  • the edges of the vertices a1 and a3 are the largest.
  • the vertices a1 and a3 correspond to the encoding rules a1 and a3, respectively.
  • the number of positions where the rule copying does not occur in the encoding rules a1 and a3 and which can be used as the reference position is the largest, and is four. By selecting any of the above four positions as the reference position, rule copying can be avoided.
  • the vertices al and a3 correspond to two sub-rule sets, respectively, and the sub-rule set corresponding to the vertex al contains the rule al0.
  • the sub-rule set corresponding to the vertex a3 contains the rule a30.
  • the sub-rule set corresponding to the vertex al&a3 contains the rules alO and a30.
  • a total of four four positions can be used as the reference position where the rule copy does not occur. Therefore, when the algorithm that generates the decision tree is run against the set of sub-rules corresponding to the vertex al&a3, such as Modular, and the decision tree is generated, the probability of rule replication is low.
  • edges of the edges connected by the vertices a3 and a4 are also 4. Therefore, the edges to which the vertices a3 and a4 are connected are also the edges with the largest weight in the first weighted undirected graph.
  • One of the edges to which the vertices a3 and a4 are connected and the side to which the vertices a3 and al are connected may be randomly selected as the to-be-edged edge. For example, in the embodiment of the present invention, it is determined that the edge to which the vertex a3 and al are connected is the edge for generating a new vertex.
  • a new vertex generating unit 225 configured to generate a new vertex according to the edge with the largest weight in the last generated weighted undirected graph in the generated weighted undirected graph, and generate a new weight according to the new vertex An undirected graph;
  • the new weighted undirected graph includes the new vertex and the two vertices of the edge with the largest weight in the last generated weighted undirected graph in the generated weighted undirected graph All the vertices of the outer vertices;
  • the coding rule corresponding to the new vertices is respectively corresponding to the two vertices of the edge with the largest weight in the weighted undirected graph generated in the generated weighted undirected graph
  • the fourth encoding rule and the fifth encoding rule are values of the third function obtained by the variable;
  • the third function is configured to perform bitwise AND operation on the fourth encoding rule and the fifth encoding rule, by bit and operation The result is the value of the third function;
  • the fourth function is configured to perform a bitwise operation on the sixth encoding rule and the seventh encoding rule, and calculate a result of the bitwise operation.
  • the number of 1s in the result of the bit operation is the value of the fourth function.
  • the weighted undirected graph shown in Fig. 3b is the last generated weighted undirected graph in the generated weighted undirected graph. Performing a bitwise AND operation on the coding rules corresponding to the vertices a1 and a3 in the weighted undirected graph shown in FIG. 3b, and obtaining the coding rule corresponding to the vertex a&a3 in the weighted undirected graph shown in FIG. 4b, thereby obtaining The updated set of encoding rules shown in Figure 4a.
  • the weighted undirected graph shown in Fig. 4b is the newly generated weighted undirected graph. Calculate the weights of the edges in the weighted undirected graph as shown in Figure 4b. The calculation method is the same as above, and will not be described here.
  • the second comparison unit 226 is configured to determine whether the weight of the edge with the largest weight in the newly generated weighted undirected graph is less than or equal to the first threshold, and if yes, send a trigger signal to the new vertex generating unit 225. Otherwise, a trigger signal is sent to the decision tree generation unit 228.
  • the first threshold is preset to be equal to 1.
  • the weighted undirected graph shown in Fig. 4b is the last generated weighted undirected graph in the generated weighted undirected graph.
  • the edge with the largest weight in the weighted undirected graph shown in Fig. 4b is the edge where the vertices a2 and a5 are connected.
  • the edges of the vertices a2 and a5 are bounded by a weight of 3. That is to say, the weight of the edge with the most weight is greater than the first threshold.
  • the bitwise AND operation is performed on the coding rules corresponding to the vertices a2 and a5, respectively, and the result of the bit and operation is the coding rule corresponding to the new vertex.
  • FIG. 5b is the newly generated weighted undirected graph.
  • the plurality of vertices included in the weighted undirected graph shown in Fig. 5b correspond to a plurality of encoding rules included in the encoding rule set shown in Fig. 5a.
  • the weights of the edges in the weighted undirected graph shown in Figure 5b are calculated according to the encoding rules shown in Figure 5a.
  • Fig. 5b is the last generated weighted undirected graph in the weighted undirected graph that has been generated.
  • the weight of the edge with the largest weight in the weighted undirected graph shown in Figure 5b has a weight of two. That is to say, the weight of the edge with the largest weight is greater than the first threshold.
  • the bitwise AND operation is performed on the coding rules corresponding to the vertices al&a2 and a4, respectively, and the result of the bit and operation is the coding rule corresponding to the new vertex.
  • the vertex al&a2 in Figure 5b with the new vertex A4 the weighted undirected graph shown in Fig. 6b is obtained.
  • FIG. 6b is the newly generated weighted undirected graph.
  • the three vertices included in the weighted undirected graph shown in Fig. 6b correspond to the three encoding rules included in the encoding rule set shown in Fig. 6a.
  • the weights of the sides in the weighted undirected graph shown in Fig. 6b are calculated according to the encoding rules shown in Fig. 6a.
  • the weighted edge of the weighted undirected graph shown in Fig. 6b has a weight of 1 , that is, less than or equal to the first threshold. Therefore, the segmentation of the rule set has been completed so far.
  • the sub-rule set corresponding to each vertex in the weighted undirected graph shown in Fig. 6b is a plurality of sub-rule sets after the rule set shown in Fig. 3a is segmented.
  • the decision tree generating unit 228 is configured to separately generate a decision tree for the sub-rule set corresponding to each vertex in the newly generated weighted undirected graph.
  • the vertices of the weighted undirected graph shown in Fig. 7b are two, which are the vertices al&a3&a4&a6 and the vertices a2&a5.
  • the vertices al&a3&a4&a6 and a2&a5 correspond to two sub-rule sets, respectively. That is to say, the rule set shown in Fig. 3a is divided into two sub-rule sets. Specifically, as shown in Figure 7c.
  • the sub-rule set corresponding to the vertex a&a3&a4&a6 includes: rules a0, a30, a40, and a60, that is, ⁇ alO, a30, a40, a60 ⁇ ;
  • the sub-rule set corresponding to the vertex a2&a5 includes: rules a20 and a50, that is, ⁇ a20, a50 ⁇ .
  • the decision tree is generated separately according to Modular's two sub-rule sets as shown in Figure 7c.
  • the present embodiment divides the rule set shown in FIG. 3a into two sub-rule sets. For details, refer to FIG. 7c.
  • a decision tree is generated for each of the two sub-rule sets shown in Figure 7c by the Modular algorithm. See the decision tree shown in Figure 8a and the decision tree shown in Figure 8b for details.
  • the decision tree shown in Figure 9 is a decision tree generated by the Modular algorithm for the set of rules shown in Figure 3a.
  • the position indicated by the broken line in Fig. 9 is the reference position.
  • the rule with the reference position of 0 is placed on the lower level node on the left side; the rule with the reference position of 1 is placed on the lower level node on the right side; the rule with the reference position of * is placed on the left side.
  • the next level node on the side is also placed on the next level node on the right side.
  • the decision tree shown in FIG. 9 has a high height and a high probability of occurrence of rule copying.
  • the rules alo, a20, a30, and a40 all involve rule replication.
  • the height of the decision tree is low, and the probability of occurrence of rule copying is low.
  • the apparatus provided in this embodiment divides the rule set into multiple sub-rule sets, and runs an algorithm for generating a decision tree to generate a decision tree for each sub-rule set.
  • a rule set contains multiple rules.
  • the rule contained in the rule set can be the destination internet protocol address.
  • the destination internet protocol address can be 32 bits or 128 bits.
  • the network device receives the internet protocol packet.
  • the flow classifier can parse the received internet protocol packet to obtain the destination internet protocol address of the internet protocol packet.
  • the flow classifier can perform rule matching in parallel in multiple decision trees according to the destination internet protocol address. If a rule matching the destination internet protocol address of the internet protocol packet is found in a plurality of decision trees, an action corresponding to the rule is performed.
  • the device for performing a matching search for each decision tree is the same as the prior art, and details are not described herein again.
  • the rule with the highest priority is determined as the final matching rule with the Internet Protocol packet according to the priority of each rule, and the final matching is performed on the Internet Protocol packet.
  • Actions corresponding to rules such as: pass, drop, perform traffic restriction, perform bandwidth guarantee, and so on.
  • the apparatus provided in this embodiment may perform the splitting process only on the newly added rule. For example, the same coding method is used to encode the newly added rule, and a new coding rule corresponding to the newly added rule is obtained, and the weighting rule is generated based on the newly added coding rule and the original rule set corresponding to the original rule set. To the picture. The rule based on the weighted undirected graph determines whether the newly added rule belongs to one of the plurality of sub-rule sets corresponding to the original rule set, or belongs to a new sub-rule set.
  • the sub-rule set to which the newly added rule belongs can be updated and the decision tree is regenerated; if it is the latter, a decision tree can be generated based on the new sub-rule set. It can be seen that, in the apparatus provided in this embodiment, when a new rule is added to the rule set, it is not necessary to regenerate the decision tree for all the sub-rule sets that are divided into the original rule set, but only for the new rule.
  • the sub-rule set belongs to regenerate the decision tree, or the new sub-rule set corresponding to the new rule generates a decision tree.
  • FIG. 21 is a schematic structural diagram of a device for generating a decision tree according to an embodiment of the present invention.
  • the device can be implemented by the method provided in the second embodiment.
  • the apparatus includes: an encoding processing unit 211, configured to generate an encoding rule set according to a rule set; the rule set includes multiple rules, and each rule is a character string including 0, 1, or a wildcard.
  • the encoding rule set includes a plurality of encoding rules, and any two of the plurality of encoding rules are not equal to each other; each of the plurality of encoding rules
  • the code rule corresponds to at least one of the plurality of rules, each of the plurality of rules corresponding to one of the plurality of coding rules; the coding rule corresponding to the first rule is to the first according to the first function If the rule is encoded, the first rule is any one of the multiple rules, and the first function is used to replace multiple segments in the first rule with multiple code segments, thereby obtaining a An encoding rule; the first rule is composed of the plurality of segments, each segment includes at least one character; the first encoding rule is composed of the plurality of encoding segments, each encoding segment is one bit, Corresponding to the plurality of code segments, the position of the first segment in the first rule is consistent with the position of the first code segment in the first coding rule; Determining any one
  • a first weighted undirected graph generating unit 212 configured to generate a first weighted undirected graph according to the encoding rule set generated by the encoding processing unit 211;
  • the first weighted undirected graph includes a plurality of vertices, Corresponding to the plurality of coding rules, each of the plurality of vertices corresponds to a sub-rule set;
  • the sub-rule set corresponding to the first vertex includes all and the first of the plurality of rules a rule corresponding to the encoding rule;
  • the first vertex is any one of the first weighted undirected graphs;
  • the first encoding rule is an encoding corresponding to the first vertex of the plurality of encoding rules Rule
  • the edge weight calculation unit 213 is configured to calculate, according to the first weighted undirected graph generated by the first weighted undirected graph generating unit 212, each side of the first weighted undirected graph a weight; an edge connecting any two vertices in the first weighted undirected graph is an edge of the first weighted undirected graph; a weight of the first side is a second encoding rule and a third encoding rule a value of the second function obtained as a variable; the two vertices of the first side respectively correspond to the second encoding rule and the third encoding rule; the first side is the first weighted undirected graph Any one of the edges; the second function is configured to perform a bitwise operation on the second encoding rule and the third encoding rule, and calculate the number of 1 in the result of the bitwise operation, and the result of the bitwise operation is 1 The number of the second function is the value of the second function; The comparison unit 214 is configured to perform comparison according to the calculation result of the edge weight calculation unit 213.
  • the loop performs the following operations until the weight of the edge with the smallest weight in the newly generated weighted undirected graph is greater than Or equal to the first threshold, sending a trigger signal to the decision tree generating unit 215 to generate a new vertex according to the edge with the smallest weight in the last generated weighted undirected graph in the generated weighted undirected graph, and according to the Generating a new weighted undirected graph;
  • the new weighted undirected graph includes the new vertex and the last generated weighted undirected graph in the generated weighted undirected graph All vertices other than the two vertices of the smallest value;
  • the encoding rule corresponding to the new vertice is the least weighted in the last generated weighted undirected graph in the generated weighted un
  • the decision tree generating unit 215 is configured to generate a decision tree for each sub-rule set corresponding to each vertex in the newly generated weighted undirected graph according to the trigger signal sent by the comparing unit 214.
  • the new weighted undirected graph includes all of the new vertices and the two vertices of the edge having the smallest weight in the last generated weighted undirected graph in the generated weighted undirected graph vertex.
  • the encoding rule corresponding to the new vertex is a fourth encoding rule corresponding to two vertices of the edge with the smallest weight in the weighted undirected graph generated in the generated weighted undirected graph.
  • the fifth encoding rule is the value of the third function obtained by the variable.
  • the third function is configured to perform a bitwise AND operation on the fourth encoding rule and the fifth encoding rule, and the result of the bitwise AND operation is a value of the third function.
  • the sub-rule set corresponding to the new vertex includes all rules in the sub-rule set corresponding to the two vertices of the edge with the smallest weight in the last generated weighted undirected graph in the generated weighted undirected graph .
  • the weight of the second side in the new weighted undirected graph is the value of the fourth function obtained by using the sixth encoding rule and the seventh encoding rule as variables.
  • the two vertices of the second side respectively correspond to the sixth coding rule and the seventh coding rule.
  • the second side is any one of the new weighted undirected graphs.
  • the fourth function is configured to perform a bitwise operation on the sixth encoding rule and the seventh encoding rule, and calculate the number of 1 in the result of the bit operation, and the number of 1 in the result of the bit operation is The value of the fourth function.
  • the bitwise operation is an AND operation, or an operation or an exclusive OR operation or the like.
  • the first weighted undirected graph is included in the generated weighted undirected graph, the first weighted undirected graph is the last generated strip in the generated weighted undirected graph. Undirected graph of rights.
  • the least weighted edge in the weighted undirected graph generated in the generated weighted undirected graph may have only one or more.
  • one edge may be randomly selected to generate a new vertex.
  • the sum of the number of rules included in the two sub-rule sets corresponding to the two vertices in each edge may be separately calculated, and two of the minimum edges according to the multiple weights The edges with the largest number of rules included in the two sub-rule sets corresponding to the vertices respectively generate new vertices.
  • a new one may be generated separately.
  • Vertices (at least two) and generate new weighted undirected graphs based on new vertices (at least two).
  • a common vertex is a vertex that is shared by two or more of the edges with the smallest weight. It should be noted that when there are multiple edges with the smallest weight in the weighted undirected graph generated in the generated weighted undirected graph, new edges cannot be generated separately for the two edges with common vertices. vertex.
  • each rule in the rule set is encoded to obtain a coding rule set.
  • the two vertices corresponding to the edge with the smallest weight in the weighted undirected graph are merged until the weight of the edge with the smallest weight in the newly generated weighted undirected graph satisfies certain conditions.
  • Each of the newly generated weighted undirected graphs The vertices correspond to a set of sub-rules.
  • the rule set is divided into a plurality of sub-rule sets.
  • an algorithm for generating a decision tree such as HyperCuts, is obtained, and the probability of rule replication in multiple decision trees is lower than that obtained by running the same decision tree for the rule set.
  • the probability of rule replication in the tree Therefore, the technical solution provided by the embodiment of the present invention reduces the probability of occurrence of rule replication.
  • a plurality of sub-rule sets obtained by using the apparatus provided by the embodiment of the present invention respectively run an algorithm for generating a decision tree, and the probability of rule replication in the plurality of decision trees obtained is lower than an algorithm for running the same generated decision tree for the rule set.
  • the probability of a rule copy in a decision tree It can be determined from above that each sub-rule set contains fewer rules than the rule set contains. Therefore, the algorithm that runs a certain decision tree generates a plurality of decision trees for each of the plurality of sub-rule sets, and the height of the plurality of decision trees is less than or equal to the height of a decision tree generated by the algorithm that runs the same decision tree for the rule set.
  • the speed of performing rule matching in parallel for multiple decision trees is higher than or equal to the speed of performing rule matching separately for a decision tree.
  • the bitwise operation can be an AND operation, or an operation or an exclusive OR operation.
  • the first encoded segment is 1; if the character in the first segment is 0, the first encoded segment is 1; if the first segment is The character in the character is a wildcard, and the first code segment is 0.
  • the decision tree generating unit 215 is configured to separately generate a decision tree for the sub-rule set corresponding to each vertex in the newly generated weighted undirected graph by using a HyperCuts algorithm.
  • the bitwise operation is an exclusive OR operation
  • the first segment includes one character
  • the non-wildcards and wildcards in the rule are respectively encoded as 1 and 0 as an example to specifically analyze the technical solution provided by the embodiment. That is, if the character in the first segment is 1, the first encoded segment is 1; if the character in the first segment is 0, the first encoded segment is 1; The character in the first segment is a wildcard, and the first encoded segment is 0. That is, the first segment can be 1, 0 or a wildcard.
  • the following two cases of the first segment being 1 and the first segment being 0 are collectively referred to as the first segment being a non-wildcard.
  • a decision tree is generated according to the HyperCuts algorithm for a plurality of sub-rule sets obtained through rule segmentation.
  • Rule segmentation involves rules, encoding rules, and bitwise XOR operations. The following is a description of the rules involved in rule segmentation, the encoding rules, and the relationship between specificity or operation:
  • the two rules are separately encoded to obtain two encoding rules.
  • two bits with the same position, if they are all non-wildcards, are coded as 1. That is to say, after performing the encoding operation on the two rules, the two bits in the corresponding two encoding rules are 1 . Therefore, a bitwise XOR operation is performed on two bits of the corresponding positions in the two encoding rules, and the result is 0.
  • the two rules are separately encoded to obtain two encoding rules.
  • two bits with the same position if one is a non-wildcard and the other is a wildcard, are encoded as 1 and 0, respectively. That is to say, after performing the encoding operation on the two rules, the two bits of the corresponding positions in the two encoding rules are 1 and 0, respectively. Therefore, a bitwise XOR operation is performed on two bits of the corresponding positions in the two encoding rules, and the result is 1.
  • the two rules are separately encoded to obtain two encoding rules.
  • two bits with the same position, if they are all wildcards, are encoded as 0. That is to say, after performing the encoding operation on the two rules, the two bits in the corresponding positions in the two encoding rules are 0. Therefore, a bitwise XOR operation is performed on two bits of the corresponding positions in the two encoding rules, and the result is 0.
  • the first case is: Two of the two bits with the same coding rule position, one bit is a wildcard and one is a non-wildcard. If the bit position of the first case is selected as the reference position, rule copying occurs.
  • the weight of the edge of the weighted undirected graph is the number of bits in the bit string obtained by performing a bitwise XOR operation on the two encoding rules. That is, the weight reflects the number of times the first situation occurred. Obviously, the fewer the first case, the more the rule is copied and the more the number of reference positions can be. Selecting the above reference position for rule segmentation can avoid rule copying.
  • the bitwise operation is an exclusive OR operation
  • the first segment includes one character
  • the non-wildcards and wildcards in the rule are respectively encoded as 0 and 1 as an example to specifically analyze the technical solution provided by the embodiment. That is, if the character in the first segment is 1, The first coded segment is 0; if the character in the first segment is 0, the first coded segment is 0; if the character in the first segment is a wildcard, the first code The fragment is 1. That is, the first segment can be 1, 0 or a wildcard. For convenience of explanation, the following two cases of the first segment being 1 and the first segment being 0 are collectively referred to as the first segment being a non-wildcard.
  • a decision tree is generated according to the HyperCuts algorithm for a plurality of sub-rule sets obtained through rule segmentation.
  • Rule segmentation involves rules, encoding rules, and bitwise XOR operations. The following is a description of the rules involved in rule segmentation, the encoding rules, and the relationship between specificity or operation:
  • the two rules are separately encoded to obtain two encoding rules.
  • two bits with the same position, if they are all non-wildcards, are encoded as 0. That is to say, after performing the encoding operation on the two rules, the two bits in the corresponding two encoding rules are 0. Therefore, a bitwise XOR operation is performed on two bits of the corresponding positions in the two encoding rules, and the result is 0.
  • the two rules are separately encoded to obtain two encoding rules.
  • two bits with the same position if one is a non-wildcard and the other is a wildcard, are encoded as 0 and 1, respectively. That is to say, after performing the encoding operation on the two rules, the two bits of the corresponding positions in the two encoding rules are 0 and 1, respectively. Therefore, a bitwise XOR operation is performed on two bits of the corresponding positions in the two encoding rules, and the result is 1.
  • the two rules are separately encoded to obtain two encoding rules.
  • two bits with the same position, if they are all wildcards, are coded as 1. That is to say, after performing the encoding operation on the two rules, the two bits in the corresponding positions in the two encoding rules are all 1. Therefore, a bitwise XOR operation is performed on two bits of the corresponding positions in the two encoding rules, and the result is 0.
  • the result is the number of times the second case occurs in the two rules.
  • the second case is: Two of the two bits with the same coding rule position, one bit is a wildcard and one is a non-wildcard. If the bit position of the second case is selected as the reference position, rule copying occurs.
  • the weight of the edge of the weighted undirected graph is the number of bits in the bit string obtained by performing a bitwise XOR operation on the two encoding rules. That is, the weight reflects the number of times the second situation occurs. Obviously, the fewer the second case, the more the rule is copied and the more the number of reference positions can be. Selecting the above reference position for rule segmentation can avoid rule copying.
  • a plurality of sub-rule sets obtained by using the apparatus provided by the embodiment of the present invention are respectively Running an algorithm that generates a decision tree, the probability of rule replication in multiple decision trees is lower than the probability of rule replication in a decision tree obtained by running the same decision tree for the rule set. It can be determined from above that the number of rules included in each sub-rule set is smaller than the number of rules included in the rule set. Therefore, the algorithm that runs a certain decision tree generates a plurality of decision trees for each of the plurality of sub-rule sets, and the height of the plurality of decision trees is less than or equal to the height of a decision tree generated by the algorithm that runs the same decision tree for the rule set.
  • the speed of performing rule matching in parallel for multiple decision trees is higher than or equal to the speed of performing rule matching separately for one decision tree.
  • Figure lib is a schematic diagram of a set of rules and a corresponding set of encoding rules.
  • the rule set includes rules A0, B0, C0, D0, E0, and F0. Each rule is 8 bits in length.
  • the rules A0, B0, C0, D0, E0, and F0 in the rule set correspond to eight, B, C, D, E, and F in the set of encoding rules, respectively.
  • the coding method is specifically as follows: the non-wildcard code is 1 and the wildcard code is 0.
  • FIG. 11a is a schematic diagram of the mapping of six rules in the rule set shown in Figure lib to a two-dimensional geometric space in decimal units.
  • the rule set shown in Fig. lib contains six rules, and the upper 4 bits and lower 4 bits of each rule are respectively mapped to the X axis and the Y axis of the two-dimensional geometric space shown in Fig. 11a.
  • FIG. 11c is a first weighted undirected graph according to an embodiment of the present invention.
  • the vertices in the first weighted undirected graph shown in Fig. 11c are A, B, C, D, E, and F, respectively.
  • Vertices A, B, C, D, E, and F correspond to a set of sub-rules, respectively.
  • the sub-rule set corresponding to vertex A contains rule A0.
  • the set of sub-rules corresponding to vertex A contains rule A0.
  • the set of sub-rules corresponding to vertex B contains rule B0.
  • the set of sub-rules corresponding to vertex C contains rule C0.
  • the set of sub-rules corresponding to the top point D contains the rule D0.
  • the set of sub-rules corresponding to vertex E contains rule E0.
  • the sub-rule set corresponding to the vertex F contains the rule F0.
  • the first weighted undirected graph contains six vertices, namely vertices A, B, C, D, E, and F, respectively.
  • the vertices eight, B, C, D, E and F in Fig. 11c correspond to the coding rules A, B, C, D, E and F in Fig. lib, respectively.
  • the first threshold may be any one of 1 to 8. This embodiment determines the first threshold to be 2.
  • the result of performing a bitwise XOR operation is a bit string, and the number of 1s in the bit string is the weight of the edge to which any two vertices are connected.
  • the corresponding encoding rules for vertices A and C are 11101110 and 11101110, respectively.
  • a bitwise XOR operation is performed on the corresponding encoding rules of the vertices A and C, and the result is 00000000.
  • the number of 1's in the bit string is 0. That is to say, the edges of the vertices A and C are connected with a weight of 0.
  • the edges of the vertices A and B are connected with a weight of 0.
  • the edge of the vertex B and C connections has a weight of 0.
  • the weights of other edges are shown in Figure 11c, which will not be repeated here. According to Fig. 11c, it can be determined that the edge with the smallest weight has a weight of 0. If the weight of the edge with the smallest weight is less than the first threshold, then the edge used to generate the new vertex needs to be determined. You can randomly determine the apex, the edge of the C join, the vertex A, the edge of the B join, or the vertex. The edges of the C join are the edges used to generate the new vertex.
  • the edge connecting the vertex and C is the edge for generating a new vertex
  • the weighted undirected graph shown in Fig. 12b is generated according to the new vertex A&C.
  • the weighted undirected graph shown in Fig. 12b is the last generated weighted undirected graph in the weighted undirected graph that has been generated.
  • the coding rule corresponding to the vertex A&C is the result of performing a bitwise AND operation on the two vertices of the edge for generating the new vertex A&C, that is, the coding rules corresponding to the vertex A and the vertex C, respectively.
  • the result obtained by the bitwise operation is 11101110.
  • the vertices other than the vertices A&C in the weighted undirected graph shown in Fig. 12b are the vertices in the weighted undirected graph shown in Fig. 1 lc, including the vertices B, D, E, and F.
  • Fig. 12a For the coding rules corresponding to the five vertices in the weighted undirected graph shown in Fig. 12b, see Fig. 12a.
  • the bitwise XOR operation is performed according to the coding rules corresponding to any two vertices in the weighted undirected graph shown in Fig. 12b.
  • the result of performing a bitwise XOR operation is a bit string, and the number of 1s in the bit string is the weight of the edge to which any two vertices are connected.
  • the edge of vertex A&C connected to vertex B has a weight of 0. See Figure 12b for the weights of the other edges in the weighted undirected graph shown in Figure 12b.
  • the edge with the smallest weight in the weighted undirected graph shown in Fig. 12b is the edge where the vertex A&C is connected to the vertex B.
  • the weight of the edge with the smallest weight is less than the first threshold. Determine the edge where the vertex A&C is connected to vertex B is the edge used to generate the new vertex.
  • the weighted undirected graph shown in Fig. 13b is generated on the basis of the vertex A&C&B and the weighted undirected graph shown in Fig. 12b. At this time, the weighted undirected graph shown in FIG. 13b is the last generated weighted undirected graph in the weighted undirected graph that has been generated.
  • the four vertices included in the weighted undirected graph shown in Fig. 13b are vertices A&C&B, D, respectively. E and F.
  • the four encoding rules in the set of encoding rules shown in Figure 13a correspond to the four vertices in the weighted undirected graph shown in Figure 13b.
  • the octets A&C and B of the weighted undirected graph shown in Fig. 12b are respectively subjected to a bitwise AND operation corresponding to the encoding rule, and the result of the bitwise AND operation is an encoding rule corresponding to the vertex A&C&B.
  • the bitwise XOR operation is performed according to the coding rules corresponding to any two vertices in the weighted undirected graph shown in Fig. 13b.
  • the result of performing a bitwise XOR operation is a bit string, and the number of 1s in the bit string is the weight of the edge to which any two vertices are connected.
  • the edge of vertex E connected to vertex F has a weight of 1. See Figure 13b for the weights of the other edges in the weighted undirected graph shown in Figure 13b.
  • the edge with the smallest weight in the weighted undirected graph shown in Fig. 13b is the edge where vertex E is connected to vertex F.
  • the weight of the edge with the smallest weight is less than the first threshold.
  • the edge where the vertex E is connected to the vertex F is the edge used to generate the new vertex.
  • a new vertex E&F is generated.
  • the weighted undirected graph shown in Fig. 14b is generated on the basis of the vertex E&F and the weighted undirected graph shown in Fig. 13b.
  • the three vertices included in the weighted undirected graph shown in Fig. 14b are vertices A&C&B, E&F, and D, respectively.
  • the three encoding rules in the set of encoding rules shown in Figure 14a correspond to the three vertices in the weighted undirected graph shown in Figure 14b.
  • the bitwise AND operation is performed on the two coding rules corresponding to the vertices E and F of the weighted undirected graph shown in Fig. 13b, and the result of the bitwise operation is the coding rule corresponding to the vertex E&F.
  • the bitwise XOR operation is performed according to the coding rules corresponding to any two vertices in the weighted undirected graph shown in Fig. 14b.
  • the result of performing a bitwise XOR operation is a bit string, and the number of 1s in the bit string is the weight of the edge to which any two vertices are connected.
  • the edge of vertex A&C&B connected to vertex D has a weight of 3. See Figure 14b for the weights of the other edges in the weighted undirected graph shown in Figure 14b.
  • the weighted undirected graph shown in Fig. 14b is the newly generated weighted undirected graph.
  • a decision tree is generated for each of the sub-rule sets corresponding to each vertex in the newly generated weighted undirected graph.
  • the newly generated weighted undirected graph consists of three vertices, vertex A&C&B, vertex E&F, and vertex D.
  • the sub-rule set corresponding to the vertex A&C&B includes rules A0, B0, and C0.
  • Figure 15a is a schematic diagram of the mapping of rules A0, B0, and CO to a two-dimensional geometric space measured in decimal.
  • Figure 15b is a schematic diagram of a decision tree generated for a set of sub-rules corresponding to vertex A&C&B according to the HyperCuts algorithm. It can be seen that rule replication does not occur for rules A0, BO and CO.
  • the sub-rule set corresponding to the vertex E&F contains rules E0 and F0.
  • Figure 16a is a schematic diagram of rules E0 and F0 mapped to a two-dimensional geometric space measured in decimal.
  • Figure 16b is a schematic diagram of a decision tree generated for a set of sub-rules corresponding to a vertex E&F according to the HyperCuts algorithm. It can be seen that rule replication does not occur in rules E0 and F0.
  • the sub-rule set corresponding to the vertex D includes the rule D0.
  • Figure 17a is a schematic diagram of a rule DO mapped to a two-dimensional geometric space measured in decimal.
  • Figure 17b is a schematic diagram of a decision tree generated for a set of sub-rules corresponding to vertex D according to the HyperCuts algorithm. It can be seen that the rule DO does not have rule replication.
  • Figure 18a is a diagram showing the mapping of multiple rules in the rule set shown in Figure lib to a two-dimensional geometric space in units of decimal.
  • Figure 18b is a diagram of a decision tree generated for the set of rules shown in Figure lib according to the HyperCuts algorithm.
  • the functional units in the various embodiments of the present invention may be integrated into one unit, or each functional unit may exist physically alone, or two or more functional units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
  • modules in the apparatus in the embodiments may be distributed in the apparatus of the embodiment as described in the embodiments, or may be correspondingly changed in one or more apparatuses different from the embodiment.
  • the modules of the above embodiments may be combined into one module, or may be further split into a plurality of sub-modules.
  • the steps of implementing the above method embodiments may be performed by hardware related to the program instructions.
  • the aforementioned program can be stored in a calculation
  • the machine can be read from the storage medium.
  • the steps including the foregoing method embodiments are performed; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

Des modes de réalisation de la présente invention concernent un procédé et un dispositif permettant de générer un arbre de décision. Le procédé comprend les étapes suivantes : génération d'un ensemble de règles de codage conformément à un ensemble de règles ; génération d'un premier graphique non orienté pondéré ; calculer le poids de chaque côté dans le premier graphique non orienté pondéré ; si le poids d'un côté ayant le plus grand poids dans le premier graphique non orienté pondéré est supérieur à un premier seuil, exécution d'une première opération de manière cyclique jusqu'à ce que le poids d'un côté ayant le plus grand poids dans un graphique non orienté pondéré nouvellement généré soit inférieur ou égal au premier seuil ; et génération d'un arbre de décision pour un sous-ensemble de règles correspondant à chaque sommet dans le graphe non orienté pondéré nouvellement généré. De plus, des modes de réalisation de la présente invention concernent, en outre, un procédé et un dispositif pour la génération d'un autre arbre de décision. Au moyen des solutions techniques décrites dans les modes de réalisation de la présente invention, la probabilité de l'occurrence d'un réplication de règle peut être réduite.
PCT/CN2013/073036 2012-04-01 2013-03-22 Procédé et dispositif de génération d'un arbre de décision WO2013149555A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP13772797.0A EP2819355B1 (fr) 2012-04-01 2013-03-22 Procédé et dispositif de génération d'un arbre de décision
US14/497,720 US10026039B2 (en) 2012-04-01 2014-09-26 Method and apparatus for generating decision tree

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201210095978.1A CN102664787B (zh) 2012-04-01 2012-04-01 决策树的生成方法和装置
CN201210095978.1 2012-04-01

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/497,720 Continuation US10026039B2 (en) 2012-04-01 2014-09-26 Method and apparatus for generating decision tree

Publications (1)

Publication Number Publication Date
WO2013149555A1 true WO2013149555A1 (fr) 2013-10-10

Family

ID=46774205

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/073036 WO2013149555A1 (fr) 2012-04-01 2013-03-22 Procédé et dispositif de génération d'un arbre de décision

Country Status (4)

Country Link
US (1) US10026039B2 (fr)
EP (1) EP2819355B1 (fr)
CN (1) CN102664787B (fr)
WO (1) WO2013149555A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112784510A (zh) * 2021-03-10 2021-05-11 国微集团(深圳)有限公司 基于均衡权值和最小边割的条件循环电路分割方法

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102664787B (zh) 2012-04-01 2014-10-08 华为技术有限公司 决策树的生成方法和装置
CN104737151A (zh) * 2013-08-28 2015-06-24 华为技术有限公司 一种复杂图的处理方法和设备
CN104573194A (zh) * 2014-12-20 2015-04-29 西安工业大学 一种装配序列规划中子装配体的识别方法
WO2017107215A1 (fr) * 2015-12-25 2017-06-29 华为技术有限公司 Procédé, dispositif et contrôleur permettant de déployer des fonctions de service dans un centre de données
CN106230725B (zh) * 2016-07-14 2019-09-06 杭州迪普科技股份有限公司 网包规则集的分类方法及装置
CN107318205A (zh) * 2017-08-07 2017-11-03 广汉阿拉丁科技有限公司 基于ZigBee技术的近距离灯具控制方法
US11244502B2 (en) * 2017-11-29 2022-02-08 Adobe Inc. Generating 3D structures using genetic programming to satisfy functional and geometric constraints
CN108804593B (zh) * 2018-05-28 2019-06-18 西安理工大学 基于图谱和可达路径数的无向加权图的子图查询方法
US10673765B2 (en) * 2018-09-11 2020-06-02 Cisco Technology, Inc. Packet flow classification in spine-leaf networks using machine learning based overlay distributed decision trees
EP3716353A1 (fr) 2019-03-25 2020-09-30 Hilti Aktiengesellschaft Protection contre les chutes de batterie
CN110414567B (zh) * 2019-07-01 2020-08-04 阿里巴巴集团控股有限公司 数据处理方法、装置和电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070179966A1 (en) * 2006-02-01 2007-08-02 Oracle International Corporation System and method for building decision trees in a database
CN102054002A (zh) * 2009-10-28 2011-05-11 中国移动通信集团公司 一种数据挖掘系统中决策树的生成方法及装置
CN102281196A (zh) * 2011-08-11 2011-12-14 中兴通讯股份有限公司 决策树生成方法及设备、基于决策树报文分类方法及设备
CN102664787A (zh) * 2012-04-01 2012-09-12 华为技术有限公司 决策树的生成方法和装置

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7685087B2 (en) * 2005-12-09 2010-03-23 Electronics And Telecommunications Research Institute Method for making decision tree using context inference engine in ubiquitous environment
WO2007113700A2 (fr) * 2006-03-30 2007-10-11 Koninklijke Philips Electronics N.V. Établissement de règles pour arbre décisionnel
US7937336B1 (en) * 2007-06-29 2011-05-03 Amazon Technologies, Inc. Predicting geographic location associated with network address
SE532426C2 (sv) * 2008-05-26 2010-01-19 Oricane Ab Metod för datapaketklassificering i ett datakommunikationsnät
EP2582096B1 (fr) 2010-06-28 2016-03-30 Huawei Technologies Co., Ltd. Procédé et dispositif de classification de paquets
US8868446B2 (en) * 2011-03-08 2014-10-21 Affinnova, Inc. System and method for concept development

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070179966A1 (en) * 2006-02-01 2007-08-02 Oracle International Corporation System and method for building decision trees in a database
CN102054002A (zh) * 2009-10-28 2011-05-11 中国移动通信集团公司 一种数据挖掘系统中决策树的生成方法及装置
CN102281196A (zh) * 2011-08-11 2011-12-14 中兴通讯股份有限公司 决策树生成方法及设备、基于决策树报文分类方法及设备
CN102664787A (zh) * 2012-04-01 2012-09-12 华为技术有限公司 决策树的生成方法和装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2819355A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112784510A (zh) * 2021-03-10 2021-05-11 国微集团(深圳)有限公司 基于均衡权值和最小边割的条件循环电路分割方法

Also Published As

Publication number Publication date
EP2819355B1 (fr) 2016-01-20
US20150019471A1 (en) 2015-01-15
CN102664787A (zh) 2012-09-12
CN102664787B (zh) 2014-10-08
EP2819355A1 (fr) 2014-12-31
US10026039B2 (en) 2018-07-17
EP2819355A4 (fr) 2014-12-31

Similar Documents

Publication Publication Date Title
WO2013149555A1 (fr) Procédé et dispositif de génération d'un arbre de décision
US9191321B2 (en) Packet classification
US9208438B2 (en) Duplication in decision trees
Meiners et al. Fast regular expression matching using small {tcams} for network intrusion detection and prevention systems
US8504510B2 (en) State machine compression for scalable pattern matching
WO2011085577A1 (fr) Procédé et dispositif de classification de paquets
US9595003B1 (en) Compiler with mask nodes
JP6383578B2 (ja) 構文解析木において経路を一意的に列挙する装置および方法
Inoue et al. Rethinking packet classification for global network view of software-defined networking
US9268855B2 (en) Processing request keys based on a key size supported by underlying processing elements
CN109672623B (zh) 一种报文处理方法和装置
MacDavid et al. Concise encoding of flow attributes in SDN switches
WO2019183962A1 (fr) Procédé de classification de paquet de réseau sur la base d'une longueur égale et d'une segmentation de densité égale
JP5682442B2 (ja) パケット分類器、パケット分類方法、及びパケット分類プログラム
JPWO2011108168A1 (ja) パケット分類器、パケット分類方法、パケット分類プログラム
Shankar et al. A divide and conquer state grouping method for bitmap based transition compression
Antikainen et al. XBF: Scaling up bloom-filter-based source routing
Hager et al. Trees in the list: Accelerating list-based packet classification through controlled rule set expansion
JP7483867B2 (ja) パケット処理方法及び装置、及びコンピュータ記憶媒体
KR101583439B1 (ko) 리프-푸싱 기반의 영역 분할 사분 트라이를 이용한 패킷 분류 방법 및 패킷 분류 장치
KR102229554B1 (ko) 해시 키 생성 방법 및 그 장치
US20240205092A1 (en) Network collective offloading routing management
US20230269310A1 (en) Efficient Memory Utilization for Cartesian Products of Rules
Matoušek et al. Towards hardware architecture for memory efficient IPv4/IPv6 Lookup in 100 Gbps networks
US10862903B2 (en) State grouping methodologies to compress transitions in a deterministic automata

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13772797

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2013772797

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE