WO2011085577A1 - 对报文进行分类的方法及装置 - Google Patents

对报文进行分类的方法及装置 Download PDF

Info

Publication number
WO2011085577A1
WO2011085577A1 PCT/CN2010/074575 CN2010074575W WO2011085577A1 WO 2011085577 A1 WO2011085577 A1 WO 2011085577A1 CN 2010074575 W CN2010074575 W CN 2010074575W WO 2011085577 A1 WO2011085577 A1 WO 2011085577A1
Authority
WO
WIPO (PCT)
Prior art keywords
segmentation
rule
code
matching level
rules
Prior art date
Application number
PCT/CN2010/074575
Other languages
English (en)
French (fr)
Inventor
张文勇
龚钧
刘淑英
陈洪飞
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2010/074575 priority Critical patent/WO2011085577A1/zh
Priority to CN201080002602.0A priority patent/CN102308533B/zh
Priority to EP10842861.6A priority patent/EP2582096B1/en
Publication of WO2011085577A1 publication Critical patent/WO2011085577A1/zh
Priority to US13/724,797 priority patent/US8732110B2/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2441Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]

Definitions

  • the present invention relates to the field of communications technologies, and in particular, to a method and apparatus for classifying messages. Background technique
  • Traffic classification that is, classifying received packets, is one of the key functions of the router; it provides technologies for complex value-added services such as network security, QoS (Quality of Service), load balancing, and traffic counting of routers. Guarantee.
  • the basic idea of the flow classification method based on decision tree is: Recursively separate the rule set by a certain segmentation strategy until the number of rules in each sub-rule set is less than the preset Bucket Size; A decision tree can be established. The middle node of the decision tree saves the method used by the segmentation rule set, and the leaf node saves the sub-rule set, that is, all possible matching rules are stored in the leaf node.
  • Decision tree-based algorithms include HiCuts (one-dimensional segmentation), HyperCuts (multidimensional segmentation), and Modular (selection segmentation).
  • the prior art has the following scheme for improving the flow classification method based on the decision tree: First, the original rule set is divided into a plurality of sub-rule sets that do not overlap each other, and then the decision is made on the obtained sub-rule set. tree.
  • the above process of dividing the original rule set into several sub-rule sets can be implemented as follows: 1) classifying the rule set according to the prefix; for example, classifying the standard Ipv4 quintuple rule When class, you can consider classifying the rule according to the prefix of the source IP and/or destination IP address; 2) classifying the rule according to the scope; for example, when classifying the standard Ipv4 quintuple rule, you can base the source The range of ports and/or destination ports classifies rules;
  • the subclasses obtained in 1) and 2) above are the required sub-rule sets.
  • the above Ipv4 quintuple rule may need to be divided into five domains; in this case, the subclasses obtained according to different classification methods may be differently combined according to the cross product method. Get multiple sub-rule sets that do not overlap each other.
  • the original rule set is divided according to an address field and a port domain.
  • the original rule set can be divided into si and s2 subclasses by the methods described in 1) and 2) above, and then the cross product method is used.
  • the original rule set can be divided into si * s2 sub-rule sets.
  • the original rule set can be divided into "complete" non-overlapping sub-rule sets, and the rule copying is reduced from a certain program; however, the improved stream classification algorithm is utilized.
  • the inventors found that at least the following problems exist in the prior art:
  • Whether or not the rule is copied depends on whether the rule has a wildcard '* in the bit used for segmentation when splitting, and does not depend on whether the fields of the rule overlap. Therefore, the above scheme is only applicable to the domain by itself. Split into the stream classification algorithm.
  • Embodiments of the present invention provide a method and apparatus for classifying messages, which are used to reduce rule copying in the classification process and improve classification efficiency.
  • a method for classifying messages including:
  • a device for classifying messages including:
  • a receiving unit configured to receive a message
  • a searching unit configured to search, in the at least one decision tree that has been created, a rule that matches the # ⁇ , the decision tree is a decision tree that is created by dividing the original rule set based on the segmentation code; And categorizing the packet according to the found rule.
  • the method and device for classifying a message according to an embodiment of the present invention because the decision tree used in the process of performing rule search is a decision tree created by dividing the original rule set based on the segment code, and using the score
  • the segmentation of the rule set can not only reduce the replication of the rule, but also greatly reduce the depth of the decision tree, the memory footprint and the construction time. Therefore, when the rule search is performed by using the solution provided in the embodiment of the present invention, Under the premise of keeping the search bandwidth unchanged, the speed of processing such as searching and classification is greatly improved.
  • the method and apparatus provided in the embodiments of the present invention can reduce rule copying in the classification process and improve classification efficiency.
  • Figure 1 is a schematic diagram of a rule set segmentation
  • FIG. 2 is a flowchart of a method for classifying a message according to Embodiment 1 of the present invention
  • FIG. 3 is a flowchart of a method for classifying a message according to Embodiment 2 of the present invention.
  • FIG. 5 is a schematic structural diagram of an apparatus for classifying a message according to Embodiment 4 of the present invention
  • FIG. 6 is a schematic structural diagram of an apparatus for classifying a message according to Embodiment 5 of the present invention
  • Schematic diagram of the division unit in the middle
  • a wildcard in a rule refers to a binary bit
  • the number and location of '*,) determines whether it is easy to cause rule replication during the construction of the decision tree. Since the number of wildcards '*' is different, the case of rule copying has been described in the previous section, and the effect of the location on rule copying is explained here.
  • the number of '*'s is exactly the same, except that the positions appearing in the rules are different. For the four rules in Table 1, the positions appear exactly the same, so when you divide the rule set in Table 1, you only need to select the first two digits of the first dimension (Diml) to divide the rule set into 4 sub-rules.
  • Each sub-rule set contains a rule, and there is no rule copy, as shown in Figure 1 (a).
  • the position of '* appears alternately; at this point, no matter which one is selected for segmentation, the rule is copied.
  • the rule set in Table 2 can be divided into four sub-rule sets, and each sub-rule set is There are 2 rules, as shown in Figure 1(b).
  • Figure 1(b) As can be seen from the above, although the numbers in the two examples shown in Tables 1 and 2 are identical, the degree of rule replication is quite different.
  • rule set For a rule set, if you select certain bits for segmentation, it will not cause replication. We call the rules in this rule set match; if you select which one to split, it will cause replication, which is called the rule set. The rules are not matched. In the example corresponding to Figure 1, the four rules in Table 1 are matched; the four rules in Table 2 are not matched. Moreover, for matching rules, the more optional bits that are not caused by the splitting, the higher the degree of matching, and the less likely it is to cause replication in the process of creating a decision tree.
  • each rule is a three-valued string of '0,, '1', and '*. If the bit string corresponding to each rule is divided into N ( N > 2 ) segments, and the number of ' *, in each segment is counted, the number of '*' in the segment dominates the use of the bits in the segment. Whether the rule is easy to copy when splitting. When the number of '*' in a certain section of a rule exceeds a certain threshold t (for example, a rule contains 16 characters, when the number of '*, the number exceeds 8), the bit in the segment can be considered.
  • t for example, a rule contains 16 characters, when the number of '*, the number exceeds 8
  • rule is "bad”, that is, when the bits in the segment are used for segmentation, the rule is easily copied, or there is a larger tendency to be copied; otherwise, the bit in the segment is considered to be "Ok.
  • each rule is a three-valued bit string consisting of '0, '1, and '*', that is, each rule contains multiple characters '0, '1, and '*,; The at least two characters are used as a segment, and the bit string corresponding to each rule is divided into N segments.
  • the segment is called "bad", and The code is 0; otherwise, the segment is called "good” and the code is 1.
  • each rule corresponds to an N-bit binary code, called a segmentation code.
  • Embodiment 1 is a diagrammatic representation of Embodiment 1:
  • the method for classifying a message includes:
  • the router can receive multiple messages from the network, and then the traffic classifier checks the multiple fields in the message to find the rules that match the message.
  • the decision tree is a decision tree that is created by dividing the original rule set based on the segment code.
  • the decision tree created by dividing the original rule set based on the segmentation code is created at the same time Considering the number and location of the wildcard '*, compared to the decision tree created by the existing method, there are fewer cases of rule copying, and the depth of the decision tree is smaller than the existing decision tree, so the embodiment is utilized.
  • the decision tree created by dividing the original rule set based on the segmentation code can shorten the time for finding the rule matching the message and improve the classification efficiency.
  • step 203 all the packets matching the same rule are considered to belong to one class, and the packets processed for different classes may be different.
  • the processing may be discarding, accepting, counting, and the like.
  • the execution body of each step described above may be a router or a traffic classification engine integrated inside the router.
  • the method for classifying a message because the decision tree used in the process of performing the rule search is a decision tree that is created by dividing the original rule set based on the segment code, and using the segment code
  • the division of the rule set can not only reduce the replication of the rules, but also greatly reduce the depth of the decision tree, the memory footprint and the construction time; therefore, when the rule search is performed by using the solution provided in the embodiment of the present invention, the search can be kept. Under the premise of constant bandwidth, the speed of processing such as searching and classification is greatly improved.
  • Embodiment 2 is a diagrammatic representation of Embodiment 1:
  • the segmentation code is an N-bit binary code determined according to the number and position in the rule.
  • each rule includes a plurality of characters '0, T, and '*,; wherein at least two characters are one segment, and the bit string corresponding to each rule is divided into N segments, when in a segment * If the number exceeds a certain threshold t, the segment is said to be "bad” and is encoded as 0; otherwise, the segment is said to be "good” and the code is 1.
  • each rule corresponds to an N-bit binary code called a segmentation code.
  • each domain can be prefixed or used. Said to the surrounding.
  • the source IP address and the destination IP address are naturally represented by a prefix; for the protocol number, it can be regarded as a prefix with a mask length of 0 (the domain is '*') or 8; the port number is represented by a range.
  • the mask length can be used to determine whether the corresponding character segment is "good” or "bad". For example, for an IP address containing a 32-bit binary code, with 16 bits as a segment, the source IP can be divided into 2 segments, corresponding to 2 binary bits as its segmentation code; when the number of '*, a segment When it exceeds 8, it is considered "bad".
  • the mask length of the source IP of a rule is maskLen ⁇ 8
  • the two binary bits in the segmentation code corresponding to the source IP are 00; when 8 maskLen ⁇ 24, the encoding is 10; when maskLen > 24 , coded to 11.
  • the upper and lower limits of a range are generally 16-bit binary numbers.
  • each 16 bits of the source IP address and the destination IP address can be divided into two segments, so that the source IP address and the destination IP address are respectively divided into two segments, and the threshold number of the number is 8; the source port and the destination port.
  • the range length threshold is 256;
  • the protocol number is represented by 8-bit binary, as a segment, the number of '*' is a positive integer less than 8 (corresponding to a specific protocol number and ' *, two cases).
  • the rule is divided into 7 segments, and the segmentation code is represented by a 7-bit binary number.
  • the number of 1s in the binary representation of the result is defined as the matching level of the two segmentation codes.
  • segmentation code When using the segmentation code to perform rule set partitioning, it is necessary to first select a segmentation code, then calculate the matching level of all the segmentation codes and the segmentation code, and classify the segmentation codes according to the matching level.
  • the selected segmentation code is referred to as a seed segmentation code.
  • the matching level of the other segmentation code and the seed segmentation code can be calculated.
  • the priority order can be set for each segmentation code. The way is as follows:
  • the plurality of segmentation codes are successively bitwise and subsequent results are referred to as common matching segment codes of the segmentation codes.
  • the common matching segment code of the N segment codes reflects the degree of matching between the rules in S.
  • the common matching segmentation code reflects the selection of which bits to segment when creating a decision tree for a sub-rule set, which is not easy to cause rule replication. If a bit in the segment code is 1, and the segment corresponding to the bit contains k-bit characters, and the threshold for determining whether the segment is "good” or "bad” is t, then When the first (k - t) bit in the segment splits the rule set, it is not easy to cause the rule to be copied. In the Modular algorithm, we can use the bit corresponding to 1 in the common matching segmentation code. The first (kt) bit in the segment creates a more efficient jump table. When selecting a bit, you can also prefer these bits that are not easy to cause the rule to be copied. In the HiCuts and HyperCuts algorithms, the bits in the dimension (corresponding to some segments) that are not easy to cause rule copying can be preferentially segmented according to the common matching segmentation code, so that the segmentation is more efficient.
  • the method for classifying the text specifically includes the following steps:
  • the original rule set is divided into at least two sub-rule sets according to the segmentation code.
  • the original rule set needs to be divided into multiple sub-rule sets. Then, the decision tree is constructed separately for the plurality of sub-rules.
  • the process of dividing the original rule set into at least two sub-rule sets according to the segmentation code, as shown in FIG. 4, may be implemented by the following steps:
  • the segment code corresponding to the rule is an N-bit binary code.
  • the segmentation code may be sorted according to the order of the number of rules corresponding to the segmentation code.
  • the specific ordering manner may be determined according to the requirements in the actual execution process.
  • the segmentation code with the largest number of selection rules in the segmentation code sorted according to the number of rules is used as a seed segmentation code, and bitwise and operation are performed separately with other segmentation codes sorted according to the number of rules;
  • step S13 the segmentation code having the largest number of corresponding rules is selected as the seed segmentation code in order to separate most of the rules as quickly as possible.
  • the seed segmentation code does not only have a large number of rules corresponding to itself, but also generally has more segmentation codes matching it.
  • the number of 1 is the matching level between the two segment codes.
  • all the obtained segmentation codes are sorted by the matching level between the other segmentation code and the seed segmentation code.
  • the matching level between the segmentation codes reflects the tendency of copying at the time of rule segmentation, that is, when the matching level is higher, the tendency of copying at the time of rule segmentation is smaller, and vice versa.
  • the segmentation code whose matching level is not 0 is sequentially subjected to continuous bitwise operation, and refers to: sequentially pressing the bitwise and result of all the segmentation codes in the segmentation code whose matching level is not 0.
  • the latter segmentation code performs a bitwise AND operation until the result of the bitwise sum of all the segmentation codes is obtained.
  • the final result obtained by successive bitwise and operation of the remaining segmentation code is that the common matching segmentation code of all segmentation codes in the class with a matching level greater than 0 between the seed segmentation codes .
  • the calculated common matching segment code of the segmentation code whose matching level is greater than 0 should also be saved.
  • step S14 the lowest matching level can be calculated by the following formula:
  • the lowest match level of a rule set reflects the number of bits in the rule set that can be used to split and not easily result in copying.
  • the lowest matching level can be set to ', some; when the number of rules is large, many bits need to be used to separate, then you need to The minimum match level is set higher. For example, if the number of '*' in a segment exceeds 8, the corresponding bit of the segment is '0'.
  • the minimum matching level between rules is 1, then the rules are less copied when they are split with no more than 8 bits, but when more bits need to be used for segmentation, May cause more copying. This situation can be solved by raising the minimum match level of the rule set.
  • the rule corresponding to the segmentation code whose matching level is greater than 0 is classified into the first sub-rule set.
  • the common matching segment code and the lowest matching level of all the segment codes whose matching levels are greater than 0 obtained in step S14 may be saved as the attributes of the first sub-rule set.
  • the rule corresponding to the segmentation code with the matching level of 0 is classified into the second sub-rule set, and Ending the continuation of the rule set; when the number of rules corresponding to the segment code whose matching level is 0 is greater than the first threshold, reselecting the number of corresponding rules from the plurality of segment codes whose matching level is 0 The most one segmentation code is returned to step S13 to continue dividing the segmentation codes whose matching level is 0.
  • the first threshold may be, but is not limited to, a multiple of a number of rules that can be accommodated in each leaf node, for example, n*bucketSize, n>2; corresponding to the segmentation code whose matching level is 0.
  • n*bucketSize a multiple of a number of rules that can be accommodated in each leaf node
  • the second threshold herein is smaller than the first threshold, which may be but not limited to each The number of rules that can be accommodated in the leaf node.
  • the rule corresponding to the segmentation code with the matching level of 0 can be classified into the second sub-rule set to construct different decision trees for different sub-rule sets, and the reduction is performed. The depth of each decision tree;
  • the number of rules corresponding to the segmentation code with the matching level of 0 is less than or equal to the second threshold, the number of rules corresponding to the segmentation code with the remaining matching level of 0 is small, so the The rule corresponding to the segmentation code with the matching level of 0 is classified into the first sub-rule set; the partitioning scheme is applicable to the case where the original rule set itself is relatively small, and the decision tree constructed according to the first sub-rule set The depth is not very large and has little effect on the rate of discovery in the classification process.
  • the number of sub-rule sets I do not have to be divided or divided is less; and for a rule set with poor matching, it can be divided into more sub-rule sets. In this way, the on-demand partitioning of the rule set can be achieved.
  • the number of sub-rule sets to be generated may be set in advance; for example, the original rule set may be set to be divided into two sub-rule sets.
  • Number of sub-rules that have been generated number of sub-rules set in advance - 1
  • the original rule set can basically be divided into at least two sub-rule sets.
  • step S13 and step S14 may also be combined, that is, the number of bits and the number of 1 in the result obtained in step S13 are The lowest matching level is compared, specifically, when the bit and the knot are in step S13 If the number of 1s in the result is less than the lowest matching level, the segmentation code that is currently bitwise and operated with the seed segmentation code is classified into the category with the matching level of 0. At this time, it is equivalent to dividing the rule set by using only the matching level between the other segmentation code and the seed segmentation code as a criterion.
  • step 301 Construct a decision tree for at least two sub-rule sets obtained in step 301, and save the decision tree.
  • the foregoing steps 301 and 302 are not necessarily performed every time the classification is performed; as long as the original rule set based on the segmentation code is already stored in the router and then created. In the decision tree, the above two steps can be skipped directly in the subsequent classification process.
  • the router can receive multiple messages from the network, and then the traffic classifier checks multiple domains in the message to find rules that match the message.
  • step 305 all the packets matching the same rule belong to one class, and the packets processed for different classes are processed differently.
  • the processing may be discarding, accepting, counting, and the like.
  • the method for classifying a message by determining a segmentation code corresponding to a different rule, and selecting a seed segmentation code from the segmentation code, and then according to the other segmentation code and the seed segmentation code
  • the matching level and the preset minimum matching segmentation code are used to classify the segmentation code to implement the division of the original rule set, thereby obtaining at least two sub-rule sets and constructing at least two decision trees;
  • the constructed decision tree classifies the received message.
  • Using the method provided in this embodiment especially the process of dividing the original rule set by using the segmentation code, can not only reduce the copying of the rule, but also accurately select the bit when performing the rule set segmentation, thereby making the depth of the decision tree, The memory footprint and the building time are greatly reduced, and the division of the rule set is improved. Speed, and when classifying messages, it can also reduce the rule search time and improve the classification efficiency.
  • Embodiment 3 is a diagrammatic representation of Embodiment 3
  • the end condition of the original rule set division is set as follows:
  • the minimum matching level can be set to 1.
  • the rule set is divided as follows: 521. Calculate the segmentation code of each rule shown in Table 3; the result is shown in the last column of Table 3.
  • segmentation code '11 Since the segmentation code with the largest number of rules in the segmentation code is ' ⁇ ⁇ , the segmentation code '11 is used as the seed segmentation code, and the matching with the segmentation code '11' is found. All segmentation codes are classified according to the matching level between the other each segmentation code and the seed segmentation code '11 (as shown in Table 5).
  • the number of rules with a matching level of 0 is 3, and the condition iii) is satisfied, that is, when the number of rules with the matching level of 0 is greater than 2 and less than or equal to 4, the sub-rules may be directly set.
  • a rule with a matching level of 0 is treated as a new sub-rule set; therefore, a rule with a matching level of 0 is assigned to a new sub-rule set.
  • the original rule set is divided, then the original rule set can be divided into two sub-rule sets; and according to the common matching segmentation code (01), the first 4 bits of the second dimension are used to the first When a sub-rule set is split, there is no rule copy.
  • the solution provided by the embodiment of the present invention divides the rule set according to the segmentation code to create a decision tree, and can simultaneously consider the influence of the number and location of the wildcard '*, in each rule on the rule division, so that when the decision tree is constructed You can select the appropriate bits to segment the rules, which can effectively reduce the replication of rules, shorten the time of building trees, and improve the utilization of memory.
  • Embodiment 4 is a diagrammatic representation of Embodiment 4:
  • the embodiment of the present invention provides a device for classifying a message. As shown in FIG. 5, the device includes:
  • the receiving unit 51 is configured to receive a packet, where the packet may be a plurality of packets from the network, and the searching unit 52 is configured to search, in the at least one decision tree that has been created, a rule that matches the packet.
  • the decision tree is a decision tree that is created by dividing the original rule set based on the segmentation code;
  • the classifying unit 53 is configured to perform classifying processing on the message according to the found rule; the processing referred to herein may be an operation to be performed for different types of messages, such as discarding, accepting, counting, and the like.
  • the device for classifying the message in this embodiment may be a router or a traffic classification engine integrated in the router.
  • the apparatus for classifying a message because the decision tree used in the process of performing the rule search is based on the segmentation code, the original rule set is divided and then created.
  • the use of the segmentation code to divide the rule set not only reduces the replication of the rules, but also greatly reduces the depth of the decision tree, the memory footprint and the building time; therefore, the rules are provided by using the solution provided in the embodiment of the present invention.
  • searching the processing speed of searching, sorting, etc. can be greatly improved while keeping the search bandwidth constant.
  • Embodiment 5 is a diagrammatic representation of Embodiment 5:
  • the apparatus for classifying the message includes: a receiving unit 61, a searching unit 62, and a classifying unit 63, and a dividing unit 64 and a building unit 65; wherein the dividing unit 64 And dividing the original rule set into at least two sub-rule sets according to the segmentation code; wherein the segmentation code refers to dividing a rule into ⁇ ( ⁇ > 2 ) segments, according to the wildcard '*' in each segment
  • the quantity is determined to represent the binary code of the rule. For the specific determination manner, refer to the description in the second embodiment, which will not be described in detail herein.
  • the building unit 65 is configured to create a decision tree for each of the at least two sub-rule sets divided by the dividing unit 64, so that the searching unit 62 performs a rule search when performing message classification.
  • the receiving unit 61 is configured to receive a packet; the packet may be multiple packets received by the router from the network;
  • the searching unit 62 is configured to search, in the at least one decision tree that has been created, a rule that matches the message, where the decision tree is a decision tree that is created by dividing the original rule set based on the segment code;
  • the classifying unit 63 is configured to perform classifying processing on the message according to the found rule; the processing referred to herein may be an operation to be performed for different types of messages, such as discarding, accepting, counting, and the like.
  • the dividing unit 64 can be implemented as follows: Specifically, the dividing unit 64 includes: a segmentation module 641, a sorting module 642, a classification module 643, a categorization module 644, a first diversity module 645, a second diversity module 646, a third diversity module 647, a calculation module 648, and a save module 649;
  • a segmentation module 641 configured to segment each rule in the original rule set by using at least two characters as a segment, and calculate a segmentation code corresponding to each rule;
  • the sorting module 642 is configured to count the number of rules corresponding to the same segmentation code, and sort the segmentation codes according to the number of rules;
  • the classification module 643 is configured to select a segmentation code with the largest number of corresponding rules, perform bitwise AND operation with other segmentation codes sorted according to the number of rules, and use the number of bits in the bit and the result as the matching level.
  • the segmentation codes are sorted in order of matching level from highest to lowest;
  • the categorization module 644 is configured to perform successive bitwise operations and operations on the segmentation codes that are sorted according to the matching level and whose matching level is not 0, and the number of 1s in the result of consecutive bitwise AND is less than the minimum When matching the level, the last segmentation code that participates in continuous bitwise and operation is classified into the category with matching level 0;
  • the first diversity module 645 is configured to classify the rule corresponding to the segmentation code whose matching level is greater than 0 into the first sub-rule set;
  • the second diversity module 646 is configured to: when the number of rules corresponding to the segmentation code with the matching level of 0 is less than or equal to the first threshold, the rule corresponding to the segmentation code with the matching level of 0 is classified into the first If the number of rules corresponding to the segmentation code with the matching level of 0 is greater than the first value, the division unit 64, in particular, the classification module 643 and the classification module 644 in the division unit 64 are required. The first diversity module 645 and the second diversity module 646 re-select the segmentation code with the largest number of corresponding rules from the plurality of segmentation codes whose matching level is 0, to complete the segment with the matching level of 0. Continue the division of the code.
  • a second threshold that is smaller than the first threshold may be introduced.
  • the second diversity module 646 is specifically configured to use the matching level as 0.
  • the dividing unit 64 further includes a third diversity module 647, where the third diversity module 647 is configured to: when the number of rules corresponding to the segmentation code with the matching level of 0 is less than or equal to the second threshold, The rule corresponding to the segmentation code with the matching level of 0 is classified into the first sub-rule set.
  • the lowest matching level used by the categorization module 644 to perform the categorization of the segmentation code may be calculated by the calculation module 648; specifically, the calculation module 648 is configured to calculate the match according to the following formula. level:
  • the lowest matching level represents rounding up
  • k is the number of characters corresponding to each bit of the segmentation code
  • t is the segment of the segment that can be used for regular segmentation
  • the maximum value of the wildcard numRules is the number of rules in the original rule set before partitioning
  • bucketSize is the maximum number of rules saved in the leaf nodes of the decision tree
  • f is the utilization efficiency of each leaf node in the decision tree.
  • the dividing unit 64 further includes a saving module 649, where the saving module 649 is configured to calculate and save the segmentation code with the matching level greater than 0 after being classified by the classification module 644, and sequentially perform continuous bitwise operation and operation. The subsequent bitwise and result, that is, the common matching segmentation code of the segmentation code whose matching level is greater than 0 after the classification process by the classification module 644.
  • the apparatus for classifying a message determines a segmentation code corresponding to a different rule, and selects a seed segmentation code therefrom, and then according to another segmentation code and the seed segmentation code.
  • the matching level and the preset minimum matching segmentation code to classify the segmentation code to implement the division of the original rule set, thereby obtaining at least two sub-rule sets and constructing at least two decision trees;
  • the constructed decision tree classifies the received message.
  • the apparatus By using the apparatus provided in this embodiment, especially by using the segmentation code to divide the original rule set, not only can the rule copy be reduced, but the rule set is accurately selected when the rule set is segmented, thereby making the decision tree
  • the depth, memory usage, and construction time are greatly reduced, and the processing speed of the partition rule set is improved.
  • the rule search time can also be reduced, and the classification efficiency is improved.
  • the method and apparatus for classifying messages provided in the embodiments of the present invention may also support incremental update.
  • the incremental update referred to here means that after the original rule set is divided into multiple sub-rule sets, when some rules need to be added or deleted, the segmentation-based method is not needed for the sub-rules that have been divided.
  • the set is re-divided, and only the newly added rules are assigned to the appropriate sub-rule set, or the old rules are deleted from a sub-rule set.
  • the segmentation code of the rule is first calculated, and the matching level of the segmentation code and the common matching segmentation code of each sub-rule set is calculated according to the generation order of the plurality of sub-rule sets.
  • the matching code of the segmentation code of the newly added rule and the common matching segment code of a certain sub-rule set is greater than or equal to the lowest matching level of the sub-rule set, the new rule is added to the sub-rule set.
  • deleting a rule When deleting a rule, first calculating a segmentation code of the rule, determining, according to a common matching segmentation code and a lowest matching level of each sub-rule set, which sub-rule set the rule belongs to, and The child rule set is deleted. When a rule in a sub-rule set is deleted and the number of rules it contains is less than a certain threshold, the sub-rule set is combined with other sub-rules.
  • the actual value can be set according to actual needs in the actual process.
  • the rule set is divided by the segmentation code method, and the sub-rule set can be updated flexibly.
  • the update includes the addition of a new rule in the sub-rule set or the deletion of the old rule in the sub-rule set.
  • the method of segmentation code greatly reduces the time of update and the amount of memory occupied by the update activity.
  • a combination of software and hardware can be adopted.
  • the rules for easy segmentation are handled by software, and the rules that are not easily segmented are processed in TCAM (Ternary Content Addressable Memory).
  • the division of the original rule set based on the segmentation code can separate most of the rules as soon as possible; and, since the segmentation code method ensures the matching between the rules, the separated Rules are easily split by software, and the remaining few rules that are not easily categorized can be placed in the TCAM. In this way, the performance of the algorithm can be improved, and the space of the TCAM can be saved.
  • the present invention can be implemented by means of software plus necessary general hardware, and of course, by hardware, but in many cases, the former is a better implementation. .
  • the technical solution of the present invention which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a readable storage medium, such as a floppy disk of a computer.
  • a hard disk or optical disk or the like includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)

Description

对报文进行分类的方法及装置 技术领域
本发明涉及通信技术领域, 尤其涉及一种对报文进行分类的方法及装置。 背景技术
流分类, 即对接收到的报文进行分类, 是路由器的关键功能之一; 它为 路由器的网络安全、 QoS ( Quality of Service, 服务质量)、 负载平衡、 流量计 数等复杂的增值服务提供技术保障。
基于决策树的流分类方法, 其基本思想是: 采用某种切分策略递归地将 规则集分开,直到每个子规则集中的规则数都小于预先设定的 Bucket Size (桶 深); 通过切分可以建立一棵决策树, 决策树的中间结点保存切分规则集所使 用的方法, 叶子结点保存子规则集, 也就是说, 叶子结点中保存着所有可能 的匹配规则。
在对接收到的报文进行分类时, 首先从报文头中抽出相关的域组成关键 字, 然后使用关键字遍历已经建立好的决策树, 将关键字跟叶子结点中的规 则进行比较, 最终可以得到与报文匹配且优先级最高的规则。 基于决策树的 算法有 HiCuts (一维切分)、 HyperCuts (多维切分)和 Modular (选位切分) 等。
然而在以上这些基于决策树的流分类方法中, 由于规则中通配符 的 存在, 导致规则的复制很难避免, 进而造成内存占用量增加、 切分效率较低 等问题。
针对上述问题, 现有技术中存在如下对基于决策树的流分类方法进行改 进的方案: 首先将原始规则集划分成若干个互相不重叠的子规则集, 然后再 对得到的子规则集构建决策树。
上述将原始规则集划分成若干子规则集的过程可以通过如下方式实现: 1 )才艮据前缀对规则集进行分类; 例如, 在对标准 Ipv4五元組规则进行分 类时, 就可以考虑依据其中的源 IP和 /或目的 IP地址的前缀对规则进行分类; 2 )根据范围对规则进行分类; 例如, 在对标准 Ipv4五元组规则进行分类 时, 可以依据源端口和 /或目的端口的范围对规则进行分类;
如果仅针对一个域对原始规则集进行划分, 则上述 1 )和 2 ) 中得到的子 类即为所需要的子规则集。 而如果原始规则集中存在多个域, 例如上述 Ipv4 五元组规则可能就需要针对 5个域进行划分; 此时, 可以根据交叉积方法将 根据不同分类方法所得到的子类进行不同的组合, 得到多个互不重叠的子规 则集。 假设要根据一个地址域和一个端口域对原始规则集进行划分, 首先可 以用上面 1 )和 2 ) 中介绍的方法将原始规则集分别划分为 si和 s2个子类, 然后用交叉积的方法就可以将原始规则集划分为 si * s2个子规则集。
通过上述改进后的基于决策树的流分类算法, 可以将原始规则集划分为 "完全"不重叠的子规则集, 从一定程序上減少规则的复制; 不过, 在利用上述 改进后的流分类算法进行报文分类的过程中, 发明人发现现有技术中至少还 存在如下问题:
规则是否出现复制取决于, 在切分时规则在用于切分的位中是否存在通 配符' *,, 并不是取决于规则的域之间是否重叠; 因此, 上述方案仅适用于完 全按照域进行切分的流分类算法中。
发明内容
本发明的实施例提供一种对报文进行分类的方法及装置 , 用以減少分类 过程中的规则复制, 提高分类效率。
为达到上述目的 , 本发明的实施例采用如下技术方案:
一种对报文进行分类的方法, 包括:
接收报文;
在已创建的至少一个决策树中查找与所述报文相匹配的规则, 所述决策 树为基于分段码对原始规则集进行划分后进而创建的决策树;
根据查找到的所述规则对所述报文进行分类处理。 一种对报文进行分类的装置, 包括:
接收单元, 用于接收报文;
查找单元, 用于在已创建的至少一个决策树中查找与所述 #艮文相匹配的 规则, 所述决策树为基于分段码对原始规则集进行划分后进而创建的决策树; 分类单元, 用于根据查找到的所述规则对所述报文进行分类处理。
本发明实施例提供的对报文进行分类的方法及装置, 由于其在进行规则 查找过程中所利用的决策树是基于分段码对原始规则集进行划分后进而创建 的决策树, 而使用分段码对规则集进行划分不仅可以减少规则的复制, 而且 可以使决策树的深度、 内存占用量和建树时间都大大降低; 因此, 利用本发 明实施例中提供的方案进行规则查找时, 可以在保持查找带宽不变的前提下, 大大提高查找、 分类等处理过程的速度。 与现有技术相比, 本发明实施例中 提供的方法及装置, 可以减少分类过程中的规则复制, 提高分类效率。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案, 下面将对实 施例描述中所需要使用的附图作筒单地介绍, 显而易见地, 下面描述中的附 图仅仅是本发明的一些实施例, 对于本领域普通技术人员来讲, 在不付出创 造性劳动的前提下, 还可以根据这些附图获得其他的附图。
图 1为规则集切分示意图;
图 2为本发明实施例一中的对报文进行分类的方法的流程图;
图 3为本发明实施例二中的对报文进行分类的方法的流程图;
图 4为本发明实施例二中的对规则集进行划分的流程图;
图 5为本发明实施例四中的对报文进行分类的装置的结构示意图; 图 6为本发明实施例五中的对报文进行分类的装置的结构示意图; 图 Ί为本发明实施例五中的划分单元的结构示意图。
具体实施方式
在基于决策树的流分类过程中, 规则中通配符 (指某一个二进制位 为 '*,)的数量和位置决定了在构建决策树过程中, 是否容易导致规则复制。 由于通配符 '*, 的数量不同导致规则复制的情况在前述部分中已有描述, 这里再解释一下 的位置对规则复制的影响。 在表 1、 表 2所示的两个例 子中, '*' 的个数完全相同, 只是在规则中出现的位置不同。 对于表 1中的 4 条规则 出现的位置完全相同, 因此在对表 1中的规则集进行切分时, 只 需要选择第一维(Diml ) 的前两位将规则集切分为 4个子规则集; 每个子规 则集包含一条规则, 并且没有规则复制, 如图 1 (a)所示。 对于表 2中的规则, '*, 的位置交叉出现; 这时,无论选择哪一位进行切分都会导致规则的复制。 例如, 选择第一维(Diml ) 的第一位和第二维(Dim2 ) 的第一位进行切分, 可以将表 2中的规则集切分为 4个子规则集, 而每个子规则集中都有 2条规 则, 如图 1(b)所示。 由上可知, 虽然在表 1和表 2所示的两个例子中 的 个数都完全相同, 但规则复制的程度却相差甚远。
Figure imgf000006_0001
对于一个规则集, 如果选择某些位进行切分时不会导致复制, 我们就称 这个规则集中的规则是匹配的; 如果无论选择哪一位进行切分都会导致复制, 就称该规则集中的规则是不匹配的。 在图 1所对应的示例中, 表 1中的 4条 规则就是匹配的; 表 2中的 4条规则就是不匹配的。 而且, 对于匹配规则, 切分时不会导致复制的可选位越多, 则相互匹配的程度越高, 在创建决策树 的过程中越不容易导致复制。
为了更好地描述规则之间的匹配性, 下面我们引入规则分段的思想。 首先, 我们将每条规则可以看作一个由 '0,, '1 ', 和' *,组成的三值位串。 如果将每条规则对应的位串分为 N ( N > 2 )段, 并统计每一段中' *,的个数, 则该段中 '*,的数量主导着在使用该段中的位进行切分时, 该规则是否容易被 复制。 当一条规则的某一段中 '*,的个数超过某个阔值 t时 (比如一段规则中包 含 16位字符, 当' *,的个数超过 8时), 就可以认为该段中的位对于该规则是 "坏" 的, 即用该段中的位进行切分时, 该规则容易被复制, 或者说有较大 的被复制的趋势; 否则就认为该段中的位对于该规则是 "好" 的。 通过统计 每条规则中哪些段是 "好" 的, 哪些段是 "坏" 的, 可以对每条规则中 的个数和位置有一个大概的了解, 并依此作为划分规则集的依据。
具体地, 每条规则都是由 '0,、 '1,和' *'组成的三值位串, 即在每条规则 中包含有多位字符' 0,、 '1,和' *,; 以其中至少两位字符为一段, 将每条规则对 应的位串分为 N段, 当某一段中' *'的个数超过某个阈值 t时, 称该段为 "坏" 的, 将其编码为 0; 否则, 称该段为 "好" 的, 编码为 1。 这样, 每条规则就 对应一个 N位的二进制码, 称为分段码。 下面将结合本发明实施例中的附图, 对本发明实施例中的技术方案进行 清楚、 完整地描述, 显然, 所描述的实施例仅仅是本发明一部分实施例, 而 不是全部的实施例。 基于本发明中的实施例, 本领域普通技术人员在没有做 出创造性劳动前提下所获得的所有其他实施例, 都属于本发明保护的范围。
实施例一:
如图 2所示, 本发明实施例提供的对报文进行分类的方法, 包括:
201、 接收报文。
路由器可以接收来自网络的多个报文, 然后由流分类器对报文中的多个 域进行检查, 以便查找与所述 4艮文相匹配的规则。
202、 在已创建的至少一个决策树中查找与所述 文相匹配的规则, 所述 决策树为基于分段码对原始规则集进行划分后进而创建的决策树。
基于分段码对原始规则集进行划分后进而创建的决策树在创建时, 同时 考虑通配符 '*, 的数量和位置, 相比于通过现有方法创建的决策树, 其出现 规则复制的情况更少, 且决策树的深度较现有的决策树要小, 因此利用本实 施例中的基于分段码对原始规则集进行划分后进而创建的决策树可以缩短查 找与所述报文相匹配的规则的时间, 提高分类效率。
203、 根据查找到的所述规则对所述报文进行分类处理。
在步骤 203 中, 可以认为与同一条规则相匹配的所有报文属于一类, 而 针对不同类的报文其处理方式可能是不一样的, 例如所述处理可以是丢弃、 接受、 计数等。
在本实施例中, 上述各步骤的执行主体可以是路由器、 或者集成在路由 器内部的流分类引擎。
本发明实施例提供的对报文进行分类的方法, 由于其在进行规则查找过 程中所利用的决策树是基于分段码对原始规则集进行划分后进而创建的决策 树, 而使用分段码对规则集进行划分不仅可以减少规则的复制, 而且可以使 决策树的深度、 内存占用量和建树时间都大大降低; 因此, 利用本发明实施 例中提供的方案进行规则查找时, 可以在保持查找带宽不变的前提下, 大大 提高查找、 分类等处理过程的速度。
实施例二:
下面将以一具体实施例来对本发明实施例中提供的对报文进行分类的方 法进行详细描述。 不过在对所述方法的实现过程进行详述之前, 可以先介绍 以下几个 f既念:
1 )通过上面的描述可知, 分段码是根据规则中 的数量和位置而确定的 N位二进制码。 具体地, 在每条规则中包含有多位字符' 0,、 T和' *,; 以其中 至少两位字符为一段, 将每条规则对应的位串分为 N段, 当某一段中' *,的个 数超过某个阔值 t时,称该段为 "坏"的,将其编码为 0; 否则,称该段为 "好" 的, 编码为 1。 这样, 每条规则就对应一个 N位的二进制码, 称为分段码。
在实际的规则中 , 比如 Ipv4标准 5元组规则, 各个域都可以用前缀或范 围来表示。 源 IP和目的 IP地址自然就用前缀表示; 对于协议号, 可以看作掩 码长度为 0 (该域为 '*' )或 8的前缀; 端口号用范围表示。
对于前缀, 通过掩码长度就可以判断对应的字符段是 "好" 的还是 "坏" 的。 比如, 对于包含 32位二进制码的 IP地址来说, 以 16位作为一段, 则源 IP可以分为 2段, 对应 2个二进制位来作为其分段码; 当一段中 '*, 的个数 超过 8 时, 就认为该段为 "坏" 的。 这样, 当某条规则的源 IP 的掩码长度 maskLen < 8时, 源 IP对应的分段码中的两个二进制位为 00; 当 8 maskLen < 24时, 编码为 10; 当 maskLen > 24时, 编码为 11。
对于范围, 可以用范围的长度判断某一段是 "好" 的, 还是 "坏" 的。 范围 (如协议号)的上下限一般都是 16位的二进制数。 我们可以将一个范围 域作为一段, 如果某个范围的长度大于阈值 L, 则称该段是 "坏" 的, 编码为 0; 否则, 称其为 "好" 的, 编码为 1。 L优选地跟前缀中 的个数的阔值 t相对应, 其关系为 t = log2L。 这样, 有利于后面计算子规则集的最低匹配级 别。
分段时, 各段的长度及阈值也不一定相同。 比如对于 Ipv4标准 5元组规 则 ,可以将源 IP和目的 IP中每 16位分为一段,从而分别将源 IP和目的 IP分 为两段, 的个数的阈值为 8; 源端口和目的端口用范围表示, 各作为一个 段, 范围长度阈值为 256; 协议号用 8位二进制表示, 作为一个段, '*' 的个 数的阔值为小于 8 的正整数(对应特定的协议号和 '*, 两种情况)。 这样, 规则就被分为 7段, 分段码用 7位二进制数表示。
2 )如果两个分段码进行按位与运算(AND操作 )之后的结果为 0, 表示 两个分段码不匹配; 否则表示匹配。 相互匹配的分段码称为匹配分段码。
两个匹配分段码进行按位与操作后, 结果的二进制表示中 1 的个数定义 为该两个分段码的匹配级别。
例如, 任取两个分段码 A和 B , 如果( A&B ) == 0 , 则 A和 B不匹配; 否则 A和 B匹配, 且匹配级别 = ( A&B )的二进制表示中 1的个数。 对于不 匹配的分段码, 我们也说其匹配级别为 0。 如果分段码 A和 B匹配, 则称 A 是 B的匹配分段码, 同时 B也是 A的匹配分段码。 分段码之间的匹配级别越 高, 与这些分段码关联的规则在一起时越容易切分, 也就是说利用这些分段 码进行规则切分时出现复制的趋势越小。
3 )在使用分段码进行规则集划分时, 需要先选定一个分段码, 然后计算 所有的分段码跟该分段码的匹配级别, 并将分段码按照匹配级别分类。 在此, 被选定的分段码称为种子分段码。
选定种子分段码之后, 可以计算其它分段码跟种子分段码的匹配级别。 在计算匹配级别之后, 可以为各个分段码设定优先级次序。 方式如下:
( 1 ) 匹配级别越高, 优先级越高;
( 2 )如果匹配级别相同, 对应的规则数越多, 优先级越高。
4 )多个分段码依次连续按位与之后的结果称为这些分段码的公共匹配分 段码。
例如, 存在多个分段码 A、 B、 C、 D, 则计算这四个分段码的公共匹配 分段码时, 需要先将 A和 B进行按位与运算, 然后将 A和 B的按位与结果与 C进行按位与运算, 后面依次类推, 从而得到 A、 B、 C、 D的公共匹配分段 码。
假设跟 N个分段码相对应的规则组成的子规则集为 S , 则这 N个分段码 的公共匹配分段码反映了 S 中的规则之间的匹配程度。 公共匹配分段码中 1 的个数越多, 说明 S中的规则之间的匹配程度越高; 在对 S进行切分时, 就 有越多的位可以选择而不容易导致规则的复制。
同时, 公共匹配分段码反映了对子规则集创建决策树时, 选择哪些位进 行切分不易导致规则复制。如果分段码中的某一位为 1, 且跟该位对应的段中 包含 k位字符, 而判断该段是 "好" 的还是 "坏" 的所用的 的数量的阈 值为 t, 则使用该段中的前(k - t )位对规则集进行切分时, 不易导致规则复 制。 在 Modular算法中, 我们可以使用公共匹配分段码中为 1的位所对应的 段中的前(k-t )位建立更有效的 jump table (跳转表)。 在选位时, 也可以优 先选择这些不易导致规则复制的位。 在 HiCuts和 HyperCuts算法中, 可以根 据公共匹配分段码优先选择那些不易导致规则复制的维(对应某些分段 ) 中 的位进行切分, 从而使切分更高效。
下面将以上述基本概念为基础 , 详细介绍本实施例中提供的对报文进行 分类的方法。
在本实施例中, 所述对 文进行分类的方法, 如图 3 所示, 具体包括以 下步驟:
301、 根据分段码将原始规则集划分成至少两个子规则集。
为了能够減少复制, 同时使决策树的深度减小、 缩短建树时间, 需要对 原始规则集进行划分, 使其分成多个子规则集; 然后, 再对所述多个子规则 集分别构建决策树。
具体地, 所述根据分段码将原始规则集划分成至少两个子规则集的过程, 如图 4所示, 可以通过以下步骤来实现:
S 11、 以至少两位字符为一段对所述原始规则集中的每条规则进行分段, 并计算每条规则对应的分段码。
如果将所述原始规则集中的一条规则分成了 N ( N > 2 )段, 则这条规则 对应的分段码就是一个 N位的二进制码。
512、 在得到了每条规则对应的分段码之后, 统计相同的分段码对应的规 则数量, 并按照所述规则数量从高到低的次序对所述分段码进行排序。
在本步骤中, 当然还可以是按照所述分段码对应的规则的数量从低到高 的次序对所述分段码进行排序; 具体的排序方式可以根据实际执行过程中的 需要来确定。
513、在所述按照规则数量进行排序后的分段码中选 应规则数量最多 的分段码作为种子分段码, 与按照规则数量排序后的其它分段码两两进行按 位与操作; 统计按位与结果中 1的个数, 按照所述按位与结杲中 1的个数的 多少对所述分段码进行分类排序。
在步骤 S13 中, 之所以选对应规则数量最多的分段码作为种子分段码, 是为了尽快地把大部分的规则分离出来。 这样, 种子分段码不仅仅本身对应 的规则数多, 而且一般情况下跟它匹配的分段码也较多。
通过前面的描述可知, 两个分段码进行按位与运算后得到的二进制结果 中, 1的个数即为这两个分段码之间的匹配级别。 那么, 在步骤 S 13中, 就是 以其他分段码与所述种子分段码之间的匹配级别对所得到的全部分段码进行 排序。 所述分段码之间的匹配级别反映了规则切分时的复制的趋势, 即当所 述的匹配级别越高, 则规则切分时的复制的趋势越小, 反之亦然。
S 14、 将所述按照匹配级别排序后的、 且匹配级别不为 0的分段码依次进 行连续按位与操作, 并在连续按位与的结杲中 1的个数小于最低匹配级别时, 将最后一个参与连续按位与的分段码归入到匹配级别为 0的类别中。
其中, 将匹配级别不为 0的分段码依次进行连续按位与操作, 指的是: 依次将所述匹配级别不为 0 的分段码中、 前面所有分段码的按位与结果与后 面一个分段码进行按位与运算 , 直至得到全部分段码相按位与后的结果。
在所述依次进行连续按位与操作的过程中, 所有可能使按位与结果中 1 的个数 、于最低匹配级别的分段码都被归入到了与种子分段码的匹配级别为
0 的类别中; 那么剩余的分段码进行连续按位与操作后得到的最终结果也就 是, 与种子分段码之间的匹配级别大于 0的类别中所有分段码的公共匹配分 段码。
此外, 还应该将计算得到的所述匹配级别大于 0的分段码的公共匹配分 段码进行保存。
在步骤 S14中, 所述最低匹配级别 可以通过如下公式计算得到:
Figure imgf000012_0001
其中, 符号 "「," 代表向上取整; k为分段码的每个二进制位对应的字 符数; t为分段后可用于规则切分的字符段中包含的通配符的最大值, 也就是 一段字符中 '*, 的数量的阈值; num ules为划分前原始规则集中的规则数; bucketSize为所述决策树的叶子结点中保存的最大规则数; f为所述决策树中 平均每个叶子结点的利用效率, 该值在决策树创建之前很难得到精确的结果, 因此可以根据经验值来设定。
一个规则集的最低匹配级别反映了该规则集中, 可以用于切分而不易导 致复制的位的多少。 当规则数较少时, 只需要较少的位就可以将它们分开, 这时可以将最低匹配级别设得'、一些; 当规则数很多时, 需要用很多位才能 切分开, 这时需要把最低匹配级别设得高一些。 比如, 设定一段中' *,的个数 超过 8时, 该段对应的位为' 0,。 假设在一个规则集中, 规则之间的最低匹配 级别是 1 , 那么这些规则在用不超过 8位进行切分时规则的复制程度较小,但 当需要用更多的位进行切分时, 还是可能导致较多复制。 这种情况可以通过 提高规则集的最低匹配級别来解决。
515、 将所述匹配级别大于 0的分段码对应的规则归入到第一子规则集。 同时, 可以将步骤 S14中得到的所有匹配级别大于 0的分段码的公共匹 配分段码和最低匹配级别作为所述第一子规则集的属性进行保存。
516、在所述匹配级别为 0的分段码所对应的规则数小于等于第一阔值时, 将所述匹配级别为 0的分段码对应的规则归入到第二子规则集, 并结束规则 集的继续划分; 在所述匹配级别为 0 的分段码所对应的规则数大于所述第一 阈值时, 从所述匹配级别为 0的多个分段码中重新选取对应规则数量最多的 一个分段码, 返回步骤 S13 , 以对所述匹配级别为 0的所有分段码继续划分。
其中, 所述第一阔值可以是但不限于每个叶子结点中可容纳的规则数的 倍数, 例如 n*bucketSize, n > 2; 在所述匹配级别为 0的分段码所对应的规则 数小于等于 n*bucketSize时,可以认为此时剩余的规则数较少,可以不必要进 一步对规则集进行划分。
如果在判断规则划分是否可以结束时, 涉及到两个阔值的话, 则可以通 过以下方式来确定剩余的匹配级别为 0 的分段码应该归入的子规则集。 具体 地,
在所述匹配级别为 0的分段码所对应的规则数小于等于第一阈值且大于 笫二阁值时, 这里的第二阈值要小于所述第一阈值, 其可以是但不限于每个 叶子结点中可容纳的规则数, 此时可以将所述匹配级别为 0的分段码对应的 规则归入到第二子规则集, 以针对不同的子规则集构建不同的决策树, 减低 每个决策树的深度;
而在所述匹配级别为 0的分段码所对应的规则数小于等于所述第二阈值 时, 由于剩余的匹配级别为 0的分段码所对应的规则数很小, 因此可以将所 述匹配级别为 0 的分段码对应的规则归入到所述第一子规则集中; 这种划分 方案适用于原始规则集本身就比较小的情况, 根据所述第一子规则集构建的 决策树的深度不会很大, 对分类过程中的查找速率影响不大。
通过上述方法, 对于一些规则之间匹配很好的规则集, 不必划分或划分 的子规则集数 I较少; 而对于匹配不好的规则集, 可以划分成较多的子规则 集。 这样, 可以实现规则集的按需划分。
另外, 也可以事先设定要产生的子规则集的数目; 比如, 可以设置将原 始规则集划分成 2个子规则集。 这样, 在步骤 S16中, 即使在所述匹配级别 为 0的分段码所对应的规则数大于所述第一阈值时, 只要
已经产生的子规则集数 =预先设定的子规则集数 - 1
那么, 就可以结束规则集划分, 将匹配级别为 0的规则直接作为一个新 的子规则集。
在完成了步骤 S11至 S16之后, 基本上可以将所述原始规则集划分成至 少两个子规则集。
在上述根据分段码将原始规则集划分成至少两个子规则集的过程中, 还 可以将步骤 S13和步骤 S14进行合并, 即: 以步骤 S13中得到的按位与结果 中 1的个数与最低匹配级别进行比较, 具体地, 当步骤 S13中的当按位与结 果中 1 的个数小于最低匹配级别时, 则将当前与种子分段码进行按位与操作 的分段码归入到匹配级别为 0的类别中。 此时相当于仅以其他分段码与所述 种子分段码之间的匹配级别作为判断标准, 来对规则集进行划分。
302、 针对步骤 301中得到的至少两个子规则集分别构建决策树, 并对其 进行保存。
在路由器对报文进行分类的过程中, 上述步骤 301和 302不必要在每次 进行分类时都执行; 只要在路由器中已经保存有所述基于分段码对原始规则 集进行划分后进而创建的决策树, 则在后续的分类过程中可以直接将上述两 个步驟跳过。
303、 接收报文。
路由器可以接收来自网络的多个报文, 然后由流分类器对报文中的多个 域进行检查, 以便查找与所述报文相匹配的规则。
304、 在已创建的至少一个决策树中查找与所述报文相匹配的规则, 所述 决策树为步骤 302中基于分段码对原始规则集进行划分后进而创建的决策树。
305、 根据查找到的所述规则对所述报文进行分类处理。
在步骤 305 中, 可以认为与同一条规则相匹配的所有报文属于一类, 而 针对不同类的报文其处理方式是不一样的, 例如所述处理可以是丟弃、 接受、 计数等。
本发明实施例中提供的对报文进行分类的方法, 通过确定不同的规则对 应的分段码, 并从其中选取出种子分段码, 继而根据其他分段码与所述种子 分段码之间的匹配级别以及预先设定的最低匹配分段码来对分段码进行分 类, 以实现原始规则集的划分, 从而得到至少两个子规则集并构建至少两个 决策树; 之后, 就可以根据所述已构建的决策树对接收到的报文进行分类处 理。 使用本实施例中提供的方法, 尤其是利用分段码对原始规则集进行划分 的过程, 不仅可以减少规则的复制, 在进行规则集切分时准确地选位, 从而 使决策树的深度、 内存占用量和建树时间都大大降低, 提高划分规则集的处 理速度, 而且在对报文进行分类时, 也可以降低规则查找时间, 提高分类效 率。
实施例三:
为了更好地理解上述实施例二中描述的对原始规则集进行划分的过程, 在本实施例中将给出一个具体的实例来加以说明。
如表 3中所示, 原始规则集中有 10条 2维规则, 每条规则的位数是 8。 所述规则由 '0,、 T和' *,组成的三值位串表示。 设置每 8位为一段, 当每段 中' *,的个数大于 4时, 所述段的编码为' 0' , 否则所述段的编码为' Γ。
表 3
Figure imgf000016_0001
在本实施例中 , 所述原始规则集划分的结束条件设置如下:
i)将所述原始规则集最多划分为 2个子规则集;
ii) 当所述子规则集中的匹配级别为 0的规则数小于等于 2时, 无须继续 划分;
iii)当匹配级别为 0的规则数大于 2且小于等于 4时, 则可以直接把所述 子规则集中的匹配级别为 0的规则作为一个新的子规则集。
由于表 3中所示的规则数较少, 因此, 可以设置最低匹配级别为 1。 在设置好所述结束条件后, 规则集的划分方法如下所示: 521、计算表 3中所示的每条规则的分段码;结果如表 3的最后一列所示。
522、 将所述分段码按规则数从高到低的次序排序 (如表 4所示)。
表 4
Figure imgf000017_0001
S23、由于所述分段码中规则数最多的分段码为 'Ι Γ , 因此将分段码 '11, 作为种子分段码, 并找出与所述分段码 '11' 相匹配的所有分段码, 同时根 据所述其他每个分段码与种子分段码 '11, 之间的匹配级别进行分类 (如表 5 所示)。
表 5
Figure imgf000017_0002
S24、 将表 5中的匹配级别不为 0的分段码依次进行连续按位与操作, 在 连续按位与结果中 1 的个数小于最低匹配级别时, 将最后参与连续按位与操 作的分段码移到匹配级别为 0的类别中。 如表 6所示, 分段码 01和 10不匹 配, 而分段码 01对应的规则数较多 , 因此分段码 01先与分段码 11进行按位 与操作, 得到结果 01 , 进而以结果 01与分段码 10进行按位与操作, 在进行 了第二次按位与后的结果中 1 的个数小于最低匹配级别 1; 因此, 将分段码 10移到匹配级别为 0的类别中。
表 6
Figure imgf000017_0003
S25、 将匹配级别大于 0的分段码( 11和 01 )对应的规则归入到一个子 规则集; 同时记录下它们的公共匹配分段码为 01。
S26、 如表 6所示, 匹配级别为 0的规则数为 3 , 满足条件 iii), 即当匹配 级别为 0的规则数大于 2且小于等于 4时, 则可以直接把所述子规则集中的 匹配级别为 0的规则作为一个新的子规则集; 所以将匹配级别为 0的规则归 入到一个新的子规则集。
根据所述以上步骤对原始规则集进行划分, 则可以把原始规则集分为两 个子规则集; 并根据公共匹配分段码(01 )可知, 当用第二维的前 4位对第 一个子规则集进行切分时, 则不会有规则复制。
本发明实施例提供的方案, 根据分段码来对规则集进行划分进而创建决 策树, 能够同时考虑每条规则中通配符 '*, 的数量和位置对规则划分的影响, 使得构建决策树的时候可以选取适当的位对规则进行切分, 这样可以有效地 减少规则的复制, 缩短了建树时间, 提高内存的利用率。
实施例四:
对应于上述实施例一中的对报文进行分类的方法, 本发明实施例提供了 一种用于对报文进行分类的装置, 如图 5所示, 该装置包括:
接收单元 51, 用于接收报文, 所述报文可以是来自网络的多个报文; 查找单元 52, 用于在已创建的至少一个决策树中查找与所述报文相匹配 的规则, 所述决策树为基于分段码对原始规则集进行划分后进而创建的决策 树;
分类单元 53 , 用于根据查找到的所述规则对所述报文进行分类处理; 这 里所说的处理, 可以是针对不同类别的报文需执行的操作, 例如丟弃、 接受、 计数等。
本实施例中的对报文进行分类的装置, 可以是路由器, 也可以是集成在 路由器中的流分类引擎。
本发明实施例提供的对报文进行分类的装置, 由于其在进行规则查找过 程中所利用的决策树是基于分段码对原始规则集进行划分后进而创建的决策 树, 而使用分段码对规则集进行划分不仅可以减少规则的复制, 而且可以使 决策树的深度、 内存占用量和建树时间都大大降低; 因此, 利用本发明实施 例中提供的方案进行规则查找时, 可以在保持查找带宽不变的前提下, 大大 提高查找、 分类等处理过程的速度。
实施例五:
下面将以一具体实施例来对本发明实施例中提供的对报文进行分类的装 置进行详细描述。
在本实施例中, 所述对报文进行分类的装置, 如图 6所示, 包括: 接收 单元 61、查找单元 62和分类单元 63,以及划分单元 64和建树单元 65;其中, 划分单元 64, 用于根据分段码将原始规则集划分成至少两个子规则集; 其中, 所述分段码指的是将一条规则分成 Ν ( Ν > 2 )段后, 根据每一段中通 配符 '*' 的数量确定的代表所述规则的 Ν位二进制码, 其具体的确定方式参 见实施例二中的描述, 这里不再详述。
建树单元 65,用于针对所述划分单元 64划分得到的至少两个子规则集中 的每个子规则集创建决策树, 以便在进行报文分类时所述查找单元 62进行规 则查找。
此外, 所述接收单元 61 , 用于接收报文; 所述报文可以是路由器接收到 的来自网络的多个报文;
查找单元 62, 用于在已创建的至少一个决策树中查找与所述艮文相匹配 的规则, 所述决策树为基于分段码对原始规则集进行划分后进而创建的决策 树;
分类单元 63, 用于根据查找到的所述规则对所述报文进行分类处理; 这 里所说的处理, 可以是针对不同类别的报文需执行的操作, 例如丢弃、 接受、 计数等。
在本实施例中,如图 7所示,所述划分单元 64可以通过如下方式来实现: 具体地, 划分单元 64包括: 分段模块 641、排序模块 642、分类模块 643、 归类模块 644、 第一分集模块 645、 第二分集模块 646、 第三分集模块 647、 计算模块 648、 保存模块 649; 其中,
分段模块 641 ,用于以至少两位字符为一段对所述原始规则集中的每条规 则进行分段, 并计算每条规则对应的分段码;
排序模块 642, 用于统计相同的分段码对应的规则数量, 并按照所述规则 数量从高到低对所述分段码进行排序;
分类模块 643 , 用于选取对应规则数量最多的分段码, 与按照规则数量排 序后的其它分段码两两进行按位与操作, 并将按位与结果中 1 的个数作为匹 配级别对所述分段码按匹配级别从高到低的次序进行排序;
归类模块 644 , 用于将所述按照匹配级别排序后的、且匹配级别不为 0的 分段码依次进行连续按位与操作, 并在连续按位与的结果中 1 的个数小于最 低匹配级别时,将最后一个参与连续按位与操作的分段码归入到匹配级别为 0 的类别中;
第一分集模块 645,用于将所述匹配级别大于 0的分段码对应的规则归入 到第一子规则集;
第二分集模块 646,用于在所述匹配级别为 0的分段码所对应的规则数小 于等于第一阔值时, 将所述匹配级别为 0的分段码对应的规则归入到第二子 规则集; 而如果所述匹配级别为 0的分段码所对应的规则数大于第一阁值, 则需要通过划分单元 64 ,尤其是划分单元 64中的分类模块 643、归类模块 644、 第一分集模块 645和第二分集模块 646,从所述匹配级别为 0的多个分段码中 重新选取对应规则数量最多的一个分段码, 以完成所述匹配级别为 0 的分段 码的继续划分。
如果需要对子规则集的划分做更细致的限定, 那么可以引入一个小于所 述第一阈值的第二阈值; 此时, 所述第二分集模块 646, 具体用于在所述匹配 级别为 0 的分段码所对应的规则数小于等于第一阔值、 大于第二阈值时, 将 所述匹配级别为 0的分段码对应的规则归入到第二子规则集; 此时, 所述划分单元 64还包括第三分集模块 647, 该第三分集模块 647 用于在所述匹配级别为 0的分段码所对应的规则数小于等于所述第二阈值时, 将所述匹配级别为 0的分段码对应的规则归入到所述第一子规则集中。
进一步地, 所述归类模块 644在进行分段码的归类时所利用到的最低匹 配级别可以通过计算模块 648来计算得到; 具体地, 计算模块 648用于根据 如下公式计算得到所述 匹配级别:
Figure imgf000021_0001
其中, 为所述最低匹配级别; 符号 "「,"代表向上取整; k为分段码 的每个二进制位对应的字符数; t为分段后可用于规则切分的字符段中包含的 通配符的最大值; numRules为划分前原始规则集中的规则数; bucketSize为 所述决策树的叶子结点中保存的最大规则数; f为所述决策树中平均每个叶子 结点的利用效率。
此外, 所述划分单元 64还包括保存模块 649, 该保存模块 649用于计算 并保存经所述归类模块 644进行归类处理后的匹配级别大于 0的分段码依次 进行连续按位与操作后的按位与结果, 即经所述归类模块 644进行归类处理 后的匹配级别大于 0的分段码的公共匹配分段码。
利用本实施例中的装置进行报文分类的过程, 可以参看实施例二中的描 述, 此处不再赞述。
本发明实施例提供的对报文进行分类的装置, 通过确定不同的规则对应 的分段码, 并从其中选取出种子分段码, 继而根据其他分段码与所述种子分 段码之间的匹配级别以及预先设定的最低匹配分段码来对分段码进行分类, 以实现原始规则集的划分, 从而得到至少两个子规则集并构建至少两个决策 树; 之后, 就可以根据所述已构建的决策树对接收到的报文进行分类处理。 使用本实施例中提供的装置, 尤其是利用分段码对原始规则集进行划分, 不 仅可以减少规则的复制, 在进行规则集切分时准确地选位, 从而使决策树的 深度、 内存占用量和建树时间都大大降低, 提高划分规则集的处理速度, 而 且在对报文进行分类时, 也可以降低规则查找时间, 提高分类效率。 此外, 本发明实施例中提供的对报文进行分类的方法及装置, 还可以支 持增量更新。 此处所涉及的增量更新指的是, 在对原始规则集进行划分为多 个子规则集后, 当需要再增加或删除一些规则时, 利用基于分段码的方法无 需对已划分好的子规则集进行重新划分, 只需要将新增的规则分到合适的子 规则集中, 或者将旧的规则从某个子规则集中删除。
当新增一个规则时, 首先计算所述规则的分段码, 并按照所述多个子规 则集的产生顺序计算分段码跟各子规则集的公共匹配分段码的匹配级别。 当 所述新增规则的分段码跟某个子规则集的公共匹配分段码的匹配级别大于或 等于 该子规则集的最低匹配级别时, 将所述新增规则添加到该子规则集中。
当删除一个规则时, 首先计算所述规则的分段码, 根据各子规则集的公 共匹配分段码和最低匹配级别判断所述规则属于哪一个子规则集, 并把所述 规则从所述的子规则集中删除。 当某个子规则集中的规则删除后, 其包含的 规则数量小于某个阈值时, 则将该子规则集跟其它子规则集合并; 所述闹值 在实际过程中可以根据实际需要来设置„
通过分段码的方法对规则集进行划分, 可以很灵活的对子规则集进行更 新, 所述更新包括所述子规则集中新规则的增加或所述子规则集中旧规则的 删除等。 与现有技术相比, 所述分段码的方法大大降低了更新的时间和所述 更新活动带来的内存占有量。 当规则数较多且难以切分时, 可以采用软硬件结合的解决方案。 将容易 切分的规则通过软件方式来处理, 而不容易切分的规则放在 TCAM ( Ternary Content Addressable Memory, 三态内容寻址存储器) 中进行处理。 然而, 由 于 TCAM的集成和存储效率低、 功耗大, 因此应当将尽量少且跟其它规则不 匹配的规则放在 TCAM中处理, 以减少 TCAM的使用。 在本发明实施例提供的方案中, 基于分段码对原始规则集进行划分可以 尽快地将大部分规则分离出来; 而且, 由于分段码方法保证了规则之间的匹 配性, 这些分离出来的规则很容易通过软件方式来进行切分, 剩余的不容易 切分的少量规则可以放在 TCAM中。 这样, 既可以提高算法的性能, 又能节 省 TCAM的空间。
通过以上的实施方式的描述, 所属领域的技术人员可以清楚地了解到本 发明可借助软件加必需的通用硬件的方式来实现, 当然也可以通过硬件, 但 很多情况下前者是更佳的实施方式。 基于这样的理解, 本发明的技术方案本 质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来, 该 计算机软件产品存储在可读取的存储介质中, 如计算机的软盘, 硬盘或光盘 等, 包括若干指令用以使得一台计算机设备(可以是个人计算机, 服务器, 或者网络设备等)执行本发明各个实施例所述的方法。
以上所述, 仅为本发明的具体实施方式, 但本发明的保护范围并不局限 于此, 任何熟悉本技术领域的技术人员在本发明揭露的技术范围内, 可轻易 想到的变化或替换, 都应涵盖在本发明的保护范围之内。 因此, 本发明的保 护范围应以权利要求的保护范围为准。

Claims

权利要求 书
1、 一种对报文进行分类的方法, 其特征在于, 包括:
接收报文;
在已创建的至少一个决策树中查找与所述报文相匹配的规则, 所述决策树 为基于分段码对原始规则集进行划分后进而创建的决策树;
根据查找到的所述规则对所述报文进行分类处理。
2、 根据权利要求 1所述的对报文进行分类的方法, 其特征在于, 在所述在 已创建的至少一个决策树中查找与所述 4艮文相匹配的规则之前, 还包括:
根据分段码将原始规则集划分成至少两个子规则集;
针对每个所述子规则集创建决策树。
3、 根据权利要求 2所述的对报文进行分类的方法, 其特征在于, 所述根据 分段码将原始规则集划分成至少两个子规则集, 包括:
以至少两位字符为一段对所述原始规则集中的每条规则进行分段, 并计算 每条规则对应的分段码;
统计相同的分段码对应的规则数量, 并按照所述规则数量从高到低对所述 分段码进行排序;
选取对应规则数量最多的分段码, 顺次与按照规则数量排序后的其他分段 码两两进行按位与操作, 并将按位与结果中 1 的个数作为匹配级别对所述分段 码进行分类排序;
将所述按照匹配级别排序后的、 且匹配级别不为 0的分段码依次进行连续 按位与操作, 并在连续按位与的结杲中 1 的个数小于最低匹配级别时, 将最后 一个参与连续按位与操作的分段码归入到匹配级别为 0的类别中;
将所述匹配级别大于 0的分段码对应的规则归入到第一子规则集; 在所述匹配级别为 0的分段码所对应的规则数小于等于第一阈值时, 将所 述匹配级别为 0的分段码对应的规则归入到第二子规则集;在所述匹配级别为 0 的分段码所对应的规则数大于所述第一阔值时, 从所述匹配级别为 0 的多个分 段码中重新选取对应规则数量最多的一个分段码, 并重复上述步驟以对所述匹 配级别为 0的分段码继续划分。
4、 根据权利要求 3所述的对报文进行分类的方法, 其特征在于, 在所述将 所述按照匹配级别排序后的、 且匹配级别不为 0 的分段码依次进行连续按位与 操作之后, 还包括:
计算并保存所述匹配级别大于 0 的分段码依次进行连续按位与操作后的按 位与结果。
5、 根据权利要求 3所述的对报文进行分类的方法, 其特征在于, 所述在所述匹配级别为 0的分段码所对应的规则数小于等于第一阈值时, 将所述匹配级别为 0的分段码对应的规则归入到第二子规则集, 为: 在所述匹 配级别为 0 的分段码所对应的规则数小于等于第一阁值、 大于第二阔值时, 将 所述匹配级别为 0 的分段码对应的规则归入到第二子规则集; 其中, 所述笫一 阈值大于所述第二阈值; 贝' J ,
所述根据分段码将原始规则集划分成至少两个子规则集, 还包括: 在所述匹配级别为 0的分段码所对应的规则数小于等于所述第二阈值时, 将所述匹配级别为 0的分段码对应的规则归入到所述第一子规则集中。
6、 根据权利要求 3所述的对报文进行分类的方法, 其特征在于, 所述最低 匹配级别可以通过如下公式计 得到:
Figure imgf000025_0001
其中, 为所述最低匹配级别; k为分段码的每个二进制位对应的字符数; t为分段后可用于规则切分的字符段中包含的通配符的最大值; numRules为划分 前原始规则集中的规则数; bucketSize为所述决策树的叶子结点中保存的最大规 则数; f为所述决策树中平均每个叶子结点的利用效率。
7、 一种对报文进行分类的装置, 其特征在于, 包括:
接收单元, 用于接收报文; 查找单元, 用于在已创建的至少一个决策树中查找与所述报文相匹配的规 则, 所述决策树为基于分段码对原始规则集进行划分后进而创建的决策树; 分类单元, 用于 ^艮据查找到的所述规则对所述报文进行分类处理。
8、 根据权利要求 7所述的对报文进行分类的装置, 其特征在于, 还包括: 划分单元, 用于根据分段码将原始规则集划分成至少两个子规则集; 建树单元, 用于针对每个所述子规则集创建决策树, 以便所述查找单元进 行规则查找。
9、 根据权利要求 8所述的对报文进行分类的装置, 其特征在于, 所述划分 单元, 包括:
分段模块, 用于以至少两位字符为一段对所述原始规则集中的每条规则进 行分段, 并计算每条规则对应的分段码;
排序模块, 用于统计相同的分段码对应的规则数量, 并按照所述规则数量 从高到低对所述分段码进行排序;
分类模块, 用于选取对应规则数量最多的分段码, 顺次与按照规则数量排 序后的其他分段码两两进行按位与操作, 并将按位与结果中 1 的个数作为匹配 级别对所述分段码进行分类排序;
归类模块, 用于将所述按照匹配级别排序后的、 且匹配级别不为 0 的分段 码依次进行连续按位与操作, 并在连续按位与的结果中 1 的个数小于最低匹配 级别时, 将最后一个参与连续按位与操作的分段码归入到匹配级别为 0 的类别 中;
第一分集模块, 用于将所述匹配级别大于 0 的分段码对应的规则归入到第 一子规则集;
第二分集模块, 用于在所述匹配级别为 0 的分段码所对应的规则数小于等 于第一阈值时, 将所述匹配级别为 0的分段码对应的规则归入到第二子规则集。
10、 根据权利要求 9所述的对报文进行分类的装置, 其特征在于, 所述划分单元, 还用于在所述匹配级别为 0 的分段码所对应的规则数大于 所述第一阈值时, 从所述匹配级别为 0 的多个分段码中重新选取对应规则数量 最多的一个分段码, 以对所述匹配级别为 0的分段码继续划分。
11、 根据权利要求 9所述的对报文进行分类的装置, 其特征在于, 还包括: 保存模块, 用于计算并保存经所述归类模块进行归类处理后的匹配级别大 于 0的分段码依次进行连续按位与操作后的按位与结果。
12、 根据权利要求 9所述的对报文进行分类的装置, 其特征在于, 所述第二分集模块, 具体用于在所述匹配级别为 0的分段码所对应的规则 数小于等于第一阈值、 大于第二阈值时, 将所述匹配级别为 0的分段码对应的 规则归入到第二子规则集; 其中, 所述第一阈值大于所述第二阈值; 且,
所述划分单元还包括:
第三分集模块, 用于在所述匹配级别为 0 的分段码所对应的规则数小于等 于所述第二阈值时, 将所述匹配级别为 0 的分段码对应的规则归入到所述笫一 子规则集中。
13、 根据权利要求 9所述的对报文进行分类的装置, 其特征在于, 还包括; 计算模块, 用于根据如下公式计算得到所述最低匹配级别:
Figure imgf000027_0001
其中, 为所述最低匹配级别; k为分段码的每个二进制位对应的字符数; t为分段后可用于规则切分的字符段中包含的通配符的最大值; numRules为划分 前原始规则集中的规则数; bucketSize为所述决策树的叶子结点中保存的最大规 则数; f为所述决策树中平均每个叶子结点的利用效率。
PCT/CN2010/074575 2010-06-28 2010-06-28 对报文进行分类的方法及装置 WO2011085577A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
PCT/CN2010/074575 WO2011085577A1 (zh) 2010-06-28 2010-06-28 对报文进行分类的方法及装置
CN201080002602.0A CN102308533B (zh) 2010-06-28 2010-06-28 对报文进行分类的方法及装置
EP10842861.6A EP2582096B1 (en) 2010-06-28 2010-06-28 Classification method and device for packets
US13/724,797 US8732110B2 (en) 2010-06-28 2012-12-21 Method and device for classifying a packet

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2010/074575 WO2011085577A1 (zh) 2010-06-28 2010-06-28 对报文进行分类的方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US13/724,797 Continuation US8732110B2 (en) 2010-06-28 2012-12-21 Method and device for classifying a packet

Publications (1)

Publication Number Publication Date
WO2011085577A1 true WO2011085577A1 (zh) 2011-07-21

Family

ID=44303819

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2010/074575 WO2011085577A1 (zh) 2010-06-28 2010-06-28 对报文进行分类的方法及装置

Country Status (4)

Country Link
US (1) US8732110B2 (zh)
EP (1) EP2582096B1 (zh)
CN (1) CN102308533B (zh)
WO (1) WO2011085577A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105072122A (zh) * 2015-08-19 2015-11-18 山东超越数控电子有限公司 一种数据包快速匹配分类方法
US10026039B2 (en) 2012-04-01 2018-07-17 Huawei Technologies Co., Ltd Method and apparatus for generating decision tree

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103516757B (zh) * 2012-06-28 2016-12-21 华为技术有限公司 内容处理方法、装置及系统
GB2521406A (en) 2013-12-18 2015-06-24 Ibm Transforming rules into generalised rules in a rule management system
CN103780435B (zh) * 2014-02-18 2017-09-26 迈普通信技术股份有限公司 使用端口号掩码对数据流进行分类的方法及系统
US9967309B2 (en) * 2014-10-06 2018-05-08 Microsoft Technology Licensing, Llc Dynamic loading of routes in a single-page application
CN106209614B (zh) * 2015-04-30 2019-09-17 新华三技术有限公司 一种网包分类方法和装置
CN106096022B (zh) * 2016-06-22 2020-02-11 杭州迪普科技股份有限公司 多域网包分类规则的划分方法及装置
CN108572921B (zh) * 2017-05-15 2021-03-12 北京金山云网络技术有限公司 规则集更新方法、装置、规则匹配方法及装置
CN109218211B (zh) * 2017-07-06 2022-04-19 创新先进技术有限公司 数据流的控制策略中阈值的调整方法、装置和设备
GB2580285B (en) * 2018-08-13 2021-01-06 Metaswitch Networks Ltd Packet processing graphs
US11552887B2 (en) * 2019-08-07 2023-01-10 Arista Networks, Inc. System and method of processing packet classification with range sets
CN110705227A (zh) * 2019-08-15 2020-01-17 广州文冲船厂有限责任公司 一种型材下料表的转化方法
CN112367262B (zh) * 2020-08-20 2022-07-05 国家计算机网络与信息安全管理中心 一种五元组规则的匹配方法及装置
CN114840133A (zh) * 2021-01-15 2022-08-02 华为技术有限公司 一种网络配置规则的处理方法以及相关设备
CN113347173B (zh) * 2021-05-31 2022-04-22 新华三信息安全技术有限公司 一种包过滤方法、装置及电子设备
CN117609894B (zh) * 2024-01-23 2024-04-09 中国人民解放军国防科技大学 一种基于分区策略的高性能报文分类方法、设备及介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1992673A (zh) * 2005-12-31 2007-07-04 华为技术有限公司 一种高速路由器及防火墙中实现快速分组流识别的方法
CN1992674A (zh) * 2005-12-31 2007-07-04 华为技术有限公司 一种基于多比特分割的多维分组分类方法
WO2009145712A1 (en) * 2008-05-26 2009-12-03 Oricane Ab Method for data packet classification in a data communications network

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7039641B2 (en) 2000-02-24 2006-05-02 Lucent Technologies Inc. Modular packet classification
CN101345707B (zh) * 2008-08-06 2010-12-08 北京邮电大学 一种实现IPv6报文分类的方法及设备
CN101478551B (zh) 2009-01-19 2011-12-28 清华大学 基于多核处理器的多域网包分类方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1992673A (zh) * 2005-12-31 2007-07-04 华为技术有限公司 一种高速路由器及防火墙中实现快速分组流识别的方法
CN1992674A (zh) * 2005-12-31 2007-07-04 华为技术有限公司 一种基于多比特分割的多维分组分类方法
WO2009145712A1 (en) * 2008-05-26 2009-12-03 Oricane Ab Method for data packet classification in a data communications network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10026039B2 (en) 2012-04-01 2018-07-17 Huawei Technologies Co., Ltd Method and apparatus for generating decision tree
CN105072122A (zh) * 2015-08-19 2015-11-18 山东超越数控电子有限公司 一种数据包快速匹配分类方法

Also Published As

Publication number Publication date
EP2582096A4 (en) 2013-07-03
CN102308533B (zh) 2013-10-09
US8732110B2 (en) 2014-05-20
EP2582096B1 (en) 2016-03-30
CN102308533A (zh) 2012-01-04
EP2582096A1 (en) 2013-04-17
US20130166491A1 (en) 2013-06-27

Similar Documents

Publication Publication Date Title
WO2011085577A1 (zh) 对报文进行分类的方法及装置
US10460250B2 (en) Scope in decision trees
US9521082B2 (en) Methods and devices for creating, compressing and searching binary tree
US9208438B2 (en) Duplication in decision trees
US8937954B2 (en) Decision tree level merging
US6917946B2 (en) Method and system for partitioning filter rules for multi-search enforcement
US9595003B1 (en) Compiler with mask nodes
WO2004015937A2 (en) Logarithmic time range-based multifield-correlation packet classification
JP3881663B2 (ja) フィールドレベルツリーを用いたパケット分類装置及び方法
CN101345707A (zh) 一种实现IPv6报文分类的方法及设备
WO2013149555A1 (zh) 决策树的生成方法和装置
MacDavid et al. Concise encoding of flow attributes in SDN switches
WO2013078644A1 (zh) 路由前缀存储方法、装置及路由地址查找方法、装置
Yang et al. Fast OpenFlow table lookup with fast update
US6970971B1 (en) Method and apparatus for mapping prefixes and values of a hierarchical space to other representations
CN106789668B (zh) 一种处理报文的方法和装置
US11968286B2 (en) Packet filtering using binary search trees
Sahni et al. Data structures for one-dimensional packet classification using most-specific-rule matching
Macián et al. An evaluation of the key design criteria to achieve high update rates in packet classifiers
Chang Efficient multidimensional packet classification with fast updates
CN115865843A (zh) 规则存储方法、报文处理方法、装置、电子设备及介质
KR100662254B1 (ko) 라우팅 시스템에서의 패킷 분류 장치 및 이를 위한 룰 구축 방법
CN107948091B (zh) 一种网包分类的方法及装置
CN104348729B (zh) 一种软硬件结合的互联网流分类方法
CN115834340A (zh) 一种规则存储方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201080002602.0

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10842861

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2010842861

Country of ref document: EP