CN112688881B

CN112688881B - Network data packet classification method based on size domain rule division

Info

Publication number: CN112688881B
Application number: CN202011440073.4A
Authority: CN
Inventors: 宋磊; 李传宏; 吴京洪; 姜艳
Original assignee: Beijing Scv Technology Co ltd; Institute of Acoustics CAS
Current assignee: Beijing Scv Technology Co ltd; Institute of Acoustics CAS
Priority date: 2020-12-11
Filing date: 2020-12-11
Publication date: 2022-11-01
Anticipated expiration: 2040-12-11
Also published as: CN112688881A

Abstract

The invention discloses a network data packet classification method based on size domain rule division, which comprises the following steps: step 1) judging whether rule division based on a size domain is executed on a rule set or not according to the rule quantity of the rule set to be processed; if the division is needed, the step 2) is carried out, otherwise, the step 3) is carried out; step 2) for the rule set which needs to be divided, dividing the rule set into four rule subsets according to the value characteristics of each byte of the address field; step 3) preprocessing the four divided rule subsets or rule sets by adopting a dimension decomposition-based method, and accelerating the preprocessing process by adopting rule traversal based on boundary values and identifiers; step 4), taking the four preprocessed rule subsets or rule sets as rule classifiers; and 5) splitting the quintuple of the data packet to be classified according to a dimension decomposition-based method, inputting the data packet as a rule classifier, and obtaining a classification result.

Description

Network data packet classification method based on size domain rule division

Technical Field

The invention relates to the technical field of network data packet classification, in particular to a network data packet classification method based on size domain rule division.

Background

Packet classification is the matching of a received packet against a given set of rules and the execution of the action associated with the current rule based on the matching. In recent years, with the rapid development of network technologies, more and more network services, such as policy-based routing, network charging, firewall, and guarantee of service quality, all rely on packet classification technology. In the face of the explosive increase of network traffic, the performance of data packet classification gradually becomes a bottleneck based on packet forwarding service, and becomes a hot content of research of scholars at home and abroad in recent years.

According to relevant documents, the current common data packet classification algorithm is classified into the following categories, 1) an exhaustive search method; 2) Decision tree method; 3) Dimension decomposition method (dimension reduction method); 4) Tuple space method.

And comparing the data packets to be classified with all the rules in the classification rule set in sequence based on the data packet classification algorithm of exhaustive search to obtain a final matching result. The method is usually realized by hardware or the classification of data packets is accelerated by hardware, so that the linear speed packet classification can be realized, but the applicability of the method is greatly limited due to poor expansibility, high price and high energy consumption.

The decision tree-based data packet classification algorithm is to construct one or more decision trees to cover part or all of the rules in the rule set according to the characteristics of the rule set. The process of packet classification is to obtain the final matching rule or rule subset through traversal of the tree. And if the matching result is the rule subset, finding the best matching rule through simple linear matching. The method often depends on the characteristics of the rule sets, so that the classification performance of the method is greatly different for different rule sets, and meanwhile, when the current rule set is increased, the problem of memory explosion exists.

The data packet classification algorithm based on the tuple space divides the classification rule into a plurality of tuples according to the characteristic quantity of the classification rule on each dimension, each tuple correspondingly generates a hash table, the characteristic quantity refers to the number of specified bits in the rule, the specified bits are bits represented by non-wildcards, and the tuples on the dimensions are combined together to form the tuple space. The method supports quick updating by sacrificing certain classification performance, but the final quantity of hash tables is too much due to too strict definition of tuples, and meanwhile, when the hash conflicts are solved, only all conflicting items can be linearly searched to obtain the final matching rule, so that the method has an optimized space in the aspect of space-time complexity.

The main idea of the data packet classification algorithm based on dimension decomposition is to divide and conquer. And decomposing the multi-dimensional data packet classification problem into a plurality of one-dimensional matching problems, and combining the matching results of each dimension to obtain a final matching rule or a matching rule subset. The method does not depend on the characteristics of the rule set, so that the method is more suitable for meeting the requirements of different services; also, features of modern hardware, such as parallelism, can be used to speed up the pre-processing or classification process. Based on the two points, the development prospect of the data packet classification method based on the dimension decomposition is considered to be better in the industry. However, as the rule set increases, the memory requirement of the algorithm for storing the rule set and the preprocessing time for the rule set both increase dramatically.

Disclosure of Invention

The invention aims to solve the defects of the existing data packet classification algorithm based on dimension decomposition, and provides a network data packet classification method based on size domain rule division. For a rule set which does not need to be divided, preprocessing the rule set by directly adopting a method based on dimension decomposition, and meanwhile, accelerating the preprocessing process by adopting rule traversal and an identifier based on a boundary value in the preprocessing process; and for the rule set which needs to be divided, dividing the rule set into four rule subsets according to the value characteristics of each byte of the address field, and preprocessing the rule set for the divided rule subsets by adopting a dimension decomposition-based method. Through the processing, the purposes of reducing the preprocessing time of the rule set and reducing the memory consumption required by storing the rule set are achieved.

In order to achieve the above object, the present invention provides a method for classifying network data packets based on size domain rule division, wherein the method comprises:

step 1) judging whether rule division based on a size domain is executed on a rule set or not according to the rule quantity of the rule set to be processed; if the division is needed, the step 2) is carried out, otherwise, the step 3) is carried out;

step 2) for the rule set which needs to be divided, dividing the rule set into four rule subsets according to the value characteristics of each byte of the address field;

step 3) preprocessing the four divided rule subsets or rule sets by adopting a dimension decomposition-based method, and accelerating the preprocessing process by adopting rule traversal based on boundary values and identifiers;

step 4), taking the four preprocessed rule subsets or rule sets as rule classifiers;

and 5) splitting the quintuple of the data packet to be classified according to a dimension decomposition-based method, inputting the data packet as a rule classifier, and obtaining a classification result.

As an improvement of the above method, the determining, according to the number of rules of the rule set to be processed, whether to perform rule division based on size domain on the rule set; the method specifically comprises the following steps:

counting the rule number of a rule set to be processed, determining a threshold value according to a memory consumption model based on a dimension decomposition method, and when the rule number is greater than the threshold value, dividing the rule set; otherwise, no partitioning need be performed.

As a modification of the above method, the step 2) specifically includes:

resolving a source IP address realm into N₁Sub-segments of size 1 byte, noted

Given a threshold vector

Assume the length | C of each sub-section_iL is_iFor subsegments of the source IP address

If l is_i≤T_i，

The source address field is a small field, otherwise the source address field is a large field, i.e./_i＞T_i，

Decomposing a destination IP address field into N₂Sub-segments of size 1 byte, noted

Given a threshold vector

Assume the length | C of each sub-section_iL is_i(ii) a Subsegment for destination IP address

If l is_i≤T_i，

1，N]The destination address field is a small field, otherwise the destination address field is a large field, i.e./_i＞T_i，

For each rule in the rule set, the rule set is divided into the following four rule subsets according to the size domain classification of the source IP address domain and the destination IP address domain: (Src)_small，Dst_small)，(Src_small，Dst_big)，(Src_big，Dst_small)，(Src_big，Dst_big) (ii) a Wherein (Src)_small，Dst_small) A rule set indicating that the source IP address domain and the destination IP address domain are small domains; (Src)_small，Dst_big) A rule set indicating that the source IP address domain is a small domain and the destination IP address domain is a large domain; (Src)_big，Dst_small) A rule set indicating that the source IP address domain is a large domain and the destination IP address domain is a small domain; (Src)_big，Dst_big) A rule set indicating that the source IP address realm and the destination IP address realm are both large realms.

As an improvement of the above method, the step 3) specifically includes:

converting the rule set or the rule subset to be processed into a range representation form formed by combining a maximum value and a minimum value;

decomposing each rule in the rule set or rule subset into subblocks of 1 byte size, each subblock being a dimension, each rule having an upper limit and a lower limit, and assigning a size of 2 to each dimension⁸Each dimension represents a range of 0-255, i.e., the search space size of each dimension is 2⁸；

Each value in each dimension index table is compared with all rules under the dimension, whether the rule is met or not is checked, a bit vector BV is distributed, and the rule met by each value is recorded; BV is a string of bit strings, the length of the string is the number of rules in the rule set, each bit represents the number ID of the corresponding rule, if the number ID meets the rule, the number ID does not meet the number 0 of the position 1;

allocating an equivalence class table for each dimension, and storing a unique BV corresponding to each value in an index table and a BV identifier, wherein for each value in each dimension index table, when a bit vector BV is generated, the bit vector BV is only compared with the initial value of each rule in the dimension, which is recorded as Start, and the minimum integer which is greater than the end value, which is recorded as Min _ end, and whether the BV needs to be recorded is indicated through an identifier flag; if the current value is equal to the Start, setting the Bit position 1 corresponding to the rule and setting the flag to 1 to indicate that the current value is a new BV; if the current value is equal to Min _ end, setting the Bit corresponding to the rule to be 0, and keeping the flag unchanged at the moment;

and after all rules are traversed for the current value, if the value of the flag is 1, adding the corresponding BV into the equivalence class table, and setting the flag to be 0.

As a modification of the above method, the step 5) specifically includes:

extracting quintuple information of a data packet to be classified, wherein the quintuple information comprises a source IP address, a destination IP address, a source port, a destination port and a protocol number;

splitting the quintuple by taking bytes as units;

searching an index table of a corresponding dimension by taking each byte as an index on the current dimension, and acquiring the index of a corresponding equivalence class table until a bit vector BV corresponding to the dimension is obtained;

performing bitwise AND operation on BVs obtained in all dimensions to obtain a final BV; wherein the bit set to 1 indicates the rule that this packet will eventually satisfy; if there is more than one bit set to 1 in the final BV, indicating that multiple rules are satisfied, the first bit set to 1 is selected as the final result output, considering that the rule set is usually arranged in order of priority

Compared with the existing data packet classification method based on dimension decomposition, the invention has the beneficial effects that:

1. the method comprises the steps of firstly, determining whether the rule set needs to be divided or not according to the rule number of the rule set, adopting rule division based on a size domain to reduce the rule number needing to be processed each time for the rule set needing to be divided, adopting a dimension decomposition-based method to preprocess each rule subset, adopting rule traversal based on a boundary value and an identifier to accelerate the preprocessing process in the preprocessing process, reducing the preprocessing time and reducing the memory consumption for storing the rule set based on the processing;

2. according to the invention, through self-adaptive rule division based on a size domain and rule traversal based on a boundary value, the preprocessing time of the rule set and the memory consumption for storing the rule set are reduced while the high-speed data packet classification is realized.

Drawings

FIG. 1 is a flow chart of a method for classifying network packets based on size domain rule partitioning in accordance with the present invention;

FIG. 2 is a rule classifier constructed by the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

As shown in fig. 1, the present invention provides a method for classifying network packets based on size domain rule division, which comprises:

step 1) judging whether rule division based on a size domain is executed on a rule set according to the rule quantity of the rule set to be processed; if the division is needed, the step 2) is carried out, otherwise, the step 3) is carried out;

counting the number of rules to be processed, and according to a memory consumption model based on a dimension decomposition method, when the number of rules is greater than a preset threshold (the recommended value is 1000), executing division on a rule set; otherwise, it is not needed; the memory consumption model is:

the first half is the memory consumption of the index table, and the second half is the memory consumption of the equivalence class table. When the number of the rules is less, the first half part occupies most of the memory consumption, and the division of the small rule set can cause the index table to be increased by 4 times without reducing and increasing the memory consumption; with the increase of the rule set, the memory consumption of the equivalence class table occupies most, and the memory consumption of the storage rule can be reduced by dividing;

resolving a source IP address realm into N₁Sub-segments of size 1 byte, noted

Given a threshold vector

Assuming the length | C of each sub-segment_iL is_iFor subsegments of the source IP address

If l is_i≤T_i，

Decomposing a destination IP address field into N₂A sub-section of 1 byte size, noted

Given a threshold vector

If l is_i≤T_i，

The destination address field is a small field, otherwise the destination address field is a large field, i.e./_i＞T_i，

For each rule in the rule set, the rule set is divided into the following four rule subsets according to the size domain classification of the source IP address domain and the destination IP address domain: (Src)_small，Dst_small)，(Src_small，Dst_big)，(Src_big，Dst_small)，(Src_big，Dst_big). Wherein (Src)_small，Dst_small) A rule set indicating that the source IP address domain and the destination IP address domain are small domains; (Src)_small，Dst_big) A rule set indicating that the source IP address domain is a small domain and the destination IP address domain is a large domain; (Src)_big，Dst_small) A rule set indicating that the source IP address domain is a large domain and the destination IP address domain is a small domain; (Src)_big，Dst_big) A set of rules indicating that the source IP address realm and the destination IP address realm are both large realms.

Step 3) for the divided rule subsets or rule sets, preprocessing the rule subsets by adopting a method based on dimension decomposition; and adopting a rule traversal based on the boundary value and an identifier acceleration preprocessing process;

the steps are described in detail by taking a rule set as an example:

converting the rule set to be processed into a range representation form formed by combining a maximum value and a minimum value;

decomposing each rule into subblocks of 1 byte size, each subblock being a dimension, the value of each rule in the dimension having an upper limit value and a lower limit value, and allocating a size of 2 to each dimension⁸Is indexed (each dimension represents a range of 0-255, i.e., the search space size for each dimension is 2⁸) Also called a lookup table, enumerates all values of the dimension;

each value in each dimension index table is compared with all rules in the dimension to check whether the rule is satisfied, a Bit Vector (BV) is distributed, and the rule satisfied by each value is recorded. BV is a string of bit strings, the length of the string is the number of rules in the rule set, each bit represents the number ID of the corresponding rule, if the number ID meets the rule, the number ID does not meet the number 0 of the position 1;

allocating an equivalence class table for each dimension, and storing a unique BV and a BV identifier corresponding to each value in the index table;

for each value in each dimension index table, when generating a Bit Vector (BV), comparing the value with the initial value of each rule in the dimension, marked as Start, and the minimum integer greater than the end value, marked as Min _ end, and indicating whether the BV needs to be recorded through an identifier flag; if the current value is equal to the Start, setting the Bit position 1 corresponding to the rule and setting the flag to be 1 to indicate that the current value is a new BV; if the current value is equal to Min _ end, setting the Bit corresponding to the rule to 0, and keeping the flag unchanged at the moment. And after all rules of the current value are traversed, if the value of the flag is 1, adding the corresponding BV into the equivalence class table, and meanwhile, setting the flag to be 0. The above process may effectively avoid a large number of comparison operations to determine whether BV is unique, which may be used to speed up the pre-processing process.

The pre-processing of a subset of rules is the same as described above.

Step 4), taking the four preprocessed rule sets or rule subsets as rule classifiers;

step 5) splitting the quintuple of the data packet to be classified according to a dimension decomposition-based method, inputting the data packet as a rule classifier, and obtaining a classification result; the method specifically comprises the following steps:

splitting the quintuple by taking bytes as units;

searching an index table of a corresponding dimension by taking each byte as an index on the current dimension, and acquiring an index of a corresponding equivalent class table until a BV corresponding to the dimension is obtained;

and performing bitwise AND operation on the BVs obtained in all dimensions to obtain the final BV. In the final BV, the bit set to 1 indicates the rule that this packet will meet finally; if there is more than one bit set 1 in the final BV, indicating that multiple rules are satisfied, the first bit set 1 is selected as the final result output, considering that rule sets are usually prioritized.

The above method is described below with reference to an example.

As shown in table 1 for the rule set with 3 rules, let us set the threshold value for performing rule set division to 10, and since the number of rules is less than the threshold value for performing division, no division needs to be performed on the rule set. We directly adopt a dimension decomposition based approach to pre-process the rule set.

Table 1: conventional five-tuple packet classifier comprising 3 rules

Each rule in the rule set is first converted to a range representation composed of a maximum and a minimum, as shown in Table 2:

table 2: rule set expressed in scope

Then, each rule is decomposed into subblocks of 1 byte size, each subblock is a dimension, and the value of each rule in the dimension has an upper limit value and a lower limit value, as shown in table 3:

table 3: rule set after dimension decomposition

Each dimension is assigned a size of 2⁸Index table (each dimension represents a range of 0-255, i.e., the search space size of each dimension is 2⁸) Also called a lookup table, enumerates all values of the dimension;

each value in each dimension index table is compared with all rules in the dimension to check whether the rule is satisfied, a Bit Vector (Bit Vector) is distributed, and the rule satisfied by each value is recorded. BV is a string of bit strings, the length of the string is the number of rules in the rule set, each bit represents the ID of the number corresponding to the rule, if the rule is satisfied, the position 1 is not satisfied, and the position 0 is not satisfied; and allocating an equivalence class table for each dimension, and storing a unique BV and a BV identifier corresponding to each value in the index table.

The generated rule classifier is shown in fig. 2 after the preprocessing.

The process of data packet classification:

assume that the quintuple information of a packet is as follows: 192.168.8.63 (source IP address), 123.125.50.134 (destination IP address), 55951 (source port), 25 (destination port), 6 (protocol number). Splitting the quintuple by taking bytes as a unit, taking each byte as an index on the current dimension to search an index table on the corresponding dimension, acquiring the index of the corresponding equivalent class table until the BV corresponding to the dimension is acquired, and carrying out bitwise AND operation on the BVs acquired on all the dimensions to acquire the final BV. The final BV is 001 indicating that the current packet satisfies the first rule in the classifier.

Assuming we set the threshold needed to perform rule set partitioning to 2, since the number of rules is more than the partitioning threshold, the rule set needs to be subjected to size domain based rule partitioning. Let us assume that the threshold vector is T = (24, 24, 24, 24, 24), and the first and second rules in table 1 will both fall into the subset (Src) according to the definition of size domain_big，Dst_big) And the third rule will fall in the subset (Src)_big，Dst_small) In (1). And preprocessing the divided rule set by adopting a dimension decomposition-based method.

Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and are not limited. Although the present invention has been described in detail with reference to the embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A method for classifying network packets based on size domain rule partitioning, the method comprising:

the step 3) specifically comprises the following steps:

allocating an equivalence class table for each dimension, and storing a unique BV corresponding to each value in an index table and a BV identifier, wherein for each value in each dimension index table, when a bit vector BV is generated, the bit vector BV is only compared with the initial value of each rule in the dimension, which is recorded as Start, and the minimum integer which is greater than the end value, which is recorded as Min _ end, and whether the BV needs to be recorded is indicated through an identifier flag; if the current value is equal to the Start, setting the Bit position 1 corresponding to the rule and setting the flag to be 1 to indicate that the current value is a new BV; if the current value is equal to Min _ end, setting the Bit corresponding to the rule to be 0, and keeping the flag unchanged at the moment;

after all rules are traversed for the current value, if the value of the flag is 1, adding the corresponding BV into the equivalence class table, and setting the flag to be 0;

2. The method according to claim 1, wherein the method for classifying network packets based on size domain rule division determines whether to perform size domain based rule division on the rule set according to the number of rules of the rule set to be processed; the method specifically comprises the following steps:

counting the number of rules of a rule set to be processed, determining a threshold value according to a memory consumption model based on a dimension decomposition method, and when the number of the rules is larger than the threshold value, dividing the rule set; otherwise, no partitioning need be performed.

3. The method for classifying network packets based on size domain rule division according to claim 1, wherein the step 2) specifically comprises:

resolving a source IP address realm into N₁A sub-section of 1 byte size, noted

Given a threshold vector

Assuming the length | C of each sub-segment_iL is l_iFor subsegments of the source IP address

If it is not

The source address field is a small field, otherwise the source address field is a large field, i.e.

Resolving destination IP address realm into N₂A sub-section of 1 byte size, noted

Given a threshold vector

Assuming the length | C of each sub-segment_iL is l_i(ii) a Subsegment for destination IP address

If it is not

The destination address field is a small field, otherwise the destination address field is a large field, i.e.

For each rule in the rule set, the rule set is divided into the following four rule subsets according to the size domain classification of the source IP address domain and the destination IP address domain: (Src)_small,Dst_small)，(Src_small,Dst_big)，(Src_big,Dst_small)，(Src_big,Dst_big) (ii) a Wherein (Src)_small,Dst_small) A rule set indicating that the source IP address domain and the destination IP address domain are small domains; (Src)_small,Dst_big) A rule set indicating that the source IP address domain is a small domain and the destination IP address domain is a large domain; (Src)_big,Dst_small) A rule set indicating that the source IP address domain is a large domain and the destination IP address domain is a small domain; (Src)_big,Dst_big) A set of rules indicating that the source IP address realm and the destination IP address realm are both large realms.

4. The method for classifying network packets based on size domain rule division according to claim 2, wherein the step 5) specifically comprises:

splitting the quintuple by taking bytes as units;

searching an index table of a corresponding dimension by taking each byte as an index on the current dimension, and acquiring an index of a corresponding equivalence class table until a bit vector BV corresponding to the dimension is obtained;

performing bitwise AND operation on BVs obtained in all dimensions to obtain a final BV; wherein the bit set to 1 indicates the rule that this packet will eventually satisfy; if there is more than one bit set to 1 in the final BV, indicating that multiple rules are satisfied, the first bit set to 1 is selected as the final result output, considering that the rule sets are usually arranged in order of priority.