CN112688881A

CN112688881A - Network data packet classification method based on size domain rule division

Info

Publication number: CN112688881A
Application number: CN202011440073.4A
Authority: CN
Inventors: 宋磊; 李传宏; 吴京洪; 姜艳
Original assignee: Beijing Scv Technology Co ltd; Institute of Acoustics CAS
Current assignee: Beijing Scv Technology Co ltd; Institute of Acoustics CAS
Priority date: 2020-12-11
Filing date: 2020-12-11
Publication date: 2021-04-20
Anticipated expiration: 2040-12-11
Also published as: CN112688881B

Abstract

The invention discloses a network data packet classification method based on size domain rule division, which comprises the following steps: step 1) judging whether rule division based on a size domain is executed on a rule set or not according to the rule quantity of the rule set to be processed; if the division is needed, the step 2) is carried out, otherwise, the step 3) is carried out; step 2) for the rule set which needs to be divided, dividing the rule set into four rule subsets according to the value characteristics of each byte of the address field; step 3) preprocessing the four divided rule subsets or rule sets by adopting a dimension decomposition-based method, and accelerating the preprocessing process by adopting rule traversal based on boundary values and identifiers; step 4), taking the four preprocessed rule subsets or rule sets as rule classifiers; and 5) splitting the quintuple of the data packet to be classified according to a dimension decomposition-based method, inputting the data packet as a rule classifier, and obtaining a classification result.

Description

Network data packet classification method based on size domain rule division

Technical Field

The invention relates to the technical field of network data packet classification, in particular to a network data packet classification method based on size domain rule division.

Background

Packet classification is the matching of a received packet against a given set of rules and the execution of the action associated with the current rule based on the matching result. In recent years, with the rapid development of network technologies, more and more network services, such as policy-based routing, network charging, firewall, and quality of service guarantee, all rely on packet classification technology. In the face of the explosive increase of network traffic, the performance of data packet classification gradually becomes the bottleneck of packet-based forwarding service, and becomes a hot content of research of scholars at home and abroad in recent years.

According to relevant documents, the current common data packet classification algorithm is classified into the following categories, 1) an exhaustive search method; 2) decision tree method; 3) dimension decomposition method (dimension reduction method); 4) tuple space method.

And comparing the data packets to be classified with all rules in the classification rule set in sequence based on the data packet classification algorithm of exhaustive search to obtain a final matching result. The method is usually realized by hardware or the classification of data packets is accelerated by hardware, so that the linear speed packet classification can be realized, but the applicability of the method is greatly limited due to poor expansibility, high price and high energy consumption.

The decision tree-based data packet classification algorithm is to construct one or more decision trees to cover part or all of the rules in the rule set according to the characteristics of the rule set. The process of packet classification is to obtain the final matching rule or rule subset through traversal of the tree. And if the matching result is the rule subset, finding the best matching rule through simple linear matching. The method often depends on the characteristics of the rule sets, so that the classification performance of the method is greatly different from that of different rule sets, and meanwhile, when the current rule set is increased, the problem of memory explosion exists.

The data packet classification algorithm based on the tuple space divides the classification rule into a plurality of tuples according to the characteristic quantity of the classification rule on each dimension, each tuple correspondingly generates a hash table, the characteristic quantity refers to the number of designated bits in the rule, the designated bits are bits represented by non-wildcards, and the tuples on the dimensions are combined together to form the tuple space. The method supports quick updating by sacrificing certain classification performance, but the number of the final hash tables is too large due to too strict definition of tuples, and meanwhile, when hash conflicts are solved, all conflicting items can only be searched linearly to obtain a final matching rule, so that the method has an optimized space in the aspect of space-time complexity.

The main idea of the data packet classification algorithm based on dimension decomposition is to divide and conquer. And decomposing the multi-dimensional data packet classification problem into a plurality of one-dimensional matching problems, and combining the matching results of each dimension to obtain a final matching rule or a matching rule subset. The method does not depend on the characteristics of the rule set, so that the method is more suitable for meeting the requirements of different services; also, features of modern hardware, such as parallelism, can be used to speed up the pre-processing or classification process. Based on the two points, the development prospect of the data packet classification method based on the dimension decomposition is considered to be better in the industry. However, as the rule set increases, the memory requirement of the algorithm for storing the rule set and the preprocessing time for the rule set both increase dramatically.

Disclosure of Invention

The invention aims to provide a network data packet classification method based on size domain rule division for solving the defects of the existing data packet classification algorithm based on dimension decomposition. For a rule set which does not need to be divided, preprocessing the rule set by directly adopting a method based on dimension decomposition, and meanwhile, accelerating the preprocessing process by adopting rule traversal and an identifier based on a boundary value in the preprocessing process; and for the rule set which needs to be divided, dividing the rule set into four rule subsets according to the value characteristics of each byte in the address field, and preprocessing the rule set for the divided rule subsets by adopting a dimension decomposition-based method. Through the processing, the purposes of reducing the preprocessing time of the rule set and reducing the memory consumption required by storing the rule set are achieved.

In order to achieve the above object, the present invention provides a method for classifying network data packets based on size domain rule division, wherein the method comprises:

step 1) judging whether rule division based on a size domain is executed on a rule set or not according to the rule quantity of the rule set to be processed; if the division is needed, the step 2) is carried out, otherwise, the step 3) is carried out;

step 2) for the rule set which needs to be divided, dividing the rule set into four rule subsets according to the value characteristics of each byte of the address field;

step 3) preprocessing the four divided rule subsets or rule sets by adopting a dimension decomposition-based method, and accelerating the preprocessing process by adopting rule traversal based on boundary values and identifiers;

step 4), taking the four preprocessed rule subsets or rule sets as rule classifiers;

and 5) splitting the quintuple of the data packet to be classified according to a dimension decomposition-based method, inputting the data packet as a regular classifier, and obtaining a classification result.

As an improvement of the above method, the determining, according to the number of rules of the rule set to be processed, whether to perform rule division based on the size domain on the rule set; the method specifically comprises the following steps:

counting the number of rules of a rule set to be processed, determining a threshold value according to a memory consumption model based on a dimension decomposition method, and when the number of the rules is larger than the threshold value, dividing the rule set; otherwise, no partitioning need be performed.

As an improvement of the above method, the step 2) specifically includes:

resolving a source IP address realm into N₁Sub-segments of size 1 byte, noted

Given a threshold vector

Assuming the length | C of each sub-segment_iL is_iFor subsegments of the source IP address

If it is not

The source address field is a small field, otherwise the source address field is a large field, i.e.

Decomposing a destination IP address field into N₂Sub-segments of size 1 byte, noted

Given a threshold vector

Assuming the length | C of each sub-segment_iL is_i(ii) a Subsegment for destination IP address

If it is not

The destination address field is a small field, otherwise the destination address field is a large field, i.e.

For each rule in the rule set, the rule set is divided into the following four rule subsets according to the size domain classification of the source IP address domain and the destination IP address domain: (Src)_small，Dst_small)，(Src_small，Dst_big)，(Src_big，Dst_small)， (Src_big，Dst_big) (ii) a Wherein (Src)_small，Dst_small) A rule set indicating that the source IP address domain and the destination IP address domain are small domains; (Src)_small，Dst_big) A rule set indicating that the source IP address domain is a small domain and the destination IP address domain is a large domain; (Src)_big，Dst_small) A rule set indicating that the source IP address domain is a large domain and the destination IP address domain is a small domain; (Src)_big，Dst_big) A rule set indicating that the source IP address realm and the destination IP address realm are both large realms.

As an improvement of the above method, the step 3) specifically includes:

converting the rule set or the rule subset to be processed into a range representation form formed by combining a maximum value and a minimum value;

decomposing each rule in the rule set or rule subset into subblocks of 1 byte size, each subblock being a dimension, each rule having an upper limit and a lower limit, and assigning a size of 2 to each dimension⁸Each dimension represents a range of 0-255, i.e., the search space size of each dimension is 2⁸；

Each value in each dimension index table is compared with all rules under the dimension, whether the rule is met or not is checked, a bit vector BV is distributed, and the rule met by each value is recorded; BV is a string of bit strings, the length of the string is the number of rules in the rule set, each bit represents the ID of the number corresponding to the rule, if the rule is satisfied, the position 1 is not satisfied, and the position 0 is not satisfied;

allocating an equivalence class table for each dimension, and storing a unique BV corresponding to each value in an index table and a BV identifier, wherein for each value in each dimension index table, when generating a bit vector BV, only the Start value, marked as Start, and the minimum integer greater than the end value of each rule in the dimension are compared with the Start value, marked as Min _ end, and the identifier flag indicates whether the BV needs to be recorded; if the current value is equal to the Start, setting the Bit position 1 corresponding to the rule and setting the flag to 1 to indicate that the current value is a new BV; if the current value is equal to Min _ end, setting the Bit corresponding to the rule to be 0, and keeping the flag unchanged at the moment;

and after all rules of the current value are traversed, if the value of the flag is 1, adding the corresponding BV into the equivalent table, and setting the flag to be 0.

As an improvement of the above method, the step 5) specifically includes:

extracting quintuple information of a data packet to be classified, wherein the quintuple information comprises a source IP address, a destination IP address, a source port, a destination port and a protocol number;

splitting the quintuple by taking bytes as units;

searching an index table of a corresponding dimension by taking each byte as an index on the current dimension, and acquiring the index of a corresponding equivalence class table until a bit vector BV corresponding to the dimension is obtained;

performing bitwise AND operation on BVs obtained in all dimensions to obtain a final BV; wherein the bit set to 1 represents the rule that this packet will eventually satisfy; if there is more than one bit set to 1 in the final BV, indicating that multiple rules are satisfied, the first bit set to 1 is selected as the final result output, considering that the rule set is usually ordered in priority order

Compared with the existing data packet classification method based on dimension decomposition, the invention has the beneficial effects that:

1. the method comprises the steps of firstly, determining whether the rule set needs to be divided or not according to the rule number of the rule set, adopting rule division based on a size domain to reduce the rule number needing to be processed each time for the rule set needing to be divided, adopting a dimension decomposition-based method to preprocess each rule subset, adopting rule traversal based on a boundary value and an identifier to accelerate the preprocessing process in the preprocessing process, reducing the preprocessing time and reducing the memory consumption for storing the rule set based on the processing;

2. according to the invention, through self-adaptive rule division based on a size domain and rule traversal based on a boundary value, the preprocessing time of the rule set and the memory consumption for storing the rule set are reduced while the high-speed data packet classification is realized.

Drawings

FIG. 1 is a flow chart of a method for classifying network packets based on size domain rule partitioning in accordance with the present invention;

FIG. 2 is a rule classifier constructed by the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

As shown in fig. 1, the present invention provides a method for classifying network packets based on size domain rule division, which comprises:

counting the number of rules to be processed, and according to a memory consumption model based on a dimension decomposition method, when the number of rules is greater than a preset threshold value (the recommended value is 1000), dividing the rule set; otherwise, it is not needed; the memory consumption model is:

the first half is the memory consumption of the index table, and the second half is the memory consumption of the equivalence class table. When the number of the rules is less, the first half part occupies most of the memory consumption, and the division of the small rule set can cause the index table to be increased by 4 times without reducing and increasing the memory consumption; with the increase of the rule set, the memory consumption of the equivalence class table occupies most, and the memory consumption of the storage rule can be reduced by dividing;

step 2) dividing the rule set which needs to be divided into four rule subsets according to the value characteristics of each byte of the address field;

resolving a source IP address realm into N₁Sub-segments of size 1 byte, noted

Given a threshold vector

If it is not

Given a threshold vector

If it is not

For each rule in the rule set, the rule set is divided into the following four rule subsets according to the size domain classification of the source IP address domain and the destination IP address domain: (Src)_small，Dst_small)，(Src_small，Dst_big)，(Src_big，Dst_small)， (Src_big，Dst_big). Wherein (Src)_small，Dst_small) A rule set indicating that the source IP address domain and the destination IP address domain are small domains; (Src)_small，Dst_big) A rule set indicating that the source IP address domain is a small domain and the destination IP address domain is a large domain; (Src)_big，Dst_small) A rule set indicating that the source IP address domain is a large domain and the destination IP address domain is a small domain; (Src)_big，Dst_big) A rule set indicating that the source IP address realm and the destination IP address realm are both large realms.

Step 3) for the divided rule subsets or rule sets, preprocessing the rule subsets by adopting a dimension decomposition-based method; and adopting a rule traversal based on the boundary value and an identifier acceleration preprocessing process;

the steps are described in detail by taking a rule set as an example:

converting the rule set to be processed into a range representation form formed by combining a maximum value and a minimum value;

decomposing each rule into subblocks of 1 byte size, each subblock being a dimension, the value of each rule in the dimension having an upper limit value and a lower limit value, and allocating a size of 2 to each dimension⁸Is indexed (each dimension represents a range of 0-255, i.e., the search space size for each dimension is 2⁸) Also called a look-up table, enumerates the sameAll values of dimension;

each value in each dimension index table is compared with all rules in the dimension, whether the rule is met or not is checked, a Bit Vector (BV) is distributed, and the rule met by each value is recorded. BV is a string of bits, the length of the string is the number of rules in the rule set, each bit represents the ID of the number corresponding to the rule, if the rule is satisfied, the position 1 is not satisfied, and if the rule is not satisfied, the position 0 is not satisfied;

allocating an equivalence class table for each dimension, and storing a unique BV and a BV identifier corresponding to each value in the index table;

for each value in each dimension index table, when generating a Bit Vector (BV), comparing the value with the Start value, marked as Start, and the minimum integer greater than the end value, marked as Min _ end of each rule in the dimension, and indicating whether the BV needs to be recorded through an identifier flag; if the current value is equal to the Start, setting the Bit position 1 corresponding to the rule and setting the flag to 1 to indicate that the current value is a new BV; if the current value is equal to Min _ end, setting the Bit corresponding to the rule to 0, and keeping the flag unchanged at the moment. And after all rules of the current value are traversed, if the value of the flag is 1, adding the corresponding BV into the equivalence class table, and meanwhile, setting the flag to be 0. The above process may effectively avoid a large number of comparison operations to determine whether BV is unique, which may be used to speed up the pre-processing process.

The pre-processing of a subset of rules is the same as described above.

Step 4), taking the four preprocessed rule sets or rule subsets as rule classifiers;

step 5) splitting the quintuple of the data packet to be classified according to a dimension decomposition-based method, inputting the data packet as a regular classifier, and obtaining a classification result; the method specifically comprises the following steps:

splitting the quintuple by taking bytes as units;

searching an index table of a corresponding dimension by taking each byte as an index on the current dimension, and acquiring the index of a corresponding equivalent class table until a BV corresponding to the dimension is obtained;

and performing bitwise AND operation on the BVs obtained in all dimensions to obtain the final BV. In the final BV, the bit set to 1 indicates the rule that this packet will meet finally; if there is more than one bit set to 1 in the final BV, indicating that multiple rules are satisfied, the first bit set to 1 is selected as the final result output, considering that the rule sets are usually arranged in order of priority.

The above method is described below with reference to an example.

As shown in the rule set containing 3 rules in table 1, it is assumed that we set the threshold value for performing rule set division to 10, and since the number of rules is less than the threshold value for performing division, division does not need to be performed on the rule set. We directly adopt a dimension decomposition based approach to pre-process the rule set.

Table 1: conventional five-tuple packet classifier comprising 3 rules

Each rule in the rule set is first converted to a range representation composed of a maximum and a minimum, as shown in Table 2:

table 2: rule set expressed in scope

Then, each rule is decomposed into sub-blocks of 1 byte size, each sub-block is a dimension, and the value of each rule in the dimension has an upper limit value and a lower limit value, as shown in table 3:

table 3: rule set after dimension decomposition

Each dimension is assigned a size of 2⁸Is indexed (each dimension represents a range of 0-255, i.e., the search space size of each dimension is 2⁸) Also called a lookup table, enumerates all values of the dimension;

each value in each dimension index table is compared with all rules in the dimension, whether the rule is met or not is checked, a Bit Vector (Bit Vector) is distributed, and the rule met by each value is recorded. BV is a string of bit strings, the length of the string of bit strings is the number of rules in the rule set, each bit represents the ID of the number of the corresponding rule, if the rule is satisfied, the position 1 is not satisfied, and the position 0 is not satisfied; and allocating an equivalence class table for each dimension, and storing a unique BV and a BV identifier corresponding to each value in the index table.

The generated rule classifier is shown in fig. 2 after the preprocessing.

The process of data packet classification:

assume that the five tuple information of the packet is as follows: 192.168.8.63 (source IP address), 123.125.50.134 (destination IP address), 55951 (source port), 25 (destination port), 6 (protocol number). Splitting the quintuple by taking bytes as a unit, taking each byte as an index on the current dimension to search the index table on the corresponding dimension, acquiring the index of the corresponding equivalent class table until acquiring the BV corresponding to the dimension, and carrying out bitwise AND operation on the BVs acquired on all the dimensions to acquire the final BV. The final BV is 001 indicating that the current packet satisfies the first rule in the classifier.

Assuming we set the threshold needed to perform rule set partitioning to 2, since the number of rules is greater than the partition threshold, size domain based rule partitioning needs to be performed on the rule set. Let us assume that the threshold vector is T ═ 24, 24, 24, 24, 24, and the first and second rules in table 1 will fall into subsets (Src) according to the definition of size domain_big，Dst_big) And the third rule will fall within the subset (Src)_big，Dst_small) In (1). For the divided rule set, dimension-based decomposition is adoptedThe method of (3) is carried out as a pretreatment.

Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and are not limited. Although the present invention has been described in detail with reference to the embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A method for classifying network packets based on size domain rule partitioning, the method comprising:

and 5) splitting the quintuple of the data packet to be classified according to a dimension decomposition-based method, inputting the data packet as a rule classifier, and obtaining a classification result.

2. The method according to claim 1, wherein the determining whether to perform size-domain-based rule partition on the rule set is performed according to the number of rules of the rule set to be processed; the method specifically comprises the following steps:

3. The method for classifying network packets based on size domain rule division according to claim 1, wherein the step 2) specifically comprises:

resolving a source IP address realm into N₁Sub-segments of size 1 byte, noted

Given a threshold vector

If l is_i≤T_i，

The source address field is a small field, otherwise the source address field is a large field, i.e./_i＞T_i，

Given a threshold vector

If l is_i≤T_i，

The destination address field is a small field, otherwise the destination address field is a large field, i.e./_i＞T_i，

For each rule in the rule set, the rule set is divided into the following four rule subsets according to the size domain classification of the source IP address domain and the destination IP address domain: (Src)_small，Dst_small)，(Src_small，Dst_big)，(Src_big，Dst_small)，(Src_big，Dst_big) (ii) a Wherein (Src)_small，Dst_small) A rule set indicating that the source IP address domain and the destination IP address domain are small domains; (Src)_small，Dst_big) A rule set indicating that the source IP address domain is a small domain and the destination IP address domain is a large domain; (Src)_big，Dst_small) A rule set indicating that the source IP address domain is a large domain and the destination IP address domain is a small domain; (Src)_big，Dst_big) A rule set indicating that the source IP address realm and the destination IP address realm are both large realms.

4. The method for classifying network packets based on size domain rule division according to claim 2, wherein the step 3) specifically comprises:

set of rules or rulesEach rule in the subset is decomposed into subblocks with the size of 1 byte, each subblock is a dimension, the value of each rule in the dimension is provided with an upper limit value and a lower limit value, and each dimension is allocated with a size of 2⁸Each dimension represents a range of 0-255, i.e., the search space size of each dimension is 2⁸；

allocating an equivalence class table for each dimension, and storing a unique BV and a unique BV identifier corresponding to each value in an index table, wherein for each value in each dimension index table, when generating a bit vector BV, the bit vector BV is only compared with the initial value of each rule in the dimension, which is marked as Start, and the minimum integer which is greater than the end value, which is marked as Min _ end, and the identifier flag indicates whether the BV needs to be recorded; if the current value is equal to the Start, setting the Bit position 1 corresponding to the rule and setting the flag to 1 to indicate that the current value is a new BV; if the current value is equal to Min _ end, setting the Bit corresponding to the rule to be 0, and keeping the flag unchanged at the moment;

and after all rules of the current value are traversed, if the value of the flag is 1, adding the corresponding BV into the equivalence class table, and meanwhile, setting the flag to be 0.

5. The method for classifying network packets according to claim 4, wherein the step 5) specifically comprises:

splitting the quintuple by taking bytes as units;

performing bitwise AND operation on BVs obtained in all dimensions to obtain a final BV; wherein the bit set to 1 indicates the rule that this packet will eventually satisfy; if there is more than one bit set to 1 in the final BV, indicating that multiple rules are satisfied, the first bit set to 1 is selected as the final result output, considering that the rule sets are usually arranged in order of priority.