WO2016201930A1 - 流分类方法及装置、存储介质 - Google Patents

流分类方法及装置、存储介质 Download PDF

Info

Publication number
WO2016201930A1
WO2016201930A1 PCT/CN2015/097442 CN2015097442W WO2016201930A1 WO 2016201930 A1 WO2016201930 A1 WO 2016201930A1 CN 2015097442 W CN2015097442 W CN 2015097442W WO 2016201930 A1 WO2016201930 A1 WO 2016201930A1
Authority
WO
WIPO (PCT)
Prior art keywords
rule
rules
feature quantity
subset
hash
Prior art date
Application number
PCT/CN2015/097442
Other languages
English (en)
French (fr)
Inventor
程晨
Original Assignee
深圳市中兴微电子技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市中兴微电子技术有限公司 filed Critical 深圳市中兴微电子技术有限公司
Priority to US15/568,857 priority Critical patent/US20180107759A1/en
Priority to EP15895495.8A priority patent/EP3276501B1/en
Publication of WO2016201930A1 publication Critical patent/WO2016201930A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9014Indexing; Data structures therefor; Storage structures hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/137Hash-based
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24532Query optimisation of parallel queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/435Filtering based on additional data, e.g. user or group profiles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2441Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]

Definitions

  • the invention relates to a network switching technology, in particular to a stream classification method and device based on parallel hash search and a storage medium.
  • the interface rate of the core router used for the backbone network interconnection reaches 400 Gbps, which requires the core router's Access Control List (ACL) lookup module to support the large-capacity rule set.
  • ACL Access Control List
  • ACL search involves multiple matching methods in multiple domains, including exact matching, prefix matching, mask matching, and range matching.
  • the data structure is complex.
  • these structures are difficult to implement on hardware and therefore cannot meet the high-speed lookup requirements.
  • hardware search methods based on Ternary Content Addressable Memory (TCAM) are popular.
  • TCAM Ternary Content Addressable Memory
  • TCAM is a widely used technology in the current search module.
  • the search process is simple. However, to meet the 400Gbps search rate requirement, multiple TCAM scheduling searches are required. Power consumption and heat generation will become serious problems.
  • IPv6 Internet Protocol 6
  • the development of Protocol Version 6 the width of the rule is greatly increased compared with IPv4, so that the search cost and power consumption based on TCAM are further improved.
  • the commonly used IPv6 key length is 640 bits. For the traditional TCAM, it is necessary to initiate two consecutive search instructions. A 640bit key value search, the search performance will also decline.
  • the search tree structure is the main one, and the typical representative has a multi-level search tree (Trie tree) which is distinguished by bits.
  • Trie tree multi-level search tree
  • Hicut algorithm and Hypercut algorithm which are distinguished according to the characteristics of the rule set, and there are also rule rules.
  • the domain distinguishes a series of files (RFC, Request For Comments), etc.
  • Their common feature is to group rule sets into different types to achieve fast search.
  • these algorithms have a common limitation, that is, some rules need to be copied to meet the needs of algorithm completeness.
  • the rule set size is small (generally below 1k), there are not many rules that need to be copied, and the impact on the storage space is not large; but when the rule set size continues to expand, the impact of replication cannot be ignored.
  • the space required for storing the hash key value set corresponding to each rule in each type of rule subset is geometrically grown.
  • the size of the rule set is tens of thousands or even hundreds of thousands of dollars, and the overhead of the above algorithm on memory is huge, and it is difficult to implement in practical applications.
  • an embodiment of the present invention provides a flow classification method and device, and a storage medium.
  • the rules in the rule set are composed of one or more feature quantities, and each feature quantity is represented by a corresponding feature quantity value;
  • the plurality of rules in the rule set are classified according to the selected feature quantity, and more than one rule subset is obtained, including:
  • the first feature quantity or the first group of feature quantities supports a hash search mode.
  • the plurality of rules in the rule set are classified according to the selected feature quantity, and more than one rule subset is obtained, and the method further includes:
  • all rules with different feature quantity values of one or more feature quantities are selected and combined into rule subsets until the rule classification in the rule set is completed.
  • the method further includes:
  • the current feature quantity is replaced, and all rules with different feature quantity values are selected and combined into a rule subset.
  • the method further includes:
  • the classification is terminated, and rules other than the subset of rules are combined into a subset of rules.
  • the hashing of each of the classified rule subsets includes:
  • Hash entries are used to store rules for each type of rule subset.
  • the classification unit is configured to classify the plurality of rules in the rule set according to the selected feature quantity to obtain one or more rule subsets;
  • a storage unit configured to hash the classified rule subsets
  • the lookup unit is configured to perform a hash lookup in each parallel hash storage unit according to the hash key value of the rule when performing rule lookup.
  • the rules in the rule set are composed of one or more feature quantities.
  • Each feature quantity is represented by a corresponding feature quantity value;
  • the classifying unit is further configured to select, for the first feature quantity or the first group of feature quantity in the rule, all the rules with different feature quantity values, and combine into a first type rule subset;
  • the first feature quantity or the first group of feature quantity supports a hash search mode.
  • the classification unit is further configured to select, for rules other than the first type of rule subset in the rule set, all the rules with different feature quantity values of the second feature quantity or the second set of feature quantity, and combine The second type of rule subset; for rules other than the rule subsets in the rule set, all rules with different feature quantity values of one or more feature quantities are selected and combined into rule subsets until the rule classification in the rule set is completed.
  • the classification unit is further configured to: when the number of rules in the classified rule subset is less than or equal to the first threshold, replace the current feature quantity, select all rules with different feature quantity values, and combine the rules into rules. Subset.
  • the classification unit is further configured to terminate the classification when the number of rules other than the rule subset is less than or equal to the second threshold in the rule set, and divide the rule set except the rule subset except all the rule subsets. Form a subset of rules.
  • the storage unit is further configured to set a hash key value corresponding to each rule in each type of rule subset: set the feature quantity used for the classification to a corresponding feature quantity value; The feature quantity is set to 0; the hash entries are used to store the rules of each type of rule subset.
  • the storage unit is composed of a certain number of parallel hash storage units to form a storage array, and the problem of excessive hash conflict and classification rules is solved by providing high parallelism.
  • a storage medium having stored therein a computer program configured to perform the stream classification method.
  • a series of points are determined according to a rule set in an actual application.
  • the principle of class which classifies all rules uniformly, and divides large-scale rule sets into a large number of small-scale rule subsets.
  • the composition of these subsets of rules is determined by the characteristics of the macro rules, and does not depend entirely on the pre-judgment and pre-definition of the rule types. Therefore, the classification method is flexible.
  • the categorized rule subsets are hashed, and the rule search is performed according to the hash key value of the rule in each parallel hash unit.
  • the technical solution of the embodiment of the present invention does not depend on the TCAM implementation, and is guaranteed. High-performance table lookup requirements, effectively reduce memory requirements, suitable for hardware implementation, to meet the needs of large-capacity, high-speed packet classification.
  • FIG. 1 is a schematic flowchart diagram of a flow classification method according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of classifying by a single feature quantity according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of classifying by using a plurality of feature quantities according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of classifying a plurality of feature quantities including a mask feature quantity according to an embodiment of the present invention
  • FIG. 5 is a schematic block diagram of a hash algorithm unit according to an embodiment of the present invention.
  • FIG. 6 is a block diagram showing a flow classification search by a parallel hash group according to an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a stream classification apparatus according to an embodiment of the present invention.
  • FIG. 1 is a schematic flowchart of a flow classification method according to an embodiment of the present invention. As shown in FIG. 1 , the method includes the following steps:
  • Step 101 classify multiple rules in the rule set according to the selected feature quantity to obtain one or more rule subsets.
  • the current feature quantity is replaced, and all rules with different feature quantity values are selected and combined into a rule subset.
  • the classification is terminated, and the rules except the rule subsets of the rule set are combined into one type of rule subset.
  • a series of classification principles are formulated according to a rule set in an actual application, and all rules are uniformly classified, and a large-scale rule set is divided into a plurality of small-scale rule subsets.
  • the composition of these subsets of rules is determined by the characteristics of the macro rules, and does not depend entirely on the pre-judgment and pre-definition of the rule types, but is flexible.
  • Each rule in the actual application is composed of the feature quantities of multiple data streams. In different applications, it has the same or similar composition principle.
  • the general rules consist of a quintuple (ie source IP, destination IP, source port number, destination port number, protocol number) and other feature quantities.
  • a quintuple ie source IP, destination IP, source port number, destination port number, protocol number
  • each rule must have different feature quantity values that are different from other rules, and the rules are classified according to the feature quantity values of each rule.
  • Step 1 First, examine a certain feature quantity, and take a rule with a unique feature quantity value from the rule set to form a rule subset; in particular, when a certain feature quantity value exists in multiple rules, Take a rule and leave other rules.
  • the feature quantity may be skipped and other feature quantities may be selected for classification.
  • Step 2 If there are still rules in the rule set that are not classified after the first step, the remaining rules are inspected according to other feature quantities, and the rules with unique feature quantity values are taken out to form other rule subsets; if one of the rules is examined If the rules of the rule set can be classified, the classification is stopped.
  • Step 3 If there are still rules in the rule set that are not classified after step 2, repeat steps 1 and 2 to review the classification again to obtain a subset of rules; if one of the inspections can classify the rules in the rule set, Stop classification.
  • Step 4 If there are still rules in the rule set that are not classified after step 3, the combination of any two or more sets of feature quantities (excluding range matching) is performed to examine the unique feature quantity value, and the corresponding rule is taken out; When a survey can classify the rules in the rule set, the classification is stopped.
  • the matching rule is preferentially combined as a perfect match or a prefix; in the embodiment of the present invention, a reasonable mask is designed for the feature quantity of any mask matching, and the classification is performed.
  • designing a reasonable mask means that the bits of the classified feature quantity that can be "don't care" are masked, and are not considered as factors in the hash calculation; that is, the feature quantity bits that are not used for classification are set to zero.
  • the reasonable mask uses Xzzzz to mark each feature quantity in the rule, where 'X' indicates that each bit of the corresponding feature quantity is '1', ' z' indicates that each bit of the corresponding feature quantity is '0'.
  • Step 5 After the above steps are iterated a limited number of times, all the rules are classified. In some cases, the number of rules categorized in the last few iterations is likely to be too small, increasing the storage open. Pin, so when the number of remaining rules is less than a certain degree, the classification can be terminated, and the remaining rules are composed into a subset of rules.
  • the remaining rules need to expand the range matching feature quantity value according to the prefix, and expand a range matching rule into multiple prefix matching. the rule of.
  • a large-scale rule set can be classified into a finite number of rule subsets, and each rule subset contains a certain number of rules.
  • Step 102 Hash the classified subset of rules.
  • the hash key value corresponding to each rule in each type of rule subset is set: the feature quantity used for the classification is set as the corresponding feature quantity value; the unused feature quantity of the classification is set to 0;
  • the Greek entry stores the rules for each type of rule subset.
  • the hash key value corresponding to each rule subset is: only the combination of the feature quantity or the feature quantity used when classifying the original rule entries is retained, and the remaining unused feature quantities are replaced by '0'.
  • Each hash entry holds a rule and its corresponding result and priority information.
  • a subset of rules can occupy multiple hash storage units, and a hash storage unit can only be assigned to a subset of rules with a corresponding mask configuration.
  • Step 103 When performing rule lookup, hash search is performed in each parallel hash storage unit according to the hash key value of the rule.
  • the hash key value of the rule is hashed in each parallel hash unit; the key value is in each hash storage unit, and the initial key is first configured by using the mask configured by the hash storage unit. The value becomes the corresponding key value of the hash storage unit, and then the hash calculation of the hash storage unit is performed. After the address is obtained, the data information of the corresponding address is read, and the rule matching is compared. If the match is successful, the rule entry is hit; the parallel hash group arbitrates all matching rules once, and according to the flow classification principle, the target rule with the highest priority is selected as the return result.
  • CACHE cache
  • the large-scale rule set is classified into a plurality of small-scale rule subsets, and the rule subsets can be hash-searched, so that the parallel hash group is used to implement storage and searching of the large-scale rule set.
  • FIG. 2 is a schematic diagram of classifying by a single feature quantity according to an embodiment of the present invention. It is assumed that there are seven rules ⁇ R1, R2, R3, R4, R5, R6, R7 ⁇ in the rule set G, and they are composed of five independent features. The components are composed, and all feature quantities are assumed to be exact match fields, and the feature quantities are represented by letters, as shown by 201 in FIG.
  • the classification step 1 of the embodiment of the present invention a certain feature quantity of each rule is examined, and the first feature quantity is examined here. It can be found that the first feature quantity is 'A' with ⁇ R1, R2, R3, R4 ⁇ , the 'Q' has ⁇ R5, R6 ⁇ , the 'P' has ⁇ R7 ⁇ ; the slave has 'A' Characteristic quantity R1 is proposed in the rule group of values, R5 is proposed from the rule group of 'Q' feature quantity values, rule R7 with 'P' feature quantity value is proposed, and ⁇ R1, R5, R7 ⁇ is composed of the first rule subset 202 The remaining rule set 203 contains four rules of ⁇ R2, R3, R4, R6 ⁇ .
  • another feature quantity is taken to examine the remaining rules, and the second feature quantity is examined here. It can be found that the rules with the 'H' and 'B' feature quantity values each occupy 2, so R2 with the 'H' feature quantity value and R3 with the 'B' feature quantity value are taken out to form the second rule subset. 204; the remaining rule sets are classified according to the third feature quantity, and it is found that R4 and R6 have different feature quantity values 'C' and 'N', respectively, so the remaining rule sets can be directly distinguished by the third feature quantity, which constitutes the first Three rule subsets 205.
  • the three rule subsets obtained by the above classification are respectively stored in a hash table.
  • the first rule subset is marked with a mask Xzzzz, where 'X' indicates that each bit of the corresponding feature quantity is '1', and 'z' indicates that each bit of the corresponding feature quantity is '0'. Therefore, when the hash table is inserted, the hash keys of R1, R5, and R7 are A0000, Q0000, and P0000, respectively. They can be hashed into the hash table.
  • Each rule calculates a different storage address based on the respective hash key value, and then stores each rule in the corresponding address.
  • the conflicting key value is replaced by a hash table, for example, in a hash table, stored in location A.
  • a hash table for example, in a hash table, stored in location A.
  • Q0000 or A0000 is replaced by another hash table to avoid conflicts.
  • Parallel hashing is used to eliminate the effect of hash collisions on hash padding. Since a sufficient number of small hash tables are used as storage and lookup means in implementation, the number of hash parallelisms can be considered sufficient to eliminate the existence of hash collisions.
  • the second rule subset is marked with the mask zXzzz, and the hash keys used by R2 and R3 for hash calculation are 0H000 and 0B000 respectively; they are inserted into the unused hash table;
  • the third rule is The set is marked with the mask 00X00, and the hash key values of R4 and R6 are 00C00 and 00N00, respectively, and they are inserted into the unused hash table.
  • the first feature quantity is 'A' with ⁇ R1, R2, R3, R4 ⁇ , 'Q' with ⁇ R5, R6 ⁇ , 'P' with ⁇ R7 ⁇ ; with 'A' feature R2 is proposed in the rule group of the quantity value, R6 is proposed from the rule group of the 'Q' feature quantity value, and the rule R7 with the 'P' feature quantity value is proposed, and ⁇ R2, R6, R7 ⁇ is composed of the first rule subset.
  • the remaining rule set contains four rules ⁇ R1, R3, R4, R5 ⁇ .
  • the second feature quantity is ⁇ R1, R3, R4, R5 ⁇ of 'B', which cannot be classified, and then the third feature quantity is selected.
  • the third feature quantity is ⁇ R1, R3, R4, R5 ⁇ of 'C', which cannot be classified, and the fourth feature quantity is selected.
  • the fourth feature quantity is ⁇ R1, R3 ⁇ with 'D', and the fourth feature quantity is 'K' with ⁇ R4, R5 ⁇ , and R1 is proposed from the rule group with 'D' feature quantity value, from 'K
  • R4 is proposed, and ⁇ R1, R4 ⁇ is composed of the second rule subset; the remaining rule set contains ⁇ R3, R5 ⁇ , and automatically forms a subset of rules.
  • one of the feature quantities is sequentially examined in the order of the feature quantity to divide the rule subset, that is, the first, second, and third feature quantities are sequentially examined.
  • one of the feature quantities may be selected in other orders to divide the rule subset.
  • FIG. 3 is a schematic diagram of classification by using multiple feature quantities according to an embodiment of the present invention.
  • FIG. 3 follows the rule of FIG. 2 as an example 301, with the difference that classification is performed by using a combination of first and second feature quantities.
  • the feature quantity value 'AB' which are ⁇ R1, R3, R4 ⁇
  • the other rules have their own unique feature quantity values 'AH', 'QB', 'QH', ' PB'; therefore, ⁇ R1, R2, R5, R6, R7 ⁇ is proposed to form the first rule subset 302; then the remaining ⁇ R3, R4 ⁇ can be distinguished by the fourth feature quantity, thus forming the second rule Subset.
  • the first rule subset is inserted into the hash table, and its mask is configured as XXzzz, and their hash keys are 'AB000', 'AH000', 'QB000', ' QH000', 'PB000'; similarly, R3 and R4 are stored in the hash table according to the mask of the fourth feature quantity.
  • the rule subset may be classified by other combinations of feature quantities, for example, by using the combination of the second, third, and fourth feature quantities to obtain the first rule subset as ⁇ R1, R2, R4, R6 ⁇ .
  • the remaining rule set is ⁇ R3, R5, R7 ⁇ , which is classified by the first feature quantity, and ⁇ R3, R5, R7 ⁇ constitutes the second rule subset.
  • the feature quantities are assumed to be exact matches.
  • similar methods can be used for classification, but the mask selection needs to be performed within the bit range of each feature quantity. This selection can be inspired. Algorithm to complete.
  • the first feature quantity and the third feature quantity in the rule set 401 are a prefix field and
  • the mask field for simplicity, is represented by an 8-bit value, where * represents the corresponding bit 0 or 1 in the rule.
  • * represents the corresponding bit 0 or 1 in the rule.
  • the feature quantity value of R5 is '101'; at the same time, considering that since the feature quantity for distinguishing is only 3 bits, it may not be enough to distinguish a larger number of rules, and it is better to superimpose another feature quantity for classification, so In the classification in the figure, the value of the mask field of the third feature quantity is also considered.
  • R3, R4, R5 ⁇ proposes to form a first rule subset 402, ⁇ R1, R2 ⁇ as the remaining rule set. It is not difficult to investigate that the remaining rule set can be classified according to the combination of the second feature quantity and the third feature quantity to form the second rule subset 403.
  • the first rule subset is classified according to the combination of the first feature quantity and the third feature quantity. Therefore, the mask corresponding to the first rule subset is configured to be (11100000) z (11101000) zz.
  • the second rule subset is classified according to a combination of the second feature amount and the third feature amount, so the mask of the second rule subset is configured as zX(11011100) zz.
  • FIG. 5 is a block diagram of a hash algorithm unit according to an embodiment of the present invention.
  • FIG. 6 is a block diagram of a parallel hash group for performing stream classification search according to an embodiment of the present invention.
  • a rule searches for a key (Key)
  • the key value is first 'AND' with the feature mask (Mask config) 503 stored in the hash calculation unit, and the search key value is converted to find the hash table of the present hash table.
  • Hash Key 504 then hash calculation with hash key value 504, obtain the address of the corresponding hash table; use the obtained address to check the Hash Table 505, obtain the actual rule of storage and As a result, the key value 502 and the data input key value are input to the comparison module (Key Compare) 506, and if there is a previous level result (Previous Hash Result) 507, it is necessary to simultaneously compare the priority of the previous level result with the level result. The result with high priority is taken as the output result of this level; finally, the comparison result is output by 506.
  • the hash check group consists of the hash search unit described in FIG. 6 to form a lookup array of M*N.
  • the overall capacity of the M*N is sufficient to satisfy the storage space required by the large-capacity rule set, and the M*N parallel degrees are sufficient.
  • M depends on the design scale of the MUX
  • N depends on the delay requirement
  • the number of M*N is determined by the total required capacity and the capacity of each hash unit.
  • the search key value is 11101
  • the feature mask is 01100
  • 11101 and 01100 are 'AND'
  • the hash key value is 01100
  • the hash key value is 01100
  • the corresponding hash table address is obtained.
  • Address A find the rule R1 corresponding to address A in the hash table. If there is a higher priority result indicating the output of rule B, the final result is rule R2.
  • each row of hash cells is serially pipelined.
  • Each hash unit needs to compare the output of the previous stage with the search result of the current level in addition to the hash key value comparison in this unit. For comparison, the result with high priority is taken as the output result of this level.
  • Each column of hash cells needs to perform parallel lookup at the same time.
  • the hash result of each column is aggregated to the result aggregation module (MUX) 603 after the result of the last column hash table is calculated; the MUX compares all the results with priority, and outputs the priority. The highest result is the final search result.
  • MUX result aggregation module
  • CACHE completeness guarantee module
  • the purpose of CACHE is that large-scale rule sets inevitably have entries with particularly complex forms of individual entries, such as the number of significant digits of the entry being particularly small, the location of the mask being different from most mask positions, etc. These entries may not be suitable for the above-mentioned Store in hash mode, which can be stored with a small portion of CACHE.
  • the CACHE logic is implemented by comparing all the entries one by one, and can be implemented as a serial pipeline of the hash lookup group, and each level is compared in parallel with a few parallel degrees.
  • the key values are also input into the CACHE module for comparison.
  • the final result is also aggregated into the MUX module, and the results of the parallel hash search group search are aggregated, and the final result is obtained by the MUX module.
  • FIG. 7 is a schematic structural diagram of a flow classification device according to an embodiment of the present invention.
  • the device includes:
  • the classifying unit 71 is configured to classify the plurality of rules in the rule set according to the selected feature quantity to obtain one or more rule subsets;
  • the storage unit 72 is configured to perform hash storage on each of the classified rule subsets
  • the searching unit 73 is configured to perform a hash search in each of the parallel hash storage units 72 according to the hash key value of the rule when the rule search is performed.
  • the rules in the rule set are composed of one or more feature quantities, and each feature quantity is represented by a corresponding feature quantity value;
  • the classifying unit 71 is further configured to select, for the first feature quantity or the first group of feature quantity in the rule, all the rules with different feature quantity values, and combine into a first type rule subset;
  • the first feature quantity or the first group of feature quantities supports a hash search mode.
  • the classifying unit 71 is further configured to select, for rules other than the first type of rule subset in the rule set, all rules that have different feature quantity values of the second feature quantity or the second group feature quantity, and combine Form a second type of rule subset; for rules other than the rule subset except for all rule subsets in the rule set, select all rules with different feature quantity values of one or more feature quantities, and combine the rules Then the subset until the rule classification in the rule set is completed.
  • the classification unit 71 is further configured to: when the number of rules in the classified rule subset is less than or equal to the first threshold, replace the current feature quantity, select all rules with different feature quantity values, and combine A subset of rules.
  • the classification unit 71 is further configured to terminate the classification when the number of rules except the rule subset is less than or equal to the second threshold in the rule set, and divide the rule into rules other than the subset of rules. Group a subset of rules.
  • the storage unit 72 is further configured to set a hash key value corresponding to each rule in each type of rule subset: set the feature quantity used for the classification to a corresponding feature quantity value; The feature quantity is set to 0; the hash entries are used to store the rules of each type of rule subset.
  • the implementation functions of the units in the stream classification apparatus shown in FIG. 7 can be understood by referring to the related description of the foregoing stream classification method.
  • the functions of the units in the stream classifying device shown in FIG. 7 can be realized by a program running on a processor, or can be realized by a specific logic circuit.
  • the embodiment of the invention further describes a storage medium in which a computer program is stored, the computer program being configured to execute the stream classification method of the foregoing embodiments.
  • the disclosed apparatus and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner such as: multiple units or components may be combined, or Can be integrated into another system, or some features can be ignored or not executed.
  • the coupling, or direct coupling, or communication connection of the components shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may be electrical, mechanical or other forms. of.
  • the units described above as separate components may or may not be physically separated.
  • the components displayed as the unit may be, or may not be, physical units, that is, may be located in one place, or may be distributed to multiple network units; some or all of the units may be selected according to actual needs to implement the solution of the embodiment. purpose.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated into one unit;
  • the unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
  • the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed.
  • the foregoing steps include the steps of the foregoing method embodiments; and the foregoing storage medium includes: a removable storage device, a read only memory (ROM), a magnetic disk, or an optical disk, and the like, which can store program codes.
  • ROM read only memory
  • the above-described integrated unit of the present invention may be stored in a computer readable storage medium if it is implemented in the form of a software function module and sold or used as a standalone product.
  • the technical solution of the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product stored in a storage medium, including a plurality of instructions.
  • a computer device (which may be a personal computer, server, or network device, etc.) is caused to perform all or part of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a removable storage device, a read only memory (ROM), a magnetic disk, or an optical disk, and the like, which can store program codes.
  • the invention formulates a series of classification principles according to the rule set in practical application, classifies all the rules uniformly, and divides the large-scale rule set into a plurality of small-scale rule subsets.
  • the composition of these subsets of rules is determined by the characteristics of the macro rules, and does not depend entirely on the pre-judgment and pre-definition of the rule types. Therefore, the classification method is flexible.
  • the categorized rule subsets are hashed, and the rule search is performed according to the hash key value of the rule in each parallel hash unit.
  • the technical solution of the embodiment of the present invention does not depend on the TCAM implementation, and is guaranteed. High-performance table lookup requirements, effectively reduce memory requirements, suitable for hardware implementation, to meet the needs of large-capacity, high-speed packet classification.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种流分类方法及装置、存储介质,所述方法包括:对规则集中的多条规则按照选取的特征量进行分类,得到一个以上规则子集(101);将分类后的各个规则子集进行哈希存储(102);当进行规则查找时,依据所述规则的哈希键值在各个并行哈希存储单元中进行哈希查找(103)。

Description

流分类方法及装置、存储介质 技术领域
发明涉及网络交换技术,尤其涉及一种基于并行哈希(hash)查找的流分类方法及装置、存储介质。
背景技术
随着因特网(Internet)的迅猛发展,用于主干网络互联的核心路由器的接口速率达到400Gbps,该速率要求核心路由器的访问控制列表(ACL,Access Control List)查找模块在支持大容量规则集的情况下查找速率达到每秒几亿乃至几十亿次。
ACL查找涉及多域的多种匹配方式,包括精确匹配、前缀匹配、掩码匹配和范围匹配,数据结构复杂,目前已经有较多成熟的数据结构及算法,包括树形查找、启发式算法等,但这些结构难以在硬件上实现,因此无法满足高速查找需求。目前以三态内容寻址存储器(TCAM,Ternary Content Addressable Memory)为基础的硬件查找方法较为流行。
TCAM是目前查找模块中利用较为广泛的技术,查找过程简单,但要满足400Gbps的查找速率要求需要多片TCAM调度查找,功耗与发热量将成为严重问题,另外由于互联网协议6(IPv6,Internet Protocol Version 6)的发展,规则的宽度相比IPv4大大增加使得基于TCAM的查找成本与功耗进一步提升,常用的IPv6的键值长度达到640bit,对于传统的TCAM需要发起2次连续查找指令才能进行一次640bit键值的查找,查找性能也随之下降。
目前以查找树型结构为主,典型代表有按比特区分的多层查找树(Trie树),有按规则集特点区分的Hicut算法、Hypercut算法等,也有按规则条 目中的域区分的一系列以编号排定的文件(RFC,Request For Comments)类等,他们共同的特点是将规则集按不同类型进行分组,以达到快速查找的目的。但是这些算法有个共同的限制,就是需要对一部分规则进行复制以满足算法完备性需要。当规则集规模较小时(一般在1k以下),这些需要复制的规则并不多,对存储空间的影响并不大;但是当规则集规模继续扩大时,复制所带来的影响就不可忽略,存放每类规则子集中各条规则对应的哈希键值集所需空间成几何级数式增长。在实际应用中,特别是核心路由器的应用中,规则集的规模是数万乃至数十万的规模,而上述算法对内存的需求开销是巨大的,在实际应用中较难实现。
发明内容
为解决上述技术问题,本发明实施例提供了一种流分类方法及装置、存储介质。
本发明实施例提供的流分类方法包括:
对规则集中的多条规则按照选取的特征量进行分类,得到一个以上规则子集;
将分类后的各个规则子集进行哈希存储;
当进行规则查找时,依据所述规则的哈希键值在各个并行哈希存储单元中进行哈希查找。
本发明实施例中,所述规则集中的规则由一个或一个以上特征量组成,每个特征量由对应的特征量数值表示;
所述对规则集中的多条规则按照选取的特征量进行分类,得到一个以上规则子集,包括:
针对规则中的第一特征量或第一组特征量,选取特征量数值不同的所有规则,组合成第一类规则子集;
其中,所述第一特征量或第一组特征量支持哈希查找方式。
本发明实施例中,所述对规则集中的多条规则按照选取的特征量进行分类,得到一个以上规则子集,还包括:
针对规则集中除第一类规则子集以外的规则,选取第二特征量或第二组特征量的特征量数值不同的所有规则,组合成第二类规则子集;
针对规则集中除所有规则子集以外的规则,选取一个或多个特征量的特征量数值不同的所有规则,组合成规则子集,直至规则集中的规则分类完成为止。
本发明实施例中,所述方法还包括:
当分类出的规则子集中的规则数目小于或等于第一阈值时,更换当前的特征量,选取特征量数值不同的所有规则,组合成规则子集。
本发明实施例中,所述方法还包括:
当规则集中除所有规则子集以外的规则的数量小于或等于第二阈值时,终止分类,将规则集中除所有规则子集以外的规则组合成一类规则子集。
本发明实施例中,所述将分类后的各个规则子集进行哈希存储,包括:
设置每类规则子集中各条规则对应的哈希键值为:将分类所用的特征量设置为对应的特征量数值;将分类未使用的特征量设置为0;
利用哈希条目存放每类规则子集中各条规则。
本发明实施例提供的流分类装置包括:
分类单元,配置为对规则集中的多条规则按照选取的特征量进行分类,得到一个以上规则子集;
存储单元,配置为将分类后的各个规则子集进行哈希存储;
查找单元,配置为当进行规则查找时,依据所述规则的哈希键值在各个并行哈希存储单元中进行哈希查找。
本发明实施例中,所述规则集中的规则由一个或一个以上特征量组成, 每个特征量由对应的特征量数值表示;
所述分类单元,还配置为针对规则中的第一特征量或第一组特征量,选取特征量数值不同的所有规则,组合成第一类规则子集;
其中,所述第一特征量或第一组特征量支持哈希查找方式。
本发明实施例中,所述分类单元,还配置为针对规则集中除第一类规则子集以外的规则,选取第二特征量或第二组特征量的特征量数值不同的所有规则,组合成第二类规则子集;针对规则集中除所有规则子集以外的规则,选取一个或多个特征量的特征量数值不同的所有规则,组合成规则子集,直至规则集中的规则分类完成为止。
本发明实施例中,所述分类单元,还配置为当分类出的规则子集中的规则数目小于或等于第一阈值时,更换当前的特征量,选取特征量数值不同的所有规则,组合成规则子集。
本发明实施例中,所述分类单元,还配置为当规则集中除所有规则子集以外的规则的数量小于或等于第二阈值时,终止分类,将规则集中除所有规则子集以外的规则组合成一类规则子集。
本发明实施例中,所述存储单元,还配置为设置每类规则子集中各条规则对应的哈希键值为:将分类所用的特征量设置为对应的特征量数值;将分类未使用的特征量设置为0;利用哈希条目存放每类规则子集中各条规则。
本发明实施例中,所述的存储单元,由一定数量的并行哈希存储单元组成存储阵列,通过提供高并行度的方式来解决哈希冲突和分类规则过多的问题。
一种存储介质,所述存储介质中存储有计算机程序,所述计算机程序配置为执行所述的流分类方法。
本发明实施例的技术方案中,根据实际应用中的规则集制定一系列分 类原则,将所有规则进行统一的分类,将大规模的规则集分成很多个小规模的规则子集。这些规则子集的组成由宏观的规则的特点决定,并不完全依赖于对规则类型的预判和预定义,因此,分类方式是灵活多变的。将分类后的各规则子集进行哈希存储,规则查找时,依据规则的哈希键值在各个并行哈希单元中进行哈希查找;本发明实施例的技术方案不依赖TCAM实现,在保证高性能查表需求的同时,有效降低内存需求,适合硬件实现,满足大容量、高速的包分类查找需求。
附图说明
图1为本发明实施例的流分类方法的流程示意图;
图2为本发明实施例的用单一特征量进行分类的示意图;
图3为本发明实施例的用多个特征量进行分类的示意图;
图4为本发明实施例的用包含掩码特征量在内的多个特征量进行分类的示意图;
图5为本发明实施例的哈希算法单元的框图示意图;
图6为本发明实施例的并行哈希组实现流分类查找的框图示意图;
图7为本发明实施例的流分类装置的结构组成示意图。
具体实施方式
为了能够更加详尽地了解本发明实施例的特点与技术内容,下面结合附图对本发明实施例的实现进行详细阐述,所附附图仅供参考说明之用,并非用来限定本发明实施例。
图1为本发明实施例的流分类方法的流程示意图,如图1所示,所述方法包括以下步骤:
步骤101:对规则集中的多条规则按照选取的特征量进行分类,得到一个以上规则子集。
本发明实施例中,首先,针对规则中的第一特征量或第一组特征量,选取特征量数值不同的所有规则,组合成第一类规则子集;其中,所述第一特征量或第一组特征量支持哈希查找方式。
其次,针对规则集中除第一类规则子集以外的规则,选取第二特征量或第二组特征量的特征量数值不同的所有规则,组合成第二类规则子集。
然后,针对规则集中除所有规则子集以外的规则,选取一个或多个特征量的特征量数值不同的所有规则,组合成规则子集,直至规则集中的规则分类完成为止。
本发明实施例中,当分类出的规则子集中的规则数目小于或等于第一阈值时,更换当前的特征量,选取特征量数值不同的所有规则,组合成规则子集。
本发明实施例中,当规则集中除所有规则子集以外的规则的数量小于或等于第二阈值时,终止分类,将规则集中除所有规则子集以外的规则组合成一类规则子集。
具体地,根据实际应用中的规则集制定一系列分类原则,将所有规则进行统一的分类,将大规模的规则集分成很多个小规模的规则子集。这些规则子集的组成由宏观的规则的特点决定,并不完全依赖于对规则类型的预判和预定义,而是灵活多变的。
实际应用中的每条规则是由多个数据流的特征量组成的,在不同的应用中,它有着相同或相近的组成原则。一般的规则由五元组(即源IP、目的IP、源端口号、目的端口号、协议号)和其他一些特征量组成。对于大规模的规则集,每条规则一定有区别于其他规则的不同的特征量数值,根据各条规则的特征量数值对规则进行分类。
步骤一:先考察某一个特征量,从规则集中取出具有独特特征量数值的规则,组成规则子集;特别地,当某个特征量数值在多条规则中存在时, 任取出一条规则而留下其他规则。
本发明实施例中,如果该特征量不适合用哈希查找,如范围匹配或掩码匹配,则可以选择跳过该特征量,选取其他特征量进行分类。
步骤二:如果步骤一后,规则集中仍然有规则未被分类,则将剩余规则按照其他特征量,依次进行考察,取出有独特特征量数值的规则,组成其他规则子集;如果其中某次考察能够将规则集的规则分类完成,则停止分类。
步骤三:如果步骤二后,规则集中仍然有规则未被分类,则重复步骤一、二的步骤,再次考察分类,获得规则子集;如果其中某次考察能够将规则集中的规则分类完成,则停止分类。
特别地,如果存在某些步骤分类出的规则子集过小,则需要放弃当次的分类,选用其他特征量进行考察分类。
步骤四,如果步骤三后,规则集中仍然有规则未被分类,则按照任意两组或多组特征量的组合(不包括范围匹配)来进行独特特征量数值考察,取出相应的规则;如果其中某次考察能够将规则集中的规则分类完成,则停止分类。
特别地,多个特征量组合时,优先用匹配规则为完全匹配或前缀的来组合;本发明实施例中,对任意掩码匹配的特征量设计合理掩码,进行分类。这里,设计合理掩码是指:将分类的特征量中可以“不关心”的比特掩去,不作为哈希计算中需要考虑的因素;即将分类未使用的特征量比特设置为0。例如,当分类所使用的特征量为规则中首个特征量时,合理掩码采用Xzzzz来标记规则中各个特征量,其中‘X’表示对应特征量的每个比特都是‘1’,‘z’表示对应特征量的每个比特都是‘0’。
步骤五:经过上述步骤迭代有限次后,所有的规则都被分类完成。某些情况下,最后几次迭代所分类出的规则数量很可能太少,会增大存储开 销,所以当剩余规则数量少于一定程度后,可以终止分类,将剩余规则组成规则子集。
本发明实施例中,如果有大部分规则仅在范围匹配特征量中有独特特征量数值,则剩余的规则需要将范围匹配特征量数值按照前缀展开,将一条范围匹配规则展开成多条前缀匹配的规则。
范围展开的原则是:
首先尝试对某一个范围域按照前缀进行展开,按照步骤一至四进行规则分类;如果能够分出的规则数小于或等于第一阈值,则分别其它范围域按照前缀展开,取能够分出的规则数目最多且超过第一阈值的范围域作为展开域。
如果无法获得超过第一阈值的分类方式,再考虑对某两个范围域进行范围的前缀展开,并按照步骤一至四进行规则分类。
应避免对超过两个的范围域同时进行前缀展开,这样会生成过多的展开条目,极大的消耗存储资源。
经过上述五步骤后,可以将大规模的规则集分类成为有限个数的规则子集,并且每个规则子集中都包含一定数量的规则。
步骤102:将分类后的各个规则子集进行哈希存储。
本发明实施例中,设置每类规则子集中各条规则对应的哈希键值为:将分类所用的特征量设置为对应的特征量数值;将分类未使用的特征量设置为0;利用哈希条目存放每类规则子集中各条规则。
具体地,每个规则子集对应的哈希键值为:原规则条目上仅保留对他们进行分类时所使用的特征量或特征量的组合,其余未使用的特征量用‘0’替代,每个哈希条目存放一条规则及其相应的结果和优先级信息。
因为分类后,所产生的规则子集数量并不确定,所以在硬件设计上采用足够多的哈希存储单元,构成并行哈希组,来存储所有规则子集;每个 规则子集可以占用多个哈希存储单元,一个哈希存储单元只能分配给一个规则子集,并具有相应的掩码配置。
步骤103:当进行规则查找时,依据所述规则的哈希键值在各个并行哈希存储单元中进行哈希查找。
具体地,规则查找时,将规则的哈希键值在各并行哈希单元中进行哈希查找;键值在每个哈希存储单元中,先用哈希存储单元配置的掩码将初始键值变为该哈希存储单元相应的键值,再进行该哈希存储单元的哈希计算,获得地址后,读取相应地址的数据信息,进行规则匹配比较。如果匹配成功,则命中该规则条目;并行哈希组将所有匹配的规则进行一次仲裁,按照流分类原则,选择优先级最高的目标规则作为返回结果。
特别地,为了保证查找的完备性,并将硬件设计的适用性扩展,因此在硬件设计上还需要一小部分缓存(CACHE)来处理上述步骤102中的步骤五处理中剩余的极少数规则。经验证明,从系统整体复杂度和利用率考虑,残余的规则数量占总规则数量1%-5%时,可以达到系统的最佳配比。
本发明实施例中,将大规模规则集分类成多个小规模规则子集,规则子集可进行哈希查找,从而用并行哈希组来实现大规模规则集的存储和查找。支持了大规模规则集的存储与查找,在保证高性能查表需求的同时,有效降低内存需求,适合硬件实现,满足大容量、高速的包分类查找需求。
图2为本发明实施例的用单一特征量进行分类的示意图,假设有规则集G中有7条规则{R1,R2,R3,R4,R5,R6,R7},他们由五个独立的特征量组成,并假定所有特征量都是完全匹配字段,用字母来表示特征量,如图2中201所示。
按照本发明实施例的所述的分类步骤一所述,对各条规则的某一个特征量进行考察,这里考察第一特征量。可以发现,第一特征量为‘A’的有{R1,R2,R3,R4},为‘Q’的有{R5,R6},为‘P’的有{R7};从具有‘A’特征量数 值的规则组中提出R1,从‘Q’特征量数值的规则组中提出R5,将具有‘P’特征量数值的规则R7提出,将{R1,R5,R7}组成第一规则子集202;剩余规则集203中包含了{R2,R3,R4,R6}四条规则。
按照分类步骤二所述,另取一个特征量对剩余规则进行考察,这里考察第二特征量。可以发现,具有‘H’和‘B’特征量数值的规则各占2条,因此将具有‘H’特征量数值的R2和具有‘B’特征量数值的R3取出,构成第二规则子集204;剩余的规则集在按照第三特征量进行分类,发现R4和R6分别具有不同的特征量数值‘C’和‘N’,因此剩余规则集可以直接用第三特征量来区分,构成第三规则子集205。
按照本发明实施例的所述的哈希存储方法,将上述分类后得到的三个规则子集分别用哈希表来存储。第一规则子集用掩码Xzzzz来标记,其中‘X’表示对应特征量的每个比特都是‘1’,‘z’表示对应特征量的每个比特都是‘0’。所以插入哈希表时,R1、R5、R7的哈希键值分别为A0000,Q0000,P0000。他们可以被哈希散列到哈希表中。每条规则根据各自的哈希键值计算出不同的存储地址,然后在相应地址中存入各条规则。
特别地,如果在某个规则子集的哈希散列中出现哈希冲突,则将冲突的键值换一张哈希表来存放,例如,在一张哈希表中,存储在位置A处的哈希键值既有A0000,又有Q0000,则存在哈希冲突,将Q0000或A0000换另一张哈希表来存放,以避免冲突。用并行哈希方式来消除哈希冲突对哈希填充带来的影响。因为实现上采用足够数量的小哈希表作为存储和查找装置,因此可以认为哈希并行度数量足够消除存在哈希冲突。
同理,第二规则子集用掩码zXzzz来标记,R2和R3用来进行哈希计算的哈希键值分别为0H000和0B000;他们被插入未使用的哈希表中;第三规则子集用掩码00X00来标记,R4和R6分别的哈希键值为00C00和00N00,他们被插入未使用的哈希表中。
当然,在具有相同特征量的多个规则中选取不同的规则,所划分出来的规则子集也不同。例如,第一特征量为‘A’的有{R1,R2,R3,R4},为‘Q’的有{R5,R6},为‘P’的有{R7};从具有‘A’特征量数值的规则组中提出R2,从‘Q’特征量数值的规则组中提出R6,将具有‘P’特征量数值的规则R7提出,将{R2,R6,R7}组成第一规则子集;剩余规则集中包含了{R1,R3,R4,R5}四条规则。对于剩余规则集中的规则,第二特征量为‘B’的有{R1,R3,R4,R5},无法对其进行分类,再选取第三特征量。第三特征量为‘C’的有{R1,R3,R4,R5},无法对其进行分类,再选取第四特征量。第四特征量为‘D’的有{R1,R3},第四特征量为‘K’的有{R4,R5},从具有‘D’特征量数值的规则组中提出R1,从‘K’特征量数值的规则组中提出R4,将{R1,R4}组成第二规则子集;剩余规则集中包含了{R3,R5},自动组成一类规则子集。
上述方案中是按特征量的顺序依次考察其中一个特征量来划分规则子集,即顺序考察第一个、第二个、第三个特征量。当然,根据实际规则情况还可以以其他顺序选取其中一个特征量来划分规则子集。
图3为本发明实施例的用多个特征量进行分类的示意图,图3沿用了图2的规则作为例子301,区别是分类时采用第一和第二特征量的组合来分类。可以发现,具有特征量数值‘AB’的规则共有3条,分别是{R1,R3,R4},其它规则都分别具有各自独特的特征量数值‘AH’、‘QB’、‘QH’、‘PB’;因此将{R1,R2,R5,R6,R7}提出,组成第一规则子集302;再考察剩余的{R3,R4},可以用第四特征量来区分,因此组成第二规则子集。
再按照哈希存储方案所述,将第一规则子集插入哈希表中,他的掩码配置为XXzzz,他们的哈希键值分别为‘AB000’、‘AH000’、‘QB000’、‘QH000’、‘PB000’;同理,将R3和R4按照第四特征量的掩码来存入哈希表中。
当然,根据实际规则情况还可以以其他组合的特征量来分类规则子集,例如,采用第二、第三、第四特征量的组合来分类,得到第一规则子集为{R1,R2,R4,R6}。剩余规则集为{R3,R5,R7},采用第一特征量来分类,{R3,R5,R7}组成第二规则子集。
上述的例子中,特征量都假定为完全匹配,对于前缀和掩码标记可以采用相似的手段来进行分类,只是需要在每个特征量的比特范围内进行掩码选择,这种选择可以用启发式的算法来完成。
图4为本发明实施例的用包含掩码特征量在内的多个特征量进行分类的示意图,如图4所示,规则集401中第一特征量和第三特征量分别是前缀域和掩码域,简单起见,以8比特数值来表示,其中*代表该条规则中对应的比特位0或1都可以成立。对第一特征量的前缀进行观察,可以发现,按照其中最短的前缀长度(前3比特)来分,具有‘100’的规则有{R1,R2,R3},R4的特征量数值为‘011’,R5的特征量数值为‘101’;同时考虑到由于用于区别的特征量只有3比特,可能不足以区分更多数量的规则,最好需要叠加另一个特征量来进行分类,因此在图中分类时,同时再考虑了第三特征量的掩码域的数值。
进一步再考察第三掩码域中的数值,可以发现R1和R2的掩码位置是相同的,R3、R4、R5的掩码位置是相同的,因此为了最大限度上利用已知数据,将{R3,R4,R5}提出组成第一规则子集402,{R1,R2}作为剩余规则集考虑。不难考察,剩余规则集可以按照第二特征量和第三特征量的组合来分类,组成第二规则子集403。
第一规则子集按照第一特征量和第三特征量的组合来分类,因此,第一规则子集对应的掩码配置为(11100000)z(11101000)zz。第二规则子集按照第二特征量和第三特征量的组合来分类,因此第二规则子集的掩码配置为zX(11011100)zz。
图5为本发明实施例的哈希算法单元的框图示意图,图6为本发明实施例的并行哈希组实现流分类查找的框图示意图,参照图5、图6,当规则查找键值(Key)502输入到哈希计算单元时,首先将键值与存储在哈希计算单元中的特征掩码(Mask config)503相‘与’,将查找键值转为查找本张哈希表的哈希键值(Hash Key)504;然后用哈希键值504进行哈希计算,获得相应的哈希表的地址;用获得的地址查哈希表(Hash Table)505,获得存储的实际规则和结果;将键值502和数据输入键值输入比较模块(Key Compare)506,同时如果存在前一级的结果(Previous Hash Result)507,则需要同时比较前一级结果与本级结果的优先级,将优先级高的结果作为本级的输出结果;最后由506输出比较结果。哈希查组由图6所述的哈希查找单元组成M*N的查找阵列,M*N的总体容量足够满足大容量的规则集所需的存储空间,并且M*N个并行度足够满足各规则子集的哈希冲突的要求。一般来说,M取决于MUX的设计规模,N取决于延迟的要求,M*N的数量由总需求容量和每个哈希单元的容量决定。
例如,查找键值为11101,特征掩码为01100,将11101和01100相‘与’,得到哈希键值01100;用哈希键值01100进行哈希计算,获得相应的哈希表的地址为地址A,在哈希表中找到地址A对应的规则R1。如果存在优先级较高的结果指示输出规则B时,则最终结果为规则R2。
特别地,硬件实现时,每行哈希单元串行流水线查找,每个哈希单元除了需要在本单元进行哈希键值比较外,还需要将前一级的输出结果与本级的查找结果进行比较,将优先级高的结果作为本级的输出结果。每列哈希单元需要同时进行并行查找,每列的哈希结果在最后一列哈希表计算出结果后汇聚到结果汇聚模块(MUX)603;MUX将所有结果进行优先级比较后,输出优先级最高的结果作为最终查找结果。
特别地,为了保证算法完备性,可以选择添加一小块独立的CACHE, 作为并行哈希查找组的完备性保证模块(CACHE)602。CACHE的目的是:大规模规则集中不可避免的存在个别条目形式特别复杂的条目,如条目有效位数特别少,掩码位置与大多数掩码位置不同等,这些条目可能不适合用上述的哈希散列方式进行存放,这就可以用一小部分CACHE来存放。
这里,CACHE的逻辑用全部条目逐一比较的方式来实现,可以如哈希查找组的串行流水线方式实现,每级用少数并行度的并行比较。
在并行哈希计算时,同时将键值也输入CACHE模块进行比较,最终结果同样汇聚到MUX模块中,与并行哈希查找组查找的结果进行聚合,由MUX模块比较获得最终结果。
图7为本发明实施例的流分类装置的结构组成示意图,如图7所示时,所述装置包括:
分类单元71,配置为对规则集中的多条规则按照选取的特征量进行分类,得到一个以上规则子集;
存储单元72,配置为将分类后的各个规则子集进行哈希存储;
查找单元73,配置为当进行规则查找时,依据所述规则的哈希键值在各个并行哈希存储单元72中进行哈希查找。
本发明实施例中,所述规则集中的规则由一个或一个以上特征量组成,每个特征量由对应的特征量数值表示;
所述分类单元71,还配置为针对规则中的第一特征量或第一组特征量,选取特征量数值不同的所有规则,组合成第一类规则子集;
其中,所述第一特征量或第一组特征量支持哈希查找方式。
本发明实施例中,所述分类单元71,还配置为针对规则集中除第一类规则子集以外的规则,选取第二特征量或第二组特征量的特征量数值不同的所有规则,组合成第二类规则子集;针对规则集中除所有规则子集以外的规则,选取一个或多个特征量的特征量数值不同的所有规则,组合成规 则子集,直至规则集中的规则分类完成为止。
本发明实施例中,所述分类单元71,还配置为当分类出的规则子集中的规则数目小于或等于第一阈值时,更换当前的特征量,选取特征量数值不同的所有规则,组合成规则子集。
本发明实施例中,所述分类单元71,还配置为当规则集中除所有规则子集以外的规则的数量小于或等于第二阈值时,终止分类,将规则集中除所有规则子集以外的规则组合成一类规则子集。
本发明实施例中,所述存储单元72,还配置为设置每类规则子集中各条规则对应的哈希键值为:将分类所用的特征量设置为对应的特征量数值;将分类未使用的特征量设置为0;利用哈希条目存放每类规则子集中各条规则。
本领域技术人员应当理解,图7所示的流分类装置中的各单元的实现功能可参照前述流分类方法的相关描述而理解。图7所示的流分类装置中的各单元的功能可通过运行于处理器上的程序而实现,也可通过具体的逻辑电路而实现。
本发明实施例还记载了一种存储介质,所述存储介质中存储有计算机程序,所述计算机程序配置为执行前述各实施例的流分类方法。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的, 作为单元显示的部件可以是、或也可以不是物理单元,即可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。
另外,在本发明各实施例中的各功能单元可以全部集成在一个处理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、只读存储器(ROM,Read Only Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
或者,本发明上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本发明各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、只读存储器(ROM,Read Only Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。
工业实用性
本发明根据实际应用中的规则集制定一系列分类原则,将所有规则进行统一的分类,将大规模的规则集分成很多个小规模的规则子集。这些规则子集的组成由宏观的规则的特点决定,并不完全依赖于对规则类型的预判和预定义,因此,分类方式是灵活多变的。将分类后的各规则子集进行哈希存储,规则查找时,依据规则的哈希键值在各个并行哈希单元中进行哈希查找;本发明实施例的技术方案不依赖TCAM实现,在保证高性能查表需求的同时,有效降低内存需求,适合硬件实现,满足大容量、高速的包分类查找需求。

Claims (13)

  1. 一种流分类方法,所述方法包括:
    对规则集中的多条规则按照选取的特征量进行分类,得到一个以上规则子集;
    将分类后的各个规则子集进行哈希存储;
    当进行规则查找时,依据所述规则的哈希键值在各个并行哈希存储单元中进行哈希查找。
  2. 根据权利要求1所述的流分类方法,其中,所述规则集中的规则由一个或一个以上特征量组成,每个特征量由对应的特征量数值表示;
    所述对规则集中的多条规则按选取的特征量进行分类,得到一个以上规则子集,包括:
    针对规则中的第一特征量或第一组特征量,选取特征量数值不同的所有规则,组合成第一类规则子集;
    其中,所述第一特征量或第一组特征量支持哈希查找方式。
  3. 根据权利要求2所述的流分类方法,其中,所述对规则集中的多条规则按选取的特征量进行分类,得到一个以上规则子集,还包括:
    针对规则集中除第一类规则子集以外的规则,选取第二特征量或第二组特征量的特征量数值不同的所有规则,组合成第二类规则子集;
    针对规则集中除所有规则子集以外的规则,选取一个或多个特征量的特征量数值不同的所有规则,组合成规则子集,直至规则集中的规则分类完成为止。
  4. 根据权利要求2或3所述的流分类方法,其中,所述方法还包括:
    当分类出的规则子集中的规则数目小于或等于第一阈值时,更换当前的特征量,选取特征量数值不同的所有规则,组合成规则子集。
  5. 根据权利要求2或3所述的流分类方法,其中,所述方法还包括:
    当规则集中除所有规则子集以外的规则的数量小于或等于第二阈值时,终止分类,将规则集中除所有规则子集以外的规则组合成一类规则子集。
  6. 根据权利要求1至3任一项所述的流分类方法,其中,所述将分类后的各个规则子集进行哈希存储,包括:
    设置每类规则子集中各条规则对应的哈希键值为:将分类所用的特征量设置为对应的特征量数值;将分类未使用的特征量设置为0;
    利用哈希条目存放每类规则子集中各条规则。
  7. 一种流分类装置,所述装置包括:
    分类单元,配置为对规则集中的多条规则按照选取的特征量进行分类,得到一个以上规则子集;
    存储单元,配置为将分类后的各个规则子集进行哈希存储;
    查找单元,配置为当进行规则查找时,依据所述规则的哈希键值在各个并行哈希存储单元中进行哈希查找。
  8. 根据权利要求7所述的流分类装置,其中,所述规则集中的规则由一个或一个以上特征量组成,每个特征量由对应的特征量数值表示;
    所述分类单元,还配置为针对规则中的第一特征量或第一组特征量,选取特征量数值不同的所有规则,组合成第一类规则子集;
    其中,所述第一特征量或第一组特征量支持哈希查找方式。
  9. 根据权利要求8所述的流分类装置,其中,所述分类单元,还配置为针对规则集中除第一类规则子集以外的规则,选取第二特征量或第二组特征量的特征量数值不同的所有规则,组合成第二类规则子集;针对规则集中除所有规则子集以外的规则,选取一个或多个特征量的特征量数值不同的所有规则,组合成规则子集,直至规则集中的规则分类完成为止。
  10. 根据权利要求8或9所述的流分类装置,其中,所述分类单元, 还配置为当分类出的规则子集中的规则数目小于或等于第一阈值时,更换当前的特征量,选取特征量数值不同的所有规则,组合成规则子集。
  11. 根据权利要求8或9所述的流分类装置,其中,所述分类单元,还配置为当规则集中除所有规则子集以外的规则的数量小于或等于第二阈值时,终止分类,将规则集中除所有规则子集以外的规则组合成一类规则子集。
  12. 根据权利要求7至9任一项所述的流分类装置,其中,所述存储单元,还配置为设置每类规则子集中各条规则对应的哈希键值为:将分类所用的特征量设置为对应的特征量数值;将分类未使用的特征量设置为0;利用哈希条目存放每类规则子集中各条规则。
  13. 一种存储介质,所述存储介质中存储有计算机程序,所述计算机程序配置为执行权利要求1至6任一项所述的流分类方法。
PCT/CN2015/097442 2015-06-18 2015-12-15 流分类方法及装置、存储介质 WO2016201930A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US15/568,857 US20180107759A1 (en) 2015-06-18 2015-12-15 Flow classification method and device and storage medium
EP15895495.8A EP3276501B1 (en) 2015-06-18 2015-12-15 Traffic classification method and device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510341289.8A CN106326234A (zh) 2015-06-18 2015-06-18 流分类方法及装置
CN201510341289.8 2015-06-18

Publications (1)

Publication Number Publication Date
WO2016201930A1 true WO2016201930A1 (zh) 2016-12-22

Family

ID=57544846

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/097442 WO2016201930A1 (zh) 2015-06-18 2015-12-15 流分类方法及装置、存储介质

Country Status (4)

Country Link
US (1) US20180107759A1 (zh)
EP (1) EP3276501B1 (zh)
CN (1) CN106326234A (zh)
WO (1) WO2016201930A1 (zh)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106533947B (zh) * 2015-09-11 2019-10-08 新华三技术有限公司 报文处理方法及装置
US10778612B2 (en) * 2016-05-26 2020-09-15 Arista Networks, Inc. Variable TCAM actions
US11308057B2 (en) 2016-12-12 2022-04-19 Advanced Micro Devices, Inc. System and method for multiplexer tree indexing
CN108881036B (zh) * 2018-07-03 2020-06-16 电信科学技术第五研究所有限公司 一种基于查表运算的网络通信快速匹配方法及设备
CN109347747B (zh) * 2018-11-13 2021-12-17 锐捷网络股份有限公司 一种数据处理方法及装置
US10721186B1 (en) * 2019-03-30 2020-07-21 Fortinet, Inc. Packet processing with per-CPU (central processing unit) flow tables in a network device
CN111817978B (zh) * 2019-04-12 2022-10-04 华为技术有限公司 一种流分类方法及装置
CN110442586A (zh) * 2019-07-03 2019-11-12 北京左江科技股份有限公司 一种基于分类优先级的五元组查询方法
CN112866139A (zh) * 2019-11-27 2021-05-28 深圳市中兴微电子技术有限公司 一种多规则流分类的实现方法、设备和存储介质
CN117828487A (zh) * 2024-02-23 2024-04-05 深圳赋乐科技集团有限公司 一种流分类规则匹配结果的判定方法、系统、设备及介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1477494A (zh) * 2002-08-20 2004-02-25 深圳市中兴通讯股份有限公司上海第二 一种数据包递归流分类方法
US20080243748A1 (en) * 2005-01-31 2008-10-02 International Business Machines Corporation Rule set partitioning based packet classification method for Internet
CN101753445A (zh) * 2009-12-23 2010-06-23 重庆邮电大学 基于关键字分解Hash算法的快速流分类方法
CN104462144A (zh) * 2013-09-24 2015-03-25 中兴通讯股份有限公司 一种包分类规则的查找方法及装置
CN104468381A (zh) * 2014-12-01 2015-03-25 国家计算机网络与信息安全管理中心 一种多域流规则匹配的实现方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7480302B2 (en) * 2004-05-11 2009-01-20 Samsung Electronics Co., Ltd. Packet classification method through hierarchical rulebase partitioning
CN101714948B (zh) * 2009-10-27 2016-03-30 清华大学 一种多域的网包的分类方法和装置
KR101153940B1 (ko) * 2010-11-09 2012-06-08 아주대학교산학협력단 패킷 분류 장치 및 그 방법
CN102354308B (zh) * 2011-06-30 2012-11-21 中国科学技术大学 一种包分类规则集快速压缩方法
US9502111B2 (en) * 2013-11-05 2016-11-22 Cisco Technology, Inc. Weighted equal cost multipath routing

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1477494A (zh) * 2002-08-20 2004-02-25 深圳市中兴通讯股份有限公司上海第二 一种数据包递归流分类方法
US20080243748A1 (en) * 2005-01-31 2008-10-02 International Business Machines Corporation Rule set partitioning based packet classification method for Internet
CN101753445A (zh) * 2009-12-23 2010-06-23 重庆邮电大学 基于关键字分解Hash算法的快速流分类方法
CN104462144A (zh) * 2013-09-24 2015-03-25 中兴通讯股份有限公司 一种包分类规则的查找方法及装置
CN104468381A (zh) * 2014-12-01 2015-03-25 国家计算机网络与信息安全管理中心 一种多域流规则匹配的实现方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3276501A4 *

Also Published As

Publication number Publication date
US20180107759A1 (en) 2018-04-19
EP3276501B1 (en) 2020-03-18
CN106326234A (zh) 2017-01-11
EP3276501A1 (en) 2018-01-31
EP3276501A4 (en) 2018-03-07

Similar Documents

Publication Publication Date Title
WO2016201930A1 (zh) 流分类方法及装置、存储介质
US11102120B2 (en) Storing keys with variable sizes in a multi-bank database
Jiang et al. Large-scale wire-speed packet classification on FPGAs
US9269411B2 (en) Organizing data in a hybrid memory for search operations
US9569561B2 (en) Label masked addressable memory
CN104866502B (zh) 数据匹配的方法及装置
CN108875064B (zh) 基于FPGA的OpenFlow多维数据匹配查找方法
Priya et al. Hierarchical packet classification using a Bloom filter and rule-priority tries
Li et al. Tuple space assisted packet classification with high performance on both search and update
US20180375773A1 (en) Technologies for efficient network flow classification with vector bloom filters
US9672239B1 (en) Efficient content addressable memory (CAM) architecture
WO2020207248A1 (zh) 一种流分类方法及装置
Pao et al. A multi-pipeline architecture for high-speed packet classification
TWI667900B (zh) 混合萬用字元匹配表
US9485179B2 (en) Apparatus and method for scalable and flexible table search in a network switch
US11652744B1 (en) Multi-stage prefix matching enhancements
CN106487769B (zh) 一种访问控制列表acl的实现方法及装置
CN101222434B (zh) 存储策略控制列表、策略搜索方法和三态寻址存储器
Kao et al. Dynamically updatable ternary segmented aging bloom filter for openflow-compliant low-power packet processing
Xin et al. FPGA-based updatable packet classification using TSS-combined bit-selecting tree
Zhou et al. Large-scale packet classification on FPGA
Li et al. Scalable packet classification using bit vector aggregating and folding
Mahini et al. MLET: a power efficient approach for TCAM based, IP lookup engines in Internet routers
Pao et al. Bit-shuffled trie: IP lookup with multi-level index tables
Sun et al. Content-based route lookup using CAMs

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15895495

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 15568857

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE