WO2021088234A1 - 一种基于卷积神经网络的数据包分类方法及系统 - Google Patents

一种基于卷积神经网络的数据包分类方法及系统 Download PDF

Info

Publication number
WO2021088234A1
WO2021088234A1 PCT/CN2019/128935 CN2019128935W WO2021088234A1 WO 2021088234 A1 WO2021088234 A1 WO 2021088234A1 CN 2019128935 W CN2019128935 W CN 2019128935W WO 2021088234 A1 WO2021088234 A1 WO 2021088234A1
Authority
WO
WIPO (PCT)
Prior art keywords
rule set
data packet
neural network
convolutional neural
image
Prior art date
Application number
PCT/CN2019/128935
Other languages
English (en)
French (fr)
Inventor
谢高岗
张昕怡
张鹏豪
Original Assignee
中国科学院计算技术研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院计算技术研究所 filed Critical 中国科学院计算技术研究所
Priority to US17/761,220 priority Critical patent/US20220374733A1/en
Publication of WO2021088234A1 publication Critical patent/WO2021088234A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9014Indexing; Data structures therefor; Storage structures hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • G06N5/025Extracting rules from data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Definitions

  • the invention relates to the technical field of data packet search and classification in a computer network, in particular to a data packet classification method based on a convolutional neural network.
  • Data packet search and classification realize data packet classification processing through a set of predefined or dynamically generated rules. It is a basic key function in switches, routers, firewalls, load balancers, cloud platform software switches and other network devices. Scenarios such as software-defined networking, network function virtualization, and cloud computing require frequent updating of rules. Storing the rules in a data structure such as a decision tree can achieve high-speed matching and forwarding of data packets, but the update speed of the rules is slow. When the rules are updated, the matching and forwarding speed of the data packets will be greatly reduced. The hash-based data packet classification method can support fast rule update, but the matching and forwarding speed is slow. Software-defined networking, network function virtualization, cloud computing and other equipment urgently need a data packet classification method that can support high-speed data packet matching and forwarding, but also meets the rules of fast online update.
  • the existing data packet classification technologies are mainly divided into three categories: hardware-based data packet classification technologies, data packet classification technologies based on dimensionality reduction, and data packet classification technologies based on space division.
  • T-CAM reduces the search time by implementing parallel search.
  • T-CAM has shortcomings such as limited storage space, large power consumption, and slow rule update speed.
  • packet classification can also be run on other hardware platforms, such as GPU and FPGA. However, running on these platforms requires specific chips, hardware instructions, and programming languages for design, which is inconvenient to implement and apply.
  • the classification process is not to match the incoming data packet with the entire rule set, but is divided into two steps: the first step is to determine the subspace of the rule set to be searched, and the second step The data packet is matched with the rules in the corresponding subspace.
  • This method is further divided into two categories: decision tree methods and hash-based methods.
  • decision tree methods such as HiCuts and HyperCuts error! Reference source not found.
  • the key idea of decision tree methods is to recursively divide the search space into multiple subspaces until the number of rules in each region is below a certain threshold.
  • the high efficiency of the decision tree ensures the high-speed classification of data packets, but the tree-based data structure has the problem of slow update.
  • some rules may need to be copied to multiple subspaces, resulting in increased memory overhead.
  • EffiCut and SmartSplit have proposed some rule space partitioning strategies to reduce rule duplication, but such methods still cannot support fast rule updates.
  • Hash-based methods can achieve rapid update of rules, but the disadvantage is that the search and matching speed of data packets is slow.
  • Tuple Space Search method when packet classification is performed, all hash tables need to be searched at once to find matching rules. Therefore, the classification speed will decrease as the number of hash tables increases.
  • TSS can support rapid update.
  • existing methods such as Pruned Tuple Space Search, TupleMerge, and PartitionSort sacrifice update performance to improve the search speed of TSS, they still cannot support efficient packet classification and high-speed online rule update at the same time.
  • the existing various data packet classification methods cannot take into account the search speed, update speed and accuracy, and cannot realize efficient data packet classification and high-speed online rule update.
  • the present invention hopes to propose a data packet classification method with fast search and search speed, fast rule update speed, and high search accuracy. Furthermore, a data packet classification method based on convolutional neural network is proposed.
  • the present invention proposes to perform rule classification based on the combination of the distribution range of the rule address prefixes to form multiple merging schemes, to determine the rule merging scheme through the performance module, and to convert the rule model into an image model, and to perform the rule set based on the convolutional neural network. Quickly divide and construct a hash table, and at the same time realize the efficient classification and search of data packets and the high-speed online update of rules.
  • the present invention provides a data packet classification method based on a convolutional neural network, which is characterized in that the method includes the following steps:
  • Step (1) For each rule set in the training rule set, the rules are merged according to the different prefix ranges of the source address and destination address of the rules in the rule set to form a variety of merging schemes, and the training rule set is determined based on performance evaluation The optimal merging scheme of each rule set;
  • Step (2) Convert each rule set of the training rule set and the prefix combination distribution of the target rule set into an image, use the parameters of the image to characterize the parameters of the corresponding prefix combination distribution, and use the image and the corresponding optimal combination of the training rule set
  • the scheme is the feature training convolutional neural network model
  • Step (3) Input the target image converted into the target rule set into the convolutional neural network model, and determine the merging scheme of the target rule set based on the degree of matching between the target image and the image in the convolutional neural network model , Build the corresponding hash table for data packet classification.
  • the pixel coordinates of the converted image represent the prefix length or length range combination of the source address and destination address of the rule in the corresponding rule set, and the pixel value represents the rule corresponding to the prefix length or length range combination in the corresponding rule set. quantity.
  • the training of the convolutional neural network model in the step (2) includes classifying the prefix combination distribution of the rule set based on the similarity between the images, and setting the prefix combination for each category The distribution determines the corresponding merging plan.
  • it includes calculating the difference information between the pixel points of the image corresponding to each prefix combination distribution, as the fingerprint of the corresponding image, and calculating the difference between the fingerprint of each image and the fingerprint of the reference image, And based on the comparison between the difference value and a predetermined threshold, the category of the rule set corresponding to the corresponding image is determined.
  • it further includes performing rule update to the target rule set.
  • the rule update includes determining the corresponding hash table based on the prefix combination length of the updated rule, and the updated rule will be updated in the corresponding hash bucket. Update, and update the value of the pixel corresponding to the updated rule in the image of the prefix combination distribution corresponding to the target rule set.
  • the method further includes monitoring the Hamming distance before and after the prefix combination distribution update of the target rule set, and determining whether to reconstruct the hash table based on the Hamming distance.
  • the method further includes setting a priority for each hash table, the priority of which is the highest priority of the hash table containing rules, sorting all the hash tables, and performing data packets When matching, stop searching when the priority of the hit rule is not less than the priority of the next hash table.
  • the performance evaluation uses the formula Proceed, where, Represents the average hash time, Represents the average verification time, m is the number of hash tables, n i represents the number of rules in the i-th hash table, s i represents the size of the i-th hash table, The time for priority comparison.
  • the present invention provides a data packet classification system based on a convolutional neural network, characterized in that the system includes an offline system and an online system,
  • the offline system includes a calculation module and a convolutional neural network offline training module
  • the online system includes a data packet classification and forwarding module and a convolutional neural network online module
  • the calculation module is used for merging the rules of each rule set in the training rule set according to the different prefix range combinations of the source address and the target address of the rule, and evaluating the performance of different merging schemes to determine the optimal merging of each rule set Solution, and convert the distribution of each prefix combination of each rule set of the training rule set into an image, and use the parameters of the image to characterize the parameters of the corresponding prefix combination distribution;
  • the convolutional neural network offline training module uses the training rule set to perform convolutional neural network model training with the image of the training rule set and the corresponding optimal merging scheme as features;
  • the convolutional neural network online module is used to convert the prefix combination distribution of the target rule set into an image, use the parameters of the image to characterize the parameters of the corresponding prefix combination distribution, and use the trained convolutional neural network model to determine the target rule set
  • the merger plan ;
  • the data packet classification and forwarding module is used to construct a corresponding hash table based on the merging scheme, so as to perform data packet classification based on the hash table.
  • the convolutional neural network offline training module classifies the prefix combination distribution of the rule set based on the similarity between the images, and determines a corresponding merging scheme for the prefix combination distribution of each category.
  • it further includes a monitoring module, which reads the prefix combination distribution of the target rule set and determines its category, and determines whether to perform hash table reconstruction based on the change of the category.
  • the data packet classification and forwarding module determines the corresponding hash table based on the prefix combination of the updated rule, updates the updated rule in the corresponding hash bucket, and
  • the online module of the convolutional neural network updates the value of the pixel corresponding to the updated rule in the image of the prefix combination distribution corresponding to the target rule set.
  • the present invention provides a computer-readable storage medium on which a computer program is stored, wherein the program is executed by a processor to implement the above method.
  • a computer device includes a memory and a processor, and a computer program that can run on the processor is stored on the memory, and is characterized in that the above method is implemented when the processor executes the program.
  • the method of the present invention can significantly improve the data packet search performance, improve the data packet search speed, and improve the update speed of the rules.
  • the system of the present invention can ensure that the online system realizes the efficient search of data packets and the rapid update of the rule set through the mutual cooperation of the online system and the offline system.
  • the update of the rule set can be monitored to always reflect the latest state of the network.
  • the data packet search performance of the method of the present invention is obviously 4.1 times that of the PartitionSort (PS) method, 8.3 times that of the Tuple Space Search (TSS) method, and is Pruned Tuple Space Search( PR_TSS) method is 3.5 times, which is 4.3 times of TupleMerge(TM) method.
  • the update speed of the method of the present invention is 9.6 times that of the PS method, 1.8 times of the TSS method, 2.3 times of the PR-TSS method, and 5.2 times of the TM method.
  • the method of the present invention further reduces the rule storage cost by merging the hash table.
  • the memory cost of the present invention is 36% of the PS method, 70% of the TSS method, and the same as the PR-TSS method. 63%.
  • Fig. 1 is a schematic flowchart of a method for classifying data packets based on a convolutional neural network in an embodiment of the present invention.
  • Fig. 2 is a schematic structural diagram of a data packet classification system based on a convolutional neural network in an embodiment of the present invention.
  • Figure 3 is a schematic diagram of the standardization and transformation process of the rule set using the inventive method.
  • FIG. 4 is a schematic diagram of the search performance comparison between the method of the present invention and the existing method under the condition that the rule set is not updated in the actual test.
  • Fig. 5 is a schematic diagram of comparison of rule update time between the method of the present invention and the existing method in actual testing.
  • Fig. 6 is a schematic diagram showing the comparison of the memory overhead of the rule set between the method of the present invention and the existing method in actual testing.
  • FIG. 7 is a schematic diagram of comparison of the data packet search rate between the method of the present invention and the existing method under different rule set update rates in actual tests.
  • Figure 8 is a schematic diagram of the performance comparison of the system with and without the monitoring module in the actual test.
  • Fig. 9 is a schematic diagram showing the performance comparison between native Open vSwitch (OVS) and the method of the present invention after introducing OVS.
  • OVS Open vSwitch
  • Figure 10 is a schematic diagram of image fingerprint acquisition and similarity comparison using the method of the present invention.
  • FIG. 11 is a schematic diagram of the process of performing hash table reconstruction judgment in the invention system.
  • Figure 12 is a box plot of Hamming distance calculation between rule sets.
  • Data packet classification technology is one of the most critical operations in network equipment, and its data packet classification speed and rule update speed play a vital role in the overall performance of the system.
  • the existing technology cannot support fast data packet classification and efficient online update of rules at the same time.
  • the present invention proposes a data packet classification system and classification method based on convolutional network (CNN) to support fast online Update and efficient data packet search.
  • CNN convolutional network
  • Step (1) merging each rule set in the training rule set, constructing a performance model to estimate the performance of various merging schemes, and determining the optimal merging scheme of each rule set in the training rule set.
  • Step (1.1) first perform hash merging of data packet classification to form different merging schemes.
  • the inventor of this application has learned through in-depth research on data classification that the speed of data packet classification can be improved by reducing the number of hash tables corresponding to the rule set, but at the same time, it is necessary to avoid rules falling into the same bucket of the hash table. Matching overhead brought by in. Therefore, the hash merging method proposed by the present invention reduces the number of hash tables while ensuring a lower rule conflict rate, thereby improving the search speed of data packets.
  • the present invention is based on the distribution of rule prefix combinations, and is classified by using "prefix range combination", that is, the combination of the prefix range of the source address and the target address of the rule set is used for classification, and the rule set is divided into multiple disjoint subsets to reduce The number of hash tables.
  • the rule set in Table 1 can be merged into four hash tables. See Table 2.
  • the prefix range combinations are ([0, 3), [0, 4)), ([0, 3), [4 , 6)), ([3, 6), [0, 4)) and ([3, 6), [4, 6)), as shown in Table 2.
  • rule 0 in Table 1 is mapped to ([3, 6), [4, 6)) in Table 2, indicating that the prefixes of its source address and destination address fall in [3, 6) and [4, 6] ) Within the range.
  • Step (1.2) uses the performance model to perform performance evaluation in the packet classification process.
  • a performance model is proposed to estimate the performance of data packet classification by analyzing the time overhead required for each step in the hash-based data packet classification method.
  • the hash-based data packet classification method usually includes three steps: 1) Search all hash tables in the target system to determine whether there are rules in the hash table that match the data packet; 2) Verify whether the rules really match the data packet To avoid false positives caused by hash conflict or rule overlap; 3) If multiple rules are matched, the rule with the highest priority is selected as the final classification result of the data packet. Therefore, the hash-based packet classification time includes the above three parts. Among them, the hash time is related to the number of hash tables. The time for matching verification depends on the number of hash collisions or the number of overlapping rules. Given a hash function, the probability of matching entries in the hash table is proportional to the utilization of each hash table. This ratio can be defined as the number of entries in the hash table divided by the solution space size of the hash table ,as follows:
  • R & lt hits i represents the i-th hash table
  • e i represents the i-th entry of the hash table
  • s i represents the i th size of the hash table.
  • m is the number of hash table
  • the hash H i is the i-th hash table computation time
  • C i is the time after the verification hit the i-th hash table
  • It is the time for rule priority comparison.
  • the performance model guarantees the rationality and efficiency of each type of rule set merging scheme in the method of the present invention.
  • Step (1.3) uses the performance model to determine the hash table merging scheme based on the average lookup performance of different hash table merging schemes, and build the hash table.
  • the optimal merging scheme is determined for each rule set in the training rule set, and tags are added to the corresponding rule set to form a tagged training rule set.
  • Step (2) The prefix distribution combination of each rule set in the training rule set is converted into a standardized image (the target rule set will be imaged in a similar manner later), and the convolutional neural network model is substituted for model training.
  • the training set is divided into two parts: features and labels.
  • the feature is the prefix combination distribution corresponding to the rule set, which is a graph converted from the distribution, and the label is the corresponding optimal merging scheme.
  • Step (2.1) converts the prefix distribution combination of each rule set of the training rule set into a standardized image.
  • the hash merging method In the hash merging method, all merging situations need to be traversed, and the hash table is constructed according to the solution with the best search performance.
  • An innovative point of the data packet classification method based on the convolutional network proposed in the present invention is to convert the process of traversing all the merging schemes to find the optimal solution into an image recognition problem. Therefore, it is necessary to first establish a model to combine different rule set prefix distributions Convert to image.
  • the combination of the prefix length of the source IP address and the destination IP address is not the same.
  • the present invention converts the IP address prefix combination distribution of each rule set into a two-dimensional image, using one dimension (such as x coordinate) to represent the length of the source IP address prefix, and another dimension (such as y coordinate) Indicates the length of the target IP address prefix.
  • each pixel represents a "prefix length combination”
  • each pixel value in the two-dimensional image is set to the corresponding "prefix length combination” in the rule set
  • the number of rules if the image needs to be coarse-grained, the rule prefix can be divided into a range in each dimension according to a certain step, and each pixel represents a "prefix range combination", then every two-dimensional image
  • the pixel value is set to the number of corresponding "prefix range combination" rules in the rule set. Whether it is a fine-grained or coarse-grained image, the pixel value range of the image depends on the number of bits used by the pixel.
  • the range of pixel values in the image is [0,2 n -1 ].
  • the number of rules for each combination has no range limit. Therefore, we need to standardize the number of rules for each prefix combination in the rule set according to the range of image pixel values. It falls into the same range as the pixel value. This process is shown in Figure 3.
  • the rule prefix length in Figure 3 is any value within the IPV4 address prefix range. Of course, during the use process, those skilled in the art may consider using other image parameters to characterize the parameters of the rule set during image conversion.
  • the following steps (2.2) and (2.3) are required to determine how many categories the images in the training set share, and to ensure that the images in the same category are similar.
  • Step (2.2) Extract the characterization information of the image.
  • the present invention uses the following steps to obtain the fingerprint of each image to measure the similarity of image distribution: 1) The rule set prefix combination distribution image is scaled down by nearest neighbor interpolation (depending on the accuracy of the user's choice. The more the scale is Larger, the fewer pixels of the image obtained, the less information the reduced image contains.
  • Step (2.3) classifies the image based on the characterization information of the image.
  • the present invention uses Hamming distance, that is, the number of different characters in the corresponding positions of two strings of equal length, to calculate the number of different digits between two fingerprints, thereby measuring the similarity between different distributions .
  • Hamming distance is less than the threshold K, the two images are considered to be of the same type, otherwise, the two images are of different types.
  • the choice of the threshold K depends on the changes of the rules in the actual scene and the needs of users. When the threshold K is smaller, it means that the measurement of image similarity is more accurate, the system is more sensitive to image changes, and the complexity of the system is also higher. . Conversely, the larger the K value, the rougher the measurement of image similarity, the lower the sensitivity of the system to image changes, and the lower the complexity of the system .
  • the Hamming distance is 25. If the K value is defined as 20, since 25 is greater than 20, the two images belong to different types. If the K value is defined as 30, since 25 is less than 30, the two images belong to the same type.
  • the threshold K For the selection of the threshold K, the following example can be used to illustrate. Suppose there are six types of rules in the actual scenario, namely ACL1, ACL2, FW1, FW2, IPC1, and IPC2. Each type of rule includes 200 different rule sets. In each type of rule, the Hamming distance between the images corresponding to the two rule sets is calculated respectively, and the six box plots from left to right in Figure 12 are obtained.
  • Step (3) Image the prefix combination distribution of the current rule set, substitute the trained convolutional neural network model, determine the category of the current rule set based on image similarity, and determine the hash table merging scheme according to the category of the current rule set , Build a hash table.
  • the rule set to be processed needs to be converted into an image. The method is the same as that in the training set, that is, in a system, the choice of granularity is definite. Once the granularity is determined, a rule set will only correspond to one image.
  • the rule set After the rule set is converted into an image by using the method of the present invention, its category can be quickly recognized through the CNN network. Different from the label of the image in the traditional image classification, in the present invention, the optimal hash table merging scheme under the current distribution is selected as the label. Therefore, when the rule set is converted into the corresponding image and input into the CNN network, the corresponding hash table merging scheme can be quickly obtained, and the hash table can be quickly constructed.
  • Step (4) performs data matching based on the priority of the hash table.
  • the matching can be performed normally, or the following method of the present invention can be used for matching.
  • the present invention proposes two optimization schemes. First, set the priority for each hash table, whose priority is the highest priority of the hash table containing rules, and then sort the hash tables in descending order of priority. By sorting the hash table, once the priority of the hit rule is not less than the priority of the next hash table, the search is stopped, thereby speeding up the search.
  • the second optimization scheme sorts the rules in the same bucket of the hash table in descending order of priority. Once a rule is matched, it is the rule with the highest priority in the bucket, and the comparison of the rules in the bucket is ended. Speed up the search speed.
  • Step (5) For the rules to be updated, the following methods are used to update the rules.
  • the update of the rule includes the insertion and deletion of the rule.
  • the rules in the bucket are reordered according to the descending order of priority.
  • the prefix distribution combined image corresponding to the rule set needs to be updated, and the pixel value corresponding to the rule needs to be increased.
  • the hash table is updated so that the category of the prefix combination distribution corresponding to the current hash table is different from the category of the prefix combination distribution corresponding to the previous update, that is, when the category changes, the hash table is reconstructed.
  • Step (6) Determine whether to reconstruct the hash table based on the distribution change of the prefix combination of the current rule set.
  • the distribution D C CNN sent to find the model corresponding to the label, i.e., a new hash table merge program.
  • the CNN model will also output the rule set prefix combination distribution R corresponding to the label. If the Hamming distance between R and D C is still greater than the threshold, the D C as a new type of distribution system and pass it to offline. And through the performance model to compare the performance of the current merging scheme and TSS, select the better performance scheme as its merging scheme; otherwise, it has become another known distribution, only need to reconstruct the hash table according to the label.
  • the method of the present invention can be implemented by the classification system shown in Fig. 2 or by other systems.
  • the classification system of the present invention adopts an online update and efficient data packet search classification architecture.
  • the classification system includes two parts: an online system and an offline system.
  • the offline system includes the calculation module and the CNN offline module.
  • each rule set in the training set and/or test set (which can be collectively referred to as the training rule set) is merged in different ways, and the calculation module uses the performance model to perform different merging schemes.
  • the optimal hash table merging scheme corresponding to each rule set in the training set and/or test set is obtained.
  • the CNN offline module uses each hash table merging scheme and the distribution characteristics of the rules in the corresponding rule set as training Set, train the CNN model, and send the trained model to the CNN online module.
  • the merging scheme is formed as follows: based on the distribution of the prefix combination of the source address and the target address of the rule, the rules in the rule set of the target system with a predetermined prefix value or prefix range are combined, and the rule set is divided into a plurality of disjoint subordinates. Set, each combination method forms a merging scheme, and using the performance model, based on the average search performance of different hash table merging schemes, the optimal merging scheme of the hash table is determined.
  • the online system contains three modules: CNN online module, data classification and forwarding module, and monitoring module.
  • the CNN online module uses the trained CNN model to identify the rule distribution in the rule set to be classified and determine its hash table merging scheme. Specifically, the prefix combination distribution of the source address and the destination address of the rules in the current rule set is converted into a corresponding image, the category of the current rule set is determined based on the image recognition, and the hash table merging scheme is determined.
  • the data classification and forwarding module is used to match and forward data packets.
  • the classification system of the present invention needs to update two parts when updating the rules: the corresponding hash table in the data packet classification and forwarding module and the rule set prefix combination distribution.
  • the monitoring module is used to monitor the rule set prefix combination distribution change, which is changing. When the category to which the prefix combination distribution belongs is changed, the hash table is rebuilt.
  • the CNN offline module in the offline system uses the image of the rule set prefix combination distribution map and the optimal hash table merging scheme under the current distribution obtained by the calculation module as labels to train the CNN model.
  • the model is trained, it is sent to the online system as a model in the CNN online module. Then the online and offline systems work together in the following way.
  • the online system After the classification system receives the issued rule set to be processed, the online system first converts the prefix combination distribution of the original rule set into an image through the image model, and passes it to the CNN online module to obtain the hash table merging scheme. Based on this merging scheme, the packet classification and forwarding module builds a hash table.
  • the rule search and matching are directly performed in the module, and after the corresponding rule is matched, the corresponding action is performed on the data packet according to the rule.
  • the corresponding hash table and rule set prefix combination distribution in the packet classification and forwarding module will be updated at the same time.
  • the monitoring module reads the rule set prefix combination distribution at a certain time interval, and when it is found that the rule distribution has changed to another type, it updates the hash table merging scheme and reconstructs the hash table. If the distribution is a new type, the monitoring module will also pass it to the offline calculation module to get its corresponding label. The new regular prefix combination distribution image and its corresponding label will also be all passed to the CNN offline module in the offline system for model training. Finally, replace the old online CNN model with the newly trained CNN model.
  • the architecture proposed by the present invention can ensure that the online system realizes efficient search of data packets and rapid update of the rule set, while monitoring the update of the rule set, and always reflects the latest state of the network. Realize high-speed data packet matching and forwarding, and fast online update of rules.
  • the applicant used 6 rule sets (acl1, acl2, fw1, fw2, ipc1, ipc2) generated by ClassBench and 2 real rule sets (cloud1, cloud2) to test the data set.
  • the invented data packet classification method and various classification methods in the prior art have been tested. After testing, the method of the present invention can significantly improve the data packet search performance, increase the data packet search speed, and increase the rule update speed.
  • the data packet search performance of the method of the present invention is significantly better than other algorithms.
  • the CRP method of the present invention is 4.1 times that of PartitionSort (PS) and 8.3 times that of Tuple Space Search (TSS).
  • Pruned Tuple Space Search (PR_TSS) is 3.5 times and 4.3 times that of TupleMerge(TM).
  • the rule update speed of the present invention is 9.6 times that of PS, 1.8 times that of TSS, 2.3 times that of PR-TSS, and 5.2 times that of TM.
  • the present invention further reduces the rule storage overhead by merging the hash tables, and the comparison with other algorithms is shown in FIG. 6.
  • the memory overhead of the present invention is 36% of PS, 70% of TSS, and 63% of PR-TSS.
  • the present invention reconstructs the hash table to ensure that the system always maintains a higher data packet search rate.
  • the system performance is shown in FIG. 8.
  • the applicant substituting the method of the present invention into Open vSwitch and replacing the structure of MegaflowCache (which adopts the Tuple Space Search method), the throughput of the replaced Open vSwitch is 10 times that of the original Open vSwitch, as shown in FIG. 9.
  • the present invention proposes a monitoring module to monitor the distribution change of the rule set prefix combination caused by the rule update, and update the hash table merging scheme in time.
  • First monitoring module when the corresponding D C measured with the previous hash table constructed prefix composition distribution of the Hamming distance between the current rule prefixes D P composition distribution, if the Hamming distance is greater than the threshold value K, the distribution described D C D P and not It belongs to the same type of distribution, and therefore, the distribution D C CNN sent to find the model corresponding to the label, i.e., a new hash table merge program.
  • the CNN model will also output the rule set prefix combination distribution R corresponding to the label.
  • the module can quickly classify the new rule distribution and find the current best
  • the excellent hash table merging scheme rebuilds the hash table so that the system performance is always maintained at a high level.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

一种基于卷积神经网络的数据包分类方法及系统。所述方法包括对于训练规则集中的每个规则集进行归并,形成多种归并方案,基于性能评估确定训练规则集中各个规则集的最优归并方案;将训练规则集的各个规则集以及目标规则集的前缀组合分布转换成图像,以所述图像和对应最优归并方案为特征训练卷积神经网络模型;基于图像相似性对目标规则集分类,构建相应哈希表,用于数据包分类。所述方法可以提升数据包查找性能,提升数据包查找速度,并提升规则的更新速度。所述系统通过在线系统与离线系统的相互协作,能够保证在线系统实现数据包的高效查找以及规则集快速更新,并且可以监测规则集的更新,始终反映网络的最新状态。

Description

一种基于卷积神经网络的数据包分类方法及系统 技术领域
本发明涉及计算机网络中数据包查找分类技术领域,具体涉及一种基于卷积神经网络的数据包分类方法。
背景技术
数据包查找分类通过预先定义或者动态生成的规则集合,实现数据包分类处理,是交换机、路由器、防火墙、负载平衡器、云平台软件交换机和其他网络设备中的基础关键功能。软件定义网络、网络功能虚拟化、云计算等场景需要频繁更新规则。规则存储成决策树等数据结构可以实现数据包高速匹配转发,但规则更新速度慢,在规则更新时,数据包匹配转发速度将极大降低。基于哈希的数据包分类方法可以支持快速规则更新,但匹配转发速度慢。软件定义网络、网络功能虚拟化和云计算等设备迫切需要一种既可以支高速的数据包匹配转发,又满足规则的快速在线更新的数据包分类方法。
目前现有的数据包分类技术主要分为三类:基于硬件的数据包分类技术,基于降维的数据包分类技术和基于空间划分的数据包分类技术。
在基于硬件的数据包分类技术中,T-CAM通过实现并行搜索从而降低查找时间。但是,T-CAM存在存储空间有限、耗电量大和规则更新速度慢等缺点。除T-CAM外,数据包分类也可以在其他硬件平台上运行,例如GPU和FPGA。但是,在这些平台运行需要特定的芯片、硬件指令和编程语言进行设计,实现与应用较为不便。
在基于降维的数据包分类技术中,Cross-producting和RFC首先将多维规则分成若干个单维规则以单独匹配,最终将所有单维匹配的结果进行合并。这种方法的缺点是,当规则集很大时,合并过程变得非常复杂。此外,当一条规则更新时,每个维度相对应的规则表都需要进行更新,规则更新速度缓慢。
在基于空间划分的数据包分类技术中,分类过程不是将传入数据包与整个规则集匹配,而是分为两个步骤:第一步确定要搜索的规则集子空间, 第二步并将数据包与相应子空间中的规则进行匹配。这种方法进一步分为两类:决策树方法和基于哈希的方法。
决策树方法(如HiCuts和HyperCuts错误!未找到引用源。)的关键思想是将搜索空间递归地划分为多个子空间,直到每个区域中的规则数目低于某个阈值。决策树的高效性保证了数据包的高速分类,但是基于树的数据结构存在更新缓慢的问题。此外,某些规则可能需要复制到多个子空间,从而导致内存开销变大。EffiCut和SmartSplit提出了一些规则空间分区策略来减少规则复制,但这类方法仍然无法支持快速规则更新。
基于哈希的方法(比如Tuple Space Search)则可以实现规则的快速更新,但是其缺点在于数据包的查找和匹配速度较慢。Tuple Space Search方法中,当进行数据包分类时,需要一次性搜索所有哈希表以找到匹配的规则,因此分类速度会随着哈希表数量的增加而降低。当规则更新时,仅需要找到对应的哈希表,插入或删除规则,因此TSS能够支持快速更新。虽然现有方法如Pruned Tuple Space Search,TupleMerge和PartitionSort通过牺牲更新的性能,来提高TSS的查找速度,但是仍然无法同时支持数据包的高效分类和高速的在线规则更新。
发明内容
因此,现有的各种数据包分类方法都无法兼顾搜索速度、更新的速度和准确性,无法实现高效的数据包分类和高速的在线规则更新。
针对上述问题的难点以及目前技术的不足,本发明希望提出一种搜索查找速度快、规则更新速度快、查找准确性高的数据包分类方法。进而,提出了一种基于卷积神经网络的数据包分类方法。本发明提出了基于规则地址前缀分布范围组合的方式进行规则分类形成多种归并方案、通过性能模块进行规则归并方案的确定,并且将规则模型转化为图像模型,基于卷积神经网络对规则集进行快速划分构建哈希表,同时实现数据包的高效分类、查找和规则的高速在线更新。
为实现上述目的,一方面,本发明提供一种基于卷积神经网络的数据包分类方法,其特征在于,所述方法包括下述步骤:
步骤(1)、对于训练规则集中的每个规则集,按照规则集中的规则的源地址和目标地址的不同前缀范围组合对其中规则进行归并,形成多种归并方案,基于性能评估确定训练规则集中各个规则集的最优归并方案;
步骤(2)、将训练规则集的各个规则集以及目标规则集的前缀组合分布转换成图像,利用图像的参数表征相应前缀组合分布的参数,以所述训练规则集的图像和对应最优归并方案为特征训练卷积神经网络模型;
步骤(3)、将目标规则集所转换成的目标图像输入到所述卷积神经网络模型,基于所述目标图像与所述卷积神经网络模型中图像的匹配度确定目标规则集的归并方案,构建相应哈希表,用于数据包分类。
在一种优选实现方式中,所转换的图像的像素坐标表示相应规则集中规则的源地址和目标地址前缀长度或长度范围组合,像素值表示相应规则集中对应于该前缀长度或长度范围组合的规则的数量。
在另一种优选实现方式中,所述步骤(2)中对卷积神经网络模型进行训练包括基于图像之间的相似性对规则集的前缀组合分布进行分类,并且为每种类别的前缀组合分布确定相应的归并方案。
在另一种优选实现方式中,包括计算每种前缀组合分布所对应的图像的像素点之间的差值信息,作为相应图像的指纹,计算各个图像的指纹与参照图像的指纹的差异值,并基于所述差异值与预定阈值之间的比较,确定相应图像所对应规则集的类别。
在另一种优选实现方式中,还包括对目标规则集进行规则更新,所述规则更新包括基于所更新规则的前缀组合长度确定对应的哈希表,将在对应哈希桶中对所更新规则进行更新,并且更新该目标规则集对应的前缀组合分布的图像中所更新规则对应像素点的值。
在另一种优选实现方式中,还包括监测目标规则集的前缀组合分布更新前后的汉明距离,并基于该汉明距离确定是否进行哈希表的重构。
在另一种优选实现方式中,所述方法还包括为每个哈希表设置优先级,其优先级为该哈希表包含规则的最高优先级,对所有哈希表进行排序,进行数据包匹配时,当命中规则的优先级不小于下一个哈希表的优先级时停止查找。
在另一种优选实现方式中,性能评估采用公式
Figure PCTCN2019128935-appb-000001
进行,其中,
Figure PCTCN2019128935-appb-000002
表示平均哈希时间,
Figure PCTCN2019128935-appb-000003
表示平均验证时间,m为哈希表的数目,n i表示第i个哈希表中的规则数,s i表示第i个哈希表的大小,
Figure PCTCN2019128935-appb-000004
为优先级比较的时间。
另一方面,本发明提供一种基于卷积神经网络的数据包分类系统,其特征在于,所述系统包括离线系统和在线系统,
所述离线系统包括计算模块和卷积神经网络离线训练模块,所述在线系统包括数据包分类和转发模块和卷积神经网络在线模块,
所述计算模块用于对训练规则集中的各个规则集按照规则的源地址和目标地址的不同前缀范围组合对其中规则进行归并,对不同归并方案的性能进行评估确定每个规则集的最优归并方案,并且将训练规则集的各个规则集的每种前缀组合分布转换成图像,利用图像的参数表征相应前缀组合分布的参数;
所述卷积神经网络离线训练模块利用所述训练规则集以所述训练规则集的图像和对应最优归并方案为特征进行卷积神经网络模型训练;
所述卷积神经网络在线模块用于将目标规则集的前缀组合分布转换成图像、利用图像的参数表征相应前缀组合分布的参数,并且利用训练好的卷积神经网络模型确定所述目标规则集的归并方案;
所述数据包分类和转发模块用于基于所述归并方案构建相应哈希表,以基于所述哈希表进行数据包分类。
在另一种优选实现方式中,所述卷积神经网络离线训练模块基于图像之间的相似性对规则集的前缀组合分布进行分类,并且为每种类别的前缀组合分布确定相应的归并方案。
在另一种优选实现方式中,还包括监测模块,其读取目标规则集的前缀组合分布并判定其类别,基于类别的变化情况判定是否进行哈希表重构。
在另一种优选实现方式中,规则更新时,所述数据包分类和转发模块基于所更新规则的前缀组合确定对应的哈希表,在对应哈希桶中对所更新规则进行更新,并且所述卷积神经网络在线模块更新该目标规则集对应的前缀组合分布的图像中所更新规则对应像素点的值。
另一方面,本发明提供一种计算机可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现上述方法。
一种计算机设备,包括存储器和处理器,在所述存储器上存储有能够在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现上述方法。
技术效果
本发明的方法可以显著提升数据包查找性能,提升数据包查找速度,并提升规则的更新速度。
本发明的系统通过在线系统与离线系统的相互协作,能够保证在线系统实现数据包的高效查找以及规则集快速更新,并且优选地,可以监测规则集的更新,始终反映网络的最新状态。
如在优选实施例中所验证的,本发明方法(CRP)的数据包查找性能明显是PartitionSort(PS)方法的4.1倍,是Tuple Space Search(TSS)方法的8.3倍,是Pruned Tuple Space Search(PR_TSS)方法的3.5倍,是TupleMerge(TM)方法的4.3倍。
本发明方法更新速度是PS方法的9.6倍,是TSS方法的1.8倍,是PR-TSS方法的2.3倍,是TM方法的5.2倍。
本发明方法通过对哈希表进行归并,也进一步减少了规则存储开销,与其他算法相比,本发明的内存开销是PS方法的36%,是TSS方法的70%,是PR-TSS方法的63%。
附图说明
以下图仅对本发明作示意性的说明和解释,并不用于限定本发明的范围,其中:
图1为本发明实施例中基于卷积神经网络的数据包分类方法的示意流程图。
图2为本发明实施例中基于卷积神经网络的数据包分类系统的结构示意图。
图3为采用发明方法进行规则集的标准化和转化过程示意图。
图4为实际测试中在规则集无更新情况下本发明方法与现有方法的查找性能比较示意图。
图5为实际测试中本发明方法与现有方法规则更新时间比较示意图。
图6为实际测试中本发明方法与现有方法规则集内存开销比较示意图。
图7为实际测试中在不同的规则集更新速率下,本发明方法与现有方法数据包的查找速率比较示意图。
图8为实际测试中带监测模块与不带监测模块系统性能比较示意图。
图9为原生Open vSwitch(OVS)和将本发明方法引入OVS后的性 能比较示意图。
图10为采用本发明方法进行图像指纹获取及相似性比较示意图。
图11为在发明系统中进行哈希表重建判断的流程示意图。
图12为规则集间汉明距离计算的箱线图。
具体实施方式
为了使本发明的目的、技术方案、设计方法及优点更加清楚明了,以下结合图通过具体实施例对本发明进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本发明,并不用于限定本发明。
数据包分类技术是网络设备中最关键的操作之一,其数据包分类速度和规则更新速度对系统整体性能起着至关重要的作用。但现有的技术无法同时支持快速的数据包分类和高效的规则在线更新,为解决此问题,本发明提出了基于卷积网络(CNN)的数据包分类系统和分类方法,用来支持快速在线更新和高效数据包查找。
下面结合图1详细介绍本发明进行哈希表构建以及数据包分类的具体过程
步骤(1)、对训练规则集中各规则集进行归并,构建性能模型对各种归并方案进行性能估算,确定训练规则集中各规则集的最优归并方案。
步骤(1.1)、首先进行数据包分类的哈希归并,形成不同的归并方案。
本申请的发明人通过对数据分类的深入研究了解到,可以通过减少规则集对应的哈希表的数目提升数据包分类速度,但与此同时,需要避免由于规则落入哈希表同一个桶中带来的匹配开销。故本发明提出的哈希归并方法,在减少哈希表数目的同时,保证较低的规则冲突率,从而提高数据包的查找速度。
本发明基于规则前缀组合分布,通过使用“前缀范围组合”进行分类,即,使用规则集源地址和目标地址的前缀范围的组合进行分类,将规则集划分为多个不相交的子集来减少哈希表的数目。
例如表1中的规则集,可以归并为四个哈希表,见表2,其前缀范围组合分别为([0,3),[0,4))、([0,3),[4,6))、([3,6),[0,4))和([3,6),[4,6)),如表2所示。其中,表1中的规则0映射到表2中的([3,6),[4,6)),表示其源地址和目标地址的前缀分别落在[3,6)和[4,6)范围内。
规则编号 源地址 目标地址 优先级 动作
0 100* 11010 2 Fwd 0
1 101* 1001* 2 Fwd 1
2 11111 10000 3 Drop
3 111* 1000* 2 Fwd4
4 0100* 0110* 2 Fwd0
5 001* 01001 3 Fwd2
6 00* 01001 2 Drop
7 01110 * 4 Drop
8 110* 1* 1 Fwd1
9 * * 0 Fwd3
表1:原始规则集
归并后哈希表编号 前缀范围组合 对应规则编号
0 ([3,6),[4,6)) 0,1,2,3,4,5
1 ([3,6),[0,4)) 7,8
2 ([0,3),[4,6)) 6
3 ([0.3),[0,4)) 9
表2:归并后的规则集
步骤(1.2)利用性能模型进行数据包分类过程中的性能评估。
根据本发明的实施例,提出一种性能模型,通过分析基于哈希的数据包分类方法中,各个步骤需要的时间开销,估算数据包分类的性能。
基于哈希的数据包分类方法通常包括三个步骤:1)搜索目标系统中的所有哈希表,以确定哈希表中是否有规则与数据包匹配;2)验证规则是否真正匹配到数据包中的数据(包头对应的域),以避免由哈希冲突或规则重叠引起的假阳性;3)如果匹配多个规则,则选择具有最高优先级的规则做为数据包最终分类结果。因此,基于哈希的数据包分类时间包括上述三部分。其中,哈希时间与哈希表表的数量有关。匹配验证的时间取决于哈希冲突的数量或规则重叠的数目。给定哈希函数,匹配到哈希表中的条目的概率与每个哈希表的利用率成比例,该比例可以被定义为该哈希表中条目数除以哈希表的解空间大小,如下:
Figure PCTCN2019128935-appb-000005
其中r i表示第i个哈希表的命中率,e i表示第i个哈希表的条目数,s i表示第i个哈希表的大小。当命中哈希表中的某条规则后,必须进一步验证从而避免由于哈希冲突或规则重叠的假阳性。在哈希表中找到匹配的条目后,验证的时间是由平均重叠率加权所花费的时间,而平均重叠率定义为哈希表中的规则数除以条目数,如下:
Figure PCTCN2019128935-appb-000006
其中o i表示第i个哈希表的平均重叠率,n i表示第i个哈希表中的规则数。因此,对一个数据包进行分类的时间T为:
Figure PCTCN2019128935-appb-000007
其中,m为哈希表的数目,h i是第i个哈希表的哈希计算时间,c i是命中第i个哈希表后的验证时间,而
Figure PCTCN2019128935-appb-000008
为进行规则优先级比较的时间。公式(1)和(2)代入公式(3),可以转化为:
Figure PCTCN2019128935-appb-000009
从而进一步简化为:
Figure PCTCN2019128935-appb-000010
其中
Figure PCTCN2019128935-appb-000011
表示平均哈希时间,
Figure PCTCN2019128935-appb-000012
表示平均验证时间。
利用该性能模型,可以了解影响数据包分类性能的因素,以及如何提升基于哈希的数据包分类的匹配速度。该性能模型保证了本发明方法中每一类规则集归并方案的合理性及高效性。
步骤(1.3)利用性能模型,基于不同哈希表归并方案的平均查找性能,确定哈希表归并方案,并构建哈希表。
通过减小前缀长度归并哈希表,大量的规则可能落入哈希表同一个桶中引起哈希冲突,哈希冲突验证会进一步带来匹配开销,因此发明人利用上述性能模型,通过对比不同哈希表归并方案的平均查找性能,找到最优哈希表归并方案,将冲突概率降至最低。
利用这种方式,对训练规则集中的每个规则集进行最优归并方案的确定,为相应规则集添加标签,形成添加标签的训练规则集。
步骤(2)将训练规则集中各个规则集前缀分布组合转换为标准化的图像(后续对于目标规则集也采用类似方式进行图像化),代入卷积神经网络模型进行模型训练。训练模型时,训练集分为两部分:特征和标签。本发明方法中,特征就是该规则集所对应的前缀组合分布,是一张该分布转换成的图,标签就是对应的最优归并方案。
步骤(2.1)将训练规则集各个规则集前缀分布组合转换为标准化的图像。
在哈希归并方法中,需要遍历所有归并情况,根据查找性能最优的方案来对的哈希表进行构建。本发明提出的基于卷积网络的数据包分类方法的一个创新点在于将遍历所有的归并方案找到最优解的过程转换为图像识别问题,因此需要首先建立一个模型将不同的规则集前缀分布组合转换为图像。
在不同的五元组规则集中,源IP地址和目的IP地址前缀的长度的组合也不尽相同。为了获取规则集的特征,本发明将各个规则集的IP地址前缀组合分布转换为二维图像,使用一个维度(比如x坐标)表示源IP地址前缀的长度,另一个维度(比如,y坐标)表示目标IP地址前缀的长度。当需要将规则前缀组合分布用细粒度的图像表示时,即每个像素点表示一种“前缀长度组合”,则二维图像中每个像素值设置为该规则集中对应的“前缀长度组合”规则的数量;若需要对图像进行粗粒度化,可以在每个维度上,对规则前缀按照一定的步长划分范围,每个像素点表示一种“前缀范围组合”,则二维图像中每个像素值设置为该规则集中对应的“前缀范围组合”规则的数量。无论是细粒度还是粗粒度图像,图像的像素值范围均取决于像素点使用的比特数,如像素点使用的比特数为n,则该图像中像素值的范围为[0,2 n-1]。而在规则集中,不论是前缀长度组合还是前缀范围组合,每种组合的规则数目没有范围限制,因此我们需要根据图像像素值的范围,对规则集中每种前缀组合的规则数进行标准化,以使 其落入和像素值相同的范围。该过程如图3所示,图3中规则前缀长度为IPV4地址前缀范围内的任意值。当然,在使用过程中,本领域技术人员在图像转换时,可以考虑使用其他的图像参数来表征规则集的参数。
在卷积神经网络模型训练(对于后续的离线系统)以及应用时的规则集更新监测(对应于后续的在线系统)过程中,都需要基于图像的识别来进行规则集的分类,下面两个步骤(2.2)和(2.3)是对图像的处理过程,既可用于离线训练,又可用于在线分类,又可用于在线规则集更新监测。
具体而言,在卷积神经网络模型训练(对应于后续的离线系统),需要下述步骤(2.2)和(2.3)确定训练集中图像共有多少类别,保证相同类别中图像相近。
应用时,在对数据包分类(对应于后续的在线系统)时,需要将规则集转化为图像,然后输入神经网络进行识别,找到对应的归并方案。这一过程不需要步骤(2.2)和(2.3)。即利用神经网络可以直接判断类别,而步骤(2.2)和(2.3)是针对两个图像的比较。在进行规则监测时,需要判断是否对哈希表进行重构,判断图像变化,此时需要执行步骤(2.2)和(2.3)。
步骤(2.2)提取图像的表征信息。
如果为每个规则集都提供相应的哈希表构建方案,在实际中规则集种类数量巨大,这种想法并不可行。由于哈希表构建方案的性能主要取决于规则集分布,因此发明人提出对规则集前缀组合分布进行分类,相同的种类使用相同的方案。基于上述原因,本发明使用以下的步骤获取每个图像的指纹,从而度量图像分布相似性:1)通过最近邻插值将规则集前缀组合分布图像按比例缩小(取决于用户选择的精度。比例越大,得到的图像像素点越少,缩小后的图像包含的信息越少。比例越小,得到的图像像素点越多,缩小后包含的信息也就越多);2)比较两个相邻像素,将每行像素转换成差值;3)依次对差值进行编码,在同一行中,如果第x列的像素值P[x]小于第x+1列的像素值P[x+1],即P[x]<P[x+1],则该差值设置为“1”,否则设置为“0”,将得到的字符串作为图像的指纹(或称为该图像的表征信息)。
步骤(2.3)基于图像的表征信息对图像进行分类。
优选地,本发明使用汉明(Hamming)距离,即两个等长字符串在对应位置上不同字符的数目,来计算两个指纹之间不同的位数,从而度量不 同分布之间的相似性。如果汉明距离小于阈值K,则认为两个图像属于同一类型,反之,两个图像属于不同类型。阈值K的选择取决于实际场景下规则的变化及用户的需求,当阈值K越小,意味着图像相似度的度量越精确,系统对图像变化的敏感度越高,系统的复杂性也越高。反之,K值越大,图像相似度的度量越粗略,系统对图像变化的敏感度越低,系统的复杂性随之降低
如图10中所示,获取两个图像的指纹后,其汉明距离为25。如果定义K值为20,由于25大于20,则两个图像属于不同的类型。如果定义K值为30,由于25小于30,则两个图像属于相同的类型。对于阈值K的选择,可以通过下面的例子进行说明。假设实际场景中有六类规则,分别是ACL1,ACL2,FW1,FW2,IPC1,IPC2,每类规则分别包括了200个不同的规则集。在每一类规则中,分别计算两两规则集对应的图像间的汉明距离,得到图12中从左至右的六个箱线图。对每个规则集,分别计算其与在不同类规则中的规则集两两间的汉明距离,得到图12中的最后一个箱线图。通过观察图12,可以发现,从左至右的六个箱线图中,汉明距离的最大值小于15,而最后一个箱线图中,汉明距离的最小值大于10。为了保证相同的类别规则集的汉明距离尽可能小,不同类别的规则集的汉明距离尽可能大,可以选择15作为阈值K来区分不同种类的规则集。
步骤(3)对当前规则集的前缀组合分布进行图像化,代入训练好的卷积神经网络模型,基于图像相似性确定当前规则集所属类别,并根据当前规则集所属类别确定哈希表归并方案,构建哈希表。在应用时,需要将待处理的规则集转化为图像,方法和训练集中的一致,即在一个系统中,粒度的选择是确定的。一旦粒度确定,一个规则集只会对应于一张图像。
采用本发明的方法将规则集转化为图像后,能够快速通过CNN网络识别其类别。不同于传统的图像分类中图像的标签,本发明中选择当前分布下最优的哈希表归并方案作为标签。因此,当规则集转化为相应的图像并输入CNN网络中,能够快速得到其对应的哈希表归并方案,迅速构建哈希表。
步骤(4)基于哈希表的优先级进行数据匹配。
构建哈希表后,当需要进行数据包匹配时,可以正常匹配,也可以采用本发明的下述方法进行匹配。
具体而言,上面通过对哈希表进行归并,可以提升哈希表的查找速度, 但是仍然需要对当前所有的哈希表依次进行搜索。为加速数据包匹配与查找速度,本发明提出了两种优化方案。首先为每个哈希表设置优先级,其优先级为该哈希表包含规则的最高优先级,之后对哈希表按照优先级的降序进行排序。通过对哈希表进行排序,一旦命中规则的优先级不小于下一个哈希表的优先级,则停止查找,从而加快查找速度。第二种优化方案对哈希表同一个桶中的规则按照优先级降序进行排序,一旦匹配到某条规则,即为该桶中的优先级最高的规则,结束该桶中规则的比较,从而加快查找速度。
步骤(5)对于待更新的规则,采用下述方式进行规则更新。
当哈希表归并方案确定后,规则的更新包括规则的插入和删除。在本发明中,当需要插入新的规则时,首先需要根据其规则前缀组合长度找到对应的哈希表(哈希表中每个哈希值对应一个哈希桶,哈希桶中容纳与该哈希值对应的规则),将规则插入到该哈希表中对应的哈希桶之后,对该桶中的规则根据优先级降序重新排序。与此同时,需要更新该规则集对应的前缀分布组合图像,增加该规则对应像素点的值。当需要对规则进行删除时,同样需要先定位到对应的哈希表,之后将该规则进行删除,同时更新该规则集对应的前缀分布组合图像,减小该规则对应像素点的值。如果删除规则后,其所在的哈希表也为空,那么同时将该哈希表也删除。
当哈希表更新使得当前哈希表对应的前缀组合分布的类别不同于前次更新时对应的前缀组合分布类别时,即类别变化时,进行哈希表重构。
步骤(6)、基于当前规则集前缀组合分布变化确定是否进行哈希表重构。
具体而言,首先测量当前规则前缀组合分布D C与前一次哈希表构建时对应的前缀组合分布D P之间的汉明距离,如果该汉明距离大于阈值K,说明分布D C和D P不属于同一种分布类型,因此,将分布D C发送到CNN模型中寻找对应的标签,即新的哈希表归并方案。除新标签之外,CNN模型还将输出该标签对应的规则集前缀组合分布R。如果D C和R之间的汉明距离仍然大于该阈值,将D C视为新的分布类型并将其传递给离线系统。并且通过性能模型比较当前归并方案和TSS的性能,选择性能较好的方案作为其归并方案;否则,它已经变为另一个已知的分布,只需要根据标签对哈希表进行重构。
本发明的方法可以通过图2所示分类系统实现,也可以采用其他系统 实现。
概括来讲,本发明的分类系统采用在线更新高效数据包查找分类架构。如附图2所示,该分类系统包括在线系统和离线系统两部分。
离线系统包括计算模块和CNN离线模块,在离线系统中,对训练集和/或测试集(可以统称训练规则集)中的各个规则集按照不同方式进行归并,计算模块利用性能模型对不同归并方案的性能进行评估,得到训练集和/或测试集中每个规则集所对应的最优哈希表归并方案,CNN离线模块将每个哈希表归并方案和对应的规则集中规则的分布特征作为训练集,训练CNN模型,并将训练好的模型传送给CNN在线模块。
优选地,归并方案这样形成:基于规则的源地址和目标地址的前缀组合分布,将目标系统的规则集中具有预定前缀值或前缀范围的规则进行组合,将规则集划分成多个不相交的子集,每种组合方式形成一种归并方案,并且利用性能模型,基于不同哈希表归并方案的平均查找性能,确定哈希表最优归并方案。
在线系统包含三个模块:CNN在线模块、数据分类和转发模块以及监测模块。
CNN在线模块利用训练好的CNN模型来识别待分类规则集中的规则分布并确定其哈希表归并方案。具体而言,将当前规则集中规则的源地址和目标地址的前缀组合分布转换成相应图像,基于图像识别确定当前规则集的类别,确定哈希表归并方案。
数据分类和转发模块用于进行数据包的匹配和转发。本发明的分类系统在进行规则更新时需要对两部分进行更新:数据包分类与转发模块中对应的哈希表和规则集前缀组合分布,监测模块用于监测规则集前缀组合分布变化,在变化导致前缀组合分布所属的类别发生变化时,重新构建哈希表。通过在线系统与离线系统的相互协作,本发明的分类系统能够保证在线系统实现数据包的高效查找以及规则集快速更新的同时,监测规则集的更新,始终反映网络的最新状态。
下面,简单描述本发明中分类系统的工作过程。
在初始化阶段,离线系统中的CNN离线模块使用规则集前缀组合分布映射的图像以及通过计算模块得到的当前分布下最优的哈希表归并方案作为标签来训练CNN模型。当模型训练好后,将其传送到在线系统中,作为CNN在线模块中的模型。之后在线和离线系统通过以下列方式协同 工作。
当分类系统收到下发的待处理规则集后,在线系统首先通过图像模型将原始规则集的前缀组合分布转换为图像,并将其传递给CNN在线模块,以获取哈希表的归并方案。基于该归并方案,数据包分类和转发模块构建哈希表。
数据包分类和转发模块构造完成哈希表后,对于系统收到的数据包,直接在该模块中进行规则的查找与匹配,匹配到相应规则后,根据该规则对数据包执行相应的动作。
当规则集更新时,数据包分类与转发模块中对应哈希表和规则集前缀组合分布会同时更新。监测模块按照一定的时间间隔读取规则集前缀组合分布,当发现规则分布已更改为另一种类型时,更新哈希表归并方案并重新构造哈希表。如果该分布是新类型,监测模块还会将其传递给离线的计算模块来得到其对应的标签。新的规则前缀组合分布图像和其对应的标签也将全部传递到离线系统中的CNN离线模块进行模型的训练。最终,用最新训练的CNN模型替换旧的在线CNN模型。
通过在线系统与离线系统的相互协作,本发明提出的架构能够保证在线系统实现数据包的高效查找以及规则集快速更新的同时,监测规则集的更新,始终反映网络的最新状态。实现了高速的数据包匹配与转发,以及快速的规则在线更新。
分类测试实例
为了验证本发明方法的性能,申请人采用6个由ClassBench生成的规则集(acl1,acl2,fw1,fw2,ipc1,ipc2)和2个真实的规则集(cloud1,cloud2)组成的测试数据集对本发明的数据包分类方法以及现有技术中的各种分类方法进行了测试。经测试,本发明的方法可以显著提升数据包查找性能,提升数据包查找速度,并提升规则的更新速度。
本发明方法(CRP)的数据包查找性能明显优于其他算法,如附图4所示,本发明的CRP方法是PartitionSort(PS)的4.1倍,是Tuple Space Search(TSS)的8.3倍,是Pruned Tuple Space Search(PR_TSS)的3.5倍,是TupleMerge(TM)的4.3倍。
更新速度比较如附图5所示,本发明规则更新的速度是PS的9.6倍, 是TSS的1.8倍,是PR-TSS的2.3倍,是TM的5.2倍。
本发明通过对哈希表进行归并,也进一步减少了规则存储开销,与其他算法对比如附图6所示。本发明的内存开销是PS的36%,是TSS的70%,是PR-TSS的63%。
此外,在有规则更新的情况下,随着更新速度的提升,各个方法的查找速率有所下降,但本发明提出的基于卷积神经网络的数据包分类方法的性能始终大于其他方法,如附图7所示。
同时,当规则集更新导致前缀分布发生变化时时,本发明通过对哈希表进行重构,保证系统始终维持较高的数据包查找速率,系统性能如附图8所示。
进一步地,申请人将本发明方法代入Open vSwitch中、替换MegaflowCache的结构(其采用Tuple Space Search方法),替换后的Open vSwitch的吞吐量是原Open vSwitch的10倍,如附图9所示。
本发明提出通过监测模块监测由于规则更新导致的规则集前缀组合分布变化,并及时更新哈希表归并方案。监测模块首先测量当前规则前缀组合分布D C与前一次哈希表构建时对应的前缀组合分布D P之间的汉明距离,如果该汉明距离大于阈值K,说明分布D C和D P不属于同一种分布类型,因此,将分布D C发送到CNN模型中寻找对应的标签,即新的哈希表归并方案。除新标签之外,CNN模型还将输出该标签对应的规则集前缀组合分布R。如果D C和R之间的汉明距离仍然大于该阈值,我们将D C视为新的分布类型并将其传递给离线系统。并且通过性能模型比较当前归并方案和TSS的性能,选择性能较好的方案作为其归并方案;否则,它已经变为另一个已知的分布,只需要根据标签对哈希表进行重构。上述过程如图11所示。
采用这种方法,当规则集由于更新导致前缀组合分布变化后,原有的哈希表归并方案不再适用于新的规则集,因此该模块能够快速对新规则分布进行分类,并找到当前最优的哈希表归并方案,对哈希表进行重建,使系统性能始终保持在较高的水平。
以上已经描述了本发明的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原 理、实际应用或对市场中的技术改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。

Claims (14)

  1. 一种基于卷积神经网络的数据包分类方法,其特征在于,所述方法包括下述步骤:
    步骤(1)、对于训练规则集中的每个规则集,按照规则集中的规则的源地址和目标地址的不同前缀范围组合对其中规则进行归并,形成多种归并方案,基于性能评估确定训练规则集中各个规则集的最优归并方案;
    步骤(2)、将训练规则集的各个规则集以及目标规则集的前缀组合分布转换成图像,利用图像的参数表征相应前缀组合分布的参数,以所述训练规则集的图像和对应最优归并方案为特征训练卷积神经网络模型;
    步骤(3)、将目标规则集所转换成的目标图像输入到所述卷积神经网络模型,基于所述目标图像与所述卷积神经网络模型中图像的匹配度确定目标规则集的归并方案,构建相应哈希表,用于数据包分类。
  2. 根据权利要求1所述的基于卷积神经网络的数据包分类方法,其特征在于,所转换的图像的像素坐标表示相应规则集中规则的源地址和目标地址前缀长度或长度范围组合,像素值表示相应规则集中对应于该前缀长度或长度范围组合的规则的数量。
  3. 根据权利要求1或2所述的基于卷积神经网络的数据包分类方法,其特征在于,所述步骤(2)中对卷积神经网络模型进行训练包括基于图像之间的相似性对规则集的前缀组合分布进行分类,并且为每种类别的前缀组合分布确定相应的归并方案。
  4. 根据权利要求3所述的基于卷积神经网络的数据包分类方法,其特征在于,包括计算每种前缀组合分布所对应的图像的像素点之间的差值信息,作为相应图像的指纹,计算各个图像的指纹与参照图像的指纹的差异值,并基于所述差异值与预定阈值之间的比较,确定相应图像所对应规则集的类别。
  5. 根据权利要求1所述的基于卷积神经网络的数据包分类方法,其特征在于,还包括对目标规则集进行规则更新,所述规则更新包括基于所更新规则的前缀组合长度确定对应的哈希表,将在对应哈希桶中对所更新规则进行更新,并且更新该目标规则集对应的前缀组合分布的图像中所更新规则对应像素点的值。
  6. 根据权利要求1所述的基于卷积神经网络的数据包分类方法,其特 征在于,还包括监测目标规则集的前缀组合分布更新前后的汉明距离,并基于该汉明距离确定是否进行哈希表的重构。
  7. 根据权利要求1所述的基于卷积神经网络的数据包分类方法,其特征在于,所述方法还包括为每个哈希表设置优先级,其优先级为该哈希表包含规则的最高优先级,对所有哈希表进行排序,进行数据包匹配时,当命中规则的优先级不小于下一个哈希表的优先级时停止查找。
  8. 根据权利要求1所述的基于卷积神经网络的数据包分类方法,其特征在于,性能评估采用公式
    Figure PCTCN2019128935-appb-100001
    进行,其中,
    Figure PCTCN2019128935-appb-100002
    表示平均哈希时间,
    Figure PCTCN2019128935-appb-100003
    表示平均验证时间,m为哈希表的数目,n i表示第i个哈希表中的规则数,s i表示第i个哈希表的大小,
    Figure PCTCN2019128935-appb-100004
    为优先级比较的时间。
  9. 一种基于卷积神经网络的数据包分类系统,其特征在于,所述系统包括离线系统和在线系统,
    所述离线系统包括计算模块和卷积神经网络离线训练模块,所述在线系统包括数据包分类和转发模块和卷积神经网络在线模块,
    所述计算模块用于对训练规则集中的各个规则集按照规则的源地址和目标地址的不同前缀范围组合对其中规则进行归并,对不同归并方案的性能进行评估确定每个规则集的最优归并方案,并且将训练规则集的各个规则集的每种前缀组合分布转换成图像,利用图像的参数表征相应前缀组合分布的参数;
    所述卷积神经网络离线训练模块利用所述训练规则集以所述训练规则集的图像和对应最优归并方案为特征进行卷积神经网络模型训练;
    所述卷积神经网络在线模块用于将目标规则集的前缀组合分布转换成图像、利用图像的参数表征相应前缀组合分布的参数,并且利用训练好的卷积神经网络模型确定所述目标规则集的归并方案;
    所述数据包分类和转发模块用于基于所述归并方案构建相应哈希表,以基于所述哈希表进行数据包分类。
  10. 根据权利要求9所述的基于卷积神经网络的数据包分类系统,其特征在于,所述卷积神经网络离线训练模块基于图像之间的相似性对规则集的前缀组合分布进行分类,并且为每种类别的前缀组合分布确定相应的 归并方案。
  11. 根据权利要求10所述的基于卷积神经网络的数据包分类系统,其特征在于,还包括监测模块,其读取目标规则集的前缀组合分布并判定其类别,基于类别的变化情况判定是否进行哈希表重构。
  12. 根据权利要求9所述的基于卷积神经网络的数据包分类系统,其特征在于,规则更新时,所述数据包分类和转发模块基于所更新规则的前缀组合确定对应的哈希表,在对应哈希桶中对所更新规则进行更新,并且所述卷积神经网络在线模块更新该目标规则集对应的前缀组合分布的图像中所更新规则对应像素点的值。
  13. 一种计算机可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现根据权利要求1至8中任一项所述的方法。
  14. 一种计算机设备,包括存储器和处理器,在所述存储器上存储有能够在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现权利要求1至8中任一项所述的方法。
PCT/CN2019/128935 2019-11-07 2019-12-27 一种基于卷积神经网络的数据包分类方法及系统 WO2021088234A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/761,220 US20220374733A1 (en) 2019-11-07 2019-12-27 Data packet classification method and system based on convolutional neural network

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911081715.3A CN111026917B (zh) 2019-11-07 2019-11-07 一种基于卷积神经网络的数据包分类方法及系统
CN201911081715.3 2019-11-07

Publications (1)

Publication Number Publication Date
WO2021088234A1 true WO2021088234A1 (zh) 2021-05-14

Family

ID=70201161

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/128935 WO2021088234A1 (zh) 2019-11-07 2019-12-27 一种基于卷积神经网络的数据包分类方法及系统

Country Status (3)

Country Link
US (1) US20220374733A1 (zh)
CN (1) CN111026917B (zh)
WO (1) WO2021088234A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112887300B (zh) * 2021-01-22 2022-02-01 北京交通大学 一种数据包分类方法
CN112948646B (zh) * 2021-04-01 2022-12-13 支付宝(杭州)信息技术有限公司 数据识别方法和装置
US11757642B1 (en) * 2022-07-18 2023-09-12 Spideroak, Inc. Systems and methods for decentralized synchronization and braided conflict resolution

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106534224A (zh) * 2017-01-23 2017-03-22 余洋 智能网络攻击检测方法及装置
WO2017200524A1 (en) * 2016-05-16 2017-11-23 United Technologies Corporation Deep convolutional neural networks for crack detection from image data
CN109361617A (zh) * 2018-09-26 2019-02-19 中国科学院计算机网络信息中心 一种基于网络包载荷的卷积神经网络流量分类方法及系统
CN109754021A (zh) * 2019-01-11 2019-05-14 湖南大学 基于范围元组搜索的在线包分类方法
CN109886114A (zh) * 2019-01-18 2019-06-14 杭州电子科技大学 一种基于聚合变换特征提取策略的舰船目标检测方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106777644B (zh) * 2016-12-08 2020-04-10 华能国际电力股份有限公司 电厂标识系统编码的自动生成方法及装置
KR101953672B1 (ko) * 2017-04-18 2019-03-04 한국기술교육대학교 산학협력단 Cnn을 활용한 패킷 페이로드 기반의 네트워크 트래픽 분류시스템
CN110297888B (zh) * 2019-06-27 2022-05-03 四川长虹电器股份有限公司 一种基于前缀树与循环神经网络的领域分类方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017200524A1 (en) * 2016-05-16 2017-11-23 United Technologies Corporation Deep convolutional neural networks for crack detection from image data
CN106534224A (zh) * 2017-01-23 2017-03-22 余洋 智能网络攻击检测方法及装置
CN109361617A (zh) * 2018-09-26 2019-02-19 中国科学院计算机网络信息中心 一种基于网络包载荷的卷积神经网络流量分类方法及系统
CN109754021A (zh) * 2019-01-11 2019-05-14 湖南大学 基于范围元组搜索的在线包分类方法
CN109886114A (zh) * 2019-01-18 2019-06-14 杭州电子科技大学 一种基于聚合变换特征提取策略的舰船目标检测方法

Also Published As

Publication number Publication date
US20220374733A1 (en) 2022-11-24
CN111026917B (zh) 2021-07-20
CN111026917A (zh) 2020-04-17

Similar Documents

Publication Publication Date Title
WO2021189729A1 (zh) 复杂关系网络的信息分析方法、装置、设备及存储介质
WO2021088234A1 (zh) 一种基于卷积神经网络的数据包分类方法及系统
CA3088899C (en) Systems and methods for preparing data for use by machine learning algorithms
He et al. Hashing as tie-aware learning to rank
US20070005556A1 (en) Probabilistic techniques for detecting duplicate tuples
WO2020147317A1 (zh) 一种网络异常行为确定方法、装置、设备及可读存储介质
Dwyer et al. Decision tree instability and active learning
CN106980656B (zh) 一种基于二值码字典树的搜索方法
CN113326377B (zh) 一种基于企业关联关系的人名消歧方法及系统
CN112311780A (zh) 一种基于多维度攻击路径与攻击图的生成方法
CN104239553A (zh) 一种基于Map-Reduce框架的实体识别方法
CN111259933B (zh) 基于分布式并行决策树的高维特征数据分类方法及系统
CN113452802A (zh) 设备型号的识别方法、装置及系统
CN105574541A (zh) 一种基于紧密度排序的网络社区发现方法
Chu et al. Co-training based on semi-supervised ensemble classification approach for multi-label data stream
US11288266B2 (en) Candidate projection enumeration based query response generation
CN114462623A (zh) 基于边缘计算的数据分析方法、系统及平台
CN108764307A (zh) 自然最近邻优化的密度峰值聚类方法
US8370363B2 (en) Hybrid neighborhood graph search for scalable visual indexing
CN115114484A (zh) 异常事件检测方法、装置、计算机设备和存储介质
Zhang et al. Fast online packet classification with convolutional neural network
Romero et al. Bolt: Fast inference for random forests
CN111738290A (zh) 图像检测方法、模型构建和训练方法、装置、设备和介质
CN107423319B (zh) 一种垃圾网页检测方法
Huang et al. Apriori-BM algorithm for mining association rules based on bit set matrix

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19951465

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19951465

Country of ref document: EP

Kind code of ref document: A1