CN113297430B - Sketch-based high-performance arbitrary partial key measurement method and system - Google Patents

Sketch-based high-performance arbitrary partial key measurement method and system Download PDF

Info

Publication number
CN113297430B
CN113297430B CN202110588731.2A CN202110588731A CN113297430B CN 113297430 B CN113297430 B CN 113297430B CN 202110588731 A CN202110588731 A CN 202110588731A CN 113297430 B CN113297430 B CN 113297430B
Authority
CN
China
Prior art keywords
key
bucket
full
full key
array
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110588731.2A
Other languages
Chinese (zh)
Other versions
CN113297430A (en
Inventor
杨仝
张寅达
王睿鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN202110588731.2A priority Critical patent/CN113297430B/en
Publication of CN113297430A publication Critical patent/CN113297430A/en
Application granted granted Critical
Publication of CN113297430B publication Critical patent/CN113297430B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9014Indexing; Data structures therefor; Storage structures hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level

Abstract

The invention relates to a high-performance arbitrary partial key measuring method and system based on Sketch. The method comprises the following steps: extracting a full key and the size thereof from each data packet, and mapping the hash thereof to a bucket of each array in the sketch; updating each mapped bucket using a full key and determining an estimated size of the full key based on a random variance minimization technique; constructing a lookup table containing all full keys and the estimated sizes thereof based on the sketch in the data plane; and when the partial keys are inquired, aggregating the full key set corresponding to each partial key in the control plane to obtain the estimated sizes of the partial keys. The invention realizes high accuracy on any part of key measurement tasks, can realize high-speed operation in a smaller memory space, and has no obvious influence on the system performance by the number of the measured part of keys; by increasing the hardware parallelism and eliminating the loop dependence, the invention can be realized on both a software platform and a hardware platform and has excellent performance.

Description

Sketch-based high-performance arbitrary partial key measurement method and system
Technical Field
The invention relates to the field of any partial key measurement in network measurement, in particular to a method and a system for realizing high-precision measurement of any partial key on a software and hardware platform by using a probability data structure named CocoSketch (a Ruyi profile).
Background
Currently, network monitoring and measurement become the basis of various network management tasks, such as traffic engineering, load balancing, traffic scheduling, and anomaly detection. These tasks typically require timely, accurate estimation of network traffic indicators. In this regard, sketch-based algorithms are able to estimate these metrics with high accuracy in large networks using a small amount of resources. In general, different network measurement tasks need to obtain different statistical information based on different flow keys in the same network. For example, host-level traffic engineering requires SrcIP to be used as a flow key to track large flows, while traffic scheduling requires quintuple as a flow key. Furthermore, we need to keep track of all possible flow keys in detail, including quintuple, SrcIP, DstIP and any prefixes thereof, in order to locate the abnormal flow in the security detection and diagnostic tasks. However, existing sketch-based designs typically focus on estimating the statistics defined on a single stream key, while maintaining a sketch for each stream key is often not feasible due to resource limitations. Therefore, a sketch algorithm capable of supporting multi-key measurement is urgently needed to solve the query problem of any partial key.
To solve this problem, some attempts have been made to work. R-HHH (Randomized Hierarchical Heavy Hitters, see "Ran Ben-Basat, Gil Einziger, Roy Friedman, Marcelo Caggeniani Luzelli, and Erez Waisbard. constant time updates in Hierarchical heavyweight Hitters. in SIGCMM 2017.ACM, 2017.") is mainly used to find collections that share some IP streamlet prefixes, a special case of any partial key query. The method maintains a sketch for each stream key (IP prefix), and randomly selects a sketch for updating based on a sampling technology during insertion, thereby reducing sketch updating operation of each data packet. However, this method only supports IP prefixes as partial keys and is not suitable for hardware platforms due to the excessive memory consumption. USS (unaided space, see "Daniel ting. data gestures for disaggregated subset sum and frequency item estimation. in SIGMOD 2018.ACM, 2018.) solves the problem of any partial key query based on the subset sum estimation theory, which considers that any particular partial key can be represented by a set of particular full keys. The USS applies variance minimization techniques to spacescaving to solve the subset sum estimation problem to enable querying of arbitrary partial keys. However, since each update of the USS needs to be based on all recorded stream information, it cannot achieve high resource efficiency and can only run on a software platform.
Disclosure of Invention
In order to overcome the defects of low precision, low processing speed, excessive resource occupation and poor platform compatibility of the conventional random part key query algorithm, the invention provides a high-performance random part key measurement system based on CocoSketch, and the system can realize a high-accuracy random part key measurement task at a high processing speed on a software and hardware platform with limited resources.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a high-performance arbitrary partial key measurement method based on Sketch comprises the following steps:
extracting a full key and the size thereof from each data packet, and mapping the hash thereof to a bucket of each array in the sketch of the data plane;
updating each mapped bucket using a full key and determining an estimated size of the full key based on a random variance minimization technique;
constructing a lookup table containing all full keys and the estimated sizes thereof based on the sketch of the data plane;
and when the partial keys are inquired, aggregating the full key set corresponding to each partial key in the control plane to obtain the estimated sizes of the partial keys.
Further, the sketch consists of d arrays, each array comprising l<key,value>Pairs of numbers, each pair being called a bucket, recording a full key k F And its estimated value; let B i [j]K and B i [j]V is the key value of the jth bucket in the ith array and its estimated value, i is greater than or equal to 1 and less than or equal to d, j is greater than or equal to 1 and less than or equal to l; d arrays and d independent hash functions (h) respectively 1 (.),…,h d (.)) to be connected.
Further, in the software version algorithm, the updating each mapped bucket using the full key and determining the estimated size of the full key based on a random variance minimization technique includes:
when inserting a packet, representing each incoming packet as a pair of numbers (e, w), where e is the full key and w is its size; for packet (e, w), index h is based on hash 1 (e),…,h d (e) There are d possible buckets to be updated in the d arrays, and the bucket to be updated is selected based on the random variance minimization, in two cases:
(1) if a full key e is recorded in any of the d buckets, increasing the size of the bucket by w;
(2) otherwise, the bucket with the smallest stored value is updated in a random manner.
Further, the randomly updating the buckets with the minimum values includes:
assume the bucket with the smallest value is at k th In the array, to update the bucket, first B k [h k (e)]V increases w, then with probability
Figure BDA0003088581470000021
Replacement of B by e k [h k (e)].K;
If a plurality of buckets have the same minimum value, one bucket is randomly selected for updating.
Further, the estimated value of the full key is queried in the following way:
indexing h based on hash for a full key e to be queried 1 (e),…,h d (e) D corresponding buckets are found in the d arrays;
if k is th Full key B stored in array corresponding storage bucket k [h k (e)]If K is e, then return to B k [h k (e)]V as an estimate of the full key e; otherwise, 0 is returned.
Further, for the hardware version algorithm, the updating each mapped bucket using the full key and determining the estimated size of the full key based on a random variance minimization technique includes:
splitting each storage bucket array into an independent estimation value array and a full key array, wherein updating of each array is operated in parallel or in a pipeline mode so as to increase hardware parallelism and improve throughput;
for each packet (e, w), each array will be updated independently by the following two steps:
(1) b of bucket to be mapped to i [h i (e)]V increases w;
(2) by probability
Figure BDA0003088581470000031
Replacement of B by e i [h i (e)].K。
Further, if the same full key appears in multiple arrays, the average estimated size in the different arrays is taken as its final estimated size.
A Sketch-based high-performance arbitrary partial key measurement system using the above method, comprising:
the data plane component is used for extracting a full key and the size of the full key from each data packet and mapping the hash of the full key to one storage bucket of each array in the sketch; updating each mapped bucket using a full key and determining an estimated size of the full key based on a random variance minimization technique;
the control plane component is used for constructing a lookup table containing all full keys and the estimated sizes of the full keys based on the sketch of the data plane; and when the partial keys are inquired, aggregating the full key set corresponding to each partial key to obtain the estimated sizes of the partial keys.
The invention has the beneficial effects that: by using the random variance minimization technology, the invention realizes high accuracy on any partial key measurement task, can realize high-speed operation in a smaller memory space, and has no obvious influence on the system performance by the measured partial key number; the invention can be realized on both software platforms (such as Open vSwitch and CPU) and hardware platforms (such as FPGA and programmable ASIC) and has excellent performance by using the technology of increasing hardware parallelism and eliminating loop dependence.
Drawings
FIG. 1 is the overall system architecture of the present invention.
Fig. 2 is a graphical representation of a random variance minimization technique.
FIG. 3 is an illustration of a technique to increase hardware parallelism.
Fig. 4 is a diagram of a technique for eliminating loop dependence.
Fig. 5 is an example of data plane software version algorithm insertion.
Fig. 6 is a data plane hardware version algorithm insertion example.
FIG. 7 is an example of a control plane arbitrary portion key query.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following examples in the accompanying drawings. It should be understood that the specific examples described herein are intended to be illustrative only and are not intended to be limiting.
The main content of the invention comprises: minimizing random variance, increasing hardware parallelism, and eliminating loop dependence.
The first technology comprises the following steps: minimization of random variance
Previous theoretical results indicate that minimization of variance based on full-key estimates helps to achieve high precision arbitrary partial-key queries. The USS is updated with variance minimization for each packet insertion, thereby achieving high accuracy on any partial key lookup problem. However, the method of minimizing the variance requires information for all recorded streams, and the update operation is computationally expensive. Therefore, the present invention designs a random variance minimization technique. For each packet insertion, the method of the present invention minimizes the variance based only on the information of the mapped stream (selected by the hash mapping). Fig. 2 is a graphical representation of a random variance minimization technique.
The second technology is as follows: increasing hardware parallelism
Only with technique one, the mapped buckets in all arrays need to be accessed before updating, and then one of the buckets is updated based on the random variance minimization technique, which means that the insertion process of the algorithm needs to access the same memory twice. But hardware platforms typically do not allow access to the same memory at different stages of the pipeline, as this may result in read and write hazards while the pipeline is running. As a result, all operations must be done in serial fashion in the same stage, but this will significantly reduce throughput. Thus, the present invention improves throughput by increasing hardware parallelism. The present invention decouples the update operations in each array. In particular, the present invention independently updates one bucket per array based on a random variance minimization technique. The updates to each array may then be run in parallel or in a pipelined manner, thereby improving throughput. FIG. 3 is an illustration of a technique to increase hardware parallelism.
The third technology: elimination of cyclic dependencies
In the random variance minimization operation, it can be found that a cyclic dependency exists between the stream key and the update of its estimated value. As described above, the same memory cannot be accessed at different stages of the pipeline. Therefore, two consecutive conditional decisions must be made and the same memory space accessed twice in one phase. However, the Tofino switch does not allow such operations to be performed in order to guarantee high line speeds. The invention finds that the cyclic dependence in the second technique can be further eliminated. In technique two, the random variance minimization needs to process only one mapped bucket, so that the update process of the estimation value and the flow key can be independently operated, put into different stages and run the process in a pipeline manner. By combining techniques two and three, the present system can achieve higher throughput on hardware platforms and becomes feasible in P4-based hardware platforms. Fig. 4 is a diagram of a technique for eliminating loop dependence.
Based on the above three technical solutions, the system overview and the specific algorithm design scheme of the invention are as follows:
fig. 1 is a general system architecture of the present invention, and as shown in fig. 1, the system is composed of two parts (or modules) of a data plane and a control plane:
the data plane processes each packet as follows: (1) first, extract the full key k from each packet F And the size of the hash value, and mapping the hash value to a bucket of each array in the sketch; (2) second, use the corresponding full key k F Each mapped bucket is updated with an estimated size determined based on a random variance minimization technique.
The control plane provides a query front for any partial key: (1) first, a lookup table is constructed containing all full keys and their estimated sizes based on the sketch on the data plane. (2) Secondly, aggregating the full key set corresponding to each specific partial key to obtain the estimated size of the partial key.
Wherein "full bond" refers to a complete flow bond, such as quintuple, SrcIP, DstIP, or the like; "partial key" refers to a subset of full keys, e.g., SrcIP and DstIP are both quintuple partial keys, and any prefix of an IP address is a partial key of that IP address.
For the CocoSketch algorithm of the data plane, the invention designs a software version and a hardware version. The invention' technology two: increase hardware parallelism "and" technique three: eliminating loop dependency "is applied to hardware versions. The software version and the hardware version are independent of each other, the software version is applied to a software platform, and the hardware version is applied to a hardware platform.
The data structure of the software version, namely sketch, is composed of d arrays, and each array comprises l arrays<key,value>And (4) several pairs. One full key k is recorded per number pair (also called bucket, i.e. "bucket" as mentioned before) F And its estimate. Let B i [j]K and B i [j]V is the key value of the jth bucket (1. ltoreq. i.ltoreq.d, 1. ltoreq. j.ltoreq.l) in the ith array and its estimate. d arrays and d independent hash functions (h) respectively 1 (.),…,h d (.)) to be connected. Upon insertion, each incoming packet is represented as a pair of numbers (e, w), where e is the full key and w is its size. For packet (e, w), index h is based on hash 1 (e),…,h d (e) There are d possible buckets to be updated in the d arrays, and the bucket to be updated is selected based on the random variance minimization. Intuitively, there are two cases: (1) if a full key e is recorded in any of the d buckets, increasing the size of the bucket by w; (2) otherwise, the bucket with the smallest stored value is updated in a "random" manner. Assume the bucket with the smallest value is at k th In an array. To renew the bucket, first B k [h k (e)]V increases w. Then, with probability
Figure BDA0003088581470000051
Replacement of B by e k [h k (e)]K. If a plurality of buckets have the same minimum value, one bucket is randomly selected for updating. When inquiring, for the whole key e to be inquired, indexing h based on hash 1 (e),…,h d (e) Finding d corresponding stores in d arraysAnd (4) a barrel. If k is th Full key B stored in array corresponding storage bucket k [h k (e)]If K is e, then return to B k [h k (e)]V as an estimate of the full key e; otherwise, 0 is returned.
The main difference between the hardware version and the software version is that the insertion steps for each array are desirably independent of each other in hardware. The reason is that the architecture of network hardware (e.g., FPGAs) is typically designed massively parallel from logic devices, and algorithm design should consider increasing parallelism to better utilize resources. Specifically, the invention splits each bucket array in the software version into an independent evaluation value array and a full key array. Here, the present invention mainly considers the case where d is 1. When d is 1, the value of the mapped storage area is increased by w regardless of whether the recorded key is e. For each packet (e, w), each array will be updated independently by two steps: (1) b of bucket to be mapped to i [h i (e)]V increases w; (2) by probability
Figure BDA0003088581470000052
Replacement of B by e i [h i (e)]K. In this way, the insertion logic for each array may be independently distributed throughout the hardware to increase parallelism.
In the control plane, the present invention provides a front end to query the size of any partial key. First, a full key look-up table is constructed based on the sketch in the data plane, and each full key and its estimation value are recorded in the table. When the size of one partial key needs to be inquired, the full key set corresponding to the partial key is aggregated, so that the estimated size of the required partial key is obtained. In a hardware version, the same full key may appear in multiple arrays, with the average estimated size in the different arrays as its final estimated size.
Fig. 5 gives a specific insertion example of the data plane software version algorithm. As shown in fig. 5, let d be 2. To insert a full key of e 3 And a size of 4 packet, the software version first maps it to one bucket in each array. Due to e 3 Not recorded in any of the mapped buckets,so an attempt is made to find the bucket with the smallest estimate and the estimate in the bucket of the second mapping is found to be the smallest. Therefore, the estimate is first updated to 16 and then used as e with a probability of 4/16 3 Replacing e of memory area 2 . For all bonds as e 5 And a size 1 packet, which the software version maps to one bucket in each array. Of these two buckets, it is found that e is recorded in the first bucket 5 . Therefore, the corresponding value is increased by 1 (from 15 to 16).
Fig. 6 gives a concrete insertion example of the data plane hardware version algorithm. As shown in fig. 6, let d be 1. To insert a full key of e 4 And the data packet with the size of 4, the hardware version firstly directly adds 4 to the count value of the bucket mapped by the hash in the estimated value array, and then does not perform the replacement of the full key according to the probability of (13-4)/13 for the corresponding position in the full key array; for all bonds as e 1 And a size of 2, the hardware version first directly adds 2 to the count of the bucket to which the hash maps in the array of estimates, and then uses e with a probability of 2/15 1 Replacing the full key at the corresponding position in the full key array, except that the original stored full key at the position is e 1 Therefore, the effect of replacement is the same as that of non-replacement.
Fig. 7 gives an example of a control plane making a partial key query. As shown in fig. 7, assuming that the full key is SrcIP, the partial key to be queried is the first 8-bit prefix of SrcIP. First a measurement of the full key is obtained. The results are then summarized according to the first byte of the full key. There are two specific full keys prefixed by 19, and therefore their sizes are added, i.e. the estimated size of 19 prefix is 1041(520+ 521). Only one particular full key has a prefix of 56, so the estimated size of 56 prefix is equal to the size of the full key (856).
The CocoSketch-based high-performance arbitrary partial key measurement method can be used for measuring arbitrary partial keys in a network. The operator need only set a range of possible full keys (e.g., quintuple) in advance of the measurement, without specifying a specific measurement flow key. During measurement, according to the requirements of different measurement tasks, a specific measurement flow key (such as SrcIP or DstIP) is designated as a partial key on a control plane, and then corresponding measurement results can be obtained through aggregation. In particular, for example, in the task of security detection and diagnosis, we usually cannot determine in advance which flow keys need to be measured for security reasons, and if using the conventional measurement method, we need to keep track of all possible flow keys in detail, including quintuple, SrcIP, DstIP and any prefixes thereof, in order to locate the abnormal flow, which results in a large resource occupation and is not feasible in most cases. By using the measuring method provided by the invention, all the keys in the quintuple range can be obtained in the control plane only by setting the full key as the quintuple in advance, thereby efficiently completing the network measuring task.
Experimental data:
heavy Hitters assay
The recall ratio is as follows: regardless of the number of partial keys tracked, the recall rate of CocoSketch is typically greater than 95%. When 6 stream keys are measured, the accuracy of the CocoSketch of the invention is 28.4 percent higher and 51.6 percent higher than the accuracy of the existing Elastic Sketch and spaceSeave respectively.
The precision ratio is as follows: the accuracy of CocoSketch is typically above 90% and 57% higher than USS.
Average relative error: the average relative error of CocoSketch is usually less than 0.1, which is 3.1 times that of USS. When 6 partial keys are measured, the average relative error of CocoSketch is about 9.3 times that of other algorithms.
Heavy Changes detection
The recall ratio is as follows: regardless of the number of stream keys, the recall rate of CocoSketch is typically higher than 95%. When 6 streamkeys were measured, the recall of CocoSketch of the present invention was 78%, 69%, 27% and 87% higher than the current C-Heap, CM-Heap, Elastic Sketch and Univmon, respectively.
The precision ratio is as follows: the accuracy of the CocoSketch is typically higher than 90%. When 6 flow bonds were measured, the accuracy of CocoSketch was 69%, 51%, 5% and 81% higher than that of the existing C-Heap, CM-Heap, Elastic Sketch and Univmon, respectively.
Detection of Hierarchical Heavy Hitters (HHH for short)
One-dimensional HHH: the F1 Score of CocoSketch was higher than 99.5% under 1MB memory. The average relative error of CocoSketch is about 1282 times smaller than that of R-HHH.
Two-dimensional HHH: the F1 Score for CocoSketch was higher than 99.5% for 5MB of memory. The average relative error of CocoSketch is approximately 32539 times the average relative error of R-HHH.
Based on the same inventive concept, another embodiment of the present invention provides an electronic device (computer, server, smartphone, etc.) comprising a memory storing a computer program configured to be executed by the processor, and a processor, the computer program comprising instructions for performing the steps of the inventive method.
Based on the same inventive concept, another embodiment of the present invention provides a computer-readable storage medium (e.g., ROM/RAM, magnetic disk, optical disk) storing a computer program, which when executed by a computer, performs the steps of the inventive method.
The particular embodiments of the present invention disclosed above are illustrative only and are not intended to be limiting, since various alternatives, modifications, and variations will be apparent to those skilled in the art without departing from the spirit and scope of the invention. The invention should not be limited to the disclosure of the embodiments in the present specification, but the scope of the invention is defined by the appended claims.

Claims (7)

1. A high-performance arbitrary partial key measurement method based on Sketch is characterized by comprising the following steps:
extracting a full key and the size thereof from each data packet, and mapping the hash thereof to a bucket of each array in the sketch of the data plane;
updating each mapped bucket using a full key and determining an estimated size of the full key based on a random variance minimization technique;
constructing a lookup table containing all full keys and the estimated sizes thereof based on the sketch of the data plane;
when a partial key is inquired, aggregating a full key set corresponding to each partial key in a control plane to obtain the estimated size of the partial key;
the sketch consists of d arrays, and each array comprises l arrays<key,value>Pairs of numbers, each pair being called a bucket, recording a full key k F And its estimated value; let B i [j]K and B i [j]V is the key value of the jth bucket in the ith array and its estimated value, i is greater than or equal to 1 and less than or equal to d, j is greater than or equal to 1 and less than or equal to l; d arrays and d independent hash functions (h) respectively 1 (.),…,h d (.)) to be connected;
in the software version algorithm, the updating each mapped bucket using a full key and determining an estimated size of the full key based on a random variance minimization technique includes:
when inserting a packet, representing each incoming packet as a pair of numbers (e, w), where e is the full key and w is its size; for packet (e, w), index h is based on hash 1 (e),…,h d (e) There are d possible buckets to be updated in the d arrays, and the bucket to be updated is selected based on the random variance minimization, in two cases:
(1) if a full key e is recorded in any of the d buckets, increasing the size of the bucket by w;
(2) otherwise, updating the bucket with the minimum storage value in a random mode;
for a hardware version algorithm, the updating each mapped bucket using a full key and determining an estimated size of the full key based on a random variance minimization technique, comprising:
splitting each storage bucket array into an independent estimation value array and a full key array, wherein updating of each array is operated in parallel or in a pipeline mode so as to increase hardware parallelism and improve throughput;
for each packet (e, w), each array will be updated independently by the following two steps:
(1) b of bucket to be mapped to i [h i (e)]B increasing w;
(2) by probability
Figure FDA0003645280270000011
Replacement of B by e i [h i (e)].K。
2. The method of claim 1, wherein randomly updating the buckets with the smallest values comprises:
assume the bucket with the smallest value is at k th In the array, to update the bucket, first B k [h k (e)]V increases w, then with probability
Figure FDA0003645280270000012
Replacement of B by e k [h k (e)].K;
If a plurality of buckets have the same minimum value, one bucket is randomly selected for updating.
3. The method of claim 2, wherein the estimate of the full key is queried in the following manner:
indexing h based on hash for a full key e to be queried 1 (e),…,h d (e) D corresponding buckets are found in the d arrays;
if k is th Full key B stored in array corresponding storage bucket k [h k (e)]If K is e, then return to B k [h k (e)]V as an estimate of the full key e; otherwise, 0 is returned.
4. The method of claim 1, wherein if the same full key appears in multiple arrays, the average estimated size in the different arrays is taken as its final estimated size.
5. Sketch-based high-performance arbitrary partial key measurement system using the method of any one of claims 1 to 4, comprising:
the data plane component is used for extracting a full key and the size of the full key from each data packet and mapping the hash of the full key to a bucket of each array in the sketch; updating each mapped bucket using a full key and determining an estimated size of the full key based on a random variance minimization technique;
the control plane component is used for constructing a lookup table containing all full keys and the estimated sizes of the full keys based on the sketch of the data plane; and when the partial keys are inquired, aggregating the full key set corresponding to each partial key to obtain the estimated sizes of the partial keys.
6. An electronic apparatus, comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for performing the method of any of claims 1 to 4.
7.A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a computer, implements the method of any one of claims 1 to 4.
CN202110588731.2A 2021-05-28 2021-05-28 Sketch-based high-performance arbitrary partial key measurement method and system Active CN113297430B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110588731.2A CN113297430B (en) 2021-05-28 2021-05-28 Sketch-based high-performance arbitrary partial key measurement method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110588731.2A CN113297430B (en) 2021-05-28 2021-05-28 Sketch-based high-performance arbitrary partial key measurement method and system

Publications (2)

Publication Number Publication Date
CN113297430A CN113297430A (en) 2021-08-24
CN113297430B true CN113297430B (en) 2022-08-05

Family

ID=77325774

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110588731.2A Active CN113297430B (en) 2021-05-28 2021-05-28 Sketch-based high-performance arbitrary partial key measurement method and system

Country Status (1)

Country Link
CN (1) CN113297430B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115484157B (en) * 2022-09-14 2023-06-02 浙江大学 General configuration method of sketch based on programmable switch

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103647670A (en) * 2013-12-20 2014-03-19 北京理工大学 Sketch based data center network flow analysis method
CN107798042A (en) * 2016-08-29 2018-03-13 北京大学 A kind of data processing method and Frequency estimation method based on two-layer configuration outside piece inner sheet
CN108304404A (en) * 2017-01-12 2018-07-20 北京大学 A kind of data frequency method of estimation based on improved Sketch structures
CN110868332A (en) * 2019-10-25 2020-03-06 电子科技大学 SDN-based network-level flow measurement method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10515064B2 (en) * 2016-07-11 2019-12-24 Microsoft Technology Licensing, Llc Key-value storage system including a resource-efficient index
CN110071934B (en) * 2019-04-30 2021-03-26 中国人民解放军国防科技大学 Local sensitivity counting abstract method and system for network anomaly detection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103647670A (en) * 2013-12-20 2014-03-19 北京理工大学 Sketch based data center network flow analysis method
CN107798042A (en) * 2016-08-29 2018-03-13 北京大学 A kind of data processing method and Frequency estimation method based on two-layer configuration outside piece inner sheet
CN108304404A (en) * 2017-01-12 2018-07-20 北京大学 A kind of data frequency method of estimation based on improved Sketch structures
CN110868332A (en) * 2019-10-25 2020-03-06 电子科技大学 SDN-based network-level flow measurement method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《Data Sketches for Disaggregated Subset Sum and Frequent Item Estimation》;Ting, D 等;《SIGMOD"18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA》;20181231;全文 *
《基于概要数据结构的全网络持续流检测方法》;周爱平 等;《计算机应用》;20190810;全文 *

Also Published As

Publication number Publication date
CN113297430A (en) 2021-08-24

Similar Documents

Publication Publication Date Title
US8484439B1 (en) Scalable hash tables
US8438574B1 (en) Generating monotone hash preferences
US6526055B1 (en) Method and apparatus for longest prefix address lookup
CN106462620B (en) Distance query over a macro network
JP6148732B2 (en) Data indexing method and apparatus
US6098157A (en) Method for storing and updating information describing data traffic on a network
CN110309336A (en) Image search method, device, system, server and storage medium
CN108197296A (en) Date storage method based on Elasticsearch indexes
CN108460030B (en) Set element judgment method based on improved bloom filter
US8064359B2 (en) System and method for spatially consistent sampling of flow records at constrained, content-dependent rates
CN113297430B (en) Sketch-based high-performance arbitrary partial key measurement method and system
CN116095029A (en) Network data stream measuring method, system, terminal and storage medium
CN108923962B (en) Local network topology measurement task selection method based on semi-supervised clustering
CN105701128A (en) Query statement optimization method and apparatus
CN108241639B (en) A kind of data duplicate removal method
Qian et al. A fast and anti-matchability matching algorithm for content-based publish/subscribe systems
WO2014035934A2 (en) Compressed set representation for sets as measures in olap cubes
Li et al. Scalable packet classification using bit vector aggregating and folding
CN110879819A (en) Method, device, server and storage medium for quickly and accurately identifying routing information
Zhao et al. Hermes: An optimization of hyperloglog counting in real-time data processing
US8595239B1 (en) Minimally disruptive hash table
CN114884893A (en) Forwarding and control definable cooperative traffic scheduling method and system
CN108153883B (en) Search method and apparatus, computer device, program product, and storage medium
Xie et al. Towards Capacity-Adjustable and Scalable Quotient Filter Design for Packet Classification in Software-Defined Networks
CN113328947B (en) Variable-length route searching method and device based on application of controllable prefix extension bloom filter

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant