WO2023173343A1 - Device and method for multiflow quantiles extraction and reconstruction - Google Patents

Device and method for multiflow quantiles extraction and reconstruction Download PDF

Info

Publication number
WO2023173343A1
WO2023173343A1 PCT/CN2022/081296 CN2022081296W WO2023173343A1 WO 2023173343 A1 WO2023173343 A1 WO 2023173343A1 CN 2022081296 W CN2022081296 W CN 2022081296W WO 2023173343 A1 WO2023173343 A1 WO 2023173343A1
Authority
WO
WIPO (PCT)
Prior art keywords
flow
per
bin
data structure
identifier
Prior art date
Application number
PCT/CN2022/081296
Other languages
French (fr)
Inventor
Massimo Gallo
Gwendal Simon
Lihua MIAO
Zixue Bi
Hao Chen
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to PCT/CN2022/081296 priority Critical patent/WO2023173343A1/en
Publication of WO2023173343A1 publication Critical patent/WO2023173343A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/17Function evaluation by approximation methods, e.g. inter- or extrapolation, smoothing, least mean square method
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/02Capturing of monitoring data
    • H04L43/026Capturing of monitoring data using flow identification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation
    • H04L43/045Processing captured monitoring data, e.g. for logfile generation for graphical visualisation of monitoring data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/06Generation of reports
    • H04L43/065Generation of reports related to network devices
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/06Generation of reports
    • H04L43/067Generation of reports using time frame reporting

Definitions

  • the present disclosure relates to communication networks, and particularly to network telemetry in the communication networks.
  • the disclosure proposes a network device for extracting quantiles of a set of one or more flows, and a control plane entity for recovering quantiles of the set of one or more flows, and corresponding methods.
  • FIG. 1 (a) shows per-flow latency measurements over time belonging to three different flows.
  • a network device receives packets that belong to multiple flows and that are associated with measurement, for example, latency.
  • An Operator managing the network is typically interested in keeping track of per-flow distributions, as shown in FIG. 1 (b) , which can be derived from q-quantiles or ranks.
  • the typical approach is to maintain aggregated information or only keep track of a subset of flows.
  • the input of the problem is a new measurement value v belonging to an ordered set V of all the measurements taken so far.
  • the goal is to output the rank r (v) of this value in V, where r (v) is defined as the number of values in V that are smaller than v.
  • the problem which is more directly connected with the probability (or cumulative) density function, is: given quantile q in input, the goal is to extract x q , where x q is q-quantile of the multi-set of values, i.e., measurements where q-quantile with q ⁇ [0, 1] is the value x q for which q%of the elements belonging to V are smaller than x q .
  • this disclosure has the objective to enable extracting and recovering quantiles of multiple flows under resource constraints. Another objective is to enable an approximate per-flow histograms reconstruction.
  • a first aspect of the disclosure provides a network device for extracting quantiles of a set of one or more flows, wherein the network device is configured to, for each packet belonging to a flow of the set of flows: extract a flow identifier and a measurement value of the flow, wherein a set of measurement values for a single flow is stored in a plurality of bins; derive an index of a bin corresponding to the extracted measurement value; generate a per-flow bin identifier based on the flow identifier and the index of the bin; update a per-flow bin entry that corresponds to the per-flow bin identifier and that is stored in a first data structure by increasing a value of the per-flow bin entry by 1, wherein the per-flow bin entry indicates a number of appearances of the per-flow bin identifier, wherein the first data structure is adapted to store a set of per-flow bin entries corresponding to a set of per-flow bin identifiers of each flow of the set of the flows; and store the per-flow bin identifie
  • This disclosure proposes a solution that maintains data structures occupying fixed memory for approximate per-flow histograms reconstruction.
  • a combination of flow identifiers and indices of bins are used when storing per-flow histograms’ counters in the data structures, such as probabilistic data structures.
  • any histogram-based technique can be used to keep track of individual flow’s quantile with a maximum number of bins S.
  • this disclosure instead of maintaining a single histogram per flow, only the relevant bins are stored in the data structures.
  • the network device is further configured to obtain the set of measurement values of the flow; and divide the set of measurement values into the plurality of bins using a per-flow histogram algorithm.
  • the measurements are divided into bins whose size is dimensioned, for instance, according to the desired accuracy.
  • the network device is further configured to maintain a flow table; and store the flow identifier of the flow in the flow table.
  • the network device is further configured to periodically report at least one of the first data structure, the second data structure, and the flow table, to a control plane entity.
  • the first data structure and the second data structure may be periodically transferred together with the flow table to the control plane (or collection point) .
  • the first data structure is a hash table or a sketch
  • the second data structure is a filter, a dictionary, or a hash table that indicates whether the per-flow bin entry stored in the first data structure is greater than 0.
  • At least one of the first data structure and the second data structure are probabilistic data structures.
  • the first data structure or the second data structure may also be realized using a more precise data structure such as a finite size hash table, e.g., a cuckoo hash table.
  • the first data structure comprises a count-min sketch
  • the second data structure comprises a Bloom filter
  • the network device is further configured to generate the per-flow bin identifier for the bin by performing a hash function on the flow identifier and the index of the bin.
  • a second aspect of the disclosure provides a control plane entity for recovering quantiles of a set of one or more flows, wherein the control plane entity is configured to, for each flow of the set of flows: obtain a per-flow bin identifier of each bin of a plurality of bins, wherein the plurality of bins is adapted to store a set of measurement values of the flow; determine, for each bin, whether a per-flow bin entry that corresponds to the per-flow bin identifier, has a value greater than 0, by using a second data structure and the per-flow bin identifier, wherein the first data structure is adapted to store a set of per-flow bin entries corresponding to a set of per-flow bin identifiers of each flow of the set of the flows; and retrieve the value of the per-flow bin entry from the first data structure, if the value is greater than 0.
  • control plane entity for quantile reconstruction with limited memory.
  • the control plane entity checks if the bin of the flow corresponding to the obtained per-flow bin identifier has a value greater than 0 by interrogating the second data structure. If the result of the second data structure interrogation is negative, the value of the per-flow bin entry is considered to be 0. If the result of the second data structure interrogation is positive, the value of the per-flow bin entry is retrieved from the first data structure.
  • control plane entity is further configured to compute the quantiles according to a per-flow histogram algorithm.
  • control plane entity is further configured to obtain the per-flow bin identifier of each bin using a flow table.
  • control plane entity is further configured to periodically obtain at least one of the first data structure, the second data structure, and the flow table, from a network entity.
  • a third aspect of the disclosure provides a method for a network device for extracting quantiles of a set of one or more flows, wherein the method comprises, for each packet belonging to a flow of the set of flows: extracting a flow identifier and a measurement value of the flow, wherein a set of measurement values for a single flow is stored in a plurality of bins; deriving an index of a bin corresponding to the extracted measurement value; generating a per-flow bin identifier for the bin based on the flow identifier and the index of the bin; updating a per-flow bin entry that corresponds to the per-flow bin identifier and that is stored in a first data structure by increasing a value of the per-flow bin entry by 1, wherein the per-flow bin entry indicates a number of appearances of the per-flow bin identifier, wherein the first data structure is adapted to store a set of per-flow bin entries corresponding to a set of per-flow bin identifiers of each flow of the set of the flows; and
  • Implementation forms of the method of the third aspect may correspond to the implementation forms of the network device of the first aspect described above.
  • the method of the third aspect and its implementation forms achieve the same advantages and effects as described above for the network device of the first aspect and its implementation forms.
  • a fourth aspect of the disclosure provides a method for a control plane entity for recovering quantiles of one or more flows, wherein the method comprises, for each flow of the set of flows: obtaining a per-flow bin identifier of each bin of a plurality of bins, wherein the plurality of bins is adapted to store a set of measurement values of the flow; determining, for each bin, whether a per-flow bin entry that corresponds to the per-flow bin identifier, has a value greater than 0, by using a second data structure and the per-flow bin identifier, wherein the first data structure is adapted to store a set of per-flow bin entries corresponding to a set of per-flow bin identifiers of each flow of the set of the flows; and retrieving the value of the per-flow bin entry from the first data structure, if the value is greater than 0.
  • Implementation forms of the method of the fourth aspect may correspond to the implementation forms of the control plane entity of the second aspect described above.
  • the method of the fourth aspect and its implementation forms achieve the same advantages and effects as described above for the control plane entity of the second aspect and its implementation forms.
  • a fifth aspect of the disclosure provides a computer program product comprising a program code for carrying out, when implemented on a processor, the method according to the third aspect and any implementation forms of the third aspect, or the fourth aspect and any implementation forms of the fourth aspect.
  • a sixth aspect of the disclosure provides a computer-readable medium comprising instructions which, when executed by a computer, cause the computer to carry out, the method according to the third aspect and any implementation forms of the third aspect, or the fourth aspect and any implementation forms of the fourth aspect.
  • FIG. 1 (a) shows per-flow latency measurements over time of three different flows
  • FIG. 2 (a) shows an example of a conventional compactor
  • FIG. 3 shows an example of a single flow histogram
  • FIG. 4 shows a network device according to an embodiment of this disclosure
  • FIG. 5 shows a network device according to an embodiment of this disclosure
  • FIG. 6 shows a control plane entity according to an embodiment of this disclosure
  • FIG. 7 shows a control plane entity according to an embodiment of this disclosure
  • FIG. 8 shows a system according to an embodiment of this disclosure
  • FIG. 9 shows data structures update in a network device according to an embodiment of this disclosure.
  • FIG. 10 shows a method according to an embodiment of this disclosure.
  • FIG. 11 shows a method according to an embodiment of this disclosure.
  • an embodiment or example may refer to other embodiments or examples.
  • any description including but not limited to terminology, element, process, explanation, and/or technical advantage mentioned in one embodiment or example is applicative to the other embodiments or examples.
  • the first approach consists in using a special data structure called a compactor.
  • a compactor is a simple array of values v ⁇ V of size k, where all values are associated with a common weight w.
  • a compact operation on such data structure consists in first sorting the elements and then discarding either the even or the odd ones in the sequence so that the array remains with k/2 elements occupied. The weight of the elements that remain in the array is finally doubled and becomes 2w.
  • FIG. 2 (a) shows a simple example of the first approach.
  • a compactor-based algorithm uses compactors to be able to appropriately approximate ranks r (v) .
  • the required memory may be as given by Equation 1:
  • the weights associated with the different compactors are exponentially decreasing and bigger or equal to 2.
  • An example of such data structure is shown in FIG. 2 (b) .
  • the second approach histogram-based, consists in dividing the measurement space into bins whose size is dimensioned according to desired accuracy and the maximum measurement value, i.e., max (v) when known a priori.
  • DDSketch it proposes to associate any value to a bin and use a logarithmic function such as Equation 2, which takes a value v and returns a unique positive bin index.
  • This algorithm consists in maintaining per-bin counters indicating the number of values that fall in a given bin.
  • bin counters are iteratively summed up, for instance starting from the smallest bin, until the bin i for which the sum is greater than where
  • the estimated q-quantile is a value in the last processed bin i defined by Equation 2 with a relative error defined in Equation 3.
  • Equations 2, 3, and 4 With simple math, if it is interested in measurements with known a priori upper bounds, i.e., Equations 2, 3, and 4 can be reduced to Equations 5, 6, and 7 as follows:
  • FIG. 3 represents an example of a single flow histogram where bins of increasing sizes are used to store information related to the distribution of measurement values of Flow1.
  • This disclosure thus exploits the histogram-based and approximate counting solutions to overcome the limitations of the above-mentioned conventional solutions.
  • FIG. 4 shows a network device 400 for extracting quantiles of a set of one or more flows, according to an embodiment of the disclosure.
  • the network device 400 may comprise processing circuitry (not shown) configured to perform, conduct or initiate the various operations of the network device 400 described herein.
  • the processing circuitry may comprise hardware and software.
  • the hardware may comprise analog circuitry or digital circuitry, or both analog and digital circuitry.
  • the digital circuitry may comprise components such as application-specific integrated circuits (ASICs) , field-programmable arrays (FPGAs) , digital signal processors (DSPs) , or multi-purpose processors.
  • the network device 400 may further comprise memory circuitry, which stores one or more instruction (s) that can be executed by the processor or by the processing circuitry, in particular under control of the software.
  • the memory circuitry may comprise a non-transitory storage medium storing executable software code which, when executed by the processor or the processing circuitry, causes the various operations of the network device 400 to be performed.
  • the processing circuitry comprises one or more processors and a non-transitory memory connected to the one or more processors.
  • the non-transitory memory may carry executable program code which, when executed by the one or more processors, causes the network device 400 to perform, conduct or initiate the operations or methods described herein.
  • the network device 400 is configured to extract a flow identifier 401 and a measurement value 402 of the flow. Notably, a set of measurement values for a single flow is stored in a plurality of bins (e.g., as shown in FIG. 3) .
  • the network device 400 is further configured to derive an index 403 of a bin corresponding to the extracted measurement value 402. Then, the network device 400 is configured to generate a per-flow bin identifier 404 based on the flow identifier 401 and the index 403 of the bin.
  • the network device 400 is configured to update a per-flow bin entry that corresponds to the per-flow bin identifier 405 and that is stored in a first data structure 410 by increasing the value of the per-flow bin entry by 1.
  • the per-flow bin entry indicates a number of appearances of the per-flow bin identifier 405.
  • the first data structure 410 is adapted to store a set of per-flow bin entries corresponding to a set of per-flow bin identifiers of each flow of the set of the flows.
  • the network device 400 is further configured to store the per-flow bin identifier 404 in a second data structure 420.
  • the second data structure 420 is adapted to store the set of per-flow bin identifiers of each flow of the set of the flows, which the corresponding per-flow bin entries stored in the first data structure 410 are greater than 0.
  • the present disclosure introduces a network device 400 that is enabled to extract quantiles of multiple flows with limited memory.
  • this disclosure allows the network device 400 to only store the relevant bins in the system, instead of maintaining a single histogram per flow.
  • the network device 400 may be configured to obtain the set of measurement values of the flow; and divide the set of measurement values into the plurality of bins using a per-flow histogram algorithm.
  • This disclosure describes a system that maintains data structures occupying fixed memory for approximate per-flow histograms reconstruction.
  • any histogram-based technique can be used to keep track of individual flow’s quantile with a maximum number of bins S. Instead of maintaining a single histogram per flow, only the relevant bins are stored in the system.
  • FIG. 5 shows a detailed network device 400 according to an embodiment of the disclosure.
  • the network device 400 shown in FIG. 5 is based on the network device 400 shown FIG. 4.
  • the network device 400 extracts the flow identifier (i.e., the 5-tuple of the flow) , and saves existing flows in a flow table.
  • a flow is identified by its 5-tuple, which contains the complete source and destination addresses, source and destination ports, and the transport layer protocol identifier.
  • the network device 400 may be further configured to maintain a flow table; and store the flow identifier 401 of the flow in the flow table.
  • the network device 400 may further derive the index 403 of the bin according to a per-flow histogram-based sketch for quantiles extraction.
  • the network device 400 may use a first probabilistic counting data structure C, i.e., the first data structure 410 as shown in FIG. 4, for counting the number of values in each bin.
  • a combination of the bin index 403 and the flow identifier 401 is used to obtain a per-flow bin identifier 405.
  • the network device 400 may further use a second probabilistic data structure B, i.e., the second data structure 420 as shown in FIG. 4, to keep track of the per-flow bin contained in the counting data structure, i.e., to store the set of per-flow bin identifiers, which the corresponding per flow-bin counter has a value greater than 0.
  • a second probabilistic data structure B i.e., the second data structure 420 as shown in FIG. 4
  • the first data structure 410 may be a hash table or a sketch.
  • the first data structure 410 may comprise a count-min sketch.
  • the second data structure 420 may be a filter, a dictionary, or a hash table that indicates whether the per-flow bin entry 405 stored in the first data structure 410 is greater than 0.
  • the second data structure 420 may comprise a Bloom filter.
  • the per-flow bin identifier 404 for the bin may be generated by performing a hash function on the flow identifier 401 and the index 403 of the bin.
  • the first data structure 410 and the second data structure 420 are periodically reported to a telemetry collection point (either local or remote control plane) .
  • a telemetry collection point either local or remote control plane
  • the network device may be configured to periodically report at least one of the first data structure 410, the second data structure 420, and the flow table, to a control plane entity 600.
  • FIG. 6 shows a control plane entity 600 for recovering quantiles of a set of one or more flows, according to an embodiment of this disclosure.
  • the control plane entity 600 may comprise processing circuitry (not shown) configured to perform, conduct or initiate the various operations of the control plane entity 600 described herein.
  • the processing circuitry may comprise hardware and software.
  • the hardware may comprise analog circuitry or digital circuitry, or both analog and digital circuitry.
  • the digital circuitry may comprise components such as ASICs, FPGAs, DSPs, or multi-purpose processors.
  • the control plane entity 600 may further comprise memory circuitry, which stores one or more instruction (s) that can be executed by the processor or by the processing circuitry, in particular under control of the software.
  • the memory circuitry may comprise a non-transitory storage medium storing executable software code which, when executed by the processor or the processing circuitry, causes the various operations of the control plane entity 600 to be performed.
  • the processing circuitry comprises one or more processors and a non-transitory memory connected to the one or more processors.
  • the non-transitory memory may carry executable program code which, when executed by the one or more processors, causes the control plane entity 600 to perform, conduct or initiate the operations or methods described herein.
  • the control plane entity 600 is configured to, for each flow of the set of flows, obtain a per-flow bin identifier 404 of each bin of a plurality of bins. Notably, the plurality of bins is adapted to store a set of measurement values of the flow. The control plane entity 600 is further configured to determine, for each bin, whether a per-flow bin entry 405 that corresponds to the per-flow bin identifier 404 , has a value greater than 0, by using a second data structure 420 and the per-flow bin identifier 404. In particular, the first data structure 410 is adapted to store a set of per-flow bin entries corresponding to a set of per-flow bin identifiers of each flow of the set of the flows. Further, the control plane entity 600 is configured to retrieve the value of the per-flow bin entry 405 from the first data structure 410, if the value is greater than 0.
  • This disclosure further introduces a control plane entity 600 capable of reconstructing quantiles of multiple flows with limited memory.
  • control plane entity 600 may be configured to compute the quantiles according to a per-flow histogram algorithm.
  • control plane entity 600 may be further configured to obtain the per-flow bin identifier 404 of each bin using a flow table.
  • control plane entity 600 may be further configured to periodically obtain at least one of the first data structure 410, the second data structure 420, and the flow table, from a network device 400.
  • the network device 400 is the network device as shown in FIG. 4.
  • FIG. 7 shows a detailed control plane entity 600 according to an embodiment of the disclosure.
  • the control plane entity 600 shown in FIG. 7 is based on the control plane entity 600 shown FIG. 6.
  • the control plane entity 600 may be a telemetry collection point in the network.
  • the network device 400 combines the bin index i and the flow identifier to obtain a per-flow bin identifier 404.
  • the control plane entity 600 first obtains the per-flow bin identifier 404 and checks whether the bin of the corresponding flow has a value greater than 0 by interrogating the probabilistic data structure B, i.e., the second data structure 420.
  • the counter value corresponding to the per-flow bin identifier 404 is considered equal to 0. If the result of the probabilistic data structures interrogation is positive, the counter value corresponding to the per-flow bin identifier 404 is retrieved from the probabilistic counting data structure C, i.e., the first data structure 410.
  • the combined flow, and indices of bins may be obtained by k pairwise independent hash functions.
  • the probabilistic data structures B i.e., the second data structure 420, may correspond to a Bloom filter dimensioned for a low false-positive rate.
  • Said Bloom filter may be represented as:
  • k is the number of hash functions
  • m is the size of the Bloom filter in bits
  • n is the number of elements present in the filter, i.e., pairs of flow, indices of bins with at least one element.
  • the probabilistic data structures C i.e., the first data structure 410
  • the probabilistic data structures C may correspond to a Count-min sketch (or a more precise conventional sketch) with error guarantee:
  • c i and are exact and approximate counts of element i, respectively k is the number of hash functions
  • d is the size of the sketch, i.e., the total number of counters (d/k per each hash function)
  • n is the number of elements present in the sketch.
  • FIG. 8 shows a system according to an embodiment of this disclosure.
  • the system comprises a network device 400 as shown in FIG. 4 or FIG. 5, and a control plane entity 600 as shown in FIG. 6 or FIG. 7.
  • FIG. 8 presents a high-level view of the construction phase in the data plane (in the network device 400) and the recovery phases in the control plane or telemetry collection point (in the control plane entity 600) .
  • control plane i.e., the control plane entity 600
  • the control plane entity 600 can retrieve the approximate version of the flow histograms (e.g., as shown in FIG. 7) .
  • FIG. 9 present a concrete example of data structures update according to an embodiment of this disclosure.
  • the first data structure 410 or the second data structure 420 may also be realized substituting using a more precise data structure such as a finite size hash table, e.g., a cuckoo hash table.
  • this disclosure proposes a new multi-flow quantiles extraction mechanism to be implemented in constrained devices, i.e., a network device, or a control plane device, using limited memory for quantiles extraction and reconstruction.
  • constrained devices i.e., a network device, or a control plane device
  • a combination of a flow identifier and a bin index is used for storing per-flow histograms’ counters in a data structure, particularly a probabilistic data structure.
  • the constrained devices are allowed to keep track of per-flow bin counters with a positive value in another probabilistic data structure.
  • This disclosure also proposes to periodically transfer the probabilistic data structures together with the flow table to the control plane (or collection point) .
  • the control plane device is thus able to use the two probabilistic data structures and the flow table to retrieve per-flow histograms for q-quantile computation.
  • FIG. 10 shows a method 1000 according to an embodiment of the disclosure, particularly for extracting quantiles of a set of one or more flows.
  • the method 1000 is performed by the network device 400 shown in FIG. 4.
  • the method 1000 comprises a step 1001 of extracting a flow identifier 401 and a measurement value 402 of the flow.
  • a set of measurement values for a single flow is stored in a plurality of bins.
  • the method 1000 further comprises a step 1002 of deriving an index 403 of a bin corresponding to the extracted measurement value 402.
  • the method 1000 further comprises a step 1003 of generating a per-flow bin identifier 404 for the bin based on the flow identifier 401 and the index 403 of the bin.
  • the method 1000 comprises a step 1004 of updating a per-flow bin entry 405 that corresponds to the per-flow bin identifier 404 and that is stored in a first data structure 410 by increasing a value of the per-flow bin entry 405 by 1.
  • the per-flow bin entry 405 indicates a number of appearances of the per-flow bin identifier 404.
  • the first data structure 410 is adapted to store a set of per-flow bin entries corresponding to a set of per-flow bin identifiers of each flow of the set of the flows.
  • the method 1000 further comprises a step 1005 of storing the per-flow bin identifier 404 in a second data structure 420.
  • the second data structure 420 is adapted to store the set of per-flow bin identifiers of each flow of the set of the flows, which the corresponding per-flow bin entries stored in the first data structure 410 are greater than 0.
  • the method 1000 may comprise a step of periodically reporting at least one of the first data structure 410, the second data structure 420, and the flow table, to a control plane entity 600.
  • the control plane entity 600 is the control plane entity shown in FIG. 6.
  • FIG. 11 shows a method 1100 according to an embodiment of the disclosure, particularly for recovering quantiles of a set of one or more flows.
  • the method 1100 is performed by a control plane entity 600 shown in FIG. 6.
  • the method 1100 comprises a step 1101 of obtaining a per-flow bin identifier 404 of each bin of a plurality of bins.
  • the plurality of bins is adapted to store a set of measurement values of the flow.
  • the method 1100 comprises a step 1102 of determining, for each bin, whether a per-flow bin entry 405 that corresponds to the per-flow bin identifier 404, has a value greater than 0, by using a second data structure 420 and the per-flow bin identifier 404.
  • the first data structure 410 is adapted to store a set of per-flow bin entries corresponding to a set of per-flow bin identifiers of each flow of the set of the flows.
  • the method 1100 comprises a step 1103 of retrieving the value of the per-flow bin entry 405 from the first data structure 410, if the value is greater than 0.
  • the method 1100 may comprise a step of periodically obtaining at least one of the first data structure 410, the second data structure 420, and the flow table, form a network device 400.
  • the network device 400 may be the network device shown in FIG. 4.
  • any method according to embodiments of the disclosure may be implemented in a computer program, having code means, which when run by processing means causes the processing means to execute the steps of the method.
  • the computer program is included in a computer-readable medium of a computer program product.
  • the computer-readable medium may comprise essentially any memory, such as a ROM (Read-Only Memory) , a PROM (Programmable Read-Only Memory) , an EPROM (Erasable PROM) , a Flash memory, an EEPROM (Electrically Erasable PROM) , or a hard disk drive.
  • embodiments of the network device 400, or the control plane entity 600 comprises the necessary communication capabilities in the form of e.g., functions, means, units, elements, etc., for performing the solution.
  • means, units, elements and functions are: processors, memory, buffers, control logic, encoders, decoders, rate matchers, de-rate matchers, mapping units, multipliers, decision units, selecting units, switches, interleavers, de-interleavers, modulators, demodulators, inputs, outputs, antennas, amplifiers, receiver units, transmitter units, DSPs, trellis-coded modulation (TCM) encoder, TCM decoder, power supply units, power feeders, communication interfaces, communication protocols, etc. which are suitably arranged together for performing the solution.
  • TCM trellis-coded modulation
  • the processor (s) of the network device 400, or the control plane entity 600 may comprise, e.g., one or more instances of a Central Processing Unit (CPU) , a processing unit, a processing circuit, a processor, an Application Specific Integrated Circuit (ASIC) , a microprocessor, or other processing logic that may interpret and execute instructions.
  • the expression “processor” may thus represent a processing circuitry comprising a plurality of processing circuits, such as, e.g., any, some or all of the ones mentioned above.
  • the processing circuitry may further perform data processing functions for inputting, outputting, and processing of data comprising data buffering and device control functions, such as call processing control, user interface control, or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • General Engineering & Computer Science (AREA)
  • Algebra (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Operations Research (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

A network device (400) for extracting quantiles of a set of one or more flows. The network device (400) is configured to, for each packet belonging to a flow of the set of flows: extract a flow identifier (401) and a measurement value (402) of the flow; derive an index (403) of a bin corresponding to the extracted measurement value (402); generate a per-flow bin identifier (404) based on the flow identifier (401) and the index (403) of the bin; update a per-flow bin entry (405) that corresponds to the per-flow bin identifier (404) and that is stored in a first data structure (410) by increasing a value of the per-flow bin entry (405) by 1; and store the per-flow bin identifier (404) in a second data structure (420). A control plane entity (600) for reconstructing quantiles of a set of one or more flows.

Description

DEVICE AND METHOD FOR MULTIFLOW QUANTILES EXTRACTION AND RECONSTRUCTION TECHNICAL FIELD
The present disclosure relates to communication networks, and particularly to network telemetry in the communication networks. In order to provide a solution to the problem of multi-flow quantile extraction and reconstruction with limited memory overhead, the disclosure proposes a network device for extracting quantiles of a set of one or more flows, and a control plane entity for recovering quantiles of the set of one or more flows, and corresponding methods.
BACKGROUND
In a nutshell, existing solutions do not take into account multiple flows but focus on the optimization of the single flow use-case. In particular, there exist two alternative approaches to keep track of the distribution of a given per-flow measurement: compactor-based approach, which allows providing the rank of a given measurement value, or histogram-based approach, which allows to directly retrieve q-quantiles. These two solutions present advantages and disadvantages from the point of view of error guarantee, memory occupancy, and processing overhead (or feasibility in the case of the highly constrained device, i.e., high-speed switches or routers) .
Generally speaking, extracting complex per-flow statistics from telemetry is hard in modern systems. Despite the enhanced programmability of network devices, the number of operations and the amount of memory available in the data plane are still limited. Furthermore, sampling and packet mirroring to external analyzers limit the accuracy of the derived statistics.
To better describe the related multiflow problem, an example is shown in FIG. 1. FIG. 1 (a) shows per-flow latency measurements over time belonging to three different flows. A network device receives packets that belong to multiple flows and that are associated with measurement, for example, latency. An Operator managing the network is typically  interested in keeping track of per-flow distributions, as shown in FIG. 1 (b) , which can be derived from q-quantiles or ranks. In order to reduce the processing and memory overhead imposed on network devices, the typical approach is to maintain aggregated information or only keep track of a subset of flows.
To keep track of the distribution, i.e., q-quantile for any q in [0, 1] of per-flow measurements such as packet size, latency, etc., concepts related to the single-flow q-quantiles need to be defined. The problem can be defined in two equivalent ways. In one concept, the input of the problem is a new measurement value v belonging to an ordered set V of all the measurements taken so far. The goal is to output the rank r (v) of this value in V, where r (v) is defined as the number of values in V that are smaller than v. In another concept the problem, which is more directly connected with the probability (or cumulative) density function, is: given quantile q in input, the goal is to extract x q, where x q is q-quantile of the multi-set of values, i.e., measurements where q-quantile with q∈ [0, 1] is the value x q for which q%of the elements belonging to V are smaller than x q.
In both cases, when considering the implementation of such algorithms in high-speed network devices such as routers and switches, the constraint is twofold. First, the memory resource to store the values is limited. Second, the processing capabilities of the machine are limited, both when inserting a new value and when extracting the q-quantile or rank r (v) .
Therefore, a solution for extracting multi-flow quantiles under resource constraints for any size of streams is desired.
SUMMARY
In view of the above, this disclosure has the objective to enable extracting and recovering quantiles of multiple flows under resource constraints. Another objective is to enable an approximate per-flow histograms reconstruction.
These and other objectives are achieved by the solution of the present disclosure as provided in the enclosed independent claims. Advantageous implementations are further defined in the dependent claims.
A first aspect of the disclosure provides a network device for extracting quantiles of a set of one or more flows, wherein the network device is configured to, for each packet belonging to a flow of the set of flows: extract a flow identifier and a measurement value of the flow, wherein a set of measurement values for a single flow is stored in a plurality of bins; derive an index of a bin corresponding to the extracted measurement value; generate a per-flow bin identifier based on the flow identifier and the index of the bin; update a per-flow bin entry that corresponds to the per-flow bin identifier and that is stored in a first data structure by increasing a value of the per-flow bin entry by 1, wherein the per-flow bin entry indicates a number of appearances of the per-flow bin identifier, wherein the first data structure is adapted to store a set of per-flow bin entries corresponding to a set of per-flow bin identifiers of each flow of the set of the flows; and store the per-flow bin identifier in a second data structure, wherein the second data structure is adapted to store the set of per-flow bin identifiers of each flow of the set of the flows, which the corresponding per-flow bin entries stored in the first data structure are greater than 0.
This disclosure proposes a solution that maintains data structures occupying fixed memory for approximate per-flow histograms reconstruction. In particular, a combination of flow identifiers and indices of bins are used when storing per-flow histograms’ counters in the data structures, such as probabilistic data structures. Possibly, any histogram-based technique can be used to keep track of individual flow’s quantile with a maximum number of bins S. Based on this disclosure, instead of maintaining a single histogram per flow, only the relevant bins are stored in the data structures.
In an implementation form of the first aspect, the network device is further configured to obtain the set of measurement values of the flow; and divide the set of measurement values into the plurality of bins using a per-flow histogram algorithm.
Normally, in the histogram-based approach for tracking the distribution of given per-flow measurements, the measurements are divided into bins whose size is dimensioned, for instance, according to the desired accuracy.
In an implementation form of the first aspect, the network device is further configured to maintain a flow table; and store the flow identifier of the flow in the flow table.
In an implementation form of the first aspect, the network device is further configured to periodically report at least one of the first data structure, the second data structure, and the flow table, to a control plane entity.
Optionally, the first data structure and the second data structure may be periodically transferred together with the flow table to the control plane (or collection point) .
In an implementation form of the first aspect, the first data structure is a hash table or a sketch; and the second data structure is a filter, a dictionary, or a hash table that indicates whether the per-flow bin entry stored in the first data structure is greater than 0.
In an implementation form of the first aspect, at least one of the first data structure and the second data structure are probabilistic data structures.
Possibly, instead of using probabilistic data structures, the first data structure or the second data structure may also be realized using a more precise data structure such as a finite size hash table, e.g., a cuckoo hash table.
In an implementation form of the first aspect, the first data structure comprises a count-min sketch, and/or the second data structure comprises a Bloom filter.
In an implementation form of the first aspect, the network device is further configured to generate the per-flow bin identifier for the bin by performing a hash function on the flow identifier and the index of the bin.
A second aspect of the disclosure provides a control plane entity for recovering quantiles of a set of one or more flows, wherein the control plane entity is configured to, for each flow of the set of flows: obtain a per-flow bin identifier of each bin of a plurality of bins, wherein the plurality of bins is adapted to store a set of measurement values of the flow; determine, for each bin, whether a per-flow bin entry that corresponds to the per-flow bin identifier, has a value greater than 0, by using a second data structure and the per-flow bin identifier, wherein the first data structure is adapted to store a set of per-flow bin entries corresponding to a set of per-flow bin identifiers of each flow of the set of the flows; and retrieve the value of the per-flow bin entry from the first data structure, if the value is greater than 0.
Another aspect of this disclosure proposes a control plane entity for quantile reconstruction with limited memory. In particular, the control plane entity checks if the bin of the flow corresponding to the obtained per-flow bin identifier has a value greater than 0 by interrogating the second data structure. If the result of the second data structure interrogation is negative, the value of the per-flow bin entry is considered to be 0. If the result of the second data structure interrogation is positive, the value of the per-flow bin entry is retrieved from the first data structure.
In an implementation form of the second aspect, the control plane entity is further configured to compute the quantiles according to a per-flow histogram algorithm.
In an implementation form of the second aspect, the control plane entity is further configured to obtain the per-flow bin identifier of each bin using a flow table.
In an implementation form of the second aspect, the control plane entity is further configured to periodically obtain at least one of the first data structure, the second data structure, and the flow table, from a network entity.
A third aspect of the disclosure provides a method for a network device for extracting quantiles of a set of one or more flows, wherein the method comprises, for each packet belonging to a flow of the set of flows: extracting a flow identifier and a measurement value of the flow, wherein a set of measurement values for a single flow is stored in a plurality of bins; deriving an index of a bin corresponding to the extracted measurement value; generating a per-flow bin identifier for the bin based on the flow identifier and the index of the bin; updating a per-flow bin entry that corresponds to the per-flow bin identifier and that is stored in a first data structure by increasing a value of the per-flow bin entry by 1, wherein the per-flow bin entry indicates a number of appearances of the per-flow bin identifier, wherein the first data structure is adapted to store a set of per-flow bin entries corresponding to a set of per-flow bin identifiers of each flow of the set of the flows; and storing the per-flow bin identifier in a second data structure, wherein the second data structure is adapted to store the set of per-flow bin identifiers of each flow of the set of the flows, which the corresponding per-flow bin entries stored in the first data structure are greater than 0.
Implementation forms of the method of the third aspect may correspond to the implementation forms of the network device of the first aspect described above. The  method of the third aspect and its implementation forms achieve the same advantages and effects as described above for the network device of the first aspect and its implementation forms.
A fourth aspect of the disclosure provides a method for a control plane entity for recovering quantiles of one or more flows, wherein the method comprises, for each flow of the set of flows: obtaining a per-flow bin identifier of each bin of a plurality of bins, wherein the plurality of bins is adapted to store a set of measurement values of the flow; determining, for each bin, whether a per-flow bin entry that corresponds to the per-flow bin identifier, has a value greater than 0, by using a second data structure and the per-flow bin identifier, wherein the first data structure is adapted to store a set of per-flow bin entries corresponding to a set of per-flow bin identifiers of each flow of the set of the flows; and retrieving the value of the per-flow bin entry from the first data structure, if the value is greater than 0.
Implementation forms of the method of the fourth aspect may correspond to the implementation forms of the control plane entity of the second aspect described above. The method of the fourth aspect and its implementation forms achieve the same advantages and effects as described above for the control plane entity of the second aspect and its implementation forms.
A fifth aspect of the disclosure provides a computer program product comprising a program code for carrying out, when implemented on a processor, the method according to the third aspect and any implementation forms of the third aspect, or the fourth aspect and any implementation forms of the fourth aspect.
A sixth aspect of the disclosure provides a computer-readable medium comprising instructions which, when executed by a computer, cause the computer to carry out, the method according to the third aspect and any implementation forms of the third aspect, or the fourth aspect and any implementation forms of the fourth aspect.
It has to be noted that all devices, elements, units and means described in the present application could be implemented in software or hardware elements or any kind of combination thereof. All steps which are performed by the various entities described in the present application as well as the functionalities described to be performed by the various entities are intended to mean that the respective entity is adapted to or configured  to perform the respective steps and functionalities. Even if, in the following description of specific embodiments, a specific functionality or step to be performed by external entities is not reflected in the description of a specific detailed element of that entity that performs that specific step or functionality, it should be clear for a skilled person that these methods and functionalities can be implemented in respective software or hardware elements, or any kind of combination thereof.
BRIEF DESCRIPTION OF DRAWINGS
The above-described aspects and implementation forms of the present disclosure will be explained in the following description of specific embodiments in relation to the enclosed drawings, in which:
FIG. 1 (a) shows per-flow latency measurements over time of three different flows;
(b) shows per-flow distributions of three different flows;
FIG. 2 (a) shows an example of a conventional compactor;
(b) shows a data structure of a conventional compactor;
FIG. 3 shows an example of a single flow histogram;
FIG. 4 shows a network device according to an embodiment of this disclosure;
FIG. 5 shows a network device according to an embodiment of this disclosure;
FIG. 6 shows a control plane entity according to an embodiment of this disclosure;
FIG. 7 shows a control plane entity according to an embodiment of this disclosure;
FIG. 8 shows a system according to an embodiment of this disclosure;
FIG. 9 shows data structures update in a network device according to an embodiment of this disclosure;
FIG. 10 shows a method according to an embodiment of this disclosure; and
FIG. 11 shows a method according to an embodiment of this disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS
Illustrative embodiments of a first entity, a second entity, and corresponding methods for multi-flow quantile reconstruction are described in the following with reference to the figures. Although this description provides a detailed example of possible implementations, it should be noted that the details are intended to be exemplary and in no way limit the scope of the application.
Moreover, an embodiment or example may refer to other embodiments or examples. For example, any description including but not limited to terminology, element, process, explanation, and/or technical advantage mentioned in one embodiment or example is applicative to the other embodiments or examples.
For ease of understanding this application, two previously mentioned conventional approaches are first explained here.
The first approach consists in using a special data structure called a compactor. A compactor is a simple array of values v∈V of size k, where all values are associated with a common weight w. A compact operation on such data structure consists in first sorting the elements and then discarding either the even or the odd ones in the sequence so that the array remains with k/2 elements occupied. The weight of the elements that remain in the array is finally doubled and becomes 2w. FIG. 2 (a) shows a simple example of the first approach.
A compactor-based algorithm uses 
Figure PCTCN2022081296-appb-000001
compactors to be able to appropriately approximate ranks r (v) . In a conventional solution, to get an εapproximated ranks
Figure PCTCN2022081296-appb-000002
of the real ranks r (v) such that
Figure PCTCN2022081296-appb-000003
with probability 1-δ, the required memory may be as given by Equation 1:
Figure PCTCN2022081296-appb-000004
In this example, the weights associated with the different compactors are exponentially decreasing and bigger or equal to 2. An example of such data structure is shown in FIG. 2 (b) .
It can be understood that, if the objective of the algorithm is to maintain per-flow rank estimation (can be considered as equivalent to the q-quantile problem to some extent) , N of these data structures (one per flow) are required. This approach is hardly acceptable for memory-constrained devices such as switches or routers for which the number of flows is not known a priori. Furthermore, this family of algorithms also requires ordering as well as several memory operations to implement the simple building block, i.e., the compactor. Such operations are typically not available in modern programmable networking devices and hence such an approach is not fit for our use case.
The second approach, histogram-based, consists in dividing the measurement space into bins whose size is dimensioned according to desired accuracy and the maximum measurement value, i.e., max (v) when known a priori. In one conventional approach, DDSketch, it proposes to associate any value to a bin and use a logarithmic function such as Equation 2, which takes a value v and returns a unique positive bin index.
Figure PCTCN2022081296-appb-000005
This algorithm consists in maintaining per-bin counters indicating the number of values that fall in a given bin. To get f q-quantile of a flow, bin counters are iteratively summed up, for instance starting from the smallest bin, until the bin i for which the sum is greater than
Figure PCTCN2022081296-appb-000006
where |V| is the cardinality of the set V. The estimated q-quantile is a value
Figure PCTCN2022081296-appb-000007
in the last processed bin i defined by Equation 2 with a relative error defined in Equation 3.
Figure PCTCN2022081296-appb-000008
It can be proved that:
Figure PCTCN2022081296-appb-000009
With simple math, if it is interested in measurements with known a priori upper bounds, i.e., 
Figure PCTCN2022081296-appb-000010
Equations 2, 3, and 4 can be reduced to Equations 5, 6, and 7 as follows:
Figure PCTCN2022081296-appb-000011
Figure PCTCN2022081296-appb-000012
Figure PCTCN2022081296-appb-000013
where S is the number of bins required.
FIG. 3 represents an example of a single flow histogram where bins of increasing sizes are used to store information related to the distribution of measurement values of Flow1.
Although this approach requires limited and simple operations that could be easily implemented on programmable network devices, when extending it to multiple flows, it requires reserving N sketch data structures where N is the number of flows in the system. As for the compactor-based approach, this does not apply to memory-constrained network devices.
This disclosure thus exploits the histogram-based and approximate counting solutions to overcome the limitations of the above-mentioned conventional solutions.
FIG. 4 shows a network device 400 for extracting quantiles of a set of one or more flows, according to an embodiment of the disclosure. The network device 400 may comprise processing circuitry (not shown) configured to perform, conduct or initiate the various operations of the network device 400 described herein. The processing circuitry may comprise hardware and software. The hardware may comprise analog circuitry or digital circuitry, or both analog and digital circuitry. The digital circuitry may comprise components such as application-specific integrated circuits (ASICs) , field-programmable arrays (FPGAs) , digital signal processors (DSPs) , or multi-purpose processors. The network device 400 may further comprise memory circuitry, which stores one or more instruction (s) that can be executed by the processor or by the processing circuitry, in particular under control of the software. For instance, the memory circuitry may comprise a non-transitory storage medium storing executable software code which, when executed by the processor or the processing circuitry, causes the various operations of the network device 400 to be performed. In one embodiment, the processing circuitry comprises one or more processors and a non-transitory memory connected to the one or more processors. The non-transitory memory may carry executable program code which, when executed  by the one or more processors, causes the network device 400 to perform, conduct or initiate the operations or methods described herein.
The network device 400 is configured to extract a flow identifier 401 and a measurement value 402 of the flow. Notably, a set of measurement values for a single flow is stored in a plurality of bins (e.g., as shown in FIG. 3) . The network device 400 is further configured to derive an index 403 of a bin corresponding to the extracted measurement value 402. Then, the network device 400 is configured to generate a per-flow bin identifier 404 based on the flow identifier 401 and the index 403 of the bin.
Further, the network device 400 is configured to update a per-flow bin entry that corresponds to the per-flow bin identifier 405 and that is stored in a first data structure 410 by increasing the value of the per-flow bin entry by 1. In particular, the per-flow bin entry indicates a number of appearances of the per-flow bin identifier 405. The first data structure 410 is adapted to store a set of per-flow bin entries corresponding to a set of per-flow bin identifiers of each flow of the set of the flows.
The network device 400 is further configured to store the per-flow bin identifier 404 in a second data structure 420. In particular, the second data structure 420 is adapted to store the set of per-flow bin identifiers of each flow of the set of the flows, which the corresponding per-flow bin entries stored in the first data structure 410 are greater than 0.
The present disclosure introduces a network device 400 that is enabled to extract quantiles of multiple flows with limited memory. In particular, this disclosure allows the network device 400 to only store the relevant bins in the system, instead of maintaining a single histogram per flow.
According to an embodiment of this disclosure, the network device 400 may be configured to obtain the set of measurement values of the flow; and divide the set of measurement values into the plurality of bins using a per-flow histogram algorithm.
This disclosure describes a system that maintains data structures occupying fixed memory for approximate per-flow histograms reconstruction. In particular, any histogram-based technique can be used to keep track of individual flow’s quantile with a maximum number of bins S. Instead of maintaining a single histogram per flow, only the relevant bins are stored in the system.
FIG. 5 shows a detailed network device 400 according to an embodiment of the disclosure. The network device 400 shown in FIG. 5 is based on the network device 400 shown FIG. 4.
In particular, when a new per-flow measure arrives, the network device 400 extracts the flow identifier (i.e., the 5-tuple of the flow) , and saves existing flows in a flow table. Typically, a flow is identified by its 5-tuple, which contains the complete source and destination addresses, source and destination ports, and the transport layer protocol identifier.
According to an embodiment of this disclosure, the network device 400 may be further configured to maintain a flow table; and store the flow identifier 401 of the flow in the flow table.
The network device 400 may further derive the index 403 of the bin according to a per-flow histogram-based sketch for quantiles extraction.
The network device 400 may use a first probabilistic counting data structure C, i.e., the first data structure 410 as shown in FIG. 4, for counting the number of values in each bin. In particular, a combination of the bin index 403 and the flow identifier 401 is used to obtain a per-flow bin identifier 405.
The network device 400 may further use a second probabilistic data structure B, i.e., the second data structure 420 as shown in FIG. 4, to keep track of the per-flow bin contained in the counting data structure, i.e., to store the set of per-flow bin identifiers, which the corresponding per flow-bin counter has a value greater than 0.
Optionally, the first data structure 410 may be a hash table or a sketch. In a particular embodiment, the first data structure 410 may comprise a count-min sketch.
Optionally, the second data structure 420 may be a filter, a dictionary, or a hash table that indicates whether the per-flow bin entry 405 stored in the first data structure 410 is greater than 0. In a particular embodiment, the second data structure 420 may comprise a Bloom filter.
According to an embodiment of this disclosure, the per-flow bin identifier 404 for the bin may be generated by performing a hash function on the flow identifier 401 and the index 403 of the bin.
Optionally, the first data structure 410 and the second data structure 420 are periodically reported to a telemetry collection point (either local or remote control plane) .
According to an embodiment of this disclosure, the network device may be configured to periodically report at least one of the first data structure 410, the second data structure 420, and the flow table, to a control plane entity 600.
FIG. 6 shows a control plane entity 600 for recovering quantiles of a set of one or more flows, according to an embodiment of this disclosure.
The control plane entity 600 may comprise processing circuitry (not shown) configured to perform, conduct or initiate the various operations of the control plane entity 600 described herein. The processing circuitry may comprise hardware and software. The hardware may comprise analog circuitry or digital circuitry, or both analog and digital circuitry. The digital circuitry may comprise components such as ASICs, FPGAs, DSPs, or multi-purpose processors. The control plane entity 600 may further comprise memory circuitry, which stores one or more instruction (s) that can be executed by the processor or by the processing circuitry, in particular under control of the software. For instance, the memory circuitry may comprise a non-transitory storage medium storing executable software code which, when executed by the processor or the processing circuitry, causes the various operations of the control plane entity 600 to be performed. In one embodiment, the processing circuitry comprises one or more processors and a non-transitory memory connected to the one or more processors. The non-transitory memory may carry executable program code which, when executed by the one or more processors, causes the control plane entity 600 to perform, conduct or initiate the operations or methods described herein.
The control plane entity 600 is configured to, for each flow of the set of flows, obtain a per-flow bin identifier 404 of each bin of a plurality of bins. Notably, the plurality of bins is adapted to store a set of measurement values of the flow. The control plane entity 600 is further configured to determine, for each bin, whether a per-flow bin entry 405 that corresponds to the per-flow bin identifier 404 , has a value greater than 0, by using a  second data structure 420 and the per-flow bin identifier 404. In particular, the first data structure 410 is adapted to store a set of per-flow bin entries corresponding to a set of per-flow bin identifiers of each flow of the set of the flows. Further, the control plane entity 600 is configured to retrieve the value of the per-flow bin entry 405 from the first data structure 410, if the value is greater than 0.
This disclosure further introduces a control plane entity 600 capable of reconstructing quantiles of multiple flows with limited memory.
Optionally, the control plane entity 600 may be configured to compute the quantiles according to a per-flow histogram algorithm.
According to an embodiment of this disclosure, the control plane entity 600 may be further configured to obtain the per-flow bin identifier 404 of each bin using a flow table.
According to an embodiment of this disclosure, the control plane entity 600 may be further configured to periodically obtain at least one of the first data structure 410, the second data structure 420, and the flow table, from a network device 400. Possibly, the network device 400 is the network device as shown in FIG. 4.
FIG. 7 shows a detailed control plane entity 600 according to an embodiment of the disclosure. The control plane entity 600 shown in FIG. 7 is based on the control plane entity 600 shown FIG. 6. Notably, the control plane entity 600 may be a telemetry collection point in the network.
As previously discussed, for each bin i (i∈ [0 to S] ) , the network device 400 combines the bin index i and the flow identifier to obtain a per-flow bin identifier 404. When the quantiles of a single flow need to be reconstructed, for each bin i, the control plane entity 600 first obtains the per-flow bin identifier 404 and checks whether the bin of the corresponding flow has a value greater than 0 by interrogating the probabilistic data structure B, i.e., the second data structure 420.
If the result of the probabilistic data structures interrogation is negative, the counter value corresponding to the per-flow bin identifier 404 is considered equal to 0. If the result of the probabilistic data structures interrogation is positive, the counter value corresponding to the per-flow bin identifier 404 is retrieved from the probabilistic counting data structure C, i.e., the first data structure 410.
Once all bins bin i ∈ [1 to S] are recovered, the approximated histogram is complete and quantiles can be computed following the quantiles’ extraction method of the histogram algorithm, e.g., DDSketch.
According to an embodiment of this disclosure, the combined flow, and indices of bins may be obtained by k pairwise independent hash functions. Optionally, the probabilistic data structures B, i.e., the second data structure 420, may correspond to a Bloom filter dimensioned for a low false-positive rate. Said Bloom filter may be represented as:
Figure PCTCN2022081296-appb-000014
where k is the number of hash functions, m is the size of the Bloom filter in bits, and n is the number of elements present in the filter, i.e., pairs of flow, indices of bins with at least one element.
Optionally, the probabilistic data structures C, i.e., the first data structure 410, may correspond to a Count-min sketch (or a more precise conventional sketch) with error guarantee:
Figure PCTCN2022081296-appb-000015
where c i and
Figure PCTCN2022081296-appb-000016
are exact and approximate counts of element i, respectively, k is the number of hash functions, d is the size of the sketch, i.e., the total number of counters (d/k per each hash function) , and n is the number of elements present in the sketch.
FIG. 8 shows a system according to an embodiment of this disclosure. The system comprises a network device 400 as shown in FIG. 4 or FIG. 5, and a control plane entity 600 as shown in FIG. 6 or FIG. 7. In particular, FIG. 8 presents a high-level view of the construction phase in the data plane (in the network device 400) and the recovery phases in the control plane or telemetry collection point (in the control plane entity 600) .
Notably, by using results from the first data structure 410 and the second structure 420, the control plane (i.e., the control plane entity 600) can retrieve the approximate version of the flow histograms (e.g., as shown in FIG. 7) .
FIG. 9 present a concrete example of data structures update according to an embodiment of this disclosure. In this example, there are 3 hash functions (k=3) , and the corresponding values in the count-min sketch (i.e., the first data structure 410) are updated, i.e., by +1. Accordingly, the corresponding bits in the Bloom filter (i.e., the second data structure 420) are set to one.
It may be noted that instead of using probabilistic data structures, the first data structure 410 or the second data structure 420 may also be realized substituting using a more precise data structure such as a finite size hash table, e.g., a cuckoo hash table.
Based on the above-discussed embodiments, this disclosure proposes a new multi-flow quantiles extraction mechanism to be implemented in constrained devices, i.e., a network device, or a control plane device, using limited memory for quantiles extraction and reconstruction. In this disclosure, a combination of a flow identifier and a bin index is used for storing per-flow histograms’ counters in a data structure, particularly a probabilistic data structure. The constrained devices are allowed to keep track of per-flow bin counters with a positive value in another probabilistic data structure. This disclosure also proposes to periodically transfer the probabilistic data structures together with the flow table to the control plane (or collection point) . The control plane device is thus able to use the two probabilistic data structures and the flow table to retrieve per-flow histograms for q-quantile computation.
FIG. 10 shows a method 1000 according to an embodiment of the disclosure, particularly for extracting quantiles of a set of one or more flows. In a particular embodiment, the method 1000 is performed by the network device 400 shown in FIG. 4. The method 1000 comprises a step 1001 of extracting a flow identifier 401 and a measurement value 402 of the flow. A set of measurement values for a single flow is stored in a plurality of bins. The method 1000 further comprises a step 1002 of deriving an index 403 of a bin corresponding to the extracted measurement value 402. The method 1000 further comprises a step 1003 of generating a per-flow bin identifier 404 for the bin based on the flow identifier 401 and the index 403 of the bin. Then, the method 1000 comprises a step 1004 of updating a per-flow bin entry 405 that corresponds to the per-flow bin identifier 404 and that is stored in a first data structure 410 by increasing a value of the per-flow bin entry 405 by 1. In particular, the per-flow bin entry 405 indicates a number of appearances of the per-flow bin identifier 404. The first data structure 410 is adapted to  store a set of per-flow bin entries corresponding to a set of per-flow bin identifiers of each flow of the set of the flows. The method 1000 further comprises a step 1005 of storing the per-flow bin identifier 404 in a second data structure 420. The second data structure 420 is adapted to store the set of per-flow bin identifiers of each flow of the set of the flows, which the corresponding per-flow bin entries stored in the first data structure 410 are greater than 0.
Optionally, the method 1000 may comprise a step of periodically reporting at least one of the first data structure 410, the second data structure 420, and the flow table, to a control plane entity 600. Possibly, the control plane entity 600 is the control plane entity shown in FIG. 6.
FIG. 11 shows a method 1100 according to an embodiment of the disclosure, particularly for recovering quantiles of a set of one or more flows. In a particular embodiment, the method 1100 is performed by a control plane entity 600 shown in FIG. 6. The method 1100 comprises a step 1101 of obtaining a per-flow bin identifier 404 of each bin of a plurality of bins. The plurality of bins is adapted to store a set of measurement values of the flow. The method 1100 comprises a step 1102 of determining, for each bin, whether a per-flow bin entry 405 that corresponds to the per-flow bin identifier 404, has a value greater than 0, by using a second data structure 420 and the per-flow bin identifier 404. The first data structure 410 is adapted to store a set of per-flow bin entries corresponding to a set of per-flow bin identifiers of each flow of the set of the flows. The method 1100 comprises a step 1103 of retrieving the value of the per-flow bin entry 405 from the first data structure 410, if the value is greater than 0.
Optionally, the method 1100 may comprise a step of periodically obtaining at least one of the first data structure 410, the second data structure 420, and the flow table, form a network device 400. Possibly, the network device 400 may be the network device shown in FIG. 4.
The present disclosure has been described in conjunction with various embodiments as examples as well as implementations. However, other variations can be understood and effected by those persons skilled in the art and practicing the claimed embodiments of the disclosure, from the studies of the drawings, this disclosure, and the independent claims. In the claims as well as in the description the word “comprising” does not exclude other elements or steps and the indefinite article “a” or “an” does not exclude a plurality. A  single element or other units may fulfil the functions of several entities or items recited in the claims. The mere fact that certain measures are recited in the mutual different dependent claims does not indicate that a combination of these measures cannot be used in an advantageous implementation.
Furthermore, any method according to embodiments of the disclosure may be implemented in a computer program, having code means, which when run by processing means causes the processing means to execute the steps of the method. The computer program is included in a computer-readable medium of a computer program product. The computer-readable medium may comprise essentially any memory, such as a ROM (Read-Only Memory) , a PROM (Programmable Read-Only Memory) , an EPROM (Erasable PROM) , a Flash memory, an EEPROM (Electrically Erasable PROM) , or a hard disk drive.
Moreover, it is realized by the skilled person that embodiments of the network device 400, or the control plane entity 600, comprises the necessary communication capabilities in the form of e.g., functions, means, units, elements, etc., for performing the solution. Examples of other such means, units, elements and functions are: processors, memory, buffers, control logic, encoders, decoders, rate matchers, de-rate matchers, mapping units, multipliers, decision units, selecting units, switches, interleavers, de-interleavers, modulators, demodulators, inputs, outputs, antennas, amplifiers, receiver units, transmitter units, DSPs, trellis-coded modulation (TCM) encoder, TCM decoder, power supply units, power feeders, communication interfaces, communication protocols, etc. which are suitably arranged together for performing the solution.
Especially, the processor (s) of the network device 400, or the control plane entity 600 may comprise, e.g., one or more instances of a Central Processing Unit (CPU) , a processing unit, a processing circuit, a processor, an Application Specific Integrated Circuit (ASIC) , a microprocessor, or other processing logic that may interpret and execute instructions. The expression “processor” may thus represent a processing circuitry comprising a plurality of processing circuits, such as, e.g., any, some or all of the ones mentioned above. The processing circuitry may further perform data processing functions for inputting, outputting, and processing of data comprising data buffering and device control functions, such as call processing control, user interface control, or the like.

Claims (16)

  1. A network device (400) for extracting quantiles of a set of one or more flows, the network device (400) being configured to, for each packet belonging to a flow of the set of flows:
    extract a flow identifier (401) and a measurement value (402) of the flow, wherein a set of measurement values for a single flow is stored in a plurality of bins;
    derive an index (403) of a bin corresponding to the extracted measurement value (402) ;
    generate a per-flow bin identifier (404) based on the flow identifier (401) and the index (403) of the bin;
    update a per-flow bin entry (405) that corresponds to the per-flow bin identifier (404) and that is stored in a first data structure (410) by increasing a value of the per-flow bin entry (405) by 1, wherein the per-flow bin entry (405) indicates a number of appearances of the per-flow bin identifier (404) , wherein the first data structure (410) is adapted to store a set of per-flow bin entries corresponding to a set of per-flow bin identifiers of each flow of the set of the flows; and
    store the per-flow bin identifier (404) in a second data structure (420) , wherein the second data structure (420) is adapted to store the set of per-flow bin identifiers of each flow of the set of the flows, which the corresponding per-flow bin entries stored in the first data structure (410) are greater than 0.
  2. The network device (400) according to claim 1, configured to:
    obtain the set of measurement values of the flow; and
    divide the set of measurement values into the plurality of bins using a per-flow histogram algorithm.
  3. The network device (400) according to claim 1 or 2, configured to:
    maintain a flow table; and
    store the flow identifier (401) of the flow in the flow table.
  4. The network device (400) according to one of the claims 1 to 3, configured to:
    periodically report at least one of the first data structure (410) , the second data structure (420) , and the flow table, to a control plane entity (600) .
  5. The network device (400) according to one of the claims 1 to 4, wherein the first data structure (410) is a hash table or a sketch; and the second data structure (420) is a filter, a dictionary, or a hash table that indicates whether the per-flow bin entry (405) stored in the first data structure (410) is greater than 0.
  6. The network device (400) according to one of the claims 1 to 5, wherein at least one of the first data structure (410) and the second data structure (420) are probabilistic data structures.
  7. The network device (400) according to claim 6, wherein the first data structure (410) comprises a count-min sketch, and/or the second data structure (420) comprises a bloom filter.
  8. The network device (400) according to one of the claims 1 to 7, configured to:
    generate the per-flow bin identifier (404) for the bin by performing a hash function on the flow identifier (401) and the index (403) of the bin.
  9. A control plane entity (600) for recovering quantiles of a set of one or more flows, the control plane entity (600) being configured to, for each flow of the set of flows:
    obtain a per-flow bin identifier (404) of each bin of a plurality of bins, wherein the plurality of bins is adapted to store a set of measurement values of the flow;
    determine, for each bin, whether a per-flow bin entry (405) that corresponds to the per-flow bin identifier (404) , has a value greater than 0, by using a second data structure (420) and the per-flow bin identifier (404) , wherein the first data structure (410) is adapted to store a set of per-flow bin entries corresponding to a set of per-flow bin identifiers of each flow of the set of the flows; and
    retrieve the value of the per-flow bin entry (405) from the first data structure (410) , if the value is greater than 0.
  10. The control plane entity (600) according to claim 9, configured to:
    compute the quantiles according to a per-flow histogram algorithm.
  11. The control plane entity (600) according to claim 9 or 10, configured to:
    obtain the per-flow bin identifier (404) of each bin using a flow table.
  12. The control plane entity (600) according to one of the claims 9 to 11, configured to:
    periodically obtain at least one of the first data structure (410) , the second data structure (420) , and the flow table, from a network device (400) .
  13. Method (1000) for a network device (400) for extracting quantiles of a set of one or more flows, the method comprising, for each packet belonging to a flow of the set of flows:
    extracting (1001) a flow identifier (401) and a measurement value (402) of the flow, wherein a set of measurement values for a single flow is stored in a plurality of bins;
    deriving (1002) an index (403) of a bin corresponding to the extracted measurement value (402) ;
    generating (1003) a per-flow bin identifier (404) for the bin based on the flow identifier (401) and the index (403) of the bin;
    updating (1004) a per-flow bin entry (405) that corresponds to the per-flow bin identifier (404) and that is stored in a first data structure (410) by increasing a value of the per-flow bin entry (405) by 1, wherein the per-flow bin entry (405) indicates a number of appearances of the per-flow bin identifier (404) , wherein the first data structure (410) is adapted to store a set of per-flow bin entries corresponding to a set of per-flow bin identifiers of each flow of the set of the flows; and
    storing (1005) the per-flow bin identifier (404) in a second data structure (420) , wherein the second data structure (420) is adapted to store the set of per-flow bin identifiers of each flow of the set of the flows, which the corresponding per-flow bin entries stored in the first data structure (410) are greater than 0.
  14. Method (1100) for a control plane entity (600) for recovering quantiles of one or more flows, the method comprising, for each flow of the set of flows:
    obtaining (1101) a per-flow bin identifier (404) of each bin of a plurality of bins, wherein the plurality of bins is adapted to store a set of measurement values of the flow;
    determining (1102) , for each bin, whether a per-flow bin entry (405) that corresponds to the per-flow bin identifier (404) , has a value greater than 0, by using a second data structure (420) and the per-flow bin identifier (404) , wherein the first data structure (410) is adapted to store a set of per-flow bin entries corresponding to a set of per-flow bin identifiers of each flow of the set of the flows; and
    retrieving (1103) the value of the per-flow bin entry (405) from the first data structure (410) , if the value is greater than 0.
  15. A computer program product comprising a program code for carrying out, when implemented on a processor, the method (1000, 1100) according to claim 13 or 14.
  16. A computer-readable medium comprising instructions which, when executed by a computer, cause the computer to carry out the method (1000, 1100) of claim 13 or 14.
PCT/CN2022/081296 2022-03-17 2022-03-17 Device and method for multiflow quantiles extraction and reconstruction WO2023173343A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/081296 WO2023173343A1 (en) 2022-03-17 2022-03-17 Device and method for multiflow quantiles extraction and reconstruction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/081296 WO2023173343A1 (en) 2022-03-17 2022-03-17 Device and method for multiflow quantiles extraction and reconstruction

Publications (1)

Publication Number Publication Date
WO2023173343A1 true WO2023173343A1 (en) 2023-09-21

Family

ID=88022090

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/081296 WO2023173343A1 (en) 2022-03-17 2022-03-17 Device and method for multiflow quantiles extraction and reconstruction

Country Status (1)

Country Link
WO (1) WO2023173343A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060224609A1 (en) * 2004-12-02 2006-10-05 Graham Cormode Method and apparatus for finding biased quantiles in data streams
US20120278477A1 (en) * 2009-04-08 2012-11-01 The University Of North Carolina At Chapel Hill Methods, systems, and computer program products for network server performance anomaly detection
CN103647665A (en) * 2013-12-13 2014-03-19 北京启明星辰信息技术股份有限公司 Network flow curve analysis method and apparatus
CN109088903A (en) * 2018-11-07 2018-12-25 湖南大学 A kind of exception flow of network detection method based on streaming

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060224609A1 (en) * 2004-12-02 2006-10-05 Graham Cormode Method and apparatus for finding biased quantiles in data streams
US20120278477A1 (en) * 2009-04-08 2012-11-01 The University Of North Carolina At Chapel Hill Methods, systems, and computer program products for network server performance anomaly detection
CN103647665A (en) * 2013-12-13 2014-03-19 北京启明星辰信息技术股份有限公司 Network flow curve analysis method and apparatus
CN109088903A (en) * 2018-11-07 2018-12-25 湖南大学 A kind of exception flow of network detection method based on streaming

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YANG, BEI ET AL.: "Research on an Algorithm for Approximate Quantile Computation over Data Streams", JOURNAL OF COMPUTER RESEARCH AND DEVELOPMENT, vol. 45, no. 2, 31 December 2008 (2008-12-31), pages 287 - 292, XP009548699, ISSN: 1000-1239 *

Similar Documents

Publication Publication Date Title
US10831713B2 (en) Hardware acceleration for a compressed computation database
US9847913B2 (en) System and method for gathering information
CN107566206B (en) Flow measuring method, equipment and system
US8484269B2 (en) Computing time-decayed aggregates under smooth decay functions
EP3679473B1 (en) A system and method for stream processing
CN112671611A (en) Sketch-based large stream detection method and device
US8064359B2 (en) System and method for spatially consistent sampling of flow records at constrained, content-dependent rates
Hung et al. Finding frequent items over sliding windows with constant update time
US20230025502A1 (en) Network architecture for parallel data stream analysis and modification
WO2021169217A1 (en) Abstract extraction method and apparatus, device, and computer-readable storage medium
WO2023173343A1 (en) Device and method for multiflow quantiles extraction and reconstruction
US20170083617A1 (en) Posterior probabilistic model for bucketing records
CN104090952A (en) Method and system for estimating average value of data flow under sliding window
US11003513B2 (en) Adaptive event aggregation
JP2022503344A (en) Computer network measurement metrics
GB2466425A (en) Analysing connections in a computer network, receiving checksums of data items in computer networks and determining error correction to data
CN110781232A (en) Data processing method, data processing device, computer equipment and storage medium
CN113127693A (en) Traffic data packet statistical method, device, equipment and storage medium
US20200133930A1 (en) Information processing method, information processing system, and non-transitory computer readable storage medium
US11775531B2 (en) Systems and methods for lossy data compression using key artifacts and dynamically generated cycles
US20190327343A1 (en) Communication apparatus and communication control method
CN111581489B (en) Storage space optimized sampling method based on shared counting tree
CN113302874A (en) Device and method for executing information coordination in quantum key distribution system
CN109558426A (en) A kind of dissemination method of the streaming histogram based on difference privacy
US20230334067A1 (en) Efficient evaluation of sets of resources

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22931380

Country of ref document: EP

Kind code of ref document: A1