CN113839835A

CN113839835A - Top-k flow accurate monitoring framework based on small flow filtering

Info

Publication number: CN113839835A
Application number: CN202111133411.4A
Authority: CN
Inventors: 罗可; 周国徽; 熊兵
Original assignee: Changsha University of Science and Technology
Current assignee: Changsha University of Science and Technology
Priority date: 2021-09-27
Filing date: 2021-09-27
Publication date: 2021-12-24
Anticipated expiration: 2041-09-27
Also published as: CN113839835B

Abstract

The invention discloses a Top-k flow accurate monitoring framework based on small flow filtration, which comprises the following components: the small flow filter is used for filtering small flows in network flow, reducing storage resource overhead caused by accurately storing small flow information, improving flow size estimation precision and solving the problem that the small flow filtering of the traditional filter is invalid; the large flow monitoring table comprises a single-hash multi-mapping algorithm and a probability replacement strategy; the single-hash multi-mapping algorithm firstly calculates the fingerprint value of the stream according to the identifier of the stream, then repeatedly selects part of bits from the fingerprint value to rearrange and combine to generate a plurality of hash values, reduces the overhead of hash calculation, and enables the Top-k stream to have enough candidate positions to be selectively stored. The probability replacement strategy determines whether to expel the minimum flow by searching the minimum flow in the mapping bucket and generating a replacement probability according to the packet number of the minimum flow, so as to provide a storage position for a relatively larger flow. The invention filters small flows with small storage resource overhead, then accurately monitors large flows to accurately count, has high space utilization rate, and can achieve high Top-k flow identification rate with small space overhead.

Description

Top-k flow accurate monitoring framework based on small flow filtering

Technical Field

The invention relates to the field of network measurement, in particular to a high-precision Top-k flow identification framework and a technical scheme for filtering small flows in data flows.

Background

The tasks of network measurement include Top-k flow identification, flow size estimation, flow quantity statistics and the like, key information is provided for analyzing the characteristics of network flow, and the tasks are the basis of network management and monitoring. Where Top-k flow identification is generally defined as finding the first k largest flows in the network traffic and the size of a flow is defined as the number of packets of the network data flow. Generally, in order to identify Top-k flows, a counter is assigned to the arriving data flows to keep track of the size of the detected flows, but for millions of network data flows, it is difficult to maintain a counter for each data flow. Meanwhile, in order to correctly identify Top-k flow, the error of estimation of flow size by the measurement procedure needs to be guaranteed within a very small range. Therefore, on the premise of ensuring the algorithm processing speed, finding a Top-k flow identification method with high precision and low overhead becomes an important challenge of the current Top-k flow identification method.

Currently, the major Top-k stream identification methods are largely divided into three categories. The first type is a sketch-based method and is divided into two structures of sketch and small top heap, and sketch and hash table combination. The sketch method of the first structure counts the sizes of all streams through a two-dimensional counter and identifies Top-k streams in the streams through small Top heap tracking. The sketch method with the second structure stores the small flows through sketch, the hash table monitors the large flows to reduce the expenditure of storage resources, and the small flows in the hash table are expelled by adopting a replacement algorithm to accurately store the large flows. The second category is counter-based methods, which accurately estimate the flow size by assigning a counter to a large flow in a buffer. The third type is a method based on a filtering idea, which filters a small flow in network flow by means of a filter, and then extracts a large flow in the network flow, thereby avoiding the influence of the small flow on the accurate counting of the large flow and improving the estimation accuracy of the flow size. But at the same time face the following problems:

1. cannot meet the requirements of high precision and low memory overhead at the same time

With the increasing of network devices on the internet, the number of network data streams has already reached the million level, and the size of network traffic is subject to heavy tail distribution, that is, a small number of large streams in the network occupy most of data packets in the network traffic, while a large number of small streams occupy only a small number of data packets in the network traffic. In this regard, the sketch-based method must use a sufficient number of counters to reduce hash collisions, and each counter must use enough bits to avoid overflow, so that the overhead of memory resources cannot be reduced. The counter-based method also needs to allocate enough counters to track a large number of streams, and has the problem that a small stream is misjudged as a large stream, which affects the Top-k stream identification precision.

2. Transitional filtration and filtration failure problems of conventional small flow filters

Conventional filters use a two-dimensional array of counters to record the number of packets that a stream arrives at, and when the value of all the counters to which the stream maps reach a threshold T, they will be allowed to pass through the filter. However, most counters in the filter will reach the threshold T after a period of time, resulting in all flow being able to pass directly through the filter, thereby rendering the filter unable to filter small flows. Although the existing filter measures a fixed time as a period to reset the counter in the filter, the counter in the filter is prevented from being always kept in a state of reaching the threshold value. However, after the counter of the filter is reset, the large flow needs to re-increment the counter value in the filter to pass through the filter, resulting in the unexpected consumption of the packets of the large flow, which in turn causes the underestimation of the number of packets of the large flow, i.e. the transitional filtering problem.

Comparing files: CN111262756A discloses a method and a structure for accurately measuring a high-speed network elephant flow, which are characterized in that firstly, a sketch based filter is used for filtering a small flow in a data flow, then, an extractor based on cuckoo hash is used for extracting a large flow in a network flow, the cost of storage resources for the small flow is reduced, and the large flow is accurately tracked so as to improve the identification rate of the large flow. The file comparison scheme does not solve the problem of filtering failure, so that a small flow filter cannot filter small flows in network flow after working for a period of time, and the extraction method based on cuckoo Hash has the problem of low accuracy rate of large flow identification.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a sketch technology, a data structure of a small flow filter combining double counters is adopted, a method for periodically updating the counters in the filter is adopted, strategies corresponding to updating of the two counters are designed, and the arrival condition of each flow in each period is accurately recorded so as to judge a large flow and a small flow in a network data flow. Meanwhile, a large flow monitoring table is designed by combining a single-hash multi-mapping hash algorithm, so that the Top-k flow is ensured to have enough position storage, the minimum flow is expelled by adopting a probability replacement strategy, the large flow is accurately stored, and the identification precision of the Top-k flow is improved.

In order to solve the technical problems, the invention adopts the following technical scheme:

the invention provides a Top-k flow accurate monitoring framework based on small flow filtration, which comprises the following components:

the small flow filter is used for distinguishing large flows and small flows in network flow, and the large flows in the network flow are conveniently extracted to accurately track the number of the counter packets; the small flow filter adopts two small counters which are matched in pairs to record different packet number information of the flow, so that the low memory space overhead is realized, and the counters in the small flow filter are updated periodically; the two small counters are respectively used for recording the average arriving packet number of the stream in each period and the arriving packet number of the stream in the current period;

the large flow monitoring table is used for accurately monitoring the large flow in the network flow and accurately counting the packet quantity of the large flow; the large flow monitoring table is a hash table composed of hash buckets, each hash bucket can store a plurality of flows, a single-hash multi-mapping algorithm is adopted to map each flow into a plurality of candidate hash buckets so as to ensure that Top-k flows have enough positions to be stored, and a probability replacement strategy is adopted so as to accurately monitor the large flows; the single-hash multi-mapping algorithm is used for generating a plurality of hash values through one-time hash calculation so as to map to a plurality of hash buckets; the probability replacement strategy is to replace the minimum stream in all candidate positions by adopting a certain probability when all the candidate positions have no vacancy;

the small flow filter consists of d arrays, each array consists of w buckets, and each bucket comprises two counters, namely a new counter and an old counter; the new counter records the number of packets arriving in the current period; the old counter records the average number of packets arriving in the stream over the past period;

the large flow monitoring table is composed of r hash buckets, each hash bucket comprises c slots, and each slot stores a fingerprint value FP of one flow and a packet number counter, namely, each slot stores one flow.

The method of the present invention also provides a method based on the above framework, which comprises:

when a data packet arrives, the trickle filter firstly maps to a certain bucket in d arrays through d two independent hash functions according to the flow identifier, obtains the minimum new counter value and the minimum old counter value in the d buckets, takes the minimum new counter value as the packet number of the flow arriving at present, and takes the minimum old counter value as the average packet number of the flow arriving in the past period. When the minimum new counter value for a flow reaches a threshold T, then the flow is considered to be a newly arrived large flow, allowed to pass through the filter and enter the large flow monitoring table. When the flow's oldest counter value reaches the threshold T, then the flow is considered to be a continuously arriving big flow, which is allowed to pass through the filter and enter the big flow monitoring table.

The two counters are concerned about whether the threshold T is reached, and the threshold T is usually very small, so that the two counters only need to be set to a few bits to achieve the purpose of small size and low overhead.

When a data packet of the stream arrives, the large stream monitoring table firstly calculates a fingerprint value FP through a hash function according to a stream identifier, and then randomly selects a fixed number of bits from the fingerprint value FP for arrangement for multiple times to generate a plurality of sub-hash values, so that the sub-hash values are mapped into a plurality of hash buckets. Then, the large flow monitoring table checks all mapping buckets, and if the flow is stored, a counter corresponding to the flow is increased by 1; if the stream is not stored, but there is a vacancy, inserting the stream into a vacancy; if the stream is not stored and there is no empty space, find the minimum stream in the mapping bucket, and use the packet number of the minimum stream as the basis for generating a replacement probability to decide whether to replace the minimum stream with the newly arrived stream.

Further, the Top-k flow identification architecture comprises the following operations:

1. insertion and reporting of streamlet filters

The small flow filter maps each arriving packet to a bucket on each counter array, reports whether the packet can pass through the filter according to the minimum new counter value and the minimum old counter value, and decides whether to update the new counter in the packet.

2. Periodic updating of counters in a trickle filter

When the small flow filter measures a certain number of data packets, all counters in the small flow filter are updated. Wherein, the new counter is directly reset to 0, and the old counter adopts a halved updating strategy, namely, the value of the new counter and the average value of the values of the old counter in the last period are updated.

3. Insertion of a large flow watch

When a data packet is transmitted into a large flow monitoring table, firstly, the fingerprint value of the flow is counted according to the flow identifier of the flow to which the data packet belongs, then, the monitoring table is inquired according to the fingerprint value of the flow, and different updating steps are carried out according to the inquiry result and whether the hash bucket has a vacancy or not.

4. Substitution of a large flow watch

When a flow of packets arrives, all hash buckets mapped by the flow are full, and the flow is not recorded in the bucket, the number of packets C of the minimum flow in the bucket is first determined_minGenerating a replacement probability 1/(C)_min+1) and then compared with a real number randomly generated between 0 and 1. If the replacement probability is greater than the real number, the minimum flow is replaced, otherwise, the data packet of the flow is discarded.

5. Top-k flow reporting for large flow monitoring tables

The large flow monitoring table firstly arranges all flows from large to small in sequence according to the packet number of the flows, extracts the first k flows, adds a threshold value T of a small flow filter as the final packet number of the flows, and then reports the k flows as Top-k flows to a server.

The invention has the beneficial effects that:

1. the invention uses two small counters to build the small flow filter, and updates the counter in the small flow filter in a periodic way. One of the counters records the number of packets that the stream arrives during the current period to identify newly arriving big streams, and the other counter records the average number of packets that the stream arrives during the past period to identify continuously arriving big streams. The two small counters are combined to identify the big flow in the network flow, the storage space waste caused by storing the small flow is reduced, the problem that the small flow filtration of the traditional filter is invalid is solved, and the identification precision of the Top-k flow is improved.

2. The invention designs a low-overhead and high-precision Top-k flow identification method by combining a single-hash multi-mapping algorithm. The method comprises the steps of firstly calculating a fingerprint value according to a flow identifier, then selecting bits from the fingerprint value to be recombined into a hash value to be mapped into a hash table, and reducing the overhead of hash calculation. Meanwhile, each flow can have a plurality of candidate hash buckets, so that the Top-k flow can be ensured to have enough storage positions for selection, the problem that the size of the Top-k flow cannot be monitored due to the fact that the Top-k flow does not have position storage is avoided, and the Top-k flow identification precision is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a Top-k flow accurate monitoring architecture based on small flow filtering in the method of the present invention.

FIG. 2 is a data structure diagram of a small flow filter in the method of the present invention.

Fig. 3 is a data structure diagram of a large flow monitoring table in the method of the present invention.

Fig. 4 is a flow chart of the insertion and release of packets by the streamlet filter in the method of the present invention.

FIG. 5 is a flow chart of periodic updating of a counter in a small flow filter in the method of the present invention.

Fig. 6 is a flow chart of the insertion of a large flow monitoring table packet in the method of the present invention.

Fig. 7 is a flow chart of the flow replacement in the large flow monitoring table in the method of the present invention.

Fig. 8 is a flow chart of the large flow monitoring table reporting Top-k flows upwards in the method of the present invention.

Detailed Description

In order to better illustrate the content of the invention, the invention is further verified by the following specific examples. It should be noted that the examples are given for the purpose of describing the invention more directly and are only a part of the present invention, which should not be construed as limiting the invention in any way.

As shown in fig. 1, an embodiment of the present invention provides a Top-k flow accurate monitoring architecture based on small flow filtering, including:

as shown in fig. 2, the small flow filter is composed of d number of groups, each group is composed of w buckets, each bucket includes a pair of counters, i.e. a new counter and an old counter; where both small counters are 4 bits in size.

as shown in fig. 3, the large flow monitoring table is composed of r hash buckets, each of which contains c slots, each of which stores a fingerprint value FP of one flow and a packet count counter, i.e., each slot stores one flow. When the data packet P_fidWhen arriving, we compute the stream fingerprint FP by a hash function H (and sub-hash function sub H)_i() Mapping to i buckets. Sub-hash function subH_i(.) into two steps: (1) selecting n bits of a fixed position from the fingerprint value FP value, and then always selecting the bit of the corresponding position; (2) and arranging the selected n bit values to generate a new hash value.

The embodiment also provides a method based on the above architecture, which includes the following steps:

when a data packet arrives, the trickle filter firstly maps to a certain bucket in d arrays through d two independent hash functions according to the flow identifier, obtains the minimum new counter value and the minimum old counter value in the d buckets, takes the minimum new counter value as the packet number of the flow arriving at present, and takes the minimum old counter value as the average packet number of the flow arriving in the past period. When the minimum new counter value for a flow reaches a threshold T, then the flow is considered to be a newly arrived large flow, which is allowed to pass through the filter. When the flow's oldest counter value reaches the threshold T, then the flow is considered to be a continuously arriving large flow, which is allowed to pass through the filter.

When a data packet of a stream arrives, a large flow monitoring table firstly calculates a fingerprint value FP through a hash function according to a stream identifier, and then randomly selects a fixed number of bits from the fingerprint value FP for arrangement for multiple times to generate a plurality of sub-hash values, so that the sub-hash values are mapped into a plurality of hash buckets. Then, the large flow monitoring table checks all mapping buckets, and if the flow is stored, a counter corresponding to the flow is increased by 1; if the stream is not stored, but there is a vacancy, inserting the stream into a vacancy; if the stream is not stored and there is no empty space, find the minimum stream in the mapping bucket, and use the packet number of the minimum stream as the basis for generating a replacement probability to decide whether to replace the minimum stream with the newly arrived stream.

1. Insertion and release of packets by streamlet filters

As shown in fig. 4, the trickle filter maps each arriving packet to a bucket on each counter array, reports whether the packet can pass through the filter based on the minimum new counter value and the minimum old counter value, and decides whether to update the new counter therein.

Firstly, analyzing the header information of a data packet and extracting a flow identifier; and then mapping the d hash functions to a certain bucket on the d number groups of the small flow filter to obtain a minimum new counter value and a minimum old counter value, and further comparing the minimum value with a threshold value T. When the minimum new value is greater than the threshold, the packet is allowed to pass through the filter into the large flow monitoring table. Otherwise, updating new values in the d mapping buckets, and judging whether the minimum old value is larger than the threshold value. If the minimum old value is greater than the threshold, the packet is allowed to pass through the filter into the large flow monitoring table.

2. Periodic updating of counters in a trickle filter

As shown in fig. 5, when the small flow filter measures a certain number of packets, all the counters in itself will be updated. The filter will update the counters in the buckets starting with the first bucket of each array until the last bucket is updated.

Wherein, the new counter is directly reset to 0, and the old counter adopts a halved updating strategy, namely, the value of the new counter and the average value of the values of the old counter in the last period are updated.

3. Insertion of a large flow watch

As shown in fig. 6, when a packet is transmitted into the large flow monitoring table, the fingerprint value of the flow is counted according to the flow identifier of the flow to which the packet belongs, then the flow fingerprint value is queried in the monitoring table, and different updating steps are performed according to the query result and whether the hash bucket has a vacancy.

Firstly, generating a fingerprint value FP according to a flow identifier fid, then obtaining the positions of k hash buckets through a single-hash multi-mapping algorithm, and sequentially inquiring the flows in the buckets. The stream is inserted and ended when the first empty bit is encountered. Or when the flow is queried, the counter of the flow is increased by 1 and ended. Otherwise, a new item to be replaced of the stream is created, and the stream replacement operation is performed.

4. Substitution of a large flow watch

As shown in fig. 7, when a flow of data packets arrives, all hash buckets mapped by the flow are full, and the flow is not recorded in the bucket, the number C of packets according to the minimum flow in the bucket is first determined_minGenerating a replacement probability 1/(C)_min+1) and then compared with a real number randomly generated between 0 and 1. If the replacement probability is greater than the real number, the minimum flow is replaced, otherwise, the data packet of the flow is discarded.

5. Top-k flow reporting for large flow monitoring tables

As shown in fig. 8, the large flow monitoring table first arranges all flows from large to small according to the packet number of the flows, extracts the first k flows, adds a threshold T of the small flow filter as the final packet number of the flows, and then reports the k flows as Top-k flows to the server.

The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or scope of the invention as defined in the appended claims.

Claims

1. A Top-k flow accurate monitoring architecture based on small flow filtering is characterized by comprising the following components:

the small flow filter is used for distinguishing large flows and small flows in network flow, and the large flows in the network flow are conveniently extracted to accurately track and count the number of packets; the small flow filter adopts two small counters which are matched in pairs to record different packet number information of the flow, so that the low memory space overhead is realized, and the counters in the small flow filter are updated periodically; the two small counters are respectively used for recording the average arriving packet number of the stream in each period and the arriving packet number of the stream in the current period;

the small flow filter consists of d arrays, each array consists of w buckets, and each bucket comprises a pair of counters, namely a new counter and an old counter; the new counter records the number of packets arriving in the current period; the old counter records the average number of packets arriving in the stream over the past period;

2. A method based on the architecture of claim 1, comprising the steps of:

when a data packet arrives, the trickle filter firstly maps to a certain bucket in d arrays through d two independent hash functions according to a flow identifier, obtains a minimum new counter value and a minimum old counter value in the d buckets, takes the minimum new counter value as the number of packets currently arrived by the flow, and takes the minimum old counter value as the number of packets averagely arrived by the flow in a past period; when the minimum new counter value of the flow reaches a threshold value T, the flow is considered as a newly arrived big flow, and the flow is allowed to pass through a filter and enter a big flow monitoring table; when the minimum old counter value of the flow reaches a threshold value T, the flow is considered as a continuously arriving big flow, and the big flow is allowed to pass through a filter and enter a big flow monitoring table;

when a data packet of the stream of the large stream monitoring table arrives, firstly calculating a fingerprint value FP through a hash function according to a stream identifier, then randomly selecting a fixed number of bits from the fingerprint value FP for a plurality of times to arrange, generating a plurality of sub-hash values, and mapping the sub-hash values into a plurality of hash buckets; then, the large flow monitoring table checks all mapping buckets, and if the flow is stored, a counter corresponding to the flow is increased by 1; if the stream is not stored, but there is a vacancy, inserting the stream into a vacancy; if the stream is not stored and there is no empty space, find the minimum stream in the mapping bucket, and use the packet number of the minimum stream as the basis for generating a replacement probability to decide whether to replace the minimum stream with the newly arrived stream.

3. The method of claim 2, wherein the Top-k flow identification architecture comprises the operations of:

a. insertion and reporting of the streamlet filter;

the small flow filter maps each arriving data packet to a bucket on each counter array, reports whether the data packet can pass through the filter according to the minimum new counter value and the minimum old counter value, and determines whether to update a new counter in the data packet;

b. periodic updating of counters in the streamlet filter;

when the small flow filter measures a certain number of data packets, all counters in the small flow filter are updated; wherein, the new counter is directly reset to 0, and the old counter adopts a halving updating strategy, namely, the value of the new counter and the average value of the values of the old counter in the last period are updated;

c. inserting a large flow monitoring table;

when a data packet is transmitted into a large flow monitoring table, firstly counting a fingerprint value of an outflow according to a flow identifier of a flow to which the data packet belongs, then inquiring in the monitoring table according to the fingerprint value of the flow, and carrying out different updating steps according to an inquiry result and whether a vacant position exists in a hash bucket;

d. replacement of the large flow monitoring table;

when a flow of packets arrives, all hash buckets mapped by the flow are full, and the flow is not recorded in the bucket, the number of packets C of the minimum flow in the bucket is first determined_minGenerating a replacement probability 1/(C)_min+1) and then with a random generation from 0 to 1A real comparison therebetween; if the replacement probability is greater than the real number, replacing the minimum flow, otherwise, discarding the data packet of the flow;

e. top-k flow reports of the large flow monitoring table;

4. The method of claim 3, wherein the invention uses two small counters to construct the small flow filter, and updates the counters in the small flow filter with a period; one of the counters records the number of packets arriving in the current period of the stream to identify the newly arriving big stream, and the other counter records the average number of packets arriving in the past period of the stream to identify the continuously arriving big stream; the two small counters are combined to identify the big flow in the network flow, the storage space waste caused by storing the small flow is reduced, the problem that the small flow filtration of the traditional filter is invalid is solved, and the identification precision of the Top-k flow is improved.

5. The method of claim 3, wherein the invention designs a Top-k flow identification method with low cost and high precision by combining a single hash multi-mapping algorithm; the method comprises the steps of firstly calculating a fingerprint value according to a flow identifier, then selecting bits from the fingerprint value to be recombined into a hash value to be mapped into a hash table, and reducing the overhead of hash calculation.

6. The method of claim 3, wherein each flow has multiple candidate hash buckets, so that it can be ensured that Top-k flows have enough storage locations for selection, thereby avoiding the problem that Top-k cannot monitor the flow size due to no location storage, and improving Top-k flow identification accuracy; meanwhile, when the minimum flow is replaced, since the minimum flow can be selected from a plurality of candidate positions, the minimum flow eviction can be accurately selected.