CN113965492A - Data flow statistical method and device - Google Patents

Data flow statistical method and device Download PDF

Info

Publication number
CN113965492A
CN113965492A CN202010636270.7A CN202010636270A CN113965492A CN 113965492 A CN113965492 A CN 113965492A CN 202010636270 A CN202010636270 A CN 202010636270A CN 113965492 A CN113965492 A CN 113965492A
Authority
CN
China
Prior art keywords
data
data stream
information
flow
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010636270.7A
Other languages
Chinese (zh)
Inventor
閤先军
李伟超
金波
汪漪
刘斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Southwest University of Science and Technology
Original Assignee
Huawei Technologies Co Ltd
Southwest University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd, Southwest University of Science and Technology filed Critical Huawei Technologies Co Ltd
Priority to CN202010636270.7A priority Critical patent/CN113965492A/en
Publication of CN113965492A publication Critical patent/CN113965492A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The application provides a data flow statistical method and device. The method comprises the following steps: when a first data message is detected and no space in a first data structure records information of a first data stream to which the first data message belongs, whether a data stream recorded in the first data structure and having the same hash value as the first data stream has a second data stream of which the information is not updated for more than a preset time is determined. The second data stream is prevented from continuously occupying resources of the first data structure for a long time, the data stream is counted by combining the time dimension, and the accuracy of data stream counting is improved.

Description

Data flow statistical method and device
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data flow statistical method and apparatus.
Background
Nowadays, the internet has become an indispensable part of people's daily life, and the internet traffic has been growing at a fast pace, and as of 2018, the world-wide internet traffic has reached 1.6ZB every year. With the rapid growth of network traffic, it becomes increasingly difficult to efficiently manage a network. Therefore, network measurement has attracted extensive attention of researchers as an important way to monitor, recognize and grasp network behavior, making network measurement a research hotspot in recent years.
Among the technologies involved in network measurement, elephant flow detection belongs to a key technology, and has wide applications in congestion control, network capacity planning, network anomaly detection, troubleshooting, traffic engineering and other aspects. By elephant flows is generally meant flows whose size exceeds a given threshold, or flows whose percentage of total network traffic reaches a certain value in the measurement interval. For example, a large-scale network anomaly event such as a DDos attack can be regarded as a elephant flow, and an effective elephant flow detection method is helpful for timely discovery of network anomalies.
At present, in order to balance the problems of hash collision and memory overhead in the elephant flow detection process, the academic world proposes a data flow statistical method combining a hash table and a sketch, wherein the hash table and the sketch can store frequency information of data flows, the information of the data flows with higher frequency is stored in the hash table, the information of the data flows with lower frequency is stored in the sketch, and the elephant flow detection is realized based on the statistical result obtained by the method.
The above data flow statistical method may record the occurrence frequency of the data flow in a period of time, but ignore the time dynamic characteristics of the data flow itself, for example, there may be data flow whose information is not updated for a long time in the hash table. For another example, the transmission of the data stream varies with time, and there may be a data stream that is an early elephant stream and a later non-elephant stream. However, the current statistical method cannot select these data streams, so that the accuracy of subsequently performed elephant stream detection is low.
Disclosure of Invention
The application provides a data flow statistical method and a data flow statistical device, which are used for improving the accuracy of data flow statistics.
In a first aspect, the present application provides a data flow statistics method, which may be implemented by a network device (e.g., a router, an interactive machine, a server, a host), or may be implemented by a component of the network device, such as a processing device, a circuit, a chip, or the like in a terminal device. The method comprises the following steps: when the first data message is detected and no space is recorded in the first data structure for recording the information of a first data stream to which the first data message belongs, determining whether a data stream recorded in the first data structure and having the same hash value as the first data stream has a second data stream of which the information is not updated for more than a preset time; if so, the information of the second data stream is replaced with the information of the first data stream.
Through the design, when data flow statistics is carried out, data flows can be screened from the time dimension, the preset time is exceeded in the data flows with the same Hash values as the first data flows in the first data structure, the second data flows with the information not updated are screened out, the information of the second data flows is replaced by the information of the first data flows, the second data flows are prevented from continuously occupying resources of the first data structure for a long time, the time dynamic characteristics of the data flows are considered in the statistical mode, the data flows are counted by combining the time dimension, and the accuracy of data flow statistics is improved.
In a possible implementation method, before replacing the information of the second data stream with the information of the first data stream, the method further includes: information of the second data stream is recorded to a second data structure.
Through the design, on one hand, in order to avoid missing detection or false detection, the information of the second data stream screened from the first data structure can be recorded into the second data structure, so that the statistical integrity of the data stream is ensured, and the memory consumption of the first data structure and the hash collision problem of a plurality of data streams with the same hash value are balanced. On the other hand, the second data stream in the first data structure is transferred to the second data structure for storage, so that the data stream with shorter updating time interval is stored in the first data structure, and the data stream with longer updating time interval is stored in the second data structure, thereby facilitating the subsequent data stream query operation and improving the query efficiency.
In one possible implementation, if there is no second data stream, the information of the first data stream is recorded to the second data structure.
In a possible implementation method, determining whether there is a second data stream having information that is not updated for more than a preset time in data streams recorded in a first data structure and having the same hash value as the first data stream, further includes: determining a data flow with the minimum information updating times in the data flow which is recorded in the first data structure and has the same hash value with the first data flow; judging whether the updating times are smaller than a first preset value or not; if the data flow is smaller than the preset time, judging whether the updating time interval of the data flow exceeds the preset time, and if so, determining that the data flow is the second data flow.
Through the design, the second data stream in the first data structure is screened out by combining two dimensions of time and updating times, so that the data stream which is not updated for a long time, such as the data stream with the updating time interval exceeding the preset time, or the data stream with the updating frequency lower, such as the data stream with the updating times lower than the first preset value, is prevented from continuously occupying the resources of the first data structure for a long time, the screening precision is higher, and the statistical precision of the data stream is higher.
In a possible implementation method, determining whether there is a second data stream having information that is not updated for more than a preset time in data streams recorded in a first data structure and having the same hash value as the first data stream, further includes: determining a data flow with the minimum information updating times in the data flow which is recorded in the first data structure and has the same hash value with the first data flow; judging whether the updating times are smaller than a first preset value or not; if the number of times of updating of the first data stream recorded in the second data structure is less than a second preset value, determining whether the number of times of updating of the first data stream recorded in the second data structure is greater than the second preset value; if the update time exceeds the preset time, judging whether the update time of the data stream exceeds the preset time, and if so, determining that the data stream is the second data stream.
Through the design, when judging whether the data stream in the first data structure needs to be replaced, the data stream at the conflict position (that is, the storage space of the data stream recorded in the first data structure and having the same hash value as the first data stream) in the first data structure and the first data stream can be combined for judgment, if the first data stream is a large stream, for example, the update frequency of the first data stream recorded in the second data structure is greater than the second preset value, whether the data stream at the conflict position needs to be replaced is continuously judged, and if the first data stream is a small stream, the data stream at the conflict position does not need to be replaced, so that frequent insertion operation and deletion operation on the first data structure and the second data structure are avoided.
In one possible implementation, the first data structure is used to store information of the data stream, including but not limited to: key value, updating times and sequence value; the key value is an identifier of the data stream; updating times, namely the occurrence times of the data messages belonging to the data flow; a sequence value, which is the number of the latest updated data message of the data stream; the sequence value can be a serial number assigned to the data message according to the transmission sequence; or the sequence value is the current timestamp of the input of the information of the latest updated data packet to the first data structure.
Through the design, the sequence value can represent the latest updating time of the data stream, so that the updating time interval of the data stream in the first data structure can be detected according to the sequence value of a new first data message detected in real time, the second data stream with longer updating time interval can be screened out, the second data stream is prevented from continuously occupying the resources of the first data structure for a long time, and the accuracy of data stream statistics is improved.
In a possible implementation method, the second data structure is a two-dimensional array, the two-dimensional array longitudinally includes M hash functions, and the two-dimensional array transversely includes N hash values of each hash function; recording information of a first data stream to the second data structure, comprising: and respectively calculating key values of the first data stream, inputting hash values corresponding to the M hash functions, and adding 1 to the number of times of recording each hash value.
Through the design, different data streams may have the same hash value, so that hash operations are performed on key values of the same data stream respectively through a plurality of hash functions, for example, M hash functions, to obtain corresponding hash values, and then 1 is added to the number of times recorded by each hash value, when the number of update times of the data stream is queried, the minimum value of count values corresponding to the M hash values can be returned, so as to reduce statistical errors of hash collisions on the number of update times of the data stream as much as possible, and improve the accuracy of data stream statistics.
In one possible implementation method, recording information of a second data stream to the second data structure includes: acquiring the updating times of the second data stream recorded in the first data structure; and respectively calculating key values of the first data stream, inputting hash values corresponding to the M hash functions, and adding the number of times of recording each hash value to the number of times of updating the second data stream.
In a second aspect, the present application further provides an apparatus, where the apparatus includes multiple functional units, and the functional units may perform functions performed by the steps in the method of the first aspect. These functional units may be implemented by hardware or software. In one possible design, the apparatus includes a detection unit and a processing unit. For the beneficial effects achieved by the apparatus, please refer to the description of the first aspect, which is not described herein again.
In a third aspect, an embodiment of the present application further provides an apparatus, which includes a processor and a memory, where the memory stores program instructions, and the processor executes the program instructions in the memory to implement the method provided in the first aspect. For the beneficial effects achieved by the apparatus, please refer to the description of the first aspect, which is not described herein again.
In a fourth aspect, the present application further provides a computer-readable storage medium having stored therein instructions, which, when executed on a computer, cause the computer to perform the method provided by the first aspect.
In a fifth aspect, the present application further provides a computer chip, where the chip is connected to a memory, and the chip is used to read and execute a software program stored in the memory, and execute the method provided in the first aspect.
Drawings
Fig. 1 is a schematic architecture diagram of a possible network system according to an embodiment of the present application;
fig. 2 is a schematic flow chart illustrating a data flow statistical method according to an embodiment of the present disclosure;
FIG. 3 is a diagram illustrating a data structure of a hash table;
FIG. 4 is a schematic diagram of a sketch array structure;
FIG. 5 is a schematic of a sketch;
FIG. 6 is a schematic diagram of another sketch;
fig. 7 is a schematic flowchart of a elephant flow query method according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a data flow statistics apparatus according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of a data flow statistics apparatus according to an embodiment of the present application.
Detailed Description
Embodiments of the present application will be described in detail below with reference to the accompanying drawings.
Embodiments of the present application will be described in detail below with reference to the accompanying drawings.
A system architecture and some basic concepts to which embodiments of the invention may be applied will first be described for a person skilled in the art to understand.
Please refer to fig. 1, which is a schematic diagram of a network system to which the embodiments of the present application can be applied. As shown in fig. 1, the network system includes a Data Network (DN), a server, a gateway, a switch, and a host device.
The DN is a network for providing transmission data, such as an Internet network. The Internet network further comprises an application server, which may be a server of a service provider, or may be a storage server or a computing server, etc.
It should be understood that different application servers may have different network addresses and that different servers may be connected to the same gateway. The gateway refers to a device that can connect two different networks, and may be, for example, a network device such as a router or a switch. Similarly, on the user side, a plurality of devices of different networks (segments) may be connected to the gateway, for example, the network protocol (IP) address of the switch 1 is 192.168.0.253, and the gateway address is 192.168.0.253; the IP address of the switch 2 is 192.168.1.253, and the gateway address is 192.168.1.254; switches 1 and 2 may be connected to the gateway, and one or more host devices may be connected under switches 1 and 2, respectively. In the network system, the host device 101 or the host device 102 can access the application server 1 and/or the application server 2 via the switch 1, the gateway; the host device 103 or the host device 104 can access the application server 1 and/or the application server 2 via the switch 2, the gateway; taking the host device 101 and the application server 1 as an example, the transmission path of the data packet sent by the host device 101 to the application server 1 is: host device 101 → switch 1 → gateway → application server 1. The application server 1 may also send the feedback data packet to the host device 101, where the transmission path of the feedback data packet is: application server 1 → gateway → switch 1 → host device 101.
It should be understood that, through the switch 1 and the switch 2, the host devices under the switch 1 and the switch 2 can also achieve mutual access, and the application does not limit the data transmission process under the network system.
It should be understood that the network system shown in fig. 1 is only an example, and more or fewer network devices, for example, a storage server and a computing server, may also be included in the network system, and are not shown in fig. 1. The number of network devices such as host devices, switches, gateways, application servers, and the like included in the network system is not limited in the present application.
The data flow statistical method provided in the embodiment of the present application may be deployed in the network system shown in fig. 1, and specifically, may be deployed in any network device in the network system, for example: application server, gateway, switch, host device.
In addition, the network device included in the network system shown in fig. 1 may be an independent device, or may be a module (or a device) that supports implementation of a corresponding function, and the module may be an entity module or a virtual module, for example, in a virtual network (e.g., VPC), a gateway is a virtual gateway, and a switch may be a virtual switch, which is not limited in this embodiment of the present application. For example, the data flow statistics function may be deployed in a virtual machine. For another example, when the data flow statistics function is deployed in an entity module, the data flow statistics function may be deployed on a processor or a physical network card, for example, a host device sends and/or receives a data packet through the physical network card. The embodiment of the present application is not particularly limited to this.
For example, when deployed in a gateway, when performing data flow statistics, a server may be used as an object to perform statistics on all data flows received and/or transmitted by the server, or the gateway may be used as an object to perform statistics on all data flows received and/or transmitted by the gateway. For another example, when the switch is deployed in a switch, when performing data flow statistics, all data flows received and/or sent by the switch may be counted by using the switch as an object, or all data flows received and/or sent by any host device in a local area network formed by the switch may be counted by using any host device in the local area network as an object, which is not limited in this embodiment of the present application.
The data flow is explained in detail as follows. It should be noted that these explanations are for the convenience of those skilled in the art, and do not limit the scope of protection claimed in the present application.
A data flow is a kind of network traffic data, including one or more data packets. The header portion of the data packet contains a five-tuple. Among them, the five-tuple generally refers to a source IP address, a source port, a destination IP address, a destination port, and a transport layer protocol (such as Transmission Control Protocol (TCP)). The quintuple can distinguish different data streams. For example, 192.168.1.110000 TCP 121.14.88.7680 constitutes a five-tuple that represents: a communication apparatus having a source IP address of 192.168.1.1 is connected to a communication apparatus having a destination IP address of 121.14.88.76 and a port of 80 by using the TCP protocol through a port 10000.
The data packets belonging to the same data flow have the same key value, which may be part or all of a quintuple, and may further include information other than the quintuple, such as a local area network (MAC) Address, which is not limited in this embodiment.
During transmission, data streams may be transmitted with data packets as granularity, and for example, one or more data packets included in the same data stream may be transmitted at different times. In fact, for fairness, the data packets of different data flows may be transmitted alternately. For example, in the same network system, user a requests the cooling service, and user B requests the wechat service, the data packets included in the data stream of the cooling service and the data packets included in the data stream of the wechat service may be transmitted alternately.
At present, in a plurality of technologies included in network measurement, elephant flow detection belongs to a key technology, and has wide application in the aspects of congestion control, network capacity planning, network anomaly detection, troubleshooting, traffic engineering and the like. The elephant flow generally refers to a data flow whose occurrence frequency, that is, the occurrence frequency of data packets belonging to the same data flow, reaches a specific threshold. For example, a large-scale network anomaly event such as a DDos attack can be regarded as a elephant flow, and an effective elephant flow detection method is helpful for timely discovery of network anomalies.
In the elephant flow detection method provided by the related technology, a hash table and a sketch are combined, data flow is counted based on the hash table and the sketch, and elephant flow detection is realized based on a counting result. In the conventional data stream statistical method, the core idea is that after the space of the hash table is full, by counting the occurrence frequency of the data stream, the data stream with relatively high occurrence frequency, that is, the data stream with the possibility of being an elephant stream, and the data stream with relatively low occurrence frequency, that is, a streamlet, are separately recorded. Specifically, the hash table is used to store a key value and a count value (i.e., a frequency of occurrence value of the data stream) of the data stream, where a position of the data stream in the hash table is determined by a hash value of the key value of the data stream. The sketch is used to store the occurrence frequency of the data streams with the same hash value, and since different data streams may have the same hash value, i.e. hash collision, the occurrence frequency recorded in the sketch may also be referred to as collision number.
When data flow statistics is carried out, when a new data message is detected, hash operation is carried out on key values of the data message, the position of the key values in a hash table is determined according to hash values obtained through operation, if the position conflicts, namely the position is used for storing information of other data flows, a conflict number value recorded at the hash value in the sketch is continuously determined according to the hash value of the new data message, if the ratio of the conflict number value to a count value recorded at the conflict position in the hash table reaches a set threshold value, the data flow of the conflict position in the hash table is indicated to be a small flow, the detected data flow of the new data message belongs to be a large flow, the information of the small flow at the conflict position is replaced by the information of the large flow, and the replaced information of the small flow is recorded in the sketch.
According to the data flow statistical method provided by the related art, when whether the data flow at the collision position in the hash table needs to be replaced is judged, whether the data flow is the elephant flow or the small flow is judged only according to the calculated value of the data flow, the judgment standard is single, the time dynamic characteristics of the data flow are ignored, for example, the data flow transmission is dynamically changed along with the time, the data flow with the elephant flow at the early stage and the data flow with the non-elephant flow at the later stage may exist, however, the data flows cannot be selected by the current detection mode, and therefore, the elephant flow detection based on the data flow statistical method in the related art is low in accuracy. And these flows will occupy the hash table resource for a long time, and when new data flows arrive, collision may occur with these flows, which may affect the detection of the new data flows, and also increase many calculation overheads.
In view of this, an embodiment of the present application provides a data flow statistical method, in which, in combination with a time dynamic characteristic of a data flow, whether the data flow in a first data structure needs to be replaced is determined through a time dimension, so that accuracy of data flow statistics is improved.
Referring to fig. 2, a flow chart corresponding to the data flow statistical method provided by the present application is schematically illustrated. The method comprises the following steps:
step 201: and collecting network flow data.
And collecting real-time or offline network flow data, namely collecting any detected first data message.
Step 202: and determining a key value and a sequence value of the first data message.
As described above, the key value of the first data packet may be a five-tuple of the data stream, and the first data packet is parsed for any detected first data packet to obtain the header information of the first data packet, where the header information includes the five-tuple of the data stream, and the five-tuple may be used as the key value of the first data packet.
The sequence values are described in detail below.
A sequence value characterizing a latest update time of the data stream.
For example, the sequence value may be a number assigned according to a transmission order of the data message, for example, a counting start point is set, the sequence value of the first detected data message is the counting start point, and the sequence value of the next detected data message is sequentially increased by 1, for example, the counting start point is 1, then the sequence values of the sequentially detected data messages are sequentially 1, 2, 3 …, n, and n is a positive integer. For example, assume that the sequentially detected data packets are packet a, packet B, packet C, packet D, …, where packet a and packet C belong to flow 1, packet B belongs to flow 2, and packet D belongs to flow 3. If the sequence value of the message A is 1, the sequence value of the message B is 2, the sequence value of the message C is 3, the sequence value of the message D is 4, and so on.
For another example, when the information of the data flow to which the data packet belongs is recorded in the first data structure, the sequence value of the data packet is assigned as a current timestamp of the data packet entering the hash table, and the current timestamp represents a processing time of the data flow.
Of course, the assignment manner of the sequence value is only an example, and this is not specifically limited in this embodiment of the application.
Step 203: and performing hash operation on the key value of the first data message, and determining the position of the data stream to which the first data message belongs in the first data structure according to the hash value.
For convenience of description, the first data structure and the second data structure are respectively used as hash table examples for description. The hash table is explained first as follows.
A Hash table (Hash table) is a data structure that is directly accessed from a Key value (Key value for short). That is, records are accessed by mapping a key value to a location in the table to speed up the lookup. Such a mapping function is called a hash function (also called a hash function), and an array storing records is called a hash table (also called a hash table).
Referring to fig. 3, for a data structure of a hash table provided in an embodiment of the present application, as shown in fig. 3, the hash table is composed of B buckets (buckets), each bucket includes D slots (slots, which may be referred to as slots for short), where B and D are positive integers.
It should be understood that the setting of the slots in the hash table relates to the accuracy of measurement, generally, the greater the number of slots, the more the hash collision caused by the data flow with the same hash value can be effectively reduced, the more the detection result is, but the greater the number of slots, the more the memory consumption is also required, but the data flow statistics is generally attached to network devices such as switches and routers, the memories on these network devices are extremely precious resources, and if the method of increasing the memory is adopted to improve the accuracy, the memory shortage will be caused, and the function of the original network device will be affected. Therefore, an appropriate value needs to be set to balance memory consumption and accuracy, and preferably, the embodiment of the present application provides a slot number value D of 4.
In this embodiment of the application, when the first data structure is a hash table, a space for recording information of a data stream in the first data structure is a slot in the hash table. Where each slot may be used to store information for one data stream. That is, one bucket can record information of D data streams. Specifically, the information of the data stream includes but is not limited to: a key value, a count value, and a sequence value. The sequence value is a sequence value of a data packet that is detected by the data flow most recently, wherein each time a data packet of the data flow is detected, the count value of the data flow is incremented by 1, and the key value of the data flow has been described above and is not described herein again.
In the embodiment of the present application, the number of information updates may be a count value, and the following description will take the number of information updates as the count value as an example.
Specifically, step 203 may be to determine the position of the first data packet in the hash table, and attempt to update the information of the data flow to which the first data packet belongs to the hash table. The following describes, by taking a first data packet as an example, a process of updating information of a data flow to which the first data packet belongs to a hash table:
giving a first data message with a key value of f, and hashing the first data message to a bucket H (H) (f)% B, wherein H (f) represents a hash value obtained by calculating f by a hash function H (·); % represents the remainder operation; b denotes the number of buckets of the hash table.
Step 204: judging whether the key value f of the first data message is matched with the key value of any slot record in the bucket H; if so, step 205 is performed, otherwise, step 206 is performed.
Step 205: and updating the information of the data stream stored in the matched slot.
If there is a slot matching the key value f in the bucket H, that is, the hash table already records information of the data stream with the key value f, the information of the data stream can be directly updated. Assuming that the matching slot is slot1 of bucket H, the process of updating the data flow includes: and adding 1 to the count value in the slot1, and updating the sequence value into the sequence value of the first data message.
Step 206, determine whether bucket H contains a free slot, if yes, go to step 207, otherwise go to step 208.
Step 207: and selecting one of the free slots, and recording the information of the data stream to which the first data message belongs to the selected slot.
Wherein the free slots are empty slots, i.e. slots that have not yet been used. If no information of the data stream with the key value f is recorded in the bucket H, the information of the data stream can be stored in a free slot, that is, no data stream in the bucket H is removed. Illustratively, a free slot may be selected randomly, or the first free slot after a used slot in bucket H may be selected in order. For example, assuming that only slot4 in bucket H is free, information of the data stream to which the first data packet belongs may be recorded in slot4, and exemplarily, assuming that the sequence value of the first data packet is 100, (f, 1, 100) may be inserted into the slot4, that is, the key value in the slot4 is set to f, the count value is 1, and the current sequence value is 100.
Step 208: it is determined whether there is a slot in the bucket H that satisfies the replacement condition. If so, step 209 is performed, otherwise step 210 is performed.
If there is no slot in the bucket H where the key value matches with the key value f and there is no idle slot, it indicates that the bucket H is full, and the streamlets in the bucket H in the embodiment of the present application may be screened through the replacement condition, and if there is a data stream that meets the replacement condition, it is determined that the data stream is a streamlet and should be replaced by the data stream to which the first data packet belongs.
For convenience of description, a data flow to which the first data packet belongs is denoted as a first data flow, and a data flow that satisfies the replacement condition in the bucket H is denoted as a second data flow.
The embodiment of the present application provides three ways of determining whether there is a slot satisfying the replacement condition, which will be described below:
the first method comprises the following steps:
determining the data flow with the minimum count value in each slot of the bucket H, namely the data flow with the minimum information updating times in the data flows which have the same hash value with the first data flow in the same hash table; judging whether the time of the data stream which is not updated exceeds a preset time or not; if so, determining that the data stream is a second data stream, wherein the slot in which the second data stream is located is the slot meeting the replacement condition; otherwise, it is determined that there is no slot in bucket H that satisfies the replacement condition. And comparing the sequence value of the first data stream with the sequence value of the second data stream when judging whether the time of the data stream which is not updated exceeds the preset time, and if the difference value between the sequence value of the first data stream and the sequence value of the second data stream is greater than the preset value, considering that the time interval of the second data stream which is not updated exceeds the preset time.
For example, if the sequence value of the data flow with the smallest count value in slot H is 80 and the sequence value of the first data packet is 100, the update time interval of the data flow is 20. Another example is: if the sequence value of the data flow with the smallest count value in slot H is 6H10 min 30 s, and the sequence value of the first data packet is 6H30 min 40 s, the update time interval of the data flow is 20 min 10 s.
It should be noted that the sequence value is only an example, for example, when the sequence value is a time stamp, the time stamp may be accurate to millisecond, microsecond, and the like, and the sequence value is not specifically limited in the embodiment of the present application.
And the second method comprises the following steps:
determining a data flow with the minimum count value in each slot of the bucket H; judging whether the count value of the data stream is smaller than a first preset value or not; if the current time interval is less than the preset time, judging whether the updating time interval of the data stream exceeds the preset time or not; if so, determining that the data stream is a second data stream, wherein the slot in which the second data stream is located is the slot meeting the replacement condition; otherwise, it is determined that there is no slot in bucket H that satisfies the replacement condition.
And the third is that:
determining a data flow with the minimum count value in each slot of the bucket H; judging whether the count value of the data stream is smaller than a first preset value or not; if the query value is smaller than the first preset value, determining a query value of the key value of the first data stream in the second data structure, and judging whether the query value is larger than a second preset value; if so, continuously judging whether the updating time interval of the data stream exceeds the preset time, if so, determining that the data stream is a second data stream, wherein the slot in which the second data stream is located is the slot meeting the replacement condition; otherwise, it is determined that there is no slot in bucket H that satisfies the replacement condition.
There are various second data structures applicable to the embodiments of the present application, and the embodiments of the present application will be described below with the second data structure as a sketch example.
sketch, a hash-based data structure, stores data with the same hash value in the same position in an array by setting a hash function. The hash is to convert an input (also called as a pre-map) of an arbitrary length into an output of a fixed length by a hash function, and the output is a hash value. This transformation is a kind of compression mapping, i.e. the space of hash values is usually much smaller than the space of the input. In addition, different inputs may hash to the same output, and it is not possible to uniquely determine the input value from the hash value.
Illustratively, the sketch array structure is generally a two-dimensional array, each position in the two-dimensional array is a counter, the count value of each position is initialized to 0, for a new element, the element is hashed to the corresponding position of the row based on the hash function of each row, the count value of the position is increased by 1, and the count value of the position is the occurrence frequency of the element.
The number of the Sketch array structures which can be applied in the embodiment of the application is various, and the Sketch array structure is introduced by taking Count-Min Sketch as an example. Referring to fig. 4, a schematic diagram of an array structure of the Count-Min Sketch is shown, as shown in fig. 4, the Sketch array structure is composed of M rows and N columns of two-dimensional arrays, the two-dimensional array vertically includes M hash functions, and horizontally includes N hash values of each hash function. That is, each line in the sketch corresponds to a hash function, the horizontal direction of each line includes N hash values of the hash function corresponding to the line, and the position of each hash value is a counter.
In the embodiment of the present application, the applications of sketch include, but are not limited to: recording the data message to be updated to the sketch through an inserting operation, giving a key value of the data stream, and determining a query value of the key value of the data stream in the sketch through a query operation. The following are described separately:
1) insert operation of sketch
For the data packet to be updated, M mutually independent hash functions in the sketch map key values of the data stream to corresponding positions of each row of the two-dimensional array respectively, and then the count value of the corresponding position is increased by 1.
For example, assume that the M mutually independent hash functions in fig. 4 are h1(), h2(), and … hm (), respectively, and for a data packet to be updated, assume that the key value of the data packet is f, hash the key value f of the data packet to the corresponding position of the row through the hash function corresponding to each row of the sktech, specifically, the position of the first row is N1[ h1 (f)% N ], the position of the second row is N2[ h2 (f)% N ], the position of the third row is N3[ h3 (f)% N ], and so on, and the position of the M-th row is Nm [ hm f)% N. Wherein h (f) represents a hash value obtained by calculating f by a hash function h (·); % represents the remainder operation; n is the number of hash values contained in each row.
Assuming that N1 is 3, N2 is 2, N3 is 1, … and Nm is 4, please refer to fig. 4, the count value at the 3 rd column of the first row is increased by 1, the count value at the 2 nd column of the second row is increased by 1, the count value at the 1 st column of the third row is increased by 1, and similarly, the count value at the 4 th column of the mth row is increased by 1.
Continuing, giving a data message with a key value of p, and updating the data message into the sketch comprises the following steps: the key value p is hashed to the corresponding position of each row by h1(), h2(), … hm (), respectively. Specifically, the corresponding position of the first row is N1' [ h1 (p)% N ]; the corresponding position of the second row is N2' [ h2 (p)% N ]; the corresponding position in the third row is N3' [ h3 (p)% N ]; …, respectively; the corresponding position in row M is Nm' [ hm (p)% N ].
Assuming that N1' is 2, N2 is 2, N3 is 3, …, and Nm is 5, please refer to fig. 5, which is a schematic diagram of updating a data packet of a key value p to a sketch, in the sketch shown in fig. 5, a count value in a first row and a 2 nd column is added with 1, a count value in a second row and a 2 nd column is added with 1, a count value in a third row and a 3 rd column is added with 1, and similarly, a count value in an M row and a 4 th column is added with 1.
When detecting that a data message needs to be recorded in the sketch, the data message is updated to the sketch respectively through the above method. It should be noted that, the above insertion method is for a data flow that does not need to be removed from the hash table, and if information of the data flow is recorded in the sketch for a data flow that is removed from the hash table, after the hash value corresponding to each row is still determined in the above manner, what is different is that a count value of the removed data flow recorded in the hash table needs to be added to a count value of the hash value, which will be described in detail below.
2) sketch query operation
Because different key values may be mapped to the same position, that is, hash collision, the count values of the data stream are respectively recorded through M mutually independent hash functions, and when the count value of the data stream in the sketch is queried, the minimum value of the count values of the key values of the data stream mapped to the corresponding positions of each row can be returned to serve as the count value of the data stream, so that the counting error caused by the hash collision is reduced, and the accuracy of the measurement result is improved.
For example, taking sketch shown in fig. 6 as an example, it is assumed that M mutually independent hash functions in fig. 6 are h1(), h2(), and … h4(), respectively.
Assuming that a key value of the first data stream is t, hashing the key value t to a corresponding position of each line through h1(), h2(), … h4(), wherein specifically, the corresponding position of the first line is N1"[ h1 (t)% 4 ]; the corresponding position of the second row is N2"[ h2 (t)% 4 ]; the corresponding position in the third row is N3"[ h3 (t)% 4 ]; the corresponding position in row 4 is N4"[ hm (t)% 4 ].
Assuming that N1 "is 3, N2" is 2, N3 "is 1, N4" is 5, please refer to fig. 6, the query value of the first row is 20 at the 3 rd column of the first row, the query value of the second row is 18 at the 2 nd column of the second row, the query value of the third row is 10 at the 1 st column of the third row, and the query value of the fourth row is 15 at the 5 th column of the fourth row. And returning the minimum value of the 4 count values as the count value of the data stream, namely the count value of the data stream is 10.
It should be noted that, the information of the data flow stored in each slot in the hash table is only an example, for example, a slot may further store a data flow discovery time (i.e., a sequence value of a first data packet), a flag indicating whether the data flow is removed from the hash table, and correspondingly, the replacement condition may also be determined for new information stored in the slot, which is not specifically limited in this embodiment of the application.
Step 209: and if the slot meeting the replacement condition exists, replacing the information of the second data flow in the hash table by using the information of the first data flow.
Specifically, the information of the slot in which the information of the second data stream is located in the bucket H is removed, and the information of the first data stream is recorded into the empty slot. The manner of recording the information of the first data stream into the slot of the hash table has been described above, and will not be repeated herein.
Optionally, information of the replaced second data stream is recorded into the sketch. That is, the removed second data stream in the hash table is recorded into the sketch, that is, the sketch is inserted based on the information of the second data stream, and the sketch shown in fig. 4 is taken as an example to describe the insertion operation of the removed second data stream into the sketch: firstly, acquiring a key value and a count value of a second data stream recorded in a hash table (bucket H); and respectively mapping key values of the second data stream to corresponding positions of each row of the two-dimensional array through M mutually independent hash functions in the sketch, and then adding the count values of the corresponding positions to the count value of the second data stream. Taking a row of sketch as an example, assuming that the count value of the second data stream recorded in the bucket H is 25, and the key value of the second data stream is mapped to the third column of the first row in sketch, the count value of the third column of the first row is added to the count value of the second data stream by 25.
In addition, if the first data stream is a data stream already recorded in the sketch, when the first data stream is recorded in the hash table, the first data stream in the sketch needs to be removed, that is, the sketch is deleted. Wherein, the deletion operation of removing the first data stream in the sketch introduces: firstly, determining a count value of a first data stream recorded in the sketch, namely the minimum value of the count values of the key values of the first data stream mapped to corresponding positions of each row; the count value is subtracted from the count value of the corresponding position of each row.
Step 210: if there is no slot satisfying the replacement condition, information of the first data stream is recorded to the sketch.
That is, taking fig. 4 as an example, specifically, the key values of the first data stream are respectively mapped to corresponding positions of each row of the two-dimensional array according to M mutually independent hash functions in the sketch, and then, the count value of the corresponding position of each row is incremented by 1. For the specific execution steps, please refer to the above related descriptions, which are not described herein again.
Based on the method, the flow respectively carries out statistics on any detected data message, and subsequently, based on the statistical result, elephant flow query can be realized.
Several query modes are exemplified below, and in example 1, each data stream stored in the hash table is queried based on a preset period to determine an elephant stream. Example 2, given a data stream to be queried, it is queried whether the data stream is a elephant stream.
For example, when performing an elephant flow query, a threshold may be given, which is recorded as a third preset value, and when the count value of the data flow recorded in the hash table is greater than the third preset value, the data flow is determined to be an elephant flow.
Next, taking example 2 as an example, a method for querying an elephant flow is described in detail, please refer to fig. 7, which is a schematic flow diagram corresponding to the method for querying an elephant flow.
Step 701: for a given data flow (a flow to be inquired), carrying out hash operation on a key value of the given data flow based on a preset hash function, and determining a corresponding bucket in a hash table according to an operation result.
Step 702: judging whether the key value of the given data stream is matched with the key value of the slot record in the determined bucket; if so, step 703 is performed, otherwise, step 704 is performed.
Step 703: and judging whether the first count value of the given data stream stored in the matched slot is greater than a third preset value, if so, executing step 707, otherwise, executing step 706.
Step 704: it is determined whether the bucket is full, if so, step 705 is performed, otherwise, step 706 is performed.
It can be understood that if there is no slot in the bucket that matches the key value of the given data flow, it indicates that the given data flow does not exist in the hash table, and in one case, the given data flow is stored in a sketch; another possible scenario is that neither the hash table nor the sketch records the data stream. Thus, if the data flow is not recorded in a bucket and the bucket is full, the given data flow may be stored in a sketch, and if the data flow is not recorded in a bucket and the bucket is not full, it indicates that the data flow is not recorded in both the hash table and the sketch, i.e., a new data flow, and the data flow may be considered as a streamlet.
Step 705: and querying a second count value of the given data stream in the sketch, and judging whether the second count value is greater than a third preset value, if so, executing step 707, otherwise, executing step 706.
Step 706: query result 1 is output indicating that the given data stream is a streamlet.
Step 707: query result 2 is output indicating that the given data stream is an elephant stream.
It should be noted that the third preset value may be the same as or different from the second preset value, which is not limited in this embodiment of the application.
Based on the same inventive concept as the method embodiment, an embodiment of the present application further provides an apparatus for performing the method performed in the method embodiment, and related features may be referred to the method embodiment, which is not described herein again, and as shown in fig. 8, the apparatus includes a detecting unit 801 and a processing unit 802.
A detecting unit 801, configured to detect a first data packet;
a processing unit 802, configured to, when the detecting unit detects the first data packet and no space in the first data structure records information of a first data stream to which the first data packet belongs, determine whether there is a second data stream, in which information is not updated, in a data stream that is recorded in the first data structure and has the same hash value as the first data stream for more than a preset time; if so, the information of the second data stream is replaced with the information of the first data stream.
In a possible implementation manner, before replacing the information of the second data stream with the information of the first data stream, the processing unit 802 is further specifically configured to: information of the second data stream is recorded to a second data structure.
In a possible implementation manner, if there is no second data stream, the processing unit 802 is further specifically configured to: information of the first data stream is recorded to the second data structure.
In one possible implementation, the processing unit 802 is further configured to: determining a data flow with the minimum information updating times in the data flow which is recorded in the first data structure and has the same hash value with the first data flow; judging whether the updating times are smaller than a first preset value or not; if the data flow is smaller than the preset time, judging whether the updating time interval of the data flow exceeds the preset time, and if so, determining that the data flow is the second data flow.
In one possible implementation, the processing unit 802 is further configured to: determining a data flow with the minimum information updating times in the data flow which is recorded in the first data structure and has the same hash value with the first data flow; judging whether the updating times are smaller than a first preset value or not; if the number of times of updating of the first data stream recorded in the second data structure is less than a second preset value, determining whether the number of times of updating of the first data stream recorded in the second data structure is greater than the second preset value; if the update time exceeds the preset time, judging whether the update time of the data stream exceeds the preset time, and if so, determining that the data stream is the second data stream.
In one possible embodiment, the first data structure is used to store information of the data stream, including but not limited to: key value, updating times and sequence value; the key value is an identifier of the data stream; updating times, namely the occurrence times of the data messages belonging to the data flow; a sequence value, which is the number of the latest updated data message of the data stream; the sequence value can be a serial number assigned to the data message according to the transmission sequence; or the sequence value is the current timestamp of the input of the information of the latest updated data packet to the first data structure.
In one possible implementation, the second data structure is a two-dimensional array, the two-dimensional array includes M hash functions in the vertical direction and N hash values of each hash function in the horizontal direction;
when recording the information of the first data stream to the second data structure, the processing unit 802 is specifically configured to: and respectively calculating key values of the first data stream, inputting hash values corresponding to the M hash functions, and adding 1 to the number of times of recording each hash value.
In a possible implementation manner, the information of the second data stream is recorded in the second data structure, and the processing unit 802 is specifically configured to: acquiring the updating times of the second data stream recorded in the first data structure; and respectively calculating key values of the first data stream, inputting hash values corresponding to the M hash functions, and adding the number of times of recording each hash value to the number of times of updating the second data stream.
Similar to the above concept, as shown in fig. 9, the present application provides a device 900, where the device 900 can be applied to any network device in the scenario shown in fig. 1, and executes the steps executed by the main body in the method shown in fig. 2.
Device 900 may include a processor 901 and memory 902. Further, the apparatus may also include a communication interface 904, which may be a transceiver, or a network card. Further, the device 900 may also include a bus system 903.
The processor 901, the memory 902 and the communication interface 904 may be connected via the bus system 903, the memory 902 may store instructions, and the processor 901 may be configured to execute the instructions stored in the memory 902 to control the communication interface 904 to receive or transmit signals, so as to complete the steps of executing the main body in the method shown in fig. 2.
The memory 902 may be integrated in the processor 901, or may be a physical entity different from the processor 901.
As an implementation manner, the function of the communication interface 904 may be realized by a transceiver circuit or a dedicated chip for transceiving. Processor 901 may be considered to be implemented by a dedicated processing chip, processing circuitry, a processor, or a general purpose chip.
As another implementation manner, a manner of using a computer may be considered to implement the first computing node or the function of the first computing node provided in the embodiment of the present application. I.e., program code that implements the functions of the processor 901 and the communication interface 904, is stored in the memory 902, and a general-purpose processor can implement the functions of the processor 901 and the communication interface 904 by executing the code in the memory.
For the concepts, explanations, and detailed descriptions related to the technical solutions provided in the present application and other steps related to the apparatus 900, reference may be made to the descriptions of the foregoing methods or other embodiments for these matters, which are not described herein again.
In an example of the present application, the apparatus 900 may be configured to execute the steps of executing the main body in the flow shown in fig. 2. For example, the communication interface 904 may receive a detection first data packet; the processor 901 may respond to the first data packet detected by the communication interface 904, and when there is no space in the first data structure to record information of a first data stream to which the first data packet belongs, determine whether there is a second data stream, whose information is not updated, in a data stream recorded in the first data structure and having the same hash value as the first data stream for more than a preset time; if so, the information of the second data stream is replaced with the information of the first data stream.
For the description of the processor 901 and the communication interface 904, reference may be made to the description of the flow illustrated in fig. 2, which is not described herein again.
Based on the above embodiments, the present application further provides a computer storage medium, in which a software program is stored, and the software program can implement the method provided by any one or more of the above embodiments when being read and executed by one or more processors. The computer storage medium may include: u disk, removable hard disk, read only memory, random access memory, magnetic or optical disk, etc. for storing program codes.
Based on the above embodiments, the present application further provides a computer program product, where the computer program product includes computer instructions, and when the computer instructions are executed by a computer, the computer is caused to execute the method provided by any one or more of the above embodiments.
Based on the above embodiments, the present application further provides a chip, where the chip includes a processor, and is configured to implement the functions related to any one or more of the above embodiments, such as obtaining or processing information or messages related to the above methods. Optionally, the chip further comprises a memory for storing program instructions and data for execution by the processor. The chip may also contain chips and other discrete devices.
It should be understood that in the embodiments of the present application, the processor may be a Central Processing Unit (CPU), and the processor may also be other general purpose processors, Digital Signal Processors (DSPs), application-specific integrated circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other programmable logic devices, transistor logic devices, discrete hardware components, etc., or any combination thereof designed to implement or operate the described functions. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a digital signal processor and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a digital signal processor core, or any other similar configuration.
The memory may include both read-only memory and random access memory, and provides instructions and data to the processor. The portion of memory may also include non-volatile random access memory.
The bus system may include a power bus, a control bus, a status signal bus, and the like, in addition to the data bus. For clarity of illustration, however, the various buses are labeled as a bus system in the figures. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The steps of a method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in a processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here.
Optionally, the computer-executable instructions in the embodiments of the present application may also be referred to as application program codes, which are not specifically limited in the embodiments of the present application.
Those of ordinary skill in the art will understand that: the various numbers of the first, second, etc. mentioned in this application are only used for the convenience of description and are not used to limit the scope of the embodiments of this application, but also to indicate the sequence. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one" means one or more. At least two means two or more. "at least one," "any," or similar expressions refer to any combination of these items, including any combination of singular or plural items. For example, at least one (one ) of a, b, or c, may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or multiple. "plurality" means two or more, and other terms are analogous. Furthermore, for elements (elements) that appear in the singular form "a," an, "and" the, "they are not intended to mean" one or only one "unless the context clearly dictates otherwise, but rather" one or more than one. For example, "a device" means for one or more such devices.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device including one or more available media integrated servers, data centers, and the like. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
The steps of a method or algorithm described in the embodiments herein may be embodied directly in hardware, in a software element executed by a processor, or in a combination of the two. The software cells may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. For example, a storage medium may be coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Although the present application has been described in conjunction with specific features and embodiments thereof, it will be evident that various modifications and combinations can be made thereto without departing from the spirit and scope of the application. Accordingly, the specification and figures are merely exemplary of the present application as defined in the appended claims and are intended to cover any and all modifications, variations, combinations, or equivalents within the scope of the present application. It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to include such modifications and variations.

Claims (14)

1. A method of data flow statistics, the method comprising:
when a first data message is detected and no space in a first data structure records information of a first data stream to which the first data message belongs, determining whether a second data stream of which the information is not updated and exceeds a preset time exists in data streams which are recorded in the first data structure and have the same hash value with the first data stream;
and if so, replacing the information of the second data stream with the information of the first data stream.
2. The method of claim 1, wherein prior to said replacing information of the second data stream with information of the first data stream, further comprising:
recording information of the second data stream to a second data structure.
3. A method according to claim 1 or 2, characterized by recording information of the first data stream to a second data structure, if not.
4. The method of any of claims 1-3, wherein the determining whether there is a second data stream whose information has not been updated for more than a predetermined time in the data streams recorded in the first data structure that have the same hash value as the first data stream, further comprises:
determining a data flow with the minimum number of times of information updating in the data flow which is recorded in the first data structure and has the same hash value with the first data flow;
judging whether the updating times are smaller than a first preset value or not;
if the data flow is smaller than the preset time, judging whether the updating time interval of the data flow exceeds the preset time, and if so, determining that the data flow is a second data flow.
5. The method of any of claims 1-3, wherein the determining whether there is a second data stream whose information has not been updated for more than a predetermined time in the data streams recorded in the first data structure that have the same hash value as the first data stream, further comprises:
determining a data flow with the minimum number of times of information updating in the data flow which is recorded in the first data structure and has the same hash value with the first data flow;
judging whether the updating times are smaller than a first preset value or not;
if so, determining whether the updating times of the first data stream recorded in a second data structure are greater than a second preset value;
if so, judging whether the updating time of the data stream with the minimum updating times exceeds the preset time, and if so, determining that the data stream with the minimum updating times is a second data stream.
6. The method of claim 3, wherein the second data structure is a two-dimensional array of bits, the two-dimensional array of bits comprising M hash functions in a vertical direction and N hash values for each hash function in a horizontal direction, the recording information of the first data stream to the second data structure comprising:
and respectively calculating key values of the first data stream, inputting the key values into hash values corresponding to the M hash functions, and adding 1 to the number of times of recording each hash value.
7. An apparatus, comprising a detection unit, a processing unit:
the detection unit is used for detecting a first data message;
when the detecting unit detects a first data packet and no space in a first data structure records information of a first data stream to which the first data packet belongs, the processing unit is configured to determine whether a data stream recorded in the first data structure and having the same hash value as the first data stream has a second data stream whose information is not updated for more than a preset time; and if so, replacing the information of the second data stream with the information of the first data stream.
8. The device of claim 7, wherein prior to said replacing the information of the second data stream with the information of the first data stream, the processing unit is further specifically configured to: recording information of the second data stream to a second data structure.
9. The device according to claim 7 or 8, wherein, if there is no second data stream, the processing unit is further specifically configured to: information of the first data stream is recorded to a second data structure.
10. The device of any of claims 7-9, wherein the processing unit is further to:
determining a data flow with the minimum number of times of information updating in the data flow which is recorded in the first data structure and has the same hash value with the first data flow; judging whether the updating times are smaller than a first preset value or not; if the data flow is smaller than the preset time, judging whether the updating time interval of the data flow exceeds the preset time, and if so, determining that the data flow is a second data flow.
11. The device of any of claims 7-9, wherein the processing unit is further to:
determining a data flow with the minimum number of times of information updating in the data flow which is recorded in the first data structure and has the same hash value with the first data flow; judging whether the updating times are smaller than a first preset value or not; if so, determining whether the updating times of the first data stream recorded in a second data structure are greater than a second preset value; if so, judging whether the updating time of the data stream exceeds the preset time, and if so, determining that the data stream is a second data stream.
12. The apparatus of claim 9, wherein the second data structure is a two-dimensional array of bits, the two-dimensional array of bits comprising M hash functions in a vertical direction and N hash values for each hash function in a horizontal direction;
when the information of the first data stream is recorded in a second data structure, the processor is specifically configured to:
and respectively calculating key values of the first data stream, inputting the key values into hash values corresponding to the M hash functions, and adding 1 to the number of times of recording each hash value.
13. An apparatus comprising one or more processors and one or more memories;
the one or more memories coupled to the one or more processors for storing computer program code comprising computer instructions which, when executed by the one or more processors, cause the terminal device to perform the method of any of claims 1-6.
14. A computer-readable storage medium, comprising a computer program which, when run on a flow orchestration device, causes the flow orchestration device to perform the method according to any one of claims 1-6.
CN202010636270.7A 2020-07-03 2020-07-03 Data flow statistical method and device Pending CN113965492A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010636270.7A CN113965492A (en) 2020-07-03 2020-07-03 Data flow statistical method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010636270.7A CN113965492A (en) 2020-07-03 2020-07-03 Data flow statistical method and device

Publications (1)

Publication Number Publication Date
CN113965492A true CN113965492A (en) 2022-01-21

Family

ID=79459154

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010636270.7A Pending CN113965492A (en) 2020-07-03 2020-07-03 Data flow statistical method and device

Country Status (1)

Country Link
CN (1) CN113965492A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115378850A (en) * 2022-08-31 2022-11-22 济南大学 Sketch-based encryption flow online analysis method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115378850A (en) * 2022-08-31 2022-11-22 济南大学 Sketch-based encryption flow online analysis method and system
CN115378850B (en) * 2022-08-31 2023-10-31 济南大学 Encryption traffic online analysis method and system based on Sketch

Similar Documents

Publication Publication Date Title
JP7039685B2 (en) Traffic measurement methods, devices, and systems
JP4341413B2 (en) PACKET TRANSFER APPARATUS HAVING STATISTICS COLLECTION APPARATUS AND STATISTICS COLLECTION METHOD
US7787442B2 (en) Communication statistic information collection apparatus
US8005012B1 (en) Traffic analysis of data flows
EP2337266A2 (en) Detecting and classifying anomalies in communication networks
KR100997182B1 (en) Flow information restricting apparatus and method
EP3905622A1 (en) Botnet detection method and system, and storage medium
US11637787B2 (en) Preventing duplication of packets in a network
US11706114B2 (en) Network flow measurement method, network measurement device, and control plane device
US11316804B2 (en) Forwarding entry update method and apparatus in a memory
US7602789B2 (en) Low overhead method to detect new connection rate for network traffic
EP4075749A1 (en) Detection method and detection device for heavy flow data stream
EP2530873B1 (en) Method and apparatus for streaming netflow data analysis
CN112119613A (en) Forwarding element data plane with flow size detector
CN112468365A (en) Data quality detection method, system and medium for network mirror flow
CN114205253A (en) Active large flow accurate detection framework and method based on small flow filtering
CN112688837A (en) Network measurement method and device based on time sliding window
CN113965492A (en) Data flow statistical method and device
CN112261019B (en) Distributed denial of service attack detection method, device and storage medium
Turkovic et al. Detecting heavy hitters in the data-plane
CN111200542B (en) Network flow management method and system based on deterministic replacement strategy
CN114710444B (en) Data center flow statistics method and system based on tower type abstract and evictable flow table
CN115580543A (en) Network system activity evaluation method based on Hash counting
CN109361658A (en) Abnormal flow information storage means, device and electronic equipment based on industry control industry
CN113472670B (en) Method for computer network, network device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination