WO2021190111A1 - 大流量数据流的检测方法以及检测装置 - Google Patents

大流量数据流的检测方法以及检测装置 Download PDF

Info

Publication number
WO2021190111A1
WO2021190111A1 PCT/CN2021/072863 CN2021072863W WO2021190111A1 WO 2021190111 A1 WO2021190111 A1 WO 2021190111A1 CN 2021072863 W CN2021072863 W CN 2021072863W WO 2021190111 A1 WO2021190111 A1 WO 2021190111A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
data stream
dimensional
bucket
flow
Prior art date
Application number
PCT/CN2021/072863
Other languages
English (en)
French (fr)
Inventor
张喜
潘璐伽
唐璐
李柏晴
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21775024.9A priority Critical patent/EP4075749A4/en
Publication of WO2021190111A1 publication Critical patent/WO2021190111A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/02Capturing of monitoring data
    • H04L43/026Capturing of monitoring data using flow identification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/21Flow control; Congestion control using leaky-bucket
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/215Flow control; Congestion control using token-bucket
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/12Network monitoring probes

Definitions

  • This application relates to the field of information technology, and more specifically, to a detection method and a detection device for a large-volume data stream.
  • the detection of large-volume data flow is mostly through the deployment of multiple two-dimensional tables in a certain data collection device in the network, and the two-dimensional table stores the information of all the data flows obtained by the data collection device, and then the data collection device Perform abnormal data flow detection.
  • the data flow in the network may be distributed among multiple data collection devices, and the detection of abnormal data flow only through a single data collection device may miss some large traffic data flows, for example, the detection of a single data collection device.
  • the data stream may be small, but the data streams of many data collection devices may be combined into a network-wide high-traffic data stream; therefore, how to detect the network-wide high-traffic data stream becomes an urgent need to solve problem.
  • the present application provides a detection method and a detection device for a large-volume data flow, which can detect a large-volume data flow at the entire network level, thereby improving the accuracy of detection of a large-volume data flow.
  • a method for detecting a large-volume data stream including: a control device obtains a two-dimensional data structure in a plurality of data collection devices, wherein the two-dimensional data structure is used to store the corresponding data collection device Information about the data stream in the network; the control device merges the two-dimensional data structures in the multiple data collection devices to obtain a merged two-dimensional data structure; the control device performs processing according to the merge
  • the latter two-dimensional data structure detects a large-volume data flow, where the large-volume data flow refers to a large-volume data flow at the entire network level in the network.
  • any one of the two-dimensional data structures in the multiple data collection devices can be used to store information about the data stream of the network acquired by one data collection device among the multiple data collection devices; for example, the data stream
  • the information may include information such as the key value of the data stream obtained in the network, the size of the data stream, and so on.
  • the two-dimensional data structure in any one of the multiple data collection devices is used to store the information of the data flow in the network acquired by the data collection device, that is, the two-dimensional data structure in the data collection device can be It is used to detect the large-volume data flow at the single-point device level, that is, the large-volume data flow flowing through a single data acquisition device; and the combined two-dimensional data structure obtained by the combined processing can be used to detect the large-scale data flow of the entire network in the network.
  • Traffic data flow that is, the detection by the data collection device may not be a large flow data flow, but a data flow in many data collection devices is combined and detected as a large flow data flow, then the large flow data flow is network-wide level Big data flow.
  • the high-traffic data flow is the general term for the data flow of the heavy hitter and the heavy changer.
  • the heavy traffic object refers to the number of packets, bytes, or connections of the network data flow.
  • the value of the unit exceeds the expectation, that is, the data flow that exceeds a certain threshold;
  • the big change object refers to the value of the network data flow in a short period of time, in terms of the number of packets, bytes, or connections, etc.
  • each data collection device can be used to record the information of the data stream acquired by the data collection device, that is, a single point device; however, for the data stream at the entire network level, that is, it flows through multiple single point devices.
  • the data stream cannot be detected; in the embodiment of the present application, the merged data structure obtained by merging the data structures in multiple data collection devices can be used to detect the data stream at the entire network level, thereby avoiding The problem of missed detection of large-volume data streams at the entire network level.
  • control device can detect the high-traffic data flow in the network based on the key value of the given data stream and the combined two-dimensional data structure, so as to estimate the total flow size or change of the data stream of the given key value.
  • the amount of size can be detected.
  • control device may sequentially poll the data streams according to the key values in the buckets in the two-dimensional data structure after the merging process, and query the total traffic size of each data stream or the amount of change of each data stream.
  • the two-dimensional data structure in multiple data collection devices is acquired through the control device, wherein any two-dimensional data structure is used to store the information of the data flow in the network acquired by the corresponding data collection device; control The device can merge the two-dimensional data structures in multiple data collection devices to obtain the merged two-dimensional data structure; the control device can detect the large-volume data flow according to the merged two-dimensional data structure, and the large-volume data A flow refers to a large-volume data flow at the entire network level in the network; that is, the detection method for a large-volume data flow provided by the embodiment of this application avoids the omission of detection of a large-volume data flow at the entire network level, and can detect the entire network. Level of large-volume data flow, thereby improving the accuracy of large-volume data flow detection.
  • the two-dimensional data structure is a data structure composed of multiple buckets
  • the control device merges the two-dimensional data structures in the multiple data collection devices to obtain a merged process
  • the latter two-dimensional data structure includes:
  • the control device merges the buckets at the same position in the two-dimensional data structures in the multiple data collection devices to obtain the merged two-dimensional data structure.
  • the two-dimensional data structure may include d rows, and each row includes w buckets; the above-mentioned multiple two-dimensional data structures may have the same structure, that is, each two-dimensional data structure in the multiple two-dimensional data structures has the same The number of rows, and each row includes the same number of buckets.
  • Merging multiple two-dimensional data structures can refer to merging buckets at the same position in multiple data structures; where the same position can refer to buckets located in the i-th row and j-th column of the multiple two-dimensional data structures. .
  • any bucket in the two-dimensional data data structure in the multiple data collection devices includes the sum of the data traffic in the current bucket, the key value of the main data stream in the current bucket, and The counter value of the main data stream;
  • any bucket in the merged two-dimensional data structure includes the sum of the updated data traffic, the key value of the updated main data stream, and the updated main data stream
  • the counter value of any one of the buckets includes the bucket in the first position
  • the control device merging the buckets at the same position in the two-dimensional data structures in the multiple data collection devices to obtain the merged two-dimensional data structure includes:
  • the updated data in the bucket at the first position is obtained.
  • the process of merging two-dimensional data structures in multiple data collection devices is not simply superimposing the feature values in the buckets of multiple two-dimensional data structures; updating the stored data in a certain bucket
  • For data flow information it is necessary to compare data flows stored in buckets at the same position in other two-dimensional data structures, and then perform a reasonable estimation to determine the main data flow in each bucket after the merge process;
  • the combined processing method of the two-dimensional data structure in multiple data acquisition devices can reduce the low demand for memory and save resources.
  • the two-dimensional data structures in the multiple data collection devices are N two-dimensional data structures, and the N two-dimensional data structures correspond to the main ones in the bucket at the first position.
  • the key value of the data stream is X key values, and X is a positive integer less than or equal to N;
  • the said update in the bucket at the first location is obtained by comparing the total traffic size of the main data streams in the bucket at the first location of the two-dimensional data structures in the multiple data collection devices
  • the key values of the main data stream after that include:
  • Determining that the main data stream of the N two-dimensional data structures in the bucket at the first position is the estimated value of the total traffic size of the data stream corresponding to any one of the X key values;
  • the first key value is any one of the X key values
  • the i-th two-dimensional data structure among the N two-dimensional data structures is in the bucket at the first position
  • the estimated value of the traffic size of the main data stream is obtained according to the following formula:
  • S i (x) denotes a key value corresponding to the first estimate of the size of the traffic data stream; x is the first key value, V i represents the i-th of said first two-dimensional data structure The sum of the flow of all data flows in the bucket at a position; C i represents the counter value of the main flow of the i-th two-dimensional data structure in the bucket at the first position.
  • control device acquiring two-dimensional data structures in multiple data collection devices includes:
  • the control device acquires the two-dimensional data structure in the multiple data collection devices at the end of each time period
  • the control device detecting the high-traffic data stream according to the combined two-dimensional data structure includes:
  • control device detects that the change value of the first data stream in any two time periods is greater than the first threshold value according to the two-dimensional data structure after the merging process, it determines that the first data stream is the high-traffic data stream .
  • control device can periodically acquire the two-dimensional data structure in multiple data collection devices; that is, the data collection device can periodically send the data structure for recording data flow information to the control device; the size of the period can be Is the preset time interval.
  • control device detects the high-traffic data flow according to the two-dimensional data structure after the merging process includes:
  • control device detects that the total flow size of the first data stream is greater than the second threshold according to the two-dimensional data structure after the merging process, it is determined that the first data stream is the high-traffic data stream.
  • the two-dimensional data structure includes a majority voting data structure MV-Sketch.
  • the two-dimensional data structure can be MV-Sketch, and each bucket in MV-Sketch can include three feature values, which are the size of the total data stream stored in the current bucket, namely Vi , j ; the key value of the majority data stream in the current bucket is K i,j , the key value can represent the identity of the majority data stream, where the majority data stream means that the size of the data stream exceeds 50% of the total traffic mapped to the current bucket The above data flow; the counter value of the majority data flow in the current bucket is C i,j .
  • a device for detecting a large-volume data stream including: an acquiring unit configured to acquire a two-dimensional data structure in a plurality of data collection devices, wherein the two-dimensional data structure is used to store corresponding Information about the data stream in the network acquired by the data collection device; the processing unit is configured to merge multiple two-dimensional data structures in the data collection device to obtain a combined two-dimensional data structure; according to the The merged two-dimensional data structure detects a large-volume data flow, where the large-volume data flow refers to a large-volume data flow at the entire network level in the network.
  • any one of the two-dimensional data structures in the multiple data collection devices can be used to store information about the data stream of the network acquired by one data collection device among the multiple data collection devices; for example, the data stream
  • the information may include information such as the key value of the data stream obtained in the network, the size of the data stream, and so on.
  • the two-dimensional data structure in any one of the multiple data collection devices is used to store the information of the data flow in the network acquired by the data collection device, that is, the two-dimensional data structure in the data collection device can be It is used to detect the large-volume data flow at the single-point device level, that is, the large-volume data flow flowing through a single data acquisition device; and the combined two-dimensional data structure obtained by the combined processing can be used to detect the large-scale data flow of the entire network in the network.
  • Traffic data flow the detection by the data collection device may not be a large flow data flow, but a certain data flow in many data collection devices is combined and detected as a large flow data flow, then the large flow data flow is a large network level. Traffic data flow.
  • the high-traffic data flow is the general term for the data flow of the heavy hitter and the heavy changer.
  • the heavy traffic object refers to the number of packets, bytes, or connections of the network data flow.
  • the value of the unit exceeds the expectation, that is, the data flow that exceeds a certain threshold;
  • the big change object refers to the value of the network data flow in a short period of time, in terms of the number of packets, bytes, or connections, etc.
  • each data collection device can be used to record the information of the data stream acquired by the data collection device, that is, a single point device; however, for the entire network-level data stream, that is, it flows through multiple single point devices.
  • the data stream cannot be detected; in the embodiment of the present application, the merged data structure obtained by merging the data structures in multiple data collection devices can be used to detect the data stream at the entire network level, thereby avoiding The problem of missed detection of large-volume data streams at the entire network level.
  • the detection device may detect a high-traffic data stream based on the key value of the given data stream and the combined two-dimensional data structure, so as to estimate the total traffic size or the amount of change of the data stream with the given key value.
  • the detection device may sequentially poll the data streams according to the key values in the buckets in the two-dimensional data structure after the merging process, and query the total traffic size of each data stream or the amount of change of each data stream.
  • two-dimensional data structures in multiple data collection devices can be acquired through the detection device, where any two-dimensional data structure is used to store information about the data flow in the network acquired by the corresponding data collection device;
  • the detection device can merge the two-dimensional data structures in multiple data collection devices to obtain the combined two-dimensional data structure;
  • the control device can detect the large-volume data flow according to the combined two-dimensional data structure.
  • the data flow may refer to the large-volume data flow at the entire network level in the network; that is, the detection method for large-volume data provided by the embodiment of this application avoids the omission of the detection of the large-volume data flow at the entire network level, and can detect the entire network. Level of large-volume data flow, thereby improving the accuracy of large-volume data flow detection.
  • the two-dimensional data structure is a data structure composed of multiple buckets
  • the processing unit is specifically configured to:
  • the control device merges the buckets at the same position in the two-dimensional data structures in the multiple data collection devices to obtain the merged two-dimensional data structure.
  • the two-dimensional data structure may include d rows, and each row includes w buckets; the above-mentioned multiple two-dimensional data structures may have the same structure, that is, each two-dimensional data structure in the multiple two-dimensional data structures has the same The number of rows, and each row includes the same number of buckets.
  • Merging multiple two-dimensional data structures can refer to merging buckets at the same position in multiple data structures; where the same position can refer to buckets located in the i-th row and j-th column of the multiple two-dimensional data structures. .
  • any one bucket in the two-dimensional data data structure in the multiple data collection devices includes the sum of the data flow in the current bucket, the key value of the main data flow in the current bucket, and The counter value of the main data stream;
  • any bucket in the merged two-dimensional data structure includes the sum of the updated data traffic, the key value of the updated main data stream, and the updated main data stream
  • the counter value of any one of the buckets includes the bucket at the first position, and the processing unit is specifically configured to:
  • the key of the updated main data stream in the bucket at the first position is obtained value
  • the process of merging two-dimensional data structures in multiple data collection devices is not simply superimposing the feature values in the buckets of multiple two-dimensional data structures; updating the stored data in a certain bucket
  • For data flow information it is necessary to compare data flows stored in buckets at the same position in other two-dimensional data structures, and then perform a reasonable estimation to determine the main data flow in each bucket after the merge process;
  • the combined processing method of multiple two-dimensional data structures can reduce the low memory requirements of the detection device and save resources.
  • the two-dimensional data structures in the multiple data collection devices are N two-dimensional data structures, and the N two-dimensional data structures correspond to the main ones in the bucket at the first position.
  • the key value of the data stream is X key values, and X is a positive integer less than or equal to N;
  • the processing unit is specifically used for:
  • Determining that the main data stream of the N two-dimensional data structures in the bucket at the first position is the estimated value of the traffic size of the data stream corresponding to any one of the X key values;
  • the first key value is any one of the X key values
  • the i-th two-dimensional data structure among the N two-dimensional data structures is in the bucket at the first position
  • the estimated value of the traffic size of the main data stream is obtained according to the following formula:
  • S i (x) denotes a key value corresponding to the first estimate of the size of the traffic data stream; x is the first key value, V i represents the i-th of said first two-dimensional data structure The sum of the flow of all data flows in the bucket at a position; C i represents the counter value of the main flow of the i-th two-dimensional data structure in the bucket at the first position.
  • the acquiring unit is specifically configured to acquire the two-dimensional data structure in the multiple data collection devices at the end of each time period;
  • the processing unit is specifically used for:
  • the change value of the first data stream in any two time periods is greater than the first threshold according to the two-dimensional data structure after the merging process, it is determined that the first data stream is the high-traffic data stream.
  • the detection device can periodically acquire the two-dimensional data structure in multiple data collection devices; that is, multiple data collection devices can periodically send the data structure for recording data flow information to the control device;
  • the size can be a preset time interval.
  • the processing unit is specifically configured to: if detecting that the total flow of the first data stream is greater than a second threshold according to the two-dimensional data structure after the merging process, determine the first data The flow is the large-volume data flow.
  • the two-dimensional data structure includes a majority voting data structure MV-Sketch.
  • the two-dimensional data structure can be MV-Sketch, and each bucket in MV-Sketch can include three feature values, which are the size of the total data stream stored in the current bucket, namely Vi , j ; the key value of the majority data stream in the current bucket is K i,j , the key value can represent the identity of the majority data stream, where the majority data stream means that the size of the data stream exceeds 50% of the total traffic mapped to the current bucket The above data flow; the counter value of the majority data flow in the current bucket is C i,j .
  • a detection device for a large-volume data stream.
  • the detection device includes: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed At this time, the processor is configured to execute the foregoing first aspect and the detection method in any one of the implementation manners of the first aspect.
  • a computer storage medium stores program code, and the program code includes instructions for executing the steps in the detection method in the first aspect and any one of the implementation manners of the first aspect .
  • the above-mentioned storage medium may specifically be a non-volatile storage medium.
  • a chip in a fifth aspect, includes a processor and a data interface.
  • the processor reads instructions stored in a memory through the data interface, and executes the first aspect and any one of the implementations of the first aspect.
  • the detection method in the mode is provided.
  • the chip may further include a memory in which instructions are stored, and the processor is configured to execute instructions stored on the memory.
  • the processor is configured to execute the detection method in the first aspect and any one of the implementation manners in the first aspect.
  • the above-mentioned chip may specifically be a field programmable gate array FPGA or an application-specific integrated circuit ASIC.
  • FIG. 1 is a schematic diagram of a two-dimensional data structure provided by an embodiment of the present application
  • Figure 2 is a schematic diagram of an application scenario provided by an embodiment of the present application.
  • Fig. 3 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of a method for detecting a large-volume data stream provided by an embodiment of the present application
  • FIG. 5 is a schematic flowchart of a method for detecting a large-volume data stream provided by an embodiment of the present application
  • FIG. 6 is a schematic diagram of merging processing of multiple two-dimensional data structures provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a detection result on a public network traffic data set provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a detection result on a public network traffic data set provided by an embodiment of the present application.
  • FIG. 9 is a schematic block diagram of a detection device provided by an embodiment of the present application.
  • Fig. 10 is a schematic block diagram of a detection device provided by an embodiment of the present application.
  • High-traffic data flow is the general term for data flows of heavy hitters and heavy changers.
  • heavy traffic refers to the number of packets, bytes, or connections in the network data flow.
  • the big change object refers to the network data flow in a short period of time, in the number of packets, bytes, or connections, etc.
  • Very drastic changes in the data flow is the general term for data flows of heavy hitters and heavy changers.
  • Usually Sketch refers to a two-dimensional table data structure, which consists of several rows, and each row consists of several buckets.
  • each bucket can include three elements, namely the total flow in the current bucket, the key value of the major flow in the current bucket, and the total flow counter of the major flow in the current bucket; among them, Majority flow It refers to a data flow whose size exceeds 50% of the total traffic mapped to the current bucket.
  • FIG. 1 is a schematic diagram of MV-Sketch. As shown in Figure 1 is a d ⁇ w two-dimensional data structure; the two-dimensional data structure includes d rows, each row includes w buckets, and each bucket includes three elements.
  • B(i,j) in the i-th row as an example;
  • B(i,j) includes Vi ,j , Ki ,j and C i,j ;
  • Vi ,j represents the current bucket The total traffic of all data streams in the current bucket;
  • Ki,j represents the key value of the major data stream in the current bucket, that is, the identifier of the major data stream, where the major data stream means that the size of the data stream exceeds the total mapped to the current bucket A data flow with a flow rate of more than 50%;
  • the counter C i,j is used to count the majority data flow in the bucket.
  • each bucket can include four elements, which are the total flow in the current bucket, the maximum error of the current bucket flow estimate, and the auxiliary queue (which records the flow estimate of some objects in the bucket) , The length of the auxiliary queue.
  • the data flow in the network shows explosive growth, which leads to frequent abnormalities in the network; in the detection of abnormal network traffic, there are two types of abnormal traffic that are particularly worthy of attention, one is traffic
  • the data stream with a huge size is also called a heavy hitter; the other is a data stream with a large change in traffic size within a certain period of time, also called a heavy changer; these two Collectively referred to as heavy flow.
  • Most of the current large-volume data flow detection algorithms deploy multiple two-dimensional tables at a certain device node in the network, and use the two-dimensional tables to store all data flow information, and then perform abnormal data flow detection.
  • the data flow in the network may be distributed on multiple devices, and the detection of abnormal data flow through a single device may only detect the large-volume data flow on a single device; and for the smaller data flow on a single device, many When combined with a single device, it may be that a large-volume data stream may have the problem of missed detection.
  • this application proposes a method for detecting big data traffic to acquire two-dimensional data structures in multiple data collection devices through a control device, where any two-dimensional data structure is used to store the data acquired by the corresponding data collection device.
  • Information about the data flow in the network the control device can merge the two-dimensional data structure of multiple data collection devices to obtain a combined two-dimensional data structure.
  • the control device can detect large traffic based on the combined two-dimensional data structure
  • Data flow the large-volume data flow refers to the large-volume data flow of the entire network in the network; that is, the detection method for large-volume data flow provided by the embodiment of this application avoids the omission of the detection of the large-volume data flow of the entire network.
  • Fig. 2 is a schematic diagram of an application scenario according to an embodiment of the present application.
  • the system 100 may include a control device 101 and multiple data collection devices (for example, a single-node device through which data flows) 102; the control device 101 may be used to obtain the system at the end of a preset time period.
  • the data collection device 102 can be any device that acquires data streams, for example, at a gateway or other devices that collect network data streams, and can include any computing devices known in the current technology, such as servers, desktop computers, and so on.
  • the data collection device 102 may include a memory and a processor.
  • the memory can be used to store program codes, such as an operating system and other application programs.
  • the processor can be used to call the program code stored in the memory to implement the corresponding function of the node.
  • the processor and memory included in the node can be implemented by a chip, and there is no specific limitation here.
  • the control device 101 may periodically obtain the two-dimensional data structure used to record the data flow in the data collection device 102, so as to determine the flow size of the data flow flowing through each data collection device 102 in the network.
  • Fig. 3 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • the system architecture of the data stream detection method provided in the embodiment of the present application may include a data stream collection module 210, a data stream processing module 220, and a data stream anomaly detection module 230, where the data stream processing module 220 may also include partial update Module 221, global merging module 222, and data flow estimation module 223.
  • the data stream collection module 210 is used to collect traffic from the gateway or other equipment that collects network data streams, extract the five-tuple characteristics of the data stream (source address, source port, destination address, destination port, protocol) and the data stream The size, where the five-tuple feature is used to uniquely identify the data stream.
  • the data stream processing module 220 is used to deploy a two-dimensional data structure at each gateway (for example, a data collection device), and map and store the collected data stream into the two-dimensional data structure, for example, by adopting a majority voting algorithm to transfer the data stream
  • the mapping is stored in a two-dimensional data structure; then, the two-dimensional data structures of multiple data collection devices are merged to obtain the sum of the data flow to be queried in the estimation table, which is used to subsequently detect the abnormality of a certain data flow in the entire network Condition.
  • the data flow processing module 220 may further include a local update module 221, a global merge module 222, and a data flow estimation module 223.
  • the local update module 221 is used for the two-dimensional data structure deployed at each data collection device, mapping the data stream to the two-dimensional data structure through a hash function, and establishing a hash index for the data stream; in the two-dimensional data structure
  • Each grid (bucket) of can include the identification value of the data stream, the sum of the sizes of all data streams mapped to the grid, and a counter.
  • the global merging module 222 is used for merging the two-dimensional data structures at all data collection devices, so as to facilitate statistics on the data flow of the entire network.
  • mapping position of the data stream in the two-dimensional data structure of different data collection devices may be different, so when comparing different data collection devices
  • the data streams of all grids at the same position are not simply added; for the data stream of a grid, it is necessary to update the key value, the sum of the size of all the data streams mapped to the grid, In the case of counters, it is necessary to compare the grid conditions in the two-dimensional data structure of other data collection devices, and then make a reasonable estimate.
  • the data stream size estimation module 223 is used to detect the data-level data stream of the entire network.
  • the size of the two-dimensional data structure is generally estimated, so there may be multiple data streams mapped to the same grid; by comparing all grids
  • the key value is to estimate the total size of a certain data stream in the entire network, so as to facilitate the next anomaly detection.
  • the data stream abnormality detection module 230 is configured to perform abnormality detection based on the estimated value of the data stream at the end of each time period to determine whether a certain data stream is an abnormal value.
  • the time period may be a preset time interval.
  • Fig. 4 is a schematic flowchart of a method for detecting a large-traffic data stream according to an embodiment of the present application.
  • the data collection device can be any one of the multiple data collection devices in the system, for example, it can be any one of the data collection devices in FIG. 2.
  • the system may include multiple data collection devices and a control device.
  • the control device processes the two-dimensional data structure of the mapped data stream provided by the multiple data collection devices, so as to obtain a network-wide level The detection result of the data stream.
  • Step 310 The control device may acquire two-dimensional data structures in multiple data collection devices.
  • the two-dimensional data structure is used to store the information of the data flow in the network acquired by the corresponding data collection device.
  • any one of the two-dimensional data structures in the above-mentioned multiple data collection devices may be used to store information about the data flow of the network acquired by one of the multiple data collection devices; for example, data flow
  • the information can include the key value of the data stream in the obtained network, the size of the data stream, and other information.
  • the aforementioned network may refer to any detected object, and the network may be a network composed of one or more devices.
  • control device may periodically acquire the two-dimensional data structure in the multiple data collection devices.
  • multiple data collection devices may send the two-dimensional data structure in the data collection device to the control device, where the time period may be a preset time interval.
  • Step 320 The control device merges the two-dimensional data structures in the multiple data acquisition devices to obtain a merged two-dimensional data structure.
  • the two-dimensional table structure after the above-mentioned merging process can be used to store the information of the data stream in the network.
  • the two-dimensional data structure is a data structure composed of multiple buckets
  • the control device merges the two-dimensional data structures in the multiple data collection devices to obtain the merged two-dimensional data structure.
  • the dimensional data structure may include: the control device merges the buckets at the same position in the two-dimensional data structure of the multiple data collection devices to obtain the merged two-dimensional data structure.
  • the two-dimensional data structure may be as shown in FIG. 1.
  • the two-dimensional data structure shown in FIG. 1 includes d rows, and each row includes w buckets; the foregoing multiple two-dimensional data structures may have the same structure, that is, multiple Each of the two-dimensional data structures has the same number of rows, and each row includes the same number of buckets.
  • Combining multiple two-dimensional data structures may refer to combining multiple buckets in the same position in the multiple data structures.
  • the data structure of the two-dimensional data can be MV-Sketch, and each bucket in MV-Sketch can include three feature values, which are the size of the total data stream stored in the current bucket. That is, Vi ,j ; the key value of the majority data stream in the current bucket is K i,j , and the key value can represent the identifier of the majority data stream.
  • the majority data stream means that the size of the data stream exceeds the size of the data stream mapped to the current bucket.
  • a data stream with more than 50% of the total traffic; the counter value of the majority data stream in the current bucket is C i,j .
  • the three feature values included are: updated The sum of data traffic, the key value of the updated main data stream, and the counter value of the updated main data stream.
  • the process of merging the buckets in the first position in the two-dimensional data structure in the multiple data collection devices may include: In the data structure, the sum of the data flow in the bucket at the first position is superimposed to obtain the sum of the updated data flow in the bucket at the first position; In the data structure, the flow size of the main data stream in the bucket at the first position is compared to obtain the key value of the updated main data stream in the bucket at the first position; The key value of the main data stream and the flow size of the main data stream are used to obtain the updated counter value of the main data stream in the bucket at the first position.
  • the aforementioned merging processing algorithm is not simply superimposing the feature values in the buckets of the two-dimensional data structure in multiple data collection devices; when updating the data flow information stored in a certain bucket, It is necessary to compare the data streams stored in the buckets at the same position in other two-dimensional data structures, and then perform a reasonable estimation to determine the main data stream in each bucket after the merge processing; through the multiple data collection provided in the embodiments of this application
  • the merging processing method of the two-dimensional data structure in the device can reduce the low demand of the control device on the memory and save resources.
  • the two-dimensional data structures in the multiple data collection devices are N two-dimensional data structures, and the N two-dimensional data structures correspond to the main data stream in the bucket at the first position.
  • the key value of is X key values, and X is a positive integer less than or equal to N; by comparing the traffic size of the main data stream in the bucket with the first position of the two-dimensional data structure in multiple data collection devices, the The key value of the updated main data stream in the bucket at the first position includes: determining that the main data stream of the N two-dimensional data structures in the bucket at the first position corresponds to any one of the X key values The estimated value of the traffic size of the data stream; the data stream with the largest traffic among the traffic sizes of the data streams corresponding to any one of the X key values is determined as the updated main data stream.
  • the first key value is any one of the X key values
  • the estimated value of the traffic size of the main data flow in the bucket at the first position of the i-th two-dimensional data structure in the N two-dimensional data structures can be It is obtained according to the following formula:
  • S i (x) denotes a key value corresponding to the first estimate of the size of the traffic data stream; x is the first key value, V i represents the i-th of said first two-dimensional data structure The sum of the flow of all data flows in the bucket at a position; C i represents the counter value of the main flow of the i-th two-dimensional data structure in the bucket at the first position.
  • the bucket at the first position in the two-dimensional data structure may refer to the bucket B(i, j) in the i-th row and j-th column in the two-dimensional data structure.
  • the estimated value of the sum of the majority data stream sizes mapped to the bucket at the current location after merging is: in, Represents the sum of all data streams of bucket B(i,j) in the M-th two-dimensional table in q two-dimensional tables; Represents the counter of the majority data stream of the bucket B(i,j) in the M-th two-dimensional table in the q two-dimensional tables; if the K value in the bucket in other locations is inconsistent with the K value in the current bucket, then merge and map
  • the estimated value of the majority data stream size in the bucket at the current location is:
  • Step 330 The control device detects the high-traffic data flow according to the two-dimensional data structure after the merging process.
  • the large-volume data flow detected through the two-dimensional data structure after the merging process may refer to the large-volume data flow of the entire network in the network.
  • a large-volume data stream at the entire network level can mean that a data stream detected by a single data collection device may not be a large-volume data stream, but a data stream from many data collection devices is combined and detected as a large-volume data stream.
  • the large-volume data flow is a large-volume data flow at the entire network level.
  • a query can be performed in the merged two-dimensional table according to the key value of a given data stream, so as to estimate the total traffic size or the magnitude of the change amount of the data stream of the given key value.
  • the data streams can be polled in turn according to the key values in each bucket in the merged two-dimensional table to query the total traffic size of each data stream or the size of the change of each data stream.
  • control device acquires the two-dimensional data structure of the multiple data collection devices at the end of each time period; the control device obtains the two-dimensional data structure according to the combined process Structural inspection of high-volume data streams, including:
  • control device If the control device detects that the change value of the first data stream in any two time periods is greater than the first threshold according to the two-dimensional data structure of the merged network, it determines that the first data stream is the high-volume data stream; One data stream is a large-flow data stream at the entire network level in the network.
  • control device detects the high-traffic data flow according to the two-dimensional data structure after the merging process, including:
  • control device detects that the total flow size of the first data stream is greater than the second threshold according to the two-dimensional data structure after the merging process, it determines the large-volume data stream of the first data stream; it can determine that the first data stream is the entire network. Large-volume data flow at the network level.
  • the above-mentioned two-dimensional data structure may refer to MV-Sketch, or, LD-Sketch, or other Sketch structures, which is not limited in this application.
  • the method for detecting big data traffic obtains two-dimensional data structures in multiple data collection devices through a control device, where any two-dimensional data structure can be used to store the data in the network acquired by the corresponding data collection device Flow information; the control device can merge the two-dimensional data structure of multiple data collection devices to obtain the combined two-dimensional data structure; the control device can detect the large-flow data flow according to the combined two-dimensional data structure
  • the large-volume data flow refers to the large-volume data flow of the entire network in the network; that is, the detection method of the large-volume data flow provided by the embodiment of this application avoids the omission of the detection of the large-volume data flow of the entire network. Detect large-volume data flows at the entire network level, thereby improving the accuracy of large-volume data flow detection.
  • FIG. 5 is a schematic flowchart of a method for detecting a data stream according to an embodiment of the present application.
  • the detection method shown in FIG. 5 includes steps 401 to 407, and steps 401 to 407 are described in detail below.
  • Step 401 Start.
  • Step 402 The data collection device obtains the data stream in the network.
  • the foregoing network may refer to any detected object, and the network may be a network composed of one or more devices.
  • the data collection device may refer to any data collection device in the network shown in FIG. 2, and the data collection device is used to obtain a data stream in the network.
  • Step 403 The data collection device records the acquired information of the data stream into the data structure through a local update algorithm.
  • the foregoing data table structure may refer to MV-Sketch, or LD-Sketch or other two-dimensional data structures.
  • each row includes w buckets; when the data collection device obtains the data stream (or data packet) in the network, it can use r
  • An independent hash function maps the data stream to rows 1 to r, and the mapped sequence j is determined by the hash value hi(x); the data stream is mapped to each of the two-dimensional data structures through the hash function In the bucket, update the majority data stream according to the majority voting algorithm.
  • a majority data stream refers to a data stream with more than 50% of the total traffic in the current bucket; suppose there are three candidate data streams A, B, and C, and assume that the data streams are voted in the following order: AAACCBBCCCBCC; after recording the first After 3 votes, data stream C leads by 3 votes; when processing the next three votes, the three votes cast for data stream A are offset by three other votes (CCB); finally, all selected films are recorded Later, the data stream C becomes the majority data stream.
  • the previous two-dimensional data structure of the data collection device needs to be updated; the update process is when each data stream (object X, value V X ) is received Both are called, taking (X, V X ) as input to explain the update process in the two-dimensional data structure.
  • the hash function corresponding to the row maps X to a bucket in the row; thereby updating the information in the bucket, that is, updating the three elements V ij in the bucket , K ij and C ij ; see Figure 1.
  • V i ,j represents the total traffic of all data streams in the current bucket
  • Ki,j represents the key value of the major data stream in the current bucket, that is, the identifier of the major data stream, where the major data stream means that the size of the data stream exceeds the size of the data stream mapped to The data flow with more than 50% of the total flow in the current bucket
  • counter Ci,j represents the total flow of the majority data flow in the bucket, assuming that the three elements included in the updated bucket B(i,j) are V1 i, j , K1 i,j and C1 i,j ; the process of using the local update algorithm to update is as follows:
  • Step 2 If the data stream X is the majority data stream in bucket B(i,j) and X is in the current bucket before being updated, go to step 3; if X is not the majority data stream in bucket B(i,j) Or X does not exist in the current bucket before, go to step 4.
  • the data structure used for mapping and storing data stream information on the data acquisition device can be updated.
  • the data acquisition device can record the information of the data stream acquired through MV-Sketch, then after the data acquisition device acquires the new data stream, the above partial update algorithm can be used to perform the MV-Sketch on the data acquisition device. renew.
  • the information of the data stream obtained by LD-Sketch can be recorded in the data collection device, and the LD-Sketch on the data collection device can be updated through the partial update algorithm after the data collection device obtains the new data stream. .
  • Step 404 The control device obtains the data structure on each data collection device and performs a data structure merging process to obtain a merged data structure.
  • the data structure on each of the aforementioned data collection devices can be used to record the information of the data stream collected by a single point device, but for the entire network-level data stream, that is, the data stream flowing through multiple single point devices cannot be detected;
  • the combined data structure obtained by merging the data structures in multiple data collection devices can be used to detect the data flow at the entire network level, thereby avoiding the need for large traffic data at the entire network level. The flow of missed inspections.
  • control device can periodically obtain the data structure in the data collection device; that is, the data collection device can periodically send the data structure for recording data stream information to the control device; the size of the period can be preset time interval.
  • the following is a detailed description of the process of merging the data structures of multiple data acquisition devices acquired by the control device, that is, the control device needs to merge the data structures acquired on each single point device into one for recording the entire network-level data
  • the stream is the data structure of the data stream flowing through each single point device.
  • the control device may merge the acquired data structures in multiple data collection devices through a global merge algorithm.
  • the merging process refers to merging buckets in the same position in multiple data structures, that is, updating the information of the data stream recorded in the buckets in the same position. For example, for MV-Sketch, it is necessary to update Vi,j , Ki ,j, and C i,j included in each bucket after the merge.
  • control device acquires q data structures for example; the control device may merge the same positions in the q data structures, thereby updating the information in the data structure, and obtain the merged data structure; for example,
  • the process of merging q MV-Sketch two-dimensional tables using the global merging algorithm is as follows:
  • Step 1 Update the sum of data flows in the bucket, that is, the sum of all data flows mapped in q two-dimensional tables to the current bucket.
  • Step 2 Update the key value of the majority data stream in the bucket, compare the key value in the current bucket with the key value in the bucket at the same location in other two-dimensional tables, and update the key value in the current bucket.
  • the estimated value of the majority data stream size mapped to the bucket at the current location after merging is: in, Represents the sum of all data streams of bucket B(i,j) in the M-th two-dimensional table in q two-dimensional tables; Represents the counter of the majority data stream of the bucket B(i,j) in the M-th two-dimensional table in the q two-dimensional tables; if the K value in the bucket in other locations is inconsistent with the K value in the current bucket, then merge and map
  • the estimated value of the majority data stream size in the bucket at the current location is:
  • Step 3 Compare the estimated values of the possible majority data streams of the q two-dimensional tables in the bucket B(i,j), and the K i,j of the current bucket B(i,j) takes the data corresponding to the largest estimated value The key value of the stream.
  • Step 4 Update the K value of the bucket B(i,j) in the two-dimensional table after the merge processing.
  • control device can merge the obtained q two-dimensional tables of multiple data collection devices to obtain a two-dimensional table for recording the entire network-level data flow, that is, the data flow flowing through each single point device.
  • the key values in the three two-dimensional table buckets (1,1) there may be at most three possible X, Y, and Z, that is, the key values in the key values in the buckets (1,1) in the three two-dimensional tables are all different, or it can be that the key values in the buckets (1,1) in the three two-dimensional tables are the same or Exactly the same; for example, if the key value in the first two-dimensional table bucket (1,1) is X, the estimated value of the data stream size corresponding to the key value X in the first two-dimensional table is If the key value in the first two-dimensional table bucket (1,1) is not X, the estimated value of the data stream size corresponding to the key value X in the first two-dimensional table is Furthermore, continue to determine whether the key value in the bucket (1,1) in the second two-dimensional table is X; if the key value in the bucket (1,1) in the second two-dimensional table is X, the first The two-dimensional table is merged with the position of the second two-dimensional table bucket (1,1)
  • a two-dimensional table can be deployed in a data collection device, and the data collection device can record data flow information in the two-dimensional table through the partial update algorithm in step 403.
  • each data acquisition device can record the information of the data stream in a two-dimensional data structure; the control device can acquire the two-dimensional data structure in multiple data acquisition devices and pass the above
  • the global merging algorithm merges multiple two-dimensional data structures to obtain a merged two-dimensional data structure for recording the entire network-level data stream.
  • each of the above-mentioned data collection devices may be multiple data collection devices in the same network, or may also be multiple data collection devices in different networks.
  • multiple two-dimensional tables can also be deployed in a data collection device, and the data collection device evenly records the information of the acquired data stream in multiple two-dimensional tables, and then can store the data on the data collection device.
  • the deployed multiple two-dimensional tables are merged into one two-dimensional table through the above-mentioned global merging algorithm; the control device obtains the merged two-dimensional table in each data collection device and then performs the merging process, and finally obtains the data flow that is used to record the entire network level.
  • a two-dimensional table of data flows through each single point device.
  • the above-mentioned two-dimensional table may refer to MV-Sketch, or, LD-Sketch, or other Sketch structures, which is not limited in this application.
  • Step 405 Estimating the size of the data stream.
  • the size of a certain data stream can be estimated according to the two-dimensional table after the merging process obtained in step 404 above.
  • a query can be performed in the merged two-dimensional table according to the key value of a given data stream, so as to estimate the total traffic size or the magnitude of the change amount of the data stream of the given key value.
  • the data streams can be polled in turn according to the key values in each bucket in the merged two-dimensional table to query the total traffic size of each data stream or the size of the change of each data stream.
  • the total flow of a certain data stream can be estimated, and the data stream X will be mapped to one of each row from row 1 to row d of the combined two-dimensional data structure
  • the following estimation algorithm can be used to estimate the size of the data stream:
  • Step 1 Suppose the traffic size of data stream X is queried. If the key value in the current bucket in the merged two-dimensional table is the same as the key value of data stream X, then data stream X is in the current bucket B(i,j).
  • the total traffic size of all data streams; C i,j represents the counter value of the majority data stream in the current bucket B(i,j) in the merged two-dimensional table.
  • Step 406 Abnormal data flow detection.
  • abnormal data flow detection may be to determine whether the data flow X in the above step 405 is a large flow object or a large change object, that is, to determine whether the data flow X is a large flow data flow.
  • given threshold S can represent the total capacity of all data streams in a period
  • D can represent the difference in the total capacity of all data streams in two time periods, that is, the change value
  • the following process can be used to determine whether the data stream X is a large traffic object or a large change object :
  • D(x) represents the difference of S(x) of the data stream X in two time periods, that is, the amount of change of the data stream X in two time periods.
  • the data stream X When the data stream X satisfies the above 1 or 2, it can be determined that the data stream X is a large-volume data stream at the entire network level, so that the data stream X can be subsequently monitored.
  • Step 407 End.
  • FIG. 7 is a schematic diagram of the detection result of a large-volume object on a public network traffic data set provided by an embodiment of the present application
  • FIG. 8 is a schematic diagram of a detection result of a large-variable object on the public data flow data set provided by an embodiment of the present application .
  • the collection time of the data set is 5 minutes, and every 1 minute is a time period; each period includes about 29M data packets, 1M data stream, and the memory size varies from 64KB to 4MB.
  • Test indicators include: precision: the proportion of high-volume data streams in all data streams in the estimated detection results; recall: the large-volume data streams in the estimated detection results The proportion of traffic data flow; F1 value (F1 Score): used to evaluate the accuracy and recall rate as a whole; relative error: the error proportion of the data flow estimation during the test period.
  • the data structure of the detection result includes the sub-linear spatial data structure (Count-min, CM); the majority vote data structure (Majority vote, MV); the local distributed data structure (Local-distributed, LD); the Deltoid data structure (Del ); and fast data structure (FAST).
  • CM sub-linear spatial data structure
  • MV majority vote data structure
  • LD local distributed data structure
  • Del Deltoid data structure
  • FAST fast data structure
  • the method for detecting big data traffic provided by the embodiment of the present application is described in detail above with reference to FIGS. 1 to 8.
  • the device embodiment of the present application will be described in detail below in conjunction with FIG. 9 and FIG. 10. It should be understood that the detection device in the embodiment of the present application can execute the various big data traffic detection methods of the foregoing embodiments of the present application, that is, the specific working process of the following various products, you can refer to the corresponding process in the foregoing method embodiment.
  • FIG. 9 is a schematic block diagram of an apparatus 500 for detecting a large-volume data stream provided by an embodiment of the present application. It should be understood that the detection device 500 can execute each step in the detection method of FIG. 4 or FIG.
  • the detection device 500 includes: an acquisition unit 510 and a processing unit 520.
  • the acquiring unit 510 is configured to acquire a two-dimensional data structure in a plurality of data collection devices, where the two-dimensional data structure is used to store information about data streams in a network acquired by a corresponding data collection device;
  • the processing unit 520 is configured to Merging the two-dimensional data structures in the multiple data collection devices to obtain a merged two-dimensional data structure; detecting a large-volume data stream based on the merged two-dimensional data structure, the large-volume data stream Refers to the network-wide large-volume data flow in the network.
  • the two-dimensional data structure is a data structure composed of multiple buckets
  • the processing unit 520 is specifically configured to:
  • the buckets at the same position in the two-dimensional data structures in the multiple data collection devices are combined to obtain the combined two-dimensional data structure.
  • any one of the buckets in the two-dimensional data data structure in the multiple data collection devices includes the sum of the data traffic in the current bucket, the key value of the main data flow in the current bucket, and The counter value of the main data stream;
  • any bucket in the merged two-dimensional data structure includes the sum of the updated data traffic, the key value of the updated main data stream, and the updated main data stream
  • the counter value of any one of the buckets includes the bucket at the first position, and the processing unit 520 is specifically configured to:
  • the updated data in the bucket at the first position is obtained.
  • the two-dimensional data structures in the multiple data collection devices are N two-dimensional data structures, and the N two-dimensional data structures correspond to the main ones in the bucket at the first position.
  • the key values of the data stream are X key values, and X is a positive integer less than or equal to N; the processing unit 520 is specifically configured to:
  • Determining that the main data stream of the N two-dimensional data structures in the bucket at the first position is the estimated value of the traffic size of the data stream corresponding to any one of the X key values;
  • the processing unit 520 is specifically configured to:
  • the estimated value of the traffic size of the main data stream in the bucket of the first position of the i-th two-dimensional data structure in the N two-dimensional data structures is obtained according to the following formula:
  • S i (x) denotes a key value corresponding to the first estimate of the size of the traffic data stream; x is the first key value, V i represents the i-th of said first two-dimensional data structure The sum of the flow of all data flows in the bucket at a position; C i represents the counter value of the main flow of the i-th two-dimensional data structure in the bucket at the first position.
  • the acquiring unit 510 is specifically configured to acquire the two-dimensional data structure in the multiple data collection devices at the end of each time period;
  • the processing unit 520 is specifically configured to:
  • the change value of the first data stream in any two time periods is greater than the first threshold according to the two-dimensional data structure after the merging process, it is determined that the first data stream is the high-traffic data stream.
  • the processing unit 520 is specifically configured to:
  • the first data stream is the high-traffic data stream.
  • the two-dimensional data structure includes a majority voting data structure MV-Sketch.
  • the detection device 500 here is embodied in the form of a functional unit.
  • the term "unit” herein can be implemented in the form of software and/or hardware, which is not specifically limited.
  • a "unit” may be a software program, a hardware circuit, or a combination of the two that realizes the above-mentioned functions.
  • the hardware circuit may include an application specific integrated circuit (ASIC), an electronic circuit, and a processor for executing one or more software or firmware programs (such as a shared processor, a dedicated processor, or a group processor). Etc.) and memory, merged logic circuits and/or other suitable components that support the described functions.
  • the units of the examples described in the embodiments of the present application can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
  • FIG. 10 is a schematic diagram of the hardware structure of a device for detecting a large-flow data stream according to an embodiment of the present application.
  • the detection device 600 shown in FIG. 10 includes a memory 601, a processor 602, a communication interface 603, and a bus 604. Among them, the memory 601, the processor 602, and the communication interface 603 implement communication connections between each other through the bus 604.
  • the memory 601 may be a read-only memory (ROM), a static storage device and a random access memory (RAM).
  • the memory 601 can store a program.
  • the processor 602 and the communication interface 603 are used to execute each step of the method for detecting a large-flow data stream in the embodiment of the present application, for example, can execute Each step of the method for detecting a large-volume data stream shown in FIG. 4 or FIG. 5.
  • the processor 602 may adopt a general CPU, a microprocessor, an ASIC, a GPU, or one or more integrated circuits for executing related programs, so as to realize the execution of the units in the detection device shown in FIG. 9 in the embodiment of the present application. , Or execute the method for detecting large-volume data streams in the method embodiment of this application.
  • the processor 602 may also be an integrated circuit chip with signal processing capability.
  • each step of the method for detecting a large-flow data stream in the embodiment of the present application can be completed by an integrated logic circuit of hardware in the processor 602 or instructions in the form of software.
  • the aforementioned processor 602 may also be a general-purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, or discrete hardware component.
  • the methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 601, and the processor 602 reads the information in the memory 601, and combines its hardware to complete the functions required by the unit included in the detection device of the embodiment of the present application, or execute the large-flow data of the method embodiment of the present application Flow detection method.
  • the processor 602 may correspond to the processing unit 520 in the detection device shown in FIG. 9.
  • the communication interface 603 uses a transceiver device such as but not limited to a transceiver to implement communication between the detection device 600 and other devices or communication networks.
  • a transceiver device such as but not limited to a transceiver to implement communication between the detection device 600 and other devices or communication networks.
  • the communication interface 603 shown may correspond to the acquisition unit 510 in the detection apparatus shown in FIG.
  • the bus 604 may include a path for transmitting information between various components of the detection device 600 (for example, the memory 601, the processor 602, and the communication interface 603).
  • detection device 600 only shows a memory, a processor, and a communication interface, in the specific implementation process, those skilled in the art should understand that the detection device 600 may also include other devices necessary for normal operation. At the same time, according to specific needs, those skilled in the art should understand that the detection device 600 may also include hardware devices that implement other additional functions. In addition, those skilled in the art should understand that the detection device 600 described above may also only include the components necessary to implement the embodiments of the present application, and not necessarily include all the components shown in FIG. 10.
  • An embodiment of the present application also provides a system, which includes the above-mentioned detection device and a plurality of data collection devices; the detection device can execute the method for detecting a large-flow data stream in the above-mentioned method embodiment.
  • the embodiment of the present application also provides a chip, which includes a transceiver unit and a processing unit.
  • the transceiving unit may be an input/output circuit and a communication interface;
  • the processing unit may be a processor, a microprocessor, or an integrated circuit integrated on the chip;
  • the chip may execute the method for detecting a large-flow data stream in the foregoing method embodiment.
  • the embodiment of the present application also provides a computer-readable storage medium on which an instruction is stored.
  • the instruction When the instruction is executed, the method for detecting a large-flow data stream in the foregoing method embodiment is executed.
  • the embodiment of the present application also provides a computer program product containing an instruction, which when executed, executes the method for detecting a large-flow data stream in the foregoing method embodiment.
  • the processor may be a central processing unit (CPU), and the processor may also be other general-purpose processors, digital signal processors (digital signal processors, DSP), and dedicated integration Circuit (application specific integrated circuit, ASIC), ready-made programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the memory may include a read-only memory and a random access memory, and provide instructions and data to the processor.
  • a part of the processor may also include a non-volatile random access memory.
  • the processor may also store device type information.
  • the size of the sequence number of the above-mentioned processes does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not correspond to the embodiments of the present application.
  • the implementation process constitutes any limitation.
  • the disclosed system, device, and method can be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional modules in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请公开了一种大流量数据流的检测方法以及检测装置,该检测方法包括:控制设备获取多个数据采集设备中的二维数据结构,该二维数据结构用于存储相应的数据采集设备获取的网络中的数据流的信息;该控制设备对该多个数据采集设备中的二维数据结构进行合并处理,得到合并处理后的二维数据结构;该控制设备根据该合并处理后的二维数据结构检测大流量数据流,该大流量数据流是指该网络中全网级别的大流量数据流。基于本申请的技术方案,能够检测出全网级别的大流量数据流,从而提高大流量数据检测的准确性。

Description

大流量数据流的检测方法以及检测装置
本申请要求于2020年3月26日提交的申请号为202010225423.9、发明名称为“大流量数据流的检测方法以及检测装置”的中国专利申请优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及信息技术领域,更具体地,涉及大流量数据流的检测方法以及检测装置。
背景技术
随着移动网络的快速发展,网络中的流量呈现爆破式增长,从而导致网络中的异常频发;进一步带来了网络维护成本的增加。如何快速、无遗漏检测出网络中的异常流量变的尤为重要。
在网络异常流量检测中,有两种类型的异常流量尤为值得关注。一种是流量的大小巨大的数据流,也称为大流量对象(heavy hitter);另一种是在一定时间内流量大小变化很大的数据流,也称为大变化对象(heavy changer);两种数据流统称为大流量数据流(heavy flow)。
目前,大流量数据流检测大多是通过在网络中的某个数据采集设备中部署多个二维表,通过二维表存储该数据采集设备获取的所有数据流的信息,进而对单个数据采集设备进行异常数据流的检测。但是,网络中的数据流可能会分布在多个数据采集设备,只通过单个数据采集设备进行异常数据流的检测可能会遗漏某些大流量数据流,比如,通过单个数据采集设备的检测发现的数据流可能很小,但是许多个数据采集设备中的数据流合并起来可能会是全网级别的大流量数据流;因此,如何针对全网级别的大流量数据流进行检测成为一个亟需解决的问题。
发明内容
本申请提供一种大流量数据流的检测方法以及检测装置,能够检测出全网级别的大流量数据流,从而提高大流量数据流检测的准确性。
第一方面,提供了一种大流量数据流的检测方法,包括:控制设备获取多个数据采集设备中的二维数据结构,其中,所述二维数据结构用于存储相应的数据采集设备获取的网络中的数据流的信息;所述控制设备对所述多个数据采集设备中的二维数据结构进行合并处理,得到合并处理后的二维数据结构;所述控制设备根据所述合并处理后的二维数据结构检测大流量数据流,其中,所述大流量数据流是指所述网络中全网级别的大流量数据流。
其中,上述多个数据采集设备中的二维数据结构中的任意一个二维数据结构可以用于存储多个数据采集设备中一个数据采集设备获取的网络的数据流的信息;比如,数据流的信息可以包括获取的网络中的数据流的键值、数据流的大小等信息。
需要说明的是,多个数据采集设备中任意一个数据采集设备中的二维数据结构用于存储该数据采集设备获取的网络中的数据流的信息,即数据采集设备中的二维数据结构可以用于检测单点设备级别的大流量数据流,即流经单个数据采集设备的大流量数据流;而合并处理得到的合并处理后的二维数据结构可以用于检测网络中全网级别的大流量数据流,即通过数据采集设备的检测可能并非大流量数据流,但是许多个数据采集设备中的某一数据流合并起来检测为大流量数据流,则该大流量数据流为全网级别的大流量数据流。
其中,大流量数据流是大流量对象(heavy hitter)与大变化对象(heavy changer)的数据流的总称,其中,大流量对象是指网络数据流在以包数量、字节数或者连接数等为单位的数值超过预期,即超过一定的阈值的数据流;大变化对象是指网络数据流在一个较短的时间段内,在以包数量、字节数或者连接数等为单位的数值大小产生了非常剧烈的变化的数据流。
应理解,上述各个数据采集设备上的二维数据结构可以用来记录数据采集设备即单点设备获取的数据流的信息;但是,对于全网级别的数据流,即流经多个单点设备的数据流无法进行检测;在本申请的实施例中,通过对多个数据采集设备中的数据结构进行合并处理得到的合并后的数据结构可以用于检测全网级别的数据流,从而避免对于全网级别的大流量数据流的漏检的问题。
可选地,控制设备可以根据给定的数据流的键值与合并处理后的二维数据结构检测网络中的大流量数据流,从而估计该给定键值的数据流的总流量大小或者变化量大小。
可选地,控制设备可以根据合合并处理后的二维数据结构中各个桶中的键值对数据流依次进行轮询,查询各个数据流的总流量大小或者各个数据流的变化量大小。
基于本申请的技术方案,通过控制设备获取多个数据采集设备中的二维数据结构,其中,任意一个二维数据结构用于存储相应的数据采集设备获取的网络中的数据流的信息;控制设备可以对多个数据采集设备中的二维数据结构进行合并处理,得到合并处理后的二维数据结构;控制设备可以根据合并处理后的二维数据结构检测大流量数据流,该大流量数据流是指网络中的全网级别的大流量数据流;即通过本申请实施例提供的大流量数据流的检测方法避免了遗漏对全网级别的大流量数据流的检测,能够检测出全网级别的大流量数据流,从而提高大流量数据流检测的准确性。
在一种可能的实现方式中,所述二维数据结构是由多个桶组成的数据结构,所述控制设备对所述多个数据采集设备中的二维数据结构进行合并处理,得到合并处理后的二维数据结构,包括:
所述控制设备对所述多个数据采集设备中的二维数据结构中相同位置的桶进行合并处理,得到所述合并处理后的二维数据结构。
需要说明的是,二维数据结构可以包括d行,每一行包括w个桶;上述多个二维数据结构可以具有相同的结构,即多个二维数据结构中各个二维数据结构具有相同的行数,并且每行包括相同数量的桶。对多个二维数据结构进行合并处理可以是指对多个数据结构中相同位置的桶进行合并处理;其中,相同位置可以是指位于多个二维数据结构中第i行第j列的桶。
在一种可能的实现方式中,所述多个数据采集设备中的二维数据数据结构中的任意一个桶包括当前桶中的数据流量总和、所述当前桶中的主要数据流的键值以及所述主要数据 流的计数器值;所述合并处理后的二维数据结构中的任意一个桶包括更新后的数据流量总和、更新后的主要数据流的键值以及所述更新后的主要数据流的计数器值,所述任意一个桶包括在第一位置的桶,
所述控制设备对所述多个数据采集设备中的二维数据结构中相同位置的桶进行合并处理,得到所述合并处理后的二维数据结构,包括:
通过对所述多个数据采集设备中的二维数据结构中在所述第一位置的桶中的数据流量总和进行叠加,得到所述第一位置的桶中的所述更新后的数据流量总和;
通过对所述多个数据采集设备中的二维数据结构中在所述第一位置的桶中的主要数据流的流量大小进行比较,得到所述第一位置的桶中的所述更新后的主要数据流的键值;
通过所述更新后的主要数据流的键值以及所述主要数据流的流量大小,得到所述第一位置的桶中的所述更新后的主要数据流的计数器值。
基于本申请实施的技术方案,对多个数据采集设备中的二维数据结构的合并处理并非简单的对多个二维数据结构的桶中的特征值进行叠加;在更新某个桶中存储的数据流信息时,需要对其他二维数据结构中相同位置的桶中存储的数据流进行比较,然后进行合理的估计确定合并处理后的每个桶中的主要数据流;通过本申请实施例的多个数据采集设备中的二维数据结构的合并处理方法能够减小对内存的需求低,节省资源。
在一种可能的实现方式中,所述多个数据采集设备中的二维数据结构为N个二维数据结构,所述N个二维数据结构在所述第一位置的桶中对应的主要数据流的键值为X个键值,X为小于或者等于N的正整数;
所述通过对所述多个数据采集设备中的二维数据结构在所述第一位置的桶中的主要数据流的总流量大小进行比较,得到所述第一位置的桶中的所述更新后的主要数据流的键值,包括:
确定所述N个二维数据结构在所述第一位置的桶中的所述主要数据流为所述X个键值中任意一个键值对应的数据流的总流量大小的估计值;
确定所述X个键值中任意一个键值对应的数据流的总流量大小中流量最大的数据流为所述更新后的主要数据流。
在一种可能的实现方式中,第一键值为所述X个键值中的任意一个,所述N个二维数据结构中第i个二维数据结构在所述第一位置的桶中的主要数据流的流量大小的估计值是根据以下公式得到的:
若所述第i个二维数据结构在所述第一位置的桶的主要数据流的键值是所述第一键值,则所述第一键值对应的数据流的流量大小的估计值为:S i(x)=(V i+C i)/2;
若所述第i个二维数据结构在所述第一位置的桶的主要数据流的键值不是所述第一键值,则所述第一键值对应的数据流的流量大小的估计值为:S i(x)=(V i-C i)/2;
其中,S i(x)表示所述第一键值对应的数据流的流量大小的估计值;x表示所述第一键值,V i表示所述第i个二维数据结构在所述第一位置的桶中所有数据流的流量总和;C i表示所述第i个二维数据结构在所述第一位置的桶中主要流量的计数器值。
在一种可能的实现方式中,所述控制设备获取多个数据采集设备中的二维数据结构,包括:
所述控制设备在每个时间周期结束时刻,获取所述多个数据采集设备中的二维数据结 构;
所述控制设备根据所述合并处理后的二维数据结构检测所述大流量数据流,包括:
若所述控制设备根据所述合并处理后的二维数据结构检测第一数据流在任意两个时间周期的变化值大于第一阈值,则确定所述第一数据流为所述大流量数据流。
可选地,控制设备可以周期性地获取多个数据采集设备中的二维数据结构;即数据采集设备可以周期性地向控制设备发送用于记录数据流信息的数据结构;该周期的大小可以是预设的时间间隔。
在一种可能的实现方式中,所述控制设备根据所述合并处理后的二维数据结构检测所述大流量数据流,包括:
若所述控制设备根据所述合并处理后的二维数据结构检测第一数据流的总流量大小大于第二阈值,则确定所述第一数据流为所述大流量数据流。
在一种可能的实现方式中,所述二维数据结构包括多数投票数据结构MV-Sketch。
可选地,二维数据数据结构可以为MV-Sketch,则在MV-Sketch中的每个桶中可以包括三个特征值,分别是存储到当前桶中的总数据流的大小即V i,j;当前桶中的majority数据流的键值即K i,j,键值可以表示majority数据流的标识,其中,majority数据流是指数据流的大小超过映射到当前桶中的总流量50%以上的数据流;当前桶中的majority数据流的计数器值即C i,j
第二方面,提供了一种大流量数据流的检测装置,包括:获取单元,用于获取多个数据采集设备中的二维数据结构,其中,其中,所述二维数据结构用于存储相应的数据采集设备获取的网络中的数据流的信息;处理单元,用于对所述数据采集设备中的多个二维数据结构进行合并处理,得到合并处理后的二维数据结构;根据所述合并处理后的二维数据结构检测大流量数据流,其中,所述大流量数据流是指所述网络中全网级别的大流量数据流。
其中,上述多个数据采集设备中的二维数据结构中的任意一个二维数据结构可以用于存储多个数据采集设备中一个数据采集设备获取的网络的数据流的信息;比如,数据流的信息可以包括获取的网络中的数据流的键值、数据流的大小等信息。
需要说明的是,多个数据采集设备中任意一个数据采集设备中的二维数据结构用于存储该数据采集设备获取的网络中的数据流的信息,即数据采集设备中的二维数据结构可以用于检测单点设备级别的大流量数据流,即流经单个数据采集设备的大流量数据流;而合并处理得到的合并处理后的二维数据结构可以用于检测网络中全网级别的大流量数据流,通过数据采集设备的检测可能并非大流量数据流,但是许多个数据采集设备中的某一数据流合并起来检测为大流量数据流,则该大流量数据流为全网级别的大流量数据流。
其中,大流量数据流是大流量对象(heavy hitter)与大变化对象(heavy changer)的数据流的总称,其中,大流量对象是指网络数据流在以包数量、字节数或者连接数等为单位的数值超过预期,即超过一定的阈值的数据流;大变化对象是指网络数据流在一个较短的时间段内,在以包数量、字节数或者连接数等为单位的数值大小产生了非常剧烈的变化的数据流。
应理解,上述各个数据采集设备上的二维数据结构可以用来记录数据采集设备即单点设备获取的数据流的信息;但是,对于全网级的数据流,即流经多个单点设备的数据流无 法进行检测;在本申请的实施例中,通过对多个数据采集设备中的数据结构进行合并处理得到的合并后的数据结构可以用于检测全网级的数据流,从而避免对于全网级别的大流量数据流的漏检的问题。
可选地,检测装置可以根据给定的数据流的键值与合并处理后的二维数据结构检测大流量数据流,从而估计该给定键值的数据流的总流量大小或者变化量大小。
可选地,检测装置可以根据合合并处理后的二维数据结构中各个桶中的键值对数据流依次进行轮询,查询各个数据流的总流量大小或者各个数据流的变化量大小。
基于本申请的技术方案,通过检测装置可以获取多个数据采集设备中的二维数据结构,其中,任意一个二维数据结构用于存储相应的数据采集设备获取的网络中的数据流的信息;检测装置可以对多个数据采集设备中的二维数据结构进行合并处理,得到合并处理后的二维数据结构;控制设备可以根据合并处理后的二维数据结构检测大流量数据流,该大流量数据流可以是指网络中全网级别的大流量数据流;即通过本申请实施例提供的大流量数据的检测方法避免了遗漏对全网级别的大流量数据流的检测,能够检测出全网级别的大流量数据流,从而提高大流量数据流检测的准确性。
在一种可能的实现方式中,所述二维数据结构是由多个桶组成的数据结构,所述处理单元具体用于:
所述控制设备对所述多个数据采集设备中的二维数据结构中相同位置的桶进行合并处理,得到所述合并处理后的二维数据结构。
需要说明的是,二维数据结构可以包括d行,每一行包括w个桶;上述多个二维数据结构可以具有相同的结构,即多个二维数据结构中各个二维数据结构具有相同的行数,并且每行包括相同数量的桶。对多个二维数据结构进行合并处理可以是指对多个数据结构中相同位置的桶进行合并处理;其中,相同位置可以是指位于多个二维数据结构中第i行第j列的桶。
在一种可能的实现方式中,所述多个数据采集设备中的二维数据数据结构中的任意一个桶包括当前桶中的数据流量总和、所述当前桶中的主要数据流的键值以及所述主要数据流的计数器值;所述合并处理后的二维数据结构中的任意一个桶包括更新后的数据流量总和、更新后的主要数据流的键值以及所述更新后的主要数据流的计数器值,所述任意一个桶包括在第一位置的桶,所述处理单元具体用于:
通过对所述多个数据采集设备中的二维数据结构中在所述第一位置的桶中的数据流量总和进行叠加,得到所述第一位置的桶中的所述更新后的数据流量总和;
通过对所述多个二维数据结构中在所述第一位置的桶中的主要数据流的流量大小进行比较,得到所述第一位置的桶中的所述更新后的主要数据流的键值;
通过所述更新后的主要数据流的键值以及所述主要数据流的流量大小,得到所述第一位置的桶中的所述更新后的主要数据流的计数器值。
基于本申请实施的技术方案,对多个数据采集设备中的二维数据结构的合并处理并非简单的对多个二维数据结构的桶中的特征值进行叠加;在更新某个桶中存储的数据流信息时,需要对其他二维数据结构中相同位置的桶中存储的数据流进行比较,然后进行合理的估计确定合并处理后的每个桶中的主要数据流;通过本申请实施例的多个二维数据结构的合并处理方法能够减小检测装置对内存的需求低,节省资源。
在一种可能的实现方式中,所述多个数据采集设备中的二维数据结构为N个二维数据结构,所述N个二维数据结构在所述第一位置的桶中对应的主要数据流的键值为X个键值,X为小于或者等于N的正整数;
所述处理单元具体用于:
确定所述N个二维数据结构在所述第一位置的桶中的所述主要数据流为所述X个键值中任意一个键值对应的数据流的流量大小的估计值;
确定所述X个键值中任意一个键值对应的数据流的流量大小中流量最大的数据流为所述更新后的主要数据流。
在一种可能的实现方式中,第一键值为所述X个键值中的任意一个,所述N个二维数据结构中第i个二维数据结构在所述第一位置的桶中的主要数据流的流量大小的估计值是根据以下公式得到的:
若所述第i个二维数据结构在所述第一位置的桶的主要数据流的键值是所述第一键值,则所述第一键值对应的数据流的流量大小的估计值为:S i(x)=(V i+C i)/2;
若所述第i个二维数据结构在所述第一位置的桶的主要数据流的键值不是所述第一键值,则所述第一键值对应的数据流的流量大小的估计值为:S i(x)=(V i-C i)/2;
其中,S i(x)表示所述第一键值对应的数据流的流量大小的估计值;x表示所述第一键值,V i表示所述第i个二维数据结构在所述第一位置的桶中所有数据流的流量总和;C i表示所述第i个二维数据结构在所述第一位置的桶中主要流量的计数器值。
在一种可能的实现方式中,所述获取单元具体用于:在每个时间周期结束时刻,获取所述多个数据采集设备中的二维数据结构;
所述处理单元具体用于:
若根据所述合并处理后的二维数据结构检测第一数据流在任意两个时间周期的变化值大于第一阈值,则确定所述第一数据流为所述大流量数据流。
可选地,检测装置可以周期性地获取多个数据采集设备中的二维数据结构;即多个数据采集设备可以周期性地向控制设备发送用于记录数据流信息的数据结构;该周期的大小可以是预设的时间间隔。
在一种可能的实现方式中,所述处理单元具体用于:若根据所述合并处理后的二维数据结构检测第一数据流的总流量大小大于第二阈值,则确定所述第一数据流为所述大流量数据流。
在一种可能的实现方式中,所述二维数据结构包括多数投票数据结构MV-Sketch。
可选地,二维数据数据结构可以为MV-Sketch,则在MV-Sketch中的每个桶中可以包括三个特征值,分别是存储到当前桶中的总数据流的大小即V i,j;当前桶中的majority数据流的键值即K i,j,键值可以表示majority数据流的标识,其中,majority数据流是指数据流的大小超过映射到当前桶中的总流量50%以上的数据流;当前桶中的majority数据流的计数器值即C i,j
第三方面,提供了一种大流量数据流的检测装置,该检测装置包括:存储器,用于存储程序;处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器用于执行上述第一方面以及第一方面的任意一种实现方式中的检测方法。
应理解,在上述第一方面中对相关内容的扩展、限定、解释和说明也适用于第三方面 中相同的内容。
第四方面,提供一种计算机存储介质,该计算机存储介质存储有程序代码,该程序代码包括用于执行第一方面以及第一方面中的任意一种实现方式中的检测方法中的步骤的指令。
上述存储介质具体可以是非易失性存储介质。
第五方面,提供一种芯片,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,执行上述第一方面以及第一方面的任意一种实现方式中的检测方法。
可选地,作为一种实现方式,所述芯片还可以包括存储器,所述存储器中存储有指令,所述处理器用于执行所述存储器上存储的指令,当所述指令被执行时,所述处理器用于执行第一方面以及第一方面中的任意一种实现方式中的检测方法。
上述芯片具体可以是现场可编程门阵列FPGA或者专用集成电路ASIC。
附图说明
图1是本申请实施例提供的二维数据结构的示意图;
图2是本申请实施例提供的应用场景的示意图;
图3是本申请实施例提供的系统架构的示意图;
图4是本申请实施例提供的大流量数据流的检测方法的示意性流程图;
图5是本申请一个实施例提供的大流量数据流的检测方法的示意性流程图;
图6是本申请实施例提供的多个二维数据结构进行合并处理的示意图;
图7是本申请实施例提供的在公开网络流量数据集上的检测结果的示意图;
图8是本申请实施例提供的在公开网络流量数据集上的检测结果的示意图;
图9是本申请实施例提供的检测装置的示意性框图;
图10是本申请实施例提供的检测装置的示意性框图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
应理解,在本申请的各实施例中,“第一”、“第二”、“第三”等仅是为了指代不同的对象,并不表示对指代的对象有其它限定。
为了更好地理解本申请实施例的数据流的测量方法,下面先对相关的一些基本概念进行简单说明。
1、大流量数据流(heavy flow)
大流量数据流是大流量对象(heavy hitter)与大变化对象(heavy changer)的数据流的总称,其中,大流量对象是指网络数据流在以包数量、字节数或者连接数等为单位的数值超过预期,即超过一定的阈值的数据流;大变化对象是指网络数据流在一个较短的时间段内,在以包数量、字节数或者连接数等为单位的数值大小产生了非常剧烈的变化的数据 流。
2、数据结构(Sketch)
通常Sketch是指一种二维表的数据结构,由若干行组成,每行由若干个桶(bucket)构成。
例如,在MV-Sketch中,每个桶中包括可以三个元素,分别是当前桶中的总流量、当前桶中majority flow的键值以及当前桶中majority flow的总流量计数器;其中,Majority flow是指数据流的大小超过映射到当前桶中的总流量50%以上的数据流。
示例性地,如图1所示为MV-Sketch的示意图。如图1所示是一个d×w的二维数据结构;该二维数据结构中包括d行,每行包括w个桶(bucket),每个桶中包括三个元素。
例如,以第i行的第j个桶B(i,j)举例说明;B(i,j)中包括V i,j、K i,j以及C i,j;V i,j表示当前桶中的所有数据流的总流量;K i,j表示当前桶中majority数据流的键值,即majority数据流的标识,其中,majority数据流是指数据流的大小超过映射到当前桶中的总流量50%以上的数据流;计数器C i,j用于对该桶中majority数据流进行计数。
例如,在LD-Sketch中,每个桶中可以包括四个元素,分别为当前桶中的总流量、当前桶流量估计的最大误差、附属队列(记录了该桶中部分对象的流量估计值)、附属队列的长度。
目前,随着通信网络的快速发展,网络中的数据流呈现爆破式增长,从而导致网络的异常频发;在网络异常流量检测中,有两种类型的异常流量尤为值得关注,一种是流量的大小巨大的数据流,也称为大流量对象(heavy hitter);另一种是在一定时间内,流量大小变化很大的数据流,也称为大变化对象(heavy changer);这两种合称为大流量数据流(heavy flow)。目前的大流量数据流检测算法大多是在网络中的某个设备节点部署多个二维表,用二维表存储所有数据流的信息,进而做异常数据流检测。但是,网络中的数据流可能分布在多个设备上,通过单个设备进行异常数据流的检测可能只能检测出单设备上的大流量数据流;而对于单设备上较小的数据流,许多单设备合并起来看可能是一个大流量数据流可能会存在遗漏检测的问题。
有鉴于此,本申请提出了一种大数据流量的检测方法通过控制设备获取多个数据采集设备中的二维数据结构,其中,任意一个二维数据结构用于存储相应的数据采集设备获取的网络中的数据流的信息;控制设备可以对多个数据采集设备中二维数据结构进行合并处理,得到合并处理后的二维数据结构控制设备可以根据合并处理后的二维数据结构检测大流量数据流,该大流量数据流是指网络中全网级别的大流量数据流;即通过本申请实施例提供的大流量数据流的检测方法避免了遗漏对全网级别的大流量数据流的检测,能够检测出全网级别的大流量数据流,从而提高大流量数据流检测的准确性。
下面将结合具体的例子详细描述本申请实施例。应注意,这只是为了帮助本领域技术人员更好地理解本申请实施例,而非限制本申请实施例的范围。
图2是根据本申请的实施例的应用场景的示意图。
如图2所示,系统100中可以包括控制设备101和多个数据采集设备(例如,数据流流经的单节点设备)102;控制设备101可以用于在预设的时间周期末,获取系统中的多个数据采集设备102中的二维数据结构,并进行数据流的检测,从而实现全网级数据流的大流量的数据检测。
数据采集设备102可以为任意获取数据流的设备,例如,网关处或者其他采集网络数据流的设备,可以包括当前技术已知的任何计算设备,如服务器、台式计算机等等。
数据采集设备102中可以包括存储器和处理器。存储器可以用于存储程序代码,例如,操作系统以及其他应用程序。处理器可以用于调用存储器存储的程序代码,以实现节点的相应功能。节点中包括的处理器和存储器可以通过芯片实现,此处不作具体的限定。
控制设备101可以周期性地获取数据采集设备102中的用于记录数据流的二维数据结构,从而确定网络中流经各个数据采集设备102的数据流的流量大小。
图3是本申请实施例提供的系统架构的示意图。
例如,本申请实施例提供的数据流的检测方法的系统架构中可以包括数据流采集模块210、数据流处理模块220以及数据流异常检测模块230,其中,数据流处理模块220又可以包括局部更新模块221、全局合并模块222以及数据流量估计模块223。
其中,数据流采集模块210用于从网关处或者其他采集网络数据流的设备处采集流量,提取数据流五元组特征(源地址、源端口、目的地址、目的端口、协议)以及数据流的大小,其中五元组特征用来唯一标识数据流。
数据流处理模块220用于在每个网关(例如,数据采集设备)处部署二维数据结构,将采集到的数据流映射存储到二维数据结构中,例如,通过采用多数投票算法将数据流映射存储到二维数据结构中;然后,将多个数据采集设备的二维数据结构进行合并,得到估计表中待查询的数据流流量和,用于后续检测某个数据流在全网的异常情况。
示例性地,数据流处理模块220中还可以包括局部更新模块221、全局合并模块222以及数据流量估计模块223。
其中,局部更新模块221用于在每个数据采集设备处部署的二维数据结构,通过哈希函数将数据流映射到二维数据结构中,对数据流建立哈希索引;二维数据结构中的每个格子(桶)可以包括数据流的标识值,映射到该格子的所有数据流的大小总和以及一个计数器。
全局合并模块222用于对所有数据采集设备处的二维数据结构做合并处理,以便于统计全网数据流的情况。
需要说明的是,由于对于同一流经不同数据采集设备的数据流而言,该数据流在不同的数据采集设备处的二维数据结构中的映射位置可能不同,因此,在对不同数据采集设备处的二维数据结构进行合并处理时不是直接把所有相同位置格子的数据流简单相加;对某个格子的数据流,需要进行更新键值、映射到该格子的所有数据流大小的总和、计数器时,需要比较其他数据采集设备处二维数据结构中格子的情况,然后进行合理的估计。
数据流大小估计模块223用于对全网数据级数据流进行检测,其中,二维数据结构大小一般是估计的,所以会存在多个数据流映射到相同格子的情况;通过比较所有格子中的键值,估计某个数据流在全网的总大小值,以便于做下一步的异常检测。
数据流异常检测模块230用于在每个时间周期末,基于数据流的估计值做异常检测,判断某个数据流是否是异常值。其中,时间周期可以是预设的时间间隔。
下面将结合图4,具体介绍本申请的实施例中在包括控制设备和数据采集设备的系统中的数据流的检测方法。
图4是根据本申请一个实施例的大流量数据流的检测方法的示意性流程图。其中,数 据采集设备可以是系统中多个数据采集设备中的任意一个,例如,可以是图2中的任意一个数据采集设备。
在本申请的实施例中,系统中可以包括多个数据采集设备和一个控制设备,控制设备对多个数据采集设备提供的映射数据流的二维数据结构进行处理,从而可以得到全网级的数据流的检测结果。
步骤310、控制设备可以获取多个数据采集设备中的二维数据结构。
其中,二维数据结构用于存储相应的数据采集设备获取的网络中的数据流的信息。
应理解,上述多个数据采集设备中的二维数据结构中的任意一个二维数据结构可以用于存储多个数据采集设备中一个数据采集设备获取的网络的数据流的信息;比如,数据流的信息可以包括获取的网络中的数据流的键值、数据流的大小等信息。
还应理解,上述网络可以是指任意被检测的对象,网络可以是由一个或者多个设备组成的网络。
可选地,在一种可能的实现方式中,控制设备可以周期性地获取多个数据采集设备中二维数据结构。
例如,在每个时间周期结束时刻,多个数据采集设备可以向控制设备发送该数据采集设备中的二维数据结构,其中,该时间周期可以是预设的时间间隔。
步骤320、控制设备对多个数据采集设备中二维数据结构进行合并处理,得到合并处理后的二维数据结构。
其中,上述合并处理后的二维表结构可以用于存储网络中的数据流的信息。
可选地,在一种可能的实现方式中,二维数据结构是由多个桶组成的数据结构,控制设备对多个数据采集设备中二维数据结构进行合并处理,得到合并处理后的二维数据结构,可以包括:控制设备对多个数据采集设备中二维数据结构中相同位置的桶进行合并处理,得到合并处理后的二维数据结构。
示例性地,二维数据结构可以如图1所示,图1所示的二维数据结构包括d行,每一行包括w个桶;上述多个二维数据结构可以具有相同的结构,即多个二维数据结构中各个二维数据结构具有相同的行数,并且每行包括相同数量的桶。对多个二维数据结构进行合并处理可以是指对多个数据结构中相同位置的桶进行合并处理。
例如,如图1所示,二维数据数据结构可以为MV-Sketch,则在MV-Sketch中的每个桶中可以包括三个特征值,分别是存储到当前桶中的总数据流的大小即V i,j;当前桶中的majority数据流的键值即K i,j,键值可以表示majority数据流的标识,其中,majority数据流是指数据流的大小超过映射到当前桶中的总流量50%以上的数据流;当前桶中的majority数据流的计数器值即C i,j。对多个二维数据结构进行合并处理,则需要更新每个桶中的特征值,即对于合并处理后的二维数据结构中的任意一个桶,包括的三个特征值分别为:更新后的数据流量总和、更新后的主要数据流的键值以及更新后的主要数据流的计数器值。
可选地,在一种可能的实现方式中,对多个数据采集设备中二维数据结构中第一位置的桶进行合并处理的过程可以包括:通过对所述多个数据采集设备中二维数据结构中在所述第一位置的桶中的数据流量总和进行叠加,得到所述第一位置的桶中的所述更新后的数据流量总和;通过对所述多个数据采集设备中二维数据结构中在所述第一位置的桶中的主 要数据流的流量大小进行比较,得到所述第一位置的桶中的所述更新后的主要数据流的键值;通过所述更新后的主要数据流的键值以及所述主要数据流的流量大小,得到所述第一位置的桶中的所述更新后的主要数据流的计数器值。
应理解,在本申请的实施例中上述合并处理算法并非简单的对多个数据采集设备中二维数据结构的桶中的特征值进行叠加;在更新某个桶中存储的数据流信息时,需要对其他二维数据结构中相同位置的桶中存储的数据流进行比较,然后进行合理的估计确定合并处理后的每个桶中的主要数据流;通过本申请实施例提供的多个数据采集设备中二维数据结构的合并处理方法能够减小控制设备对内存的需求低,节省资源。
可选地,在一种可能的实现方式中,多个数据采集设备中的二维数据结构为N个二维数据结构,N个二维数据结构在第一位置的桶中对应的主要数据流的键值为X个键值,X为小于或者等于N的正整数;通过对多个数据采集设备中二维数据结构在第一位置的桶中的主要数据流的流量大小进行比较,所述第一位置的桶中的所述更新后的主要数据流的键值,包括:确定N个二维数据结构在第一位置的桶中的主要数据流为X个键值中任意一个键值对应的数据流的流量大小的估计值;确定X个键值中任意一个键值对应的数据流的流量大小中流量最大的数据流为更新后的主要数据流。
示例性地,第一键值为X个键值中的任意一个,N个二维数据结构中第i个二维数据结构在第一位置的桶中的主要数据流的流量大小的估计值可以是根据以下公式得到的:
若第i个二维数据结构在第一位置的桶的主要数据流的键值是第一键值,则第一键值对应的数据流的流量大小的估计值为:S i(x)=(V i+C i)/2;
若第i个二维数据结构在第一位置的桶的主要数据流的键值不是第一键值,则第一键值对应的数据流的流量大小的估计值为:S i(x)=(V i-C i)/2;
其中,S i(x)表示所述第一键值对应的数据流的流量大小的估计值;x表示所述第一键值,V i表示所述第i个二维数据结构在所述第一位置的桶中所有数据流的流量总和;C i表示所述第i个二维数据结构在所述第一位置的桶中主要流量的计数器值。
需要说明的是,二维数据结构中的第一位置的桶可以是指位于二维数据结构中的第i行第j列的桶B(i,j)。
例如,若其他位置桶中的K值与当前桶中的K值一致,则合并处理后映射到当前位置桶中的majority数据流大小总和的估计值为:
Figure PCTCN2021072863-appb-000001
其中,
Figure PCTCN2021072863-appb-000002
表示q个二维表中第M个二维表中桶B(i,j)的所有数据流的总和;
Figure PCTCN2021072863-appb-000003
表示q个二维表中第M个二维表中桶B(i,j)的majority数据流的计数器;若其他位置桶中的K值与当前桶中的K值不一致,则合并处理后映射到当前位置桶中的majority数据流大小的估计值为:
Figure PCTCN2021072863-appb-000004
步骤330、控制设备根据合并处理后的二维数据结构检测大流量数据流。
其中,通过合并处理后的二维数据结构检测的大流量数据流可以是指网络中全网级别的大流量数据流。比如,全网级别的大流量数据流可以是指通过单个数据采集设备检测某一数据流可能并非大流量数据流,但是许多个数据采集设备中的某一数据流合并起来检测为大流量数据流,则该大流量数据流为全网级别的大流量数据流。
在一种可能的实现方式中,可以根据给定的数据流的键值在合并后的二维表中进行查询,从而估计该给定键值的数据流的总流量大小或者变化量大小。
在一种可能的实现方式中,可以根据合并后的二维表中各个桶中的键值对数据流依次进行轮询,查询各个数据流的总流量大小或者各个数据流的变化量大小。
可选地,在一种可能的实现方式中,所述控制设备在每个时间周期结束时刻,获取所述多个数据采集设备中的二维数据结构;控制设备根据合并处理后的二维数据结构检测大流量数据流,包括:
若控制设备根据合并处理后络的二维数据结构检测第一数据流在任意两个时间周期的变化值大于第一阈值,则确定第一数据流为所述大流量数据流;即可以确定第一数据流为网络中全网级别的大流量数据流。
可选地,在一种可能的实现方式中,控制设备根据合并处理后的二维数据结构检测大流量数据流,包括:
若控制设备根据合并处理后的二维数据结构检测第一数据流的总流量大小大于第二阈值,则确定第一数据流所述大流量数据流;即可以确定第一数据流为网络中全网级别的大流量数据流。
示例性地,上述二维数据结构可以是指MV-Sketch,或者,LD-Sketch,或者其他Sketch结构,本申请对此不作任何限定。
本申请提供的大数据流量的检测方法,通过控制设备获取多个数据采集设备中的二维数据结构,其中,任意一个二维数据结构可以用于存储相应的数据采集设备获取的网络中的数据流的信息;;控制设备可以对多个数据采集设备中二维数据结构进行合并处理,得到合并处理后的二维数据结构;控制设备可以根据合并处理后的二维数据结构检测大流量数据流,该大流量数据流是指网络中全网级别的大流量数据流;即通过本申请实施例提供的大流量数据流的检测方法避免了遗漏对全网级别的大流量数据流的检测,能够检测出全网级别的大流量数据流,从而提高大流量数据流检测的准确性。
图5是本申请一个实施例的数据流的检测方法的示意性流程图。其中,图5所示的检测方法包括步骤401至步骤407,下面对步骤401至步骤407进行详细的描述。
步骤401、开始。
步骤402、数据采集设备获取网络中的数据流。
应理解,上述网络可以是指任意被检测的对象,网络可以是由一个或者多个设备组成的网络。
其中,数据采集设备可以是指图2所示的网络中的任意一个数据采集设备,数据采集设备用于获取网络中的数据流。
步骤403、数据采集设备通过局部更新算法将获取的数据流的信息记录到数据结构中。
示例性地,上述数据表结构可以是指MV-Sketch,或者LD-Sketch或者其他二维数据结构。
例如,以数据表结构为MV-Sketch举例来说,假设MV-Sketch由r行构成,每行包括w个桶;当数据采集设备获取网络中的数据流(或者,数据包)可以利用r个独立的哈希函数,将数据流分别映射到1至r行,所映射的序列j由哈希值hi(x)决定;通过哈希函数将数据流映射到二维数据结构中每个的一个桶中,根据多数投票算法来更新majority数据 流。
举例说明,majority数据流是指在当前桶中的总流量50%以上的数据流;假设有三个候选数据流A、B、C,并假设按照以下顺序对数据流进行投票:AAACCBBCCCBCC;记录完第3张投票后,数据流C以3票领先;在处理接下来的三张投票时,将三张投给数据流的A的票与三张其他票(CCB)抵消;最终,记录所有选片后,数据流C成为,majority数据流。
进一步,在一个数据采集设备重新获取网络中的数据流后,需要对该数据采集设备处之前的二维数据结构进行更新;更新过程在接收到每个数据流(对象X,值V X)时均被调用,以(X,V X)为输入,对二维数据结构中的更新过程进行说明。
例如,对于二维数据结构中的每一行,该行对应的哈希函数将X映射到该行中的某个桶中;从而更新该桶中的信息,即更新桶中的三个元素V ij、K ij以及C ij;参见图1所示。
示例性地,以第i行的第j个桶B(i,j)进行举例说明;B(i,j)中更新前包括V i,j、K i,j以及C i,j;V i,j表示当前桶中的所有数据流的总流量;K i,j表示当前桶中majority数据流的键值,即majority数据流的标识,其中,majority数据流是指数据流的大小超过映射到当前桶中的总流量50%以上的数据流;计数器C i,j表示该桶中majority数据流的总流量,假设,桶B(i,j)更新后包括的三个元素分别为V1 i,j、K1 i,j以及C1 i,j;采用局部更新算法进行更新的过程如下:
步骤1:桶B(i,j)更新后的所有数据流的总流量等于更新前该桶中所有数据流的总流量与V X的总和;即V1 i,j=V i,j+V X
步骤2:若对数据流X是桶B(i,j)中的majority数据流并且X更新前在当前桶中,则执行步骤3;若X不是桶B(i,j)中的majority数据流或者X之前在当前桶中不存在,则执行步骤4。
步骤3:更新majority数据流的总流量C1 i,j=C i,j+V X;返回。
步骤4:更新majority数据流的总流量C1 i,j=C i,j-V X;若执行步骤4后,C1 i,j小于0则执行步骤5,否则返回。
步骤5:更新当前桶中majority数据流的键值以及总流量值K1 i,j=X,C1 i,j=-C1 i,j,返回。
通过上述局部更新算法,当数据采集设备获取新的数据流后可以对该数据采集设备上的用于映射存储数据流信息的数据结构进行更新。
在一个示例中,在数据采集设备中可以通过MV-Sketch记录获取的数据流的信息,则在数据采集设备获取新的数据流后可以通过上述局部更新算法对数据采集设备上的MV-Sketch进行更新。
在一个示例中,在数据采集设备中可以通过LD-Sketch记录获取的数据流的信息,则在数据采集设备获取新的数据流后可以通过局部更新算法对数据采集设备上的LD-Sketch进行更新。
步骤404、控制设备获取各个数据采集设备上的数据结构并进行数据结构的合并处理,得到合并后的数据结构。
应理解,上述各个数据采集设备上的数据结构可以用来记录单点设备采集的数据流的信息,但是对于全网级的数据流,即流经多个单点设备的数据流无法进行检测;在本申请 的实施例中,通过对多个数据采集设备中的数据结构进行合并处理得到的合并后的数据结构可以用于检测全网级的数据流,从而避免对于全网级别的大流量数据流的漏检的问题。
示例性地,控制设备可以周期性地获取数据采集设备中的数据结构;即数据采集设备可以周期性地向控制设备发送用于记录数据流信息的数据结构;该周期的大小可以是预设的时间间隔。
下面对控制设备获取的多个数据采集设备的数据结构进行合并处理的过程进行详细的描述,即控制设备需要将获取的各个单点设备上的数据结构合并成一个用于记录全网级数据流即流经各个单点设备的数据流的数据结构。
例如,在本申请的实施例中,控制设备可以通过全局合并算法对获取的多个数据采集设备中的数据结构进行合并处理。其中,合并处理是指对多个数据结构中的相同位置的桶进行合并,即更新相同位置的桶中记录的数据流的信息。比如,对于MV-Sketch而言,需要更新合并后的每个桶中包括的V i,j、K i,j以及C i,j
示例性地,以控制设备获取q个数据结构进行举例说明;控制设备可以对q个数据结构中的相同位置进行合并处理,从而更新数据结构中的信息,得到合并处理后的数据结构;例如,采用全局合并算法对q个MV-Sketch二维表进行合并处理的过程如下:
步骤1:更新桶中数据流的总和,即将q个二维表中映射到当前桶中所有数据流量求和。
例如,对所有q个二维表中相同位置的桶B(i,j)的V进行相加,得到合并处理后的二维表中桶B(i,j)的V值;即
Figure PCTCN2021072863-appb-000005
其中,
Figure PCTCN2021072863-appb-000006
表示q个二维表中第M个二维表中桶B(i,j)的所有数据流的总和,M为整数。
步骤2:更新桶中majority数据流的键值,比较当前桶中的键值与其他二维表中的相同位置桶中的键值,更新当前桶中的键值。
例如,若其他位置桶中的K值与当前桶中的K值一致,则合并处理后映射到当前位置桶中的majority数据流大小的估计值为:
Figure PCTCN2021072863-appb-000007
其中,
Figure PCTCN2021072863-appb-000008
表示q个二维表中第M个二维表中桶B(i,j)的所有数据流的总和;
Figure PCTCN2021072863-appb-000009
表示q个二维表中第M个二维表中桶B(i,j)的majority数据流的计数器;若其他位置桶中的K值与当前桶中的K值不一致,则合并处理后映射到当前位置桶中的majority数据流大小的估计值为:
Figure PCTCN2021072863-appb-000010
步骤3:比较q个二维表在桶B(i,j)中的可能的majority数据流的估计值,当前桶B(i,j)的K i,j取其中最大的估计值对应的数据流的键值。
步骤4:更新合并处理后的二维表中桶B(i,j)的K值。
步骤5:更新合并处理后的二维表中桶B(i,j)的C i,j,即桶B(i,j)中majority数据流的计数器值;其中,C i,j=Max{2est(K i,j)-V i,j,0}。
通过上述全局合并算法,控制设备可以将获取的多个数据采集设备的q个二维表进行合并,得到用于记录全网级数据流即流经各个单点设备的数据流的二维表。
下面以q个二维表通过上述全局合并算法得到一个合并处理后的二维表的过程进行 举例说明。假设,通过上述全局合并算法对三个二维表中的桶(1,1)进行合并,对于三个二维表桶(1,1)中的键值最多可能存在三种可能X、Y以及Z,即三个二维表中桶(1,1)中的键值中的键值均不相同,或者也可以是三个二维表中桶(1,1)中的键值部分相同或者完全相同;例如,若第一个二维表桶(1,1)中的键值为X,则第一个二维表中键值X对应的数据流的大小的估计值为
Figure PCTCN2021072863-appb-000011
若第一个二维表桶(1,1)中的键值不是X,则第一个二维表中键值X对应的数据流的大小的估计值为
Figure PCTCN2021072863-appb-000012
进而,继续判断第二个二维表中桶(1,1)中的键值是否为X;若第二个二维表中桶(1,1)中的键值为X,则第一个二维表与第二个二维表桶(1,1)位置进行合并操作,得到键值X对应的数据流大小的估计值为:
Figure PCTCN2021072863-appb-000013
若第二个二维表中桶(1,1)中的键值不是X,则第一个二维表与第二个二维表桶(1,1)位置进行合并操作,得到键值X对应的数据流大小总和的估计值为:
Figure PCTCN2021072863-appb-000014
进而,继续判断第三个二维表中桶(1,1)中的键值是否为X;若第三个二维表中桶(1,1)中的键值为X,则第一个二维表、第二个二维表以及第三个二维表中桶(1,1)位置进行合并操作,得到键值X对应的数据流大小总和的估计值为:
Figure PCTCN2021072863-appb-000015
若第三个二维表中桶(1,1)中的键值不是X,则第一个二维表、第二个二维表以及第三个二维表中桶(1,1)位置进行合并操作,得到键值X对应的数据流大小的估计值为:
Figure PCTCN2021072863-appb-000016
同理,分别计算第一个二维表桶(1,1)中的键值为Y或者Z对应的数据流大小总和的估计值,最终合并后二维表桶(1,1)为数据流大小总和的估计值中最大的估计值对应的数据流的键值;合并后桶(1,1)中的C 1,1选取Max{2*e3(K)-V i,j,0},其中,e3(K)表示e3(X)、e3(Y)以及e3(Z)中最大的;V i,j表示三个二维表中桶(1,1)中所有数据流叠加的总和;合并后二维表桶(1,1)中的
Figure PCTCN2021072863-appb-000017
即三个二维表中桶(1,1)中所有数据流叠加的总和。
在一个可能的实现方式中,一个数据采集设备中可以部署一个二维表,通过步骤403中的局部更新算法可以实现数据采集设备在二维表中记录数据流的信息。
例如,如图6所示,每个数据采集设备在获取数据流之后可以将数据流的信息记录在二维数据结构中;控制设备可以获取多个数据采集设备中的二维数据结构并通过上述全局合并算法对多个二维数据结构进行合并处理,得到合并后的用于记录全网级数据流的二维数据结构。其中,上述各个数据采集设备可以是同一网络中的多个数据采集设备,或者,也可以是不同网络中的多个数据采集设备。
在一个可能的实现方式中,一个数据采集设备中也可以部署多个二维表,数据采集设备均匀的将获取的数据流的信息记录在多个二维表中,进而可以将数据采集设备上部署的多个二维表通过上述全局合并算法合并成一个二维表;控制设备获取每个数据采集设备中合并后的二维表再进行合并处理,最终得到用于记录全网级数据流即流经各个单点设备的数据流的二维表。
示例性地,上述二维表可以是指MV-Sketch,或者,LD-Sketch,或者其他Sketch结构,本申请对此不作任何限定。
步骤405、数据流大小估计。
例如,可以根据上述步骤404得到的合并处理后的二维表进行某个数据流大小的估计。
在一种可能的实现方式中,可以根据给定的数据流的键值在合并后的二维表中进行查询,从而估计该给定键值的数据流的总流量大小或者变化量大小。
在一种可能的实现方式中,可以根据合并后的二维表中各个桶中的键值对数据流依次进行轮询,查询各个数据流的总流量大小或者各个数据流的变化量大小。
示例性地,利用合并后的二维数据结构,可以估计某个数据流的总流量,数据流X会被映射到合并后的二维数据结构的第1行至d行中每行的某个桶中,可以采用如下所述的估计算法对数据流大小进行估算:
步骤一、假设查询数据流X的流量大小,若合并后的二维表中当前桶中的键值与数据流X的键值相同,则数据流X在当前桶B(i,j)中的总流量大小的估计值为S i(x)=(V i,j+C i,j)/2;其中,V i,j表示合并后的二维表中当前桶B(i,j)中所有数据流的总流量大小;C i,j表示合并后的二维表中当前桶B(i,j)中majority数据流的计数器值。若合并后的二维表中当前桶中的键值与数据流X的键值不相同,则数据流X在当前桶B(i,j)中的总流量大小的估计值为S i(x)=(V i,j-C i,j)/2。
步骤二:数据流X的总流量大小的估计值为S(x)=min{S i(x),1≤i≤d}。
步骤406、异常数据流检测。
其中,异常数据流检测可以是判断上述步骤405中数据流X是否为大流量对象或者大变化对象,即判断数据流X是否为大流量数据流。
示意性地,给定阈值
Figure PCTCN2021072863-appb-000018
S可以表示一个时期内所有数据流的容量总和;D可以表示两个时间周期内所有数据流量容量总和的差异,即变化值;可以根据以下流程判断数据流X是否为大流量对象或者大变化对象:
1、若
Figure PCTCN2021072863-appb-000019
则表示数据流X是大流量对象;
2、若
Figure PCTCN2021072863-appb-000020
则表示数据流X是大变化对象,其中,D(x)表示数据流X在两个时间周期内S(x)的差异,即在两个时间周期内数据流X的变化量。
当数据流X满足上述1或2,则可以确定数据流X为全网级别的大流量数据流,从而对数据流X可以进行后续的监测。
步骤407、结束。
应理解,上述举例说明是为了帮助本领域技术人员理解本申请实施例,而非要将本申请实施例限于所例示的具体数值或具体场景。本领域技术人员根据所给出的上述举例说明,显然可以进行各种等价的修改或变化,这样的修改或变化也落入本申请实施例的范围内。
图7是本申请实施例提供在公开网络流量数据集上对大流量对象的检测结果的示意图;图8是本申请实施例提供的在公开数据流量数据集上对大变化对象的检测结果的示意图。
其中,图7与图8所示的检测结果的示意图中数据集的采集时长是5分钟,每1分钟为一个时间周期;每个周期内包括大约29M的数据包,1M数据流,内存大小从64KB到4MB。测试指标包括:准确率(precision):用于预估的检测结果中大流量数据流占所有 数据流的比例;召回率(recall):用于预估的检测结果中大流量数据流占所有大流量数据流的比例;F1值(F1 Score):用于对准确率和召回率进行整体评价;相对误差(relative error):测试周期内,数据流估计的误差比例。其中,检测结果的数据结构包括亚线性空间数据结构(Count-min,CM);多数投票数据结构(Majority vote,MV);局部分布式数据结构(Local-distributed,LD);Deltoid数据结构(Del);以及快速数据结构(FAST)。
从图7与图8所示的检测结果中可以看出,本申请实施例提供的检测方法取得效果要比其他方法要好;其中,召回率都是1,当内存超过128KB,准确率都能达到95%以上,相对误差率在0.01以下。
上文结合图1至图8详细描述了本申请实施例提供的大数据流量的检测方法,下面将结合图9和图10,详细描述本申请的装置实施例。应理解,本申请实施例中的检测装置可以执行前述本申请实施例的各种大数据流量的检测方法,即以下各种产品的具体工作过程,可以参考前述方法实施例中的对应过程。
图9是本申请实施例提供的大流量数据流的检测装置500的示意性框图。应理解,检测装置500能够执行图4或图5的检测方法中的各个步骤,为了避免重复,此处不再详述。检测装置500包括:获取单元510和处理单元520。
其中,获取单元510用于获取多个数据采集设备中二维数据结构,其中,所述二维数据结构用于存储相应的数据采集设备获取的网络中的数据流的信息;处理单元520用于对所述多个数据采集设备中二维数据结构进行合并处理,得到合并处理后的二维数据结构;根据所述合并处理后的二维数据结构检测大流量数据流,所述大流量数据流是指所述网络中全网级别的大流量数据流。
可选地,作为一个实施例,所述二维数据结构是由多个桶组成的数据结构,所述处理单元520具体用于:
对所述多个数据采集设备中的二维数据结构中相同位置的桶进行合并处理,得到所述合并处理后的二维数据结构。
可选地,作为一个实施例,所述多个数据采集设备中的二维数据数据结构中的任意一个桶包括当前桶中的数据流量总和、所述当前桶中的主要数据流的键值以及所述主要数据流的计数器值;所述合并处理后的二维数据结构中的任意一个桶包括更新后的数据流量总和、更新后的主要数据流的键值以及所述更新后的主要数据流的计数器值,所述任意一个桶包括在第一位置的桶,所述处理单元520具体用于:
通过对所述多个数据采集设备中的二维数据结构中在所述第一位置的桶中的数据流量总和进行叠加,得到所述第一位置的桶中的所述更新后的数据流量总和;
通过对所述多个数据采集设备中的二维数据结构中在所述第一位置的桶中的主要数据流的流量大小进行比较,得到所述第一位置的桶中的所述更新后的主要数据流的键值;
通过所述更新后的主要数据流的键值以及所述主要数据流的流量大小,得到所述第一位置的桶中的所述更新后的主要数据流的计数器值。
可选地,作为一个实施例,所述多个数据采集设备中的二维数据结构为N个二维数据结构,所述N个二维数据结构在所述第一位置的桶中对应的主要数据流的键值为X个键值,X为小于或者等于N的正整数;所述处理单元520具体用于:
确定所述N个二维数据结构在所述第一位置的桶中的所述主要数据流为所述X个键 值中任意一个键值对应的数据流的流量大小的估计值;
确定所述X个键值中任意一个键值对应的数据流的流量大小中流量最大的数据流为所述更新后的主要数据流。
可选地,作为一个实施例,所述处理单元520具体用于:
所述N个二维数据结构中第i个二维数据结构在所述第一位置的桶中的主要数据流的流量大小的估计值是根据以下公式得到的:
若所述第i个二维数据结构在所述第一位置的桶的主要数据流的键值是第一键值,则第一键值对应的数据流的流量大小的估计值为:S i(x)=(V i+C i)/2;
若所述第i个二维数据结构在所述第一位置的桶的主要数据流的键值不是第一键值,则第一键值对应的数据流的流量大小的估计值为:S i(x)=(V i-C i)/2;
其中,S i(x)表示所述第一键值对应的数据流的流量大小的估计值;x表示所述第一键值,V i表示所述第i个二维数据结构在所述第一位置的桶中所有数据流的流量总和;C i表示所述第i个二维数据结构在所述第一位置的桶中主要流量的计数器值。
可选地,作为一个实施例,所述获取单元510具体用于在每个时间周期结束时刻,获取所述多个数据采集设备中的二维数据结构;
所述处理单元520具体用于:
若根据所述合并处理后的二维数据结构检测第一数据流在任意两个时间周期的变化值大于第一阈值,则确定所述第一数据流为所述大流量数据流。
可选地,作为一个实施例,所述处理单元520具体用于:
若根据所述合并处理后的二维数据结构检测第一数据流的总流量大小大于第二阈值,则确定所述第一数据流为所述大流量数据流。
可选地,作为一个实施例,所述二维数据结构包括多数投票数据结构MV-Sketch。
应理解,这里的检测装置500以功能单元的形式体现。这里的术语“单元”可以通过软件和/或硬件形式实现,对此不作具体限定。
例如,“单元”可以是实现上述功能的软件程序、硬件电路或二者结合。所述硬件电路可能包括应用特有集成电路(application specific integrated circuit,ASIC)、电子电路、用于执行一个或多个软件或固件程序的处理器(例如共享处理器、专有处理器或组处理器等)和存储器、合并逻辑电路和/或其它支持所描述的功能的合适组件。
因此,在本申请的实施例中描述的各示例的单元,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
图10是本申请实施例的大流量数据流的检测装置的硬件结构示意图。
图10所示的检测装置600包括存储器601、处理器602、通信接口603以及总线604。其中,存储器601、处理器602、通信接口603通过总线604实现彼此之间的通信连接。
存储器601可以是只读存储器(read-only memory,ROM),静态存储设备和随机存取存储器(random access memory,RAM)。存储器601可以存储程序,当存储器601中存储的程序被处理器602执行时,处理器602和通信接口603用于执行本申请实施例的大流量数据流的检测方法的各个步骤,例如,可以执行图4或图5所示的大流量数据流的检 测方法的各个步骤。
处理器602可以采用通用的CPU、微处理器、ASIC、GPU或者一个或多个集成电路,用于执行相关程序,以实现本申请实施例的图9所示的检测装置中的单元所需执行的功能,或者执行本申请方法实施例的大流量数据流的检测方法。
处理器602还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请实施例的大流量数据流的检测方法的各个步骤可以通过处理器602中的硬件的集成逻辑电路或者软件形式的指令完成。
上述处理器602还可以是通用处理器、DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器601,处理器602读取存储器601中的信息,结合其硬件完成本申请实施例的检测装置中包括的单元所需执行的功能,或者执行本申请方法实施例的大流量数据流的检测方法。
例如,处理器602可以与图9所示的检测装置中的处理单元520对应。
通信接口603使用例如但不限于收发器一类的收发装置,来实现检测装置600与其他设备或通信网络之间的通信。
例如,所示通信接口603可以与图9所示的检测装置中的获取单元510对应,可以通过通信接口603获取多个数据采集设备中的二维数据结构。
总线604可包括在检测装置600各个部件(例如,存储器601、处理器602、通信接口603)之间传送信息的通路。
应注意,尽管上述检测装置600仅仅示出了存储器、处理器、通信接口,但是在具体实现过程中,本领域的技术人员应当理解,检测装置600还可以包括实现正常运行所必须的其他器件。同时,根据具体需要,本领域的技术人员应当理解,上述检测装置600还可包括实现其他附加功能的硬件器件。此外,本领域的技术人员应当理解,上述检测装置600也可仅仅包括实现本申请实施例所必须的器件,而不必包括图10中所示的全部器件。
本申请实施例还提供一种系统,该系统中包括上述检测装置与多个数据采集设备;该检测装置可以执行上述方法实施例中的大流量数据流的检测方法。
本申请实施例还提供一种芯片,该芯片包括收发单元和处理单元。其中,收发单元可以是输入输出电路、通信接口;处理单元为该芯片上集成的处理器或者微处理器或者集成电路;该芯片可以执行上述方法实施例中的大流量数据流的检测方法。
本申请实施例还提供一种计算机可读存储介质,其上存储有指令,该指令被执行时执行上述方法实施例中的大流量数据流的检测方法。
本申请实施例还提供一种包含指令的计算机程序产品,该指令被执行时执行上述方法实施例中的大流量数据流的检测方法。
应理解,本申请实施例中,该处理器可以为中央处理单元(central processing unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(digital signal processor, DSP)、专用集成电路(application specific integrated circuit,ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
还应理解,本申请实施例中,该存储器可以包括只读存储器和随机存取存储器,并向处理器提供指令和数据。处理器的一部分还可以包括非易失性随机存取存储器。例如,处理器还可以存储设备类型的信息。
应理解,本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (18)

  1. 一种大流量数据流的检测方法,其特征在于,包括:
    控制设备获取多个数据采集设备中的二维数据结构,其中,所述二维数据结构用于存储相应的数据采集设备获取的网络中的数据流的信息;
    所述控制设备对所述多个数据采集设备中的二维数据结构进行合并处理,得到合并处理后的二维数据结构;
    所述控制设备根据所述合并处理后的二维数据结构检测大流量数据流,其中,所述大流量数据流是指所述网络中全网级别的大流量数据流。
  2. 如权利要求1所述的检测方法,其特征在于,所述二维数据结构是由多个桶组成的数据结构,所述控制设备对所述多个数据采集设备中的二维数据结构进行合并处理,得到合并处理后的二维数据结构,包括:
    所述控制设备对所述多个数据采集设备中的二维数据结构中相同位置的桶进行合并处理,得到所述合并处理后的二维数据结构。
  3. 如权利要求2所述的检测方法,其特征在于,所述多个数据采集设备中的二维数据数据结构中的任意一个桶包括当前桶中的数据流量总和、所述当前桶中的主要数据流的键值以及所述主要数据流的计数器值;所述合并处理后的二维数据结构中的任意一个桶包括更新后的数据流量总和、更新后的主要数据流的键值以及所述更新后的主要数据流的计数器值,所述任意一个桶包括在第一位置的桶,
    所述控制设备对所述多个数据采集设备中的二维数据结构中相同位置的桶进行合并处理,得到所述合并处理后的二维数据结构,包括:
    通过对所述多个数据采集设备中的二维数据结构中在所述第一位置的桶中的数据流量总和进行叠加,得到所述第一位置的桶中的所述更新后的数据流量总和;
    通过对所述多个数据采集设备中的二维数据结构中在所述第一位置的桶中的主要数据流的流量大小进行比较,得到所述第一位置的桶中的所述更新后的主要数据流的键值;
    通过所述更新后的主要数据流的键值以及所述主要数据流的流量大小,得到所述第一位置的桶中的所述更新后的主要数据流的计数器值。
  4. 如权利要求3所述的检测方法,其特征在于,所述多个数据采集设备中的二维数据结构为N个二维数据结构,所述N个二维数据结构在所述第一位置的桶中对应的主要数据流的键值为X个键值,X为小于或者等于N的正整数;
    所述通过对所述多个数据采集设备中的二维数据结构在所述第一位置的桶中的主要数据流的流量大小进行比较,得到所述第一位置的桶中的所述更新后的主要数据流的键值,包括:
    确定所述N个二维数据结构在所述第一位置的桶中的所述主要数据流为所述X个键值中任意一个键值对应的数据流的流量大小的估计值;
    确定所述X个键值中任意一个键值对应的数据流的流量大小中流量最大的数据流为所述更新后的主要数据流。
  5. 如权利要求4所述的检测方法,其特征在于,第一键值为所述X个键值中的任意 一个,所述N个二维数据结构中第i个二维数据结构在所述第一位置的桶中的主要数据流的流量大小的估计值是根据以下公式得到的:
    若所述第i个二维数据结构在所述第一位置的桶的主要数据流的键值是所述第一键值,则所述第一键值对应的数据流的流量大小的估计值为:S i(x)=(V i+C i)/2;
    若所述第i个二维数据结构在所述第一位置的桶的主要数据流的键值不是所述第一键值,则所述第一键值对应的数据流的流量大小的估计值为:S i(x)=(V i-C i)/2;
    其中,x表示所述第一键值,V i表示所述第i个二维数据结构在所述第一位置的桶中所有数据流的流量总和;C i表示所述第i个二维数据结构在所述第一位置的桶中主要流量的计数器值。
  6. 如权利要求1至5中任一项所述的检测方法,其特征在于,所述控制设备获取多个数据采集设备中的二维数据结构,包括:
    所述控制设备在每个时间周期结束时刻,获取所述多个数据采集设备中的二维数据结构;
    所述控制设备根据所述合并处理后的二维数据结构检测所述大流量数据流,包括:
    若所述控制设备根据所述合并处理后的二维数据结构检测第一数据流在任意两个时间周期的变化值大于第一阈值,则确定所述第一数据流为大流量数据流。
  7. 如权利要求1至5中任一项所述的检测方法,其特征在于,所述控制设备根据所述合并处理后的二维数据结构检测所述大流量数据流,包括:
    若所述控制设备根据所述合并处理后的二维数据结构检测第一数据流的总流量大小大于第二阈值,则确定所述第一数据流为所述大流量数据流。
  8. 如权利要求1至7中任一项所述的检测方法,其特征在于,所述二维数据结构包括多数投票数据结构MV-Sketch。
  9. 一种大流量数据流的检测装置,其特征在于,包括:
    获取单元,用于获取多个数据采集设备中的二维数据结构,其中,所述二维数据结构用于存储相应的数据采集设备获取的网络中的数据流的信息;
    处理单元,用于对所述多个数据采集设备中的二维数据结构进行合并处理,得到合并处理后的二维数据结构;根据所述合并处理后的二维数据结构检测大流量数据流,其中,所述大流量数据流是指所述网络中全网级别的大流量数据流。
  10. 如权利要求9所述的检测装置,其特征在于,所述多二维数据结构是由多个桶组成的数据结构,所述处理单元具体用于:
    所述控制设备对所述多个数据采集设备中的二维数据结构中相同位置的桶进行合并处理,得到所述合并处理后的二维数据结构。
  11. 如权利要求10所述的检测装置,其特征在于,所述多个数据采集设备中的二维数据数据结构中的任意一个桶包括当前桶中的数据流量总和、所述当前桶中的主要数据流的键值以及所述主要数据流的计数器值;所述合并处理后的二维数据结构中的任意一个桶包括更新后的数据流量总和、更新后的主要数据流的键值以及所述更新后的主要数据流的计数器值,所述任意一个桶包括在第一位置的桶,
    所述处理单元具体用于:
    通过对所述多个数据采集设备中的二维数据结构中在所述第一位置的桶中的数据流 量总和进行叠加,得到所述第一位置的桶中的所述更新后的数据流量总和;
    通过对所述多个数据采集设备中的二维数据结构中在所述第一位置的桶中的主要数据流的流量大小进行比较,得到所述第一位置的桶中的所述更新后的主要数据流的键值;
    通过所述更新后的主要数据流的键值以及所述主要数据流的流量大小,得到所述第一位置的桶中的所述更新后的主要数据流的计数器值。
  12. 如权利要求11所述的检测装置,其特征在于,所述多个数据采集设备中的二维数据结构为N个二维数据结构,所述N个二维数据结构在所述第一位置的桶中对应的主要数据流的键值为X个键值,X为小于或者等于N的正整数;
    所述处理单元具体用于:
    确定所述N个二维数据结构在所述第一位置的桶中的所述主要数据流为所述X个键值中任意一个键值对应的数据流的流量大小的估计值;
    确定所述X个键值中任意一个键值对应的数据流的流量大小中流量最大的数据流为所述更新后的主要数据流。
  13. 如权利要求12所述的检测装置,其特征在于,第一键值为所述X个键值中的任意一个,所述N个二维数据结构中第i个二维数据结构在所述第一位置的桶中的主要数据流的流量大小的估计值是根据以下公式得到的:
    若所述第i个二维数据结构在所述第一位置的桶的主要数据流的键值是所述第一键值,则所述第一键值对应的数据流的流量大小的估计值为:S i(x)=(V i+C i)/2;
    若所述第i个二维数据结构在所述第一位置的桶的主要数据流的键值不是所述第一键值,则所述第一键值对应的数据流的流量大小的估计值为:S i(x)=(V i-C i)/2;
    其中,x表示所述第一键值,V i表示所述第i个二维数据结构在所述第一位置的桶中所有数据流的流量总和;C i表示所述第i个二维数据结构在所述第一位置的桶中主要流量的计数器值。
  14. 如权利要求9至13中任一项所述的检测装置,其特征在于,所述获取单元具体用于:
    在每个时间周期结束时刻,获取所述多个数据采集设备中的二维数据结构;
    所述处理单元具体用于:
    若根据所述合并处理后的二维数据结构检测第一数据流在任意两个时间周期的变化值大于第一阈值,则确定所述第一数据流为所述大流量数据流。
  15. 如权利要求9至13中任一项所述的检测装置,其特征在于,所述处理单元具体用于:
    若根据所述合并处理后的二维数据结构检测第一数据流的总流量大小大于第二阈值,则确定所述第一数据流为所述大流量数据流。
  16. 如权利要求9至15中任一项所述的检测装置,其特征在于,所述二维数据结构包括多数投票数据结构MV-Sketch。
  17. 一种大流量数据流的检测装置,其特征在于,包括:
    存储器,用于存储程序;
    处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器用于执行如权利要求1至8中任一项所述的检测方法。
  18. 一种计算机存储介质,其特征在于,所述计算机存储介质存储有程序代码,所述程序代码包括用于执行如权利要求1至8中任一项所述的检测方法中的步骤的指令。
PCT/CN2021/072863 2020-03-26 2021-01-20 大流量数据流的检测方法以及检测装置 WO2021190111A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP21775024.9A EP4075749A4 (en) 2020-03-26 2021-01-20 DETECTION METHOD AND DETECTION DEVICE FOR A HEAVY FLOW DATA STREAM

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010225423.9 2020-03-26
CN202010225423.9A CN113452657B (zh) 2020-03-26 2020-03-26 大流量数据流的检测方法以及检测装置

Publications (1)

Publication Number Publication Date
WO2021190111A1 true WO2021190111A1 (zh) 2021-09-30

Family

ID=77807297

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/072863 WO2021190111A1 (zh) 2020-03-26 2021-01-20 大流量数据流的检测方法以及检测装置

Country Status (3)

Country Link
EP (1) EP4075749A4 (zh)
CN (1) CN113452657B (zh)
WO (1) WO2021190111A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114389964A (zh) * 2021-12-29 2022-04-22 鹏城实验室 一种流量监测方法、装置、终端及存储介质
CN117792961A (zh) * 2024-02-27 2024-03-29 苏州大学 一种多目标网络流基数融合测量方法及系统

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113890840A (zh) * 2021-09-29 2022-01-04 深信服科技股份有限公司 流量异常检测方法、装置、电子设备和存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160261507A1 (en) * 2015-03-05 2016-09-08 Electronics And Telecommunications Research Institute Method and apparatus for controlling and managing flow
CN106452941A (zh) * 2016-08-24 2017-02-22 重庆大学 网络异常的检测方法及装置
CN107566206A (zh) * 2017-08-04 2018-01-09 华为技术有限公司 一种流量测量方法、设备及系统
US20180367431A1 (en) * 2017-06-14 2018-12-20 Chung Yuan Christian University Heavy network flow detection method and software-defined networking switch

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7779143B2 (en) * 2007-06-28 2010-08-17 Alcatel-Lucent Usa Inc. Scalable methods for detecting significant traffic patterns in a data network
JP5901246B2 (ja) * 2010-12-13 2016-04-06 キヤノン株式会社 撮像装置
CN102750564B (zh) * 2012-05-14 2016-03-30 王安然 动态二维码及其解码方法
US9923794B2 (en) * 2014-04-28 2018-03-20 Huawei Technologies Co., Ltd. Method, apparatus, and system for identifying abnormal IP data stream
WO2018201084A1 (en) * 2017-04-28 2018-11-01 Opanga Networks, Inc. System and method for tracking domain names for the purposes of network management
US10601849B2 (en) * 2017-08-24 2020-03-24 Level 3 Communications, Llc Low-complexity detection of potential network anomalies using intermediate-stage processing
CN112544059B (zh) * 2018-07-27 2024-05-31 诺基亚通信公司 用于网络流量分析的方法、设备和系统
CN110011876B (zh) * 2019-04-19 2022-05-03 福州大学 一种基于强化学习的Sketch的网络测量方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160261507A1 (en) * 2015-03-05 2016-09-08 Electronics And Telecommunications Research Institute Method and apparatus for controlling and managing flow
CN106452941A (zh) * 2016-08-24 2017-02-22 重庆大学 网络异常的检测方法及装置
US20180367431A1 (en) * 2017-06-14 2018-12-20 Chung Yuan Christian University Heavy network flow detection method and software-defined networking switch
CN107566206A (zh) * 2017-08-04 2018-01-09 华为技术有限公司 一种流量测量方法、设备及系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4075749A4

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114389964A (zh) * 2021-12-29 2022-04-22 鹏城实验室 一种流量监测方法、装置、终端及存储介质
CN114389964B (zh) * 2021-12-29 2023-08-22 鹏城实验室 一种流量监测方法、装置、终端及存储介质
CN117792961A (zh) * 2024-02-27 2024-03-29 苏州大学 一种多目标网络流基数融合测量方法及系统
CN117792961B (zh) * 2024-02-27 2024-05-31 苏州大学 一种多目标网络流基数融合测量方法及系统

Also Published As

Publication number Publication date
EP4075749A1 (en) 2022-10-19
CN113452657A (zh) 2021-09-28
EP4075749A4 (en) 2023-06-14
CN113452657B (zh) 2023-03-28

Similar Documents

Publication Publication Date Title
WO2021190111A1 (zh) 大流量数据流的检测方法以及检测装置
US10097464B1 (en) Sampling based on large flow detection for network visibility monitoring
CN107566206B (zh) 一种流量测量方法、设备及系统
US9979624B1 (en) Large flow detection for network visibility monitoring
US10536360B1 (en) Counters for large flow detection
CN107729210B (zh) 分布式服务集群的异常诊断方法和装置
CN106326067B (zh) 一种在压力测试下对cpu性能进行监控的方法及装置
US10003515B1 (en) Network visibility monitoring
EP3282643B1 (en) Method and apparatus of estimating conversation in a distributed netflow environment
CN110647447B (zh) 用于分布式系统的异常实例检测方法、装置、设备和介质
CN112688837B (zh) 基于时间滑动窗口的网络测量方法与装置
US9201753B2 (en) Integrated circuit and method for monitoring bus status in integrated circuit
WO2017215557A1 (zh) 一种采集性能监视单元pmu事件的方法及装置
WO2020020098A1 (zh) 网络流测量的方法、网络测量设备以及控制面设备
CN110489317B (zh) 基于工作流的云系统任务运行故障诊断方法与系统
CN114124655B (zh) 网络监控方法、系统、装置、计算机设备和存储介质
WO2023125272A1 (zh) Radius环境下的全链路压测方法、装置、计算机设备及存储介质
JP2010171544A (ja) 異常箇所特定プログラム、異常箇所特定装置、異常箇所特定方法
CN114070755A (zh) 虚拟机网络流量确定方法、装置、电子设备和存储介质
Fu et al. Clustering-preserving network flow sketching
CN112118127B (zh) 一种基于故障相似度的服务可靠性保障方法
WO2016206241A1 (zh) 数据分析方法及装置
CN111654405A (zh) 通信链路的故障节点方法、装置、设备及存储介质
Zhu et al. A Sketch Algorithm to Monitor High Packet Delay in Network Traffic
Rao et al. Estimation of RTT and loss rate of wide-area connections using MPI measurements

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21775024

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021775024

Country of ref document: EP

Effective date: 20220713

NENP Non-entry into the national phase

Ref country code: DE