CN110071934B - Local sensitivity counting abstract method and system for network anomaly detection - Google Patents

Local sensitivity counting abstract method and system for network anomaly detection Download PDF

Info

Publication number
CN110071934B
CN110071934B CN201910361101.4A CN201910361101A CN110071934B CN 110071934 B CN110071934 B CN 110071934B CN 201910361101 A CN201910361101 A CN 201910361101A CN 110071934 B CN110071934 B CN 110071934B
Authority
CN
China
Prior art keywords
data message
key
data
count
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910361101.4A
Other languages
Chinese (zh)
Other versions
CN110071934A (en
Inventor
符永铨
李东升
黄春
沈思淇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201910361101.4A priority Critical patent/CN110071934B/en
Publication of CN110071934A publication Critical patent/CN110071934A/en
Application granted granted Critical
Publication of CN110071934B publication Critical patent/CN110071934B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1458Denial of Service

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses a local sensitivity counting and abstracting method for network anomaly detection, which comprises the following steps of 1: acquiring an offline network flow data set, performing offline training on the offline network flow data set to obtain a local sensitivity abstract data structure, and 2: assigning a data structure to each host needing protection; 3: acquiring a data message from a network flow, extracting a destination address of the data message, matching the destination address with a protection host list, and turning to 4 if matching is successful; 4: extracting a source address of the data message, and inserting the source address into a data structure corresponding to a destination address on line; 5: counting the number of inserted source addresses every t seconds, if the number is larger than a threshold value, triggering a network abnormity alarm, and turning to 6; and 6, inquiring a data structure, and inserting approximate counts of all the inserted source addresses and outputting the approximate counts. By mapping the data messages with similar sizes to the same counting array in the off-line training process, the counting variance of the barrel mean value is obviously reduced, and the approximation error is reduced by more than 100 times within the same storage space size.

Description

Local sensitivity counting abstract method and system for network anomaly detection
Technical Field
The invention belongs to the field of network communication, and particularly relates to a local sensitivity counting and abstracting method and system for network anomaly detection.
And a system.
Background
The network anomaly detection can timely detect network attack behaviors and victims suffering from the attacks, help to develop information security protection as early as possible, and minimize the influence of network damage. Distributed denial of service (DDOS) attacks seriously damage networks and users, one DDOS attack launches DOS attacks to a designated host by utilizing a large number of distributed hosts, the number of connections of the victim host exceeds the sustainable number within a certain period of time, the service of the victim host is unavailable, and the detection of the DDOS attacks is an important network anomaly detection task. At present, DDOS attack detection adopts the method that the number of connections to a specified host is aggregated and counted by using a hash table at a gateway or a firewall, and the defect of high consumption of computing resources exists. The DDOS attack detection based on the abstract approximately counts the connection number of the appointed host through an array with a fixed size, so that the consumption of computing resources is reduced, but the problem of inaccurate counting exists, and the problems of false negative and false negative are easily caused. Therefore, improving the counting accuracy of the abstract has important value for improving the DDOS detection capability.
The data flow approximate calculation method based on the abstract (sketch) is used for approximately counting the size of a data message, is an important component of network monitoring application, and is widely applied to network anomaly detection. A common count sketch consists of a set of arrays, each array containing the same number of "buckets" ("bucket" is a logical concept used to refer to a location of an array), each "bucket" being used to record the value corresponding to the key inserted into that location. When a key value pair needs to be inserted, firstly, a barrel is uniformly and randomly selected from each array through a hash function, then, the value is weighted by selecting +1 or-1 through the hash function, and finally, the value is inserted into the selected barrel. When a value corresponding to one key is inquired, the position of the barrel is firstly calculated from each array by using the same hash function, then the value of each barrel is read, and finally the weighted intermediate value of all the barrel values is selected as the value corresponding to the key to be returned. The counting digest has a maintenance and query overhead of constant time.
The count-type summary has a query error, and if a plurality of keys are inserted into the same "bucket", this "bucket" records a weighted algebraic sum of the values corresponding to the keys, and not exactly the original values of one key. The query error seriously reduces the query effectiveness of the counting abstract, and the application of the query result may cause deviation and cause judgment error.
Disclosure of Invention
The invention aims to provide a local sensitivity counting abstract method and a local sensitivity counting abstract system for network anomaly detection, which have small query error and improve DDOS detection capability.
In order to solve the problem, the technical scheme adopted by the invention is as follows: a local sensitivity abstract data structure and a data flow approximate calculation method are designed. The data flow approximate calculation method comprises three processes of an off-line training stage, an on-line insertion stage, an inquiry stage and the like: training a data flow clustering model through a K-means clustering method and an offline data set in an offline stage; dynamically mapping the data message to a local sensitivity abstract data structure according to a K-mean clustering model in an online insertion stage; in the query stage, a bucket in the counting array is selected by using the key value of the data message to be queried, and the counting mean value of the bucket is used as the approximate value of the key to be queried. The local sensitivity abstract data structure maps data messages with similar sizes to the same counting array, and the purpose of approximately reducing the query error through the counting mean value is achieved.
A local sensitivity counting and abstracting method for network anomaly detection comprises the following steps:
step 1: acquiring an offline network flow data set D, and performing offline training and initialization on a local sensitivity summary data structure LSS (X) through the offline network flow data set D;
step 2: each host X belonging to S needing protection is endowed with a local sensitivity abstract data structure LSS (X), wherein X represents a protection host, and S represents a protection host list;
and step 3: receiving the network flow in a passive monitoring mode, acquiring a data message p from the network flow, extracting a destination address p.dst of the data message, matching the destination address p.dst with the protection host list in the step 2, if the matching is successful, turning to the step 4, otherwise, directly discarding the data message, and re-receiving a new data message;
and 4, step 4: extracting a source address p.src of the data message p, taking the p.src as a key, selecting a value 1 or a bit length of the data message as a value val, and online inserting the source address p.src of the data message into a local sensitivity data structure LSS (p.dst) of the protection host corresponding to a destination address p.dst;
and 5: every T seconds, calculating the algebraic sum of the counter Count field of each bucket in all arrays of each local sensitivity abstract data structure LSS (p.dst) as the number of inserted source addresses, if the number of the inserted source addresses is larger than a preset DDOS detection threshold value T, triggering a network abnormity warning DDOS event, entering the step 6, otherwise, storing the local sensitivity abstract data structure LSS (p.dst) into a file system, returning to the step 2, and entering a new DDOS detection period.
And 6, inquiring the LSS (p.dst) by taking each inserted source address as a key, obtaining and outputting the approximate count of all the inserted source addresses, and then returning to the step 2 to enter a new DDOS detection period.
In order to further optimize the scheme, the invention also makes the following improvements:
further, the method for obtaining the data flow clustering model of the local sensitivity abstract data structure by performing offline training according to the offline network flow data in the step 1 is as follows:
step 1.1: acquiring a certain number of data message sets D from a network flow, wherein a key of each data message is a unique identifier of the data message, and a value val is the size of the data message;
step 1.2: clustering and dividing the collected data message set D based on the data size, and optimally dividing C according to the sum of squares of errors
Figure BDA0002046864650000031
Taking the division at minimum, the resulting cluster division
Figure BDA0002046864650000032
Clustering each cluster group C after clusteringiCorresponding to the count array in the data structure one-to-one, | CiL represents the number of data messages in the ith clustering group;
step 1.3: constructing a Cluster center set of a network flow data set D
Figure BDA0002046864650000033
Using the cluster center set mu as a data stream cluster model, muiAnd representing the cluster center of each cluster group after cluster division.
Further, the local sensitivity summary data structure in step 2 includes k count arrays
Figure BDA0002046864650000034
Ith count array IiComprising miEach "bucket" is composed of a value field Sum and a counter, wherein the value field Sum records the Sum of the sizes of the data messages inserted into the "bucket", and the counter records the number of the data messages inserted into the "bucket".
Further, the online insertion method in step 4 is as follows:
step 4.1: tracking a dynamic numerical value of the data message by using a cache hash table HTable, caching historical size historySize and age of the data message by using the cache hash table HTable, wherein the historical size historySize of the data message corresponds to the sum of the sizes of the data messages with the same inserted identifier, and the age of the data message corresponds to the time when the data message is inserted for the first time;
step 4.2: calculating historical clustering indexes and current clustering indexes of the data messages;
step 4.2.1: if the current data message is located in the cache hash table HTable, according to the cache history size x of the current data messagehCalculating the historical clustering index j of the current data messagehJumping to step 4.2.3;
step 4.2.2: if the current data message is not in the cache hash table HTable, x is sethJump directly to step 4.2.3, when equals 0;
step 4.2.3: the history size x of the current data message is calculatedhWith the current size xcSum xh+xcCalculating the current clustering index j of the current data message as the size update value of the current data messageh+c
Step 4.3: according to the obtained current clustering index jh+cInserting the data message into a data structure LSS;
step 4.4: updating the cache hash table HTable, and updating the historical size historySize of the current data message in the cache hash table HTable to xh+xc
Step 4.5: the identifier key of the current data message and the index j of the counting array are usedh+cSave to the location mapAnd (4) transmitting a hash table FTable.
Further, the method for calculating the historical cluster index and the current cluster index in step 4.2 is as follows:
mapping the size x of the data message to the cluster center with the minimum distance from each cluster center in the cluster center set mu to ensure that the formula
Figure BDA0002046864650000041
Obtaining an index j of the data message, taking the index j as a clustering index of the current data message, and randomly selecting one clustering center mu in an equal probability mode if the distances from the index j to a plurality of division mean values are the sameiThe index at which it resides is returned.
Further, the method for inserting the data packet into the data structure LSS in step 4.3 is as follows: updating a value range sum and a counter count of a counting array in a data structure by using the following formula;
the value range sum of the count array in the data structure is updated as:
Ij[hj(key)].sum=Ij[hj(key)].sum+val
the counter count of the count array in the data structure is updated as:
if novel (key) TRUE, Ij[hj(key)].count=Ij[hj(key)].count+1
Otherwise, returning directly; (1)
the above formula represents a given key value pair (key, val), a count array index j, and the jth count array I is selectedjWherein the new key boolean variable novel (key) indicates whether the key appears for the first time, hj(key) is shown in the count array IjIn the position of the map with key as key, val represents the position I inserted into the counting arrayj[hj(key)]A value of (1);
when the data packet is inserted into the data structure LSS, the updating of the value range sum and the counter count of the count array in the data structure is performed in three cases:
1) if the historical clustering index j of the data messagehDoes not storeThen, the identifier of the data message is used as a key, and the current value x of the data message is used as the keycAs value val, index j with the current clusterh+cSetting a new key boolean variable (key) as TRUE as an index of the count array, and executing formula 1;
2) if the historical clustering index j of the data messagehExist, and jh=jh+cThen directly executing the insertion operation of the data message, using the identifier of the data message as the key, and using the current value x of the data messagecTo value val, index j with the current clusterh+cSetting a new key boolean variable (key) as FALSE as an index of a counting array, executing a formula 1, and updating a counting result of a data message;
3) if the historical clustering index j of the data messagehExist, and jh≠jh+cThen, firstly, deleting the record of the data message from the counting array, and then inserting the record of the data message into a new counting array;
3.1) taking the identifier of the data message as a key and taking the cache history size x of the data messagehFor val, index j is clustered with historyhAs the index of the counting array, deleting the current data message from the historical counting array;
Ij[hj(key)].sum=Ij[hj(key)].sum-val
Ij[hj(key)].count=Ij[hj(key)].count-1
3.2) taking the identifier of the data message as a key and taking the history size x of the data messagehWith the current size xcSum xh+xcTo val, index j with the current clusterh+cAnd as the index of the new counting array, setting a new key boolean variable (key) as TRUE, executing the formula 1, and inserting the current data message into the new counting array.
Further, the query method in step 6 is
Querying a position mapping hash table FTable to obtain a counting array index j of a data message key to be queried, and utilizing the key value key to be queried and the counting array index where the key value key is locatedJ, average of barrel
Figure BDA0002046864650000051
As an approximation of the key to be queried.
A local sensitivity count summarization system for network anomaly detection comprises a processor and a memory connected with the processor, wherein the memory stores a program of a local sensitivity count summarization method for network anomaly detection, and the program of the local sensitivity count summarization method for network anomaly detection realizes the steps of the method when being executed by the processor.
Compared with the prior art, the invention has the beneficial effects that:
the invention discloses a local sensitivity counting abstract method for network anomaly detection, which isolates the counting process of different protection hosts by endowing each host needing protection with a local sensitivity abstract data structure, prevents the mutual crosstalk of the results of different protection hosts and influences the prediction precision compared with the prior counting abstract which uses a data structure to serve all the protection hosts, obtains a data flow clustering model of the local sensitivity abstract data structure by performing off-line training on the local sensitivity abstract data structure LSS (X), maps data messages with similar sizes to the same counting array by clustering data message sets in the off-line training process, can obviously reduce the variance of barrel mean value counting, compared with the prior art that the weighted median of all 'barrel' values is used as the value corresponding to the key to return, the invention adopts clustering, and obtaining the clustering center of each cluster, and finally, taking the bucket average value as an approximate value of the key to be queried, so that an unbiased estimation result of the insertion count can be obtained, namely the expected result of the approximate value is the same as the real result, and accurate data message size calculation is realized, thereby effectively reducing the query error, improving the counting approximation degree of the abstract, and keeping the operation complexity of a constant level. Whether a network anomaly alarm DDOS event is triggered is judged by counting whether the number of source addresses of the data messages exceeds a threshold value within a period of time t, and fine-grained network anomaly information is obtained by enumerating source address counting distribution and the size of the data messages sent by the source addresses.
Drawings
FIG. 1 is a flow chart of the system of the present invention.
Detailed Description
Fig. 1 shows a specific embodiment of a local sensitivity count summarization method for network anomaly detection according to the present invention, which comprises the following steps:
step 1: acquiring an offline network flow data set D from a network, and performing offline training and initialization on a local sensitivity summary data structure LSS (X) through the offline network flow data set D; performing offline training by using network flow offline data to obtain a data message flow clustering model for guiding the online insertion process of each local sensitivity abstract data structure; the offline training in this embodiment regularly performs network flow data training to update the data text clustering model in time, and also can avoid clustering center offset caused in the real-time updating process.
Step 1.1: acquiring an offline network flow data set D from a network, wherein a key of each data message in the network flow data is a unique identifier of the data message, and a value val is the size of the data message;
the number of data packets in the network flow data set D obtained from the location or the associated location of the data flow to be monitored, such as a gateway, is generally determined according to a distribution function of the data flow in the data flow environment. If the total number of data packets is N, the number of data packets is generally in the range of 1% N to 10% N for off-line training.
Step 1.2: clustering and dividing the collected data message set D based on the data size, and optimally dividing C according to the sum of squares of errors
Figure BDA0002046864650000071
Taking the division at minimum, the resulting cluster division
Figure BDA0002046864650000072
Clustering each cluster group C after clusteringiOne-to-one with the count array in the data structureShould, | CiL represents the number of data messages in the ith clustering group;
step 1.3: constructing a Cluster center set of a network flow data set D
Figure BDA0002046864650000073
Using the cluster center set mu as a data stream cluster model, muiAnd representing the cluster center of each cluster group after cluster division.
In this embodiment, a certain number of network flow data sets D are obtained from the positions or associated positions of the data packets to be monitored, such as gateways, through offline training, and are subjected to cluster analysis. The data messages with similar sizes are distributed to the same counting array through the offline training of the step 1 and are divided into k groups, when the network stream is passively monitored, the data messages are inserted into the counting array of the data structure on line according to the sizes of the data messages, a clustering center similar to the data messages is selected from the clustering center of the clustering model, the corresponding counting array is selected according to the index of the clustering center, the data messages with similar sizes are inserted into the counting array through the clustering groups and the clustering center, the value domain and the counting domain of the data messages are updated, and compared with the prior art, the data messages with similar sizes are inserted more accurately.
Step 2: assigning a local sensitivity abstract data structure LSS (X) to each host X belonging to the S (S represents a protection host set) to be protected; the local sensitivity abstract data structure comprises k counting arrays
Figure BDA0002046864650000081
Ith count array IiComprising miEach barrel is composed of a value range Sum and a counter Count, and the value range Sum is recordedThe sum of the size of the data stream inserted into the "bucket" is recorded and the counter Count records the number of data packets inserted into the "bucket". The local sensitivity abstract data structure LSS supports the operations of inserting, inquiring and deleting data messages according to a key value pair format. In addition, k hash functions are selected in advance as hash functions and used for key value pair insertion, query and deletion processes. The counting process of different protection hosts is isolated by endowing each host needing protection with a local sensitivity abstract data structure, and compared with the existing counting abstract which uses one data structure to serve all the protection hosts, the counting method can prevent the results of different protection hosts from mutual crosstalk and influence on the prediction precision.
And step 3: receiving network flow in a passive monitoring mode, extracting a destination address p.dst of a data message when the data message p is received, matching the destination address p.dst with the protection host list in the step 2, if the matching is successful, turning to the step 4, otherwise, directly discarding the data message, and re-receiving a new data message;
and 4, step 4: extracting a source address p.src of a data message p, taking the p.src as a key, selecting 1 as a value val, and inserting the source address p.src of the data message into a local sensitive data structure LSS (p.dst) of a protection host corresponding to a destination address p.dst on line;
step 4.1: tracking a dynamic numerical value of the data message by using a cache hash table HTable, caching historical size historySize and age of the data message by using the cache hash table HTable, wherein the historical size historySize of the data message corresponds to the sum of the sizes of the data messages with the same inserted identifier, and the age of the data message corresponds to the time when the data message is inserted for the first time;
in this embodiment, the cache hash table HTable is used to track the dynamic value of the data packet. Aiming at the characteristic that the data message usually appears for many times, a hash table HTable with fixed capacity is adopted to cache the historical size (marked as historySize) and the age (marked as age) of the data message, so that the dynamic numerical value of the data message is accurately tracked. The history size historySize of the data packet corresponds to the sum of the counts of the inserted data packets of the same identifier. The age of the data message corresponds to the time of the first insertion of the data message. The capacity of the hash table HTable is a preset parameter of the system, and the larger the capacity is, the more data packets can be tracked.
Step 4.2: calculating historical clustering indexes and current clustering indexes of the data messages;
step 4.2.1: if the current data message is located in the cache hash table HTable, according to the cache history size x of the current data messagehCalculating the historical clustering index j of the current data messagehJumping to step 4.2.3;
step 4.2.2: if the current data message is not in the Hash table HTable, x is sethJump directly to step 4.2.3, when equals 0;
step 4.2.3: the history size x of the current data message is calculatedhWith the current size xcSum xh+xcCalculating the current clustering index j of the current data message as the updated size of the current data messageh+c
The method for calculating the historical cluster index and the current cluster index in step 4 in the embodiment is as follows: mapping the size x of the data packet to the cluster center with the smallest distance, i.e. the formula
Figure BDA0002046864650000091
And if yes, obtaining an index j of the data message, taking the index j as a clustering index of the current data message, and if the distances between the index j and the multiple division mean values are the same, randomly selecting an index where one clustering center is located in an equal probability manner, and returning. Formula when calculating historical cluster index
Figure BDA0002046864650000092
Wherein x is xhTo obtain a historical clustering index jhWhen calculating the current cluster index, x is xh+xcTo obtain the current cluster index jh+c
Step 4.3: according to the obtained current clustering index jh+cInserting the data message into a data structure LSS;
the method for inserting the data message into the data structure LSS comprises the following steps: updating a value range sum and a counter count of a counting array in a data structure by using the following formula;
the value range sum of the count array in the data structure is updated as:
Ij[hj(key)].sum=Ij[hj(key)].sum+val
the counter count of the count array in the data structure is updated as:
if novel (key) TRUE, Ij[hj(key)].count=Ij[hj(key)].count+1
Otherwise, returning directly; (1)
the above formula represents a given key value pair (key, val), a count array index j, and the jth count array I is selectedjWherein the new key boolean variable novel (key) indicates whether the key appears for the first time, hj(key) is shown in the count array IjIn the position of the map with key as key, val represents the position I inserted into the counting arrayj[hj(key)]A value of (1);
when the data packet is inserted into the data structure LSS, the updating of the value range sum and the counter count of the count array in the data structure is performed in three cases:
1) if the historical clustering index j of the data messagehIf the current value x of the data message does not exist, the identifier of the data message is used as a key, and the current value x of the data message is used as the current value x of the data messagecAs value val, index j with the current clusterh+cSetting a new key boolean variable (key) as TRUE as an index of the count array, and executing formula 1;
2) if the historical clustering index j of the data messagehExist, and jh=jh+cThen directly executing the insertion operation of the data message, using the identifier of the data message as the key, and using the current value x of the data messagecTo value val, index j with the current clusterh+cSetting a new key boolean variable (key) as FALSE as an index of a counting array, executing a formula 1, and updating a counting result of a data message;
3) if the historical clustering index j of the data messagehExist, and jh≠jh+cThen, firstly, deleting the record of the data message from the counting array, and then inserting the record of the data message into a new counting array;
3.1) taking the identifier of the data message as a key and taking the cache history size x of the data messagehFor val, index j is clustered with historyhAs the index of the counting array, deleting the current data message from the historical counting array;
Ij[hj(key)].sum=Ij[hj(key)].sum-val
Ij[hj(key)].count=Ij[hj(key)].count-1
3.2) taking the identifier of the data message as a key and taking the history size x of the data messagehWith the current size xcSum xh+xcTo val, index j with the current clusterh+cAnd as the index of the new counting array, setting a new key boolean variable (key) as TRUE, executing the formula 1, and inserting the current data message into the new counting array.
Step 4.4: updating the cache hash table HTable, and updating the historical size historySize of the current data message in the cache hash table HTable to xh+xc(ii) a If the data message cached in HTable exceeds the capacity of the table, recursively deleting the oldest data message from HTable until the maximum capacity of the hash table.
Step 4.5: the identifier key of the current data message and the index j of the counting array are usedh+cAnd saving the mapping hash table to the position FTable. In order to meet the requirement of approximate calculation of the data stream, the count array index corresponding to the key to be queried needs to be recorded. Therefore, a location mapping hash table FTable is constructed to record the identifier and the count array index of the data stream. If the FTable has no identifier record of the current data message, directly recording (key, j) into a position mapping hash table FTable; if the record of the identifier key already exists in the FTable, the record of the identifier key in the FTable is directly replaced by (key, j).
And 5: every T seconds, calculating the algebraic sum of the counter Count field of each bucket in all arrays of each local sensitivity abstract data structure LSS (p.dst) as the number of inserted source addresses, if the number of the inserted source addresses is larger than a preset DDOS detection threshold value T, triggering a network abnormity warning DDOS event, entering the step 6, otherwise, storing the local sensitivity abstract data structure LSS (p.dst) into a file system, returning to the step 2, and entering a new DDOS detection period.
And 6, inquiring a local sensitivity abstract data structure LSS (p.dst) by taking each inserted source address as a key, obtaining and outputting approximate counts of all the inserted source addresses, and then returning to the step 2 to enter a new DDOS detection period.
The approximate count query method for querying the local sensitivity abstract data structure LSS (p.dst) to obtain the inserted source address is as follows:
querying a position mapping hash table FTable to obtain a counting array index j of a data message key to be queried, and utilizing the key value key to be queried and the counting array index j where the key value key is located to average the barrel
Figure BDA0002046864650000111
As the approximate count size of the key to be queried.
In the off-line training process, the local sensitivity abstract data structure maps data messages with similar sizes to the same counting array through clustering of a network flow data set by K-mean clustering, so that the variance of array counting can be obviously reduced. By combining the two optimization means, the DDOS detection process can accurately obtain the number of connections of the protection host in a period of time, and compared with the existing abstract method, the method reduces the approximate error by more than 100 times within the same storage space.
A local sensitivity count summarization system for network anomaly detection comprises a processor and a memory connected with the processor, wherein the memory stores a program of a local sensitivity count summarization method for network anomaly detection, and the program of the local sensitivity count summarization method for network anomaly detection realizes the steps of the method when being executed by the processor.
The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may be made by those skilled in the art without departing from the principle of the invention.

Claims (6)

1. A local sensitivity counting and abstracting method for network anomaly detection is characterized in that: the method comprises the following steps:
step 1: acquiring an offline network flow data set D, performing offline training and initialization on a local sensitivity summary data structure LSS (X) through the offline network flow data set D, and specifically:
step 1.1: acquiring an offline network flow data set D from a network, wherein a key of each data message in the network flow data is a unique identifier of the data message, and a value val is the size of the data message;
step 1.2: clustering and dividing a network flow item data set D based on the size of a data message, and optimally dividing C according to the sum of squares of errors
Figure FDA0002909961040000011
Taking the division at minimum, the resulting cluster division
Figure FDA0002909961040000012
Clustering divided k subgroups CiCorresponding to k count arrays in the data structure one-to-one, | CiI represents data in the ith cluster groupThe number of messages;
step 1.3: constructing a Cluster center set of a network flow data set D
Figure FDA0002909961040000013
Using the cluster center set mu as a data stream cluster model, muiRepresenting the clustering center of each clustering group after clustering division;
step 2: each host X belonging to S needing protection is endowed with a local sensitivity abstract data structure LSS (X), wherein X represents a protection host, and S represents a protection host list;
and step 3: receiving network flow in a passive monitoring mode, acquiring a data message p from the network flow, extracting a destination address p.dst of the data message, matching the destination address p.dst with the protection host list in the step 1, if the matching is successful, turning to the step 4, otherwise, directly discarding the data message, and re-receiving a new data message;
and 4, step 4: extracting a source address p.src of a data message p, taking the p.src as a key, selecting 1 as a value val, and inserting the source address p.src of the data message into a local sensitive data structure LSS (p.dst) of a protection host corresponding to a destination address p.dst on line;
the online insertion method in the step 4 comprises the following steps:
step 4.1: tracking a dynamic numerical value of the data message by using a cache hash table HTable, caching historical size historySize and age of the data message by using the cache hash table HTable, wherein the historical size historySize of the data message corresponds to the sum of the sizes of the data messages with the same inserted identifier, and the age of the data message corresponds to the time when the data message is inserted for the first time;
step 4.2: calculating historical clustering indexes and current clustering indexes of the data messages;
step 4.2.1: if the current data message is located in the cache hash table HTable, according to the cache history size x of the current data messagehCalculating the historical clustering index j of the current data messagehJumping to step 4.2.3;
step 4.2.2: if the current data message does notIn the cache hash table HTable, x is sethJump directly to step 4.2.3, when equals 0;
step 4.2.3: the history size x of the current data message is calculatedhWith the current size xcSum xh+xcCalculating the current clustering index j of the current data message as the size update value of the current data messageh+c
Step 4.3: according to the obtained current clustering index jh+cInserting the data message into a data structure LSS;
step 4.4: updating the cache hash table HTable, and updating the historical size historySize of the current data message in the cache hash table HTable to xh+xc
Step 4.5: the identifier key of the current data message and the index j of the counting array are usedh+cStoring the data to a location mapping hash table (FTable);
and 5: every T seconds, calculating the algebraic sum of the counter Count field of each barrel in all arrays of each local sensitivity abstract data structure LSS (p.dst) as the number of inserted source addresses, if the number of the inserted source addresses is larger than a preset DDOS detection threshold value T, triggering a network abnormity warning DDOS event, entering the step 6, otherwise, storing the local sensitivity abstract data structure LSS (p.dst) into a file system, returning to the step 2, and entering a new DDOS detection period;
and 6, inquiring a local sensitivity abstract data structure LSS (p.dst) by taking each inserted source address as a key, obtaining and outputting approximate counting sizes of all the inserted source addresses, and then returning to the step 2 to enter a new DDOS detection period.
2. The local sensitivity count summarization method for network anomaly detection according to claim 1, wherein: the local sensitivity abstract data structure in the step 2 comprises k counting arrays
Figure FDA0002909961040000021
Ith count array IiComprising miA 'bucket', each 'bucket' consisting of a range of values SumThe counter Count, which records the Sum of the sizes of the data streams inserted into the "bucket", records the number of data stream items inserted into the "bucket".
3. The local sensitivity count summarization method for network anomaly detection according to claim 2, wherein: mapping the size x of the data message to the cluster center with the minimum distance from each cluster center in the cluster center set mu to ensure that the formula
Figure FDA0002909961040000031
Obtaining an index j of the data message, using the index j as a cluster index of the current data message, and randomly selecting one cluster center mu in an equal probability mode if the distance between the cluster center and the index j is the same as that between the cluster centersiThe index at which it resides is returned.
4. The local sensitivity count summarization method for network anomaly detection according to claim 3, wherein: the method for inserting the data packet into the data structure LSS in step 4.3 is as follows: updating a value range sum and a counter count of a counting array in a data structure by using the following formula;
the value range sum of the count array in the data structure is updated as:
Ij[hj(key)].sum=Ij[hj(key)].sum+val
the counter count of the count array in the data structure is updated as:
if novel (key) TRUE, Ij[hj(key)].count=Ij[hj(key)].count+1
Otherwise, returning directly; (1)
the above formula represents a given key value pair (key, val), a count array index j, and the jth count array I is selectedjWherein the new key boolean variable novel (key) indicates whether the key appears for the first time, hj(key) is shown in the count array IjIn the key-based mapping position, val represents the insertion into the count arrayPosition Ij[hj(key)]A value of (1);
when the data packet is inserted into the data structure LSS, the updating of the value range sum and the counter count of the count array in the data structure is performed in three cases:
1) if the historical clustering index j of the data messagehIf the current value x of the data message does not exist, the identifier of the data message is used as a key, and the current value x of the data message is used as the current value x of the data messagecAs value val, index j with the current clusterh+cSetting a new key boolean variable (key) as TRUE as an index of the count array, and executing formula 1;
2) if the historical clustering index j of the data messagehExist, and jh=jh+cThen directly executing the insertion operation of the data message, using the identifier of the data message as the key, and using the current value x of the data messagecTo value val, index j with the current clusterh+cSetting a new key boolean variable (key) as FALSE as an index of a counting array, executing a formula 1, and updating a counting result of a data message;
3) if the historical clustering index j of the data messagehExist, and jh≠jh+cThen, firstly, deleting the record of the data message from the counting array, and then inserting the record of the data message into a new counting array;
3.1) taking the identifier of the data message as a key and taking the cache history size x of the data messagehFor val, index j is clustered with historyhAs the index of the counting array, deleting the current data message from the historical counting array;
Ij[hj(key)].sum=Ij[hj(key)].sum-val
Ij[hj(key)].count=Ij[hj(key)].count-1
3.2) taking the identifier of the data message as a key and taking the history size x of the data messagehWith the current size xcSum xh+xcTo val, index j with the current clusterh+cAs an index of the new count array, the new key boolean variable novel (key) is set to TRUE, and equation 1 is executed to beThe current data packet is inserted into the new count array.
5. The local sensitivity count summarization method for network anomaly detection according to claim 4, wherein: the query method in step 6 is as follows:
querying a position mapping Hash table FTable to obtain a counting array index j of a data message key to be queried, and utilizing the data message key to be queried and the counting array index j to obtain the average value of the bucket
Figure FDA0002909961040000041
As the approximate count size of the data message key to be queried.
6. A local sensitivity count summarization system for network anomaly detection, characterized by: comprising a processor and a memory connected to said processor, said memory storing a program of a local sensitivity count summarization method for network anomaly detection, said program of a local sensitivity count summarization method for network anomaly detection implementing the method of any of the steps 1 to 5 when executed by said processor.
CN201910361101.4A 2019-04-30 2019-04-30 Local sensitivity counting abstract method and system for network anomaly detection Active CN110071934B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910361101.4A CN110071934B (en) 2019-04-30 2019-04-30 Local sensitivity counting abstract method and system for network anomaly detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910361101.4A CN110071934B (en) 2019-04-30 2019-04-30 Local sensitivity counting abstract method and system for network anomaly detection

Publications (2)

Publication Number Publication Date
CN110071934A CN110071934A (en) 2019-07-30
CN110071934B true CN110071934B (en) 2021-03-26

Family

ID=67369802

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910361101.4A Active CN110071934B (en) 2019-04-30 2019-04-30 Local sensitivity counting abstract method and system for network anomaly detection

Country Status (1)

Country Link
CN (1) CN110071934B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111460230A (en) * 2020-03-25 2020-07-28 中国人民解放军国防科技大学 Self-repairing counting type summarization method
CN112261000B (en) * 2020-09-25 2022-01-25 湖南大学 LDoS attack detection method based on PSO-K algorithm
CN113297430B (en) * 2021-05-28 2022-08-05 北京大学 Sketch-based high-performance arbitrary partial key measurement method and system
CN115563570B (en) * 2022-12-05 2023-04-14 上海飞旗网络技术股份有限公司 Resource abnormity detection method, device and equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101547129A (en) * 2009-05-05 2009-09-30 中国科学院计算技术研究所 Method and system for detecting distributed denial of service attack
CN102821081A (en) * 2011-06-10 2012-12-12 中国电信股份有限公司 Method and system for monitoring DDOS (distributed denial of service) attacks in small flow
CN102891829A (en) * 2011-07-18 2013-01-23 航天信息股份有限公司 Method and system for detecting and defending distributed denial of service attack
CN106254321A (en) * 2016-07-26 2016-12-21 中国人民解放军防空兵学院 A kind of whole network abnormal data stream sorting technique
CN108345574A (en) * 2017-01-23 2018-07-31 无锡市计量测试院 Related dual data stream abnormality detection and modified method
CN109558464A (en) * 2018-11-21 2019-04-02 中国人民解放军国防科技大学 Network performance grading representation method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8370937B2 (en) * 2007-12-03 2013-02-05 Cisco Technology, Inc. Handling of DDoS attacks from NAT or proxy devices

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101547129A (en) * 2009-05-05 2009-09-30 中国科学院计算技术研究所 Method and system for detecting distributed denial of service attack
CN102821081A (en) * 2011-06-10 2012-12-12 中国电信股份有限公司 Method and system for monitoring DDOS (distributed denial of service) attacks in small flow
CN102891829A (en) * 2011-07-18 2013-01-23 航天信息股份有限公司 Method and system for detecting and defending distributed denial of service attack
CN106254321A (en) * 2016-07-26 2016-12-21 中国人民解放军防空兵学院 A kind of whole network abnormal data stream sorting technique
CN108345574A (en) * 2017-01-23 2018-07-31 无锡市计量测试院 Related dual data stream abnormality detection and modified method
CN109558464A (en) * 2018-11-21 2019-04-02 中国人民解放军国防科技大学 Network performance grading representation method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"基于流记录的DDoS检测和响应";任文韬;《中国优秀硕士学位论文全文数据库-信息科技辑》;20180415;全文 *

Also Published As

Publication number Publication date
CN110071934A (en) 2019-07-30

Similar Documents

Publication Publication Date Title
CN110071934B (en) Local sensitivity counting abstract method and system for network anomaly detection
CN112953933B (en) Abnormal attack behavior detection method, device, equipment and storage medium
US7596810B2 (en) Apparatus and method of detecting network attack situation
Gogoi et al. MLH-IDS: a multi-level hybrid intrusion detection method
CN101202652B (en) Device for classifying and recognizing network application flow quantity and method thereof
US10097464B1 (en) Sampling based on large flow detection for network visibility monitoring
CN109040130B (en) Method for measuring host network behavior pattern based on attribute relation graph
US10536360B1 (en) Counters for large flow detection
CN113114694B (en) DDoS attack detection method oriented to high-speed network packet sampling data acquisition scene
CN102685145A (en) Domain name server (DNS) data packet-based bot-net domain name discovery method
US10009239B2 (en) Method and apparatus of estimating conversation in a distributed netflow environment
CN101841533A (en) Method and device for detecting distributed denial-of-service attack
CN112182567B (en) Multi-step attack tracing method, system, terminal and readable storage medium
WO2020020098A1 (en) Network flow measurement method, network measurement device and control plane device
KR100901696B1 (en) Apparatus of content-based Sampling for Security events and method thereof
CN111835681A (en) Large-scale abnormal flow host detection method and device
CN117176482A (en) Big data network safety protection method and system
CN112738107A (en) Network security evaluation method, device, equipment and storage medium
CN113872962B (en) Low-speed port scanning detection method for high-speed network sampling data acquisition scene
CN110445772B (en) Internet host scanning method and system based on host relationship
TW202008749A (en) Domain name filtering method
CN118174953A (en) Multi-dimensional network anomaly perception traceability system and method based on artificial intelligence
CN117061254B (en) Abnormal flow detection method, device and computer equipment
CN101202744A (en) Devices for self-learned detecting helminth and method thereof
CN111200542B (en) Network flow management method and system based on deterministic replacement strategy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant