CN109714311A - A method of the unusual checking based on clustering algorithm - Google Patents

A method of the unusual checking based on clustering algorithm Download PDF

Info

Publication number
CN109714311A
CN109714311A CN201811355937.5A CN201811355937A CN109714311A CN 109714311 A CN109714311 A CN 109714311A CN 201811355937 A CN201811355937 A CN 201811355937A CN 109714311 A CN109714311 A CN 109714311A
Authority
CN
China
Prior art keywords
equipment
point
sample
sampled point
mass center
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811355937.5A
Other languages
Chinese (zh)
Other versions
CN109714311B (en
Inventor
王小东
韩飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Tiandihexing Technology Co Ltd
Original Assignee
Beijing Tiandihexing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Tiandihexing Technology Co Ltd filed Critical Beijing Tiandihexing Technology Co Ltd
Priority to CN201811355937.5A priority Critical patent/CN109714311B/en
Publication of CN109714311A publication Critical patent/CN109714311A/en
Application granted granted Critical
Publication of CN109714311B publication Critical patent/CN109714311B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a kind of methods of unusual checking based on clustering algorithm, comprising the following steps: A, establishes frequency based on the collected connection for equipment, averagely the connection duration, flow bandwidth sample information forms integrated information sample;Classify to the integrated information sample sampled point of equipment, forms the behavior model of each equipment;B, using the behavior model of above equipment, unusual checking is carried out for new sampled point.The present invention can improve the deficiencies in the prior art, and before virus or deliberate threat software being broken out or preclinical network behavior identifies, carry out in due course early warning.

Description

A method of the unusual checking based on clustering algorithm
Technical field
The present invention relates to technical field of the computer network, the side of especially a kind of unusual checking based on clustering algorithm Method.
Background technique
With the development of information technology, industrial control system gradually moves towards open, interconnection, general.Many Industry Control associations View is gradually run in Industrial Ethernet, and the attack for industrial control system is also more universal.It is interconnected relative to traditional IT Network, the deliberate threat software in industry control network greatly will not immediately cause industrial network once slipped into successfully Destroy, but hide, detect and await a favorable opportunity maturation when (such as internet connection receives instruction), then start suddenly into Row violence damage.
Currently, abnormal traffic detection technology is mainly that the combination of white list and blacklist detects in network, white list with it is black List technology carries out deep analysis both for network protocol flow, and is matched with white list or blacklist rule, from And abnormal network protocol traffic is alerted.In or incubation period preceding for virus or the outburst of deliberate threat software, very greatly A part of virus and deliberate threat software are only detected, and cause white list and black list techniques that can not detect prestige well Coerce the abnormal network behavior in incubation period, for before virus or the outburst of deliberate threat software or preclinical network behavior without Method correctly identifies.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of methods of unusual checking based on clustering algorithm, can The deficiencies in the prior art are solved, virus or the outburst of deliberate threat software can be identified preceding or preclinical network behavior Come, carries out in due course early warning.
In order to solve the above technical problems, the technical solution used in the present invention is as follows.
A method of the unusual checking based on clustering algorithm, comprising the following steps:
A, frequency is established based on the collected connection for equipment, averagely the connection duration, flow bandwidth sample information, shape At integrated information sample;Classify to the integrated information sample sampled point of equipment, forms the behavior of each equipment Model;
B, using the behavior model of above equipment, unusual checking is carried out for new sampled point.
Preferably, network flow is acquired using the probe device disposed in network, in study stage, period in step A Property frequency established to the connection of every equipment in network sample, form each equipment connection and establish frequency sampling sample This, as shown in the table,
Wherein, FM,NIndicate that n-th of sampled point in frequency sampling sample is established in the connection of equipment m.
Preferably, network flow is acquired using the probe device disposed in network, in the study stage, to net in step A The connection duration of every equipment in network is sampled.Through sampling after a period of time, forms each equipment connection and continue Time sampling sample, as shown in the table,
Wherein, TM,NIndicate n-th of sampled point in the average connection duration sample of equipment m.
Preferably, network flow is acquired using the probe device disposed in network, in study stage, period in step A Property the flow bandwidth of every equipment in network is sampled.Through sampling after a period of time, each equipment connection is formed Duration sample, as shown in the table,
Wherein, BM,NIndicate n-th of sampled point in the average connection duration sample of equipment m.
Preferably, frequency is established in the collected connection for equipment in step A, averagely the connection duration, flow Bandwidth sample information is measured, integrated information sample is formed, as shown in the table,
Wherein, SM,NIndicate n-th of sampled point in the average connection duration sample of equipment m, and Sm,n={Fm,n, Tm,n,Bm,n}。
Preferably, classification is carried out to the integrated information sample sampled point of equipment and is included the following steps in step A,
K point is randomly choosed as initial center of mass point, when the cluster allocation result of any one point changes, to data set Each of data point it is nearest that data point is assigned to distance to each centroid calculation mass center at a distance from data point Cluster calculates the mean value of all the points in cluster to each cluster, and using mean value as mass center.
Preferably, the value of K is determined by silhouette coefficient,
For each sample point x(i), calculate point x(i)With the average value of the every other sampled point distance in its same cluster, note Make a(i), for quantifying the condensation degree in cluster;
Choose x(i)An outer cluster b calculates x(i)With the average distance of all the points in b, every other cluster is traversed, is found nearest This average distance, is denoted as b(i), as x(i)Neighbours' class, for quantifying separating degree between cluster;
For sample point x(i), silhouette coefficient
Calculate all sample pointsx (i) Silhouette coefficient, finding out average value is overall profile coefficient, metric data cluster it is close Degree.
Preferably, the behavior model for establishing equipment includes the following steps in step A,
Calculate the mean value of Euclidean distance of every a kind of sampled point midpoint to such mass center, standard deviation, maximum value;
Such sampled point to such mass center Euclidean distance average calculation method it is as follows:
,
Such sampled point to such mass center Euclidean distance standard deviation calculation method it is as follows:
,
Such sampled point to such mass center Euclidean distance maximum value calculation method it is as follows:
,
Wherein, xiFor ith sample point, μjFor jth class mass center, m is sampled point quantity;
For some specific equipment, following information can be obtained:
Preferably, in step B, calculate the sampled point to each classification mass center Euclidean distance, with sampled point Euclidean Class where the smallest mass center, as class belonging to the sampled point judge the Euclidean of sampled point and the mass center of its affiliated class Distance whether be greater than sampled point to each classification mass center Euclidean distance maximum value and 2 times of the sum of Euclidean distance standard deviation, The case where in the event of being greater than, then it is assumed that the sampled point is abnormal sample point, otherwise it is assumed that the sampled point is normal sampled point.
Brought beneficial effect is by adopting the above technical scheme: the present invention establishes frequency to connect, when connection continues Between, the behavior model that 3 dimensions of flow bandwidth are established will break out preceding or preclinical net in virus or deliberate threat software Network Activity recognition comes out, and carries out in due course early warning.
Specific embodiment
A specific embodiment of the invention the following steps are included:
A, frequency is established based on the collected connection for equipment, averagely the connection duration, flow bandwidth sample information, Form integrated information sample;Classify to the integrated information sample sampled point of equipment, forms the row of each equipment For model;
B, using the behavior model of above equipment, unusual checking is carried out for new sampled point.
In step A, network flow is acquired using the probe device disposed in network, in the study stage, periodically to net The connection of every equipment in network is established frequency and is sampled, and forms each equipment connection and establishes frequency sampling sample, such as following table It is shown,
Wherein, FM,NIndicate that n-th of sampled point in frequency sampling sample is established in the connection of equipment m.
In step A, network flow is acquired using the probe device disposed in network, in the study stage, to every in network The connection duration of platform equipment is sampled.Through sampling after a period of time, each equipment connection duration sampling is formed Sample, as shown in the table,
Wherein, TM,NIndicate n-th of sampled point in the average connection duration sample of equipment m.
In step A, network flow is acquired using the probe device disposed in network, in the study stage, periodically to net The flow bandwidth of every equipment in network is sampled.Through sampling after a period of time, each equipment flow bandwidth sampling is formed Sample, as shown in the table,
Wherein, BM,NIndicate n-th of sampled point in the flow bandwidth sample of equipment m.
In step A, frequency is established into the collected connection for equipment, averagely connection duration, flow bandwidth are adopted Sample information forms integrated information sample, as shown in the table,
Wherein SM,NN-th of sampled point in the integrated information sample of equipment m, and SM,N={ FM,N , TM,N , BM,N }。
In step A, classification is carried out to the integrated information sample sampled point of equipment and is included the following steps,
K point is randomly choosed as initial center of mass point, when the cluster allocation result of any one point changes, to data set Each of data point it is nearest that data point is assigned to distance to each centroid calculation mass center at a distance from data point Cluster calculates the mean value of all the points in cluster to each cluster, and using mean value as mass center.
The value of K is determined by silhouette coefficient,
For each sample point x(i), calculate point x(i)With the average value of the every other sampled point distance in its same cluster, note Make a(i), for quantifying the condensation degree in cluster;
Choose x(i)An outer cluster b calculates x(i)With the average distance of all the points in b, every other cluster is traversed, is found nearest This average distance, is denoted as b(i), as x(i)Neighbours' class, for quantifying separating degree between cluster;
For sample point x(i), silhouette coefficient
Calculate all sample point x(i)Silhouette coefficient, finding out average value is overall profile coefficient, metric data cluster it is close Degree.
In step A, the behavior model for establishing equipment includes the following steps,
Calculate the mean value of Euclidean distance of every a kind of sampled point midpoint to such mass center, standard deviation, maximum value;
Such sampled point to such mass center Euclidean distance average calculation method it is as follows:
,
Such sampled point to such mass center Euclidean distance standard deviation calculation method it is as follows:
,
Such sampled point to such mass center Euclidean distance maximum value calculation method it is as follows:
,
Wherein, xiFor ith sample point, μjFor jth class mass center, m is sampled point quantity;
For some specific equipment, following information can be obtained:
In step B, calculate the sampled point to each classification mass center Euclidean distance, it is the smallest with sampled point Euclidean distance Class where mass center, as class belonging to the sampled point judge whether the Euclidean distance of sampled point and the mass center of its affiliated class is big In the Euclidean distance maximum value and 2 times of the sum of Euclidean distance standard deviation of the mass center of sampled point to each classification, in the event of Greater than the case where, then it is assumed that the sampled point be abnormal sample point, otherwise it is assumed that the sampled point be normal sampled point.
The above shows and describes the basic principles and main features of the present invention and the advantages of the present invention.The technology of the industry Personnel are it should be appreciated that the present invention is not limited to the above embodiments, and the above embodiments and description only describe this The principle of invention, without departing from the spirit and scope of the present invention, various changes and improvements may be made to the invention, these changes Change and improvement all fall within the protetion scope of the claimed invention.The claimed scope of the invention by appended claims and its Equivalent thereof.

Claims (9)

1. a kind of method of the unusual checking based on clustering algorithm, it is characterised in that the following steps are included: A, based on acquisition To the connection for equipment establish frequency, averagely connection the duration, flow bandwidth sample information, formed integrated information sampling Sample;Classify to the integrated information sample sampled point of equipment, forms the behavior model of each equipment;B, using above-mentioned The behavior model of equipment carries out unusual checking for new sampled point.
2. the method for the unusual checking according to claim 1 based on clustering algorithm, it is characterised in that: in step A, Network flow is acquired using the probe device disposed in network, in the study stage, periodically to every equipment in network Connection is established frequency and is sampled, and forms each equipment connection and establishes frequency sampling sample, as shown in the table,
Wherein, FM,NIndicate that n-th of sampled point in frequency sampling sample is established in the connection of equipment m.
3. the method for the unusual checking according to claim 2 based on clustering algorithm, it is characterised in that: in step A, Network flow is acquired using the probe device disposed in network, in the study stage, the connection of every equipment in network is continued Time is sampled.Through sampling after a period of time, each equipment connection duration sample is formed, as shown in the table,
Wherein, TM,NIndicate n-th of sampled point in the average connection duration sample of equipment m.
4. the method for the unusual checking according to claim 3 based on clustering algorithm, it is characterised in that: in step A, Network flow is acquired using the probe device disposed in network, in the study stage, periodically to every equipment in network Flow bandwidth is sampled.Through sampling after a period of time, each equipment flow bandwidth sample is formed, as shown in the table,
Wherein, BM,NIndicate n-th of sampled point in the flow bandwidth sample of equipment m.
5. the method for the unusual checking according to claim 4 based on clustering algorithm, it is characterised in that: in step A, Frequency is established into the collected connection for equipment, averagely the connection duration, flow bandwidth sample information forms comprehensive letter Sample is ceased, as shown in the table,
Wherein SM,NN-th of sampled point in the integrated information sample of equipment m, and Sm,n={Fm,n,Tm,n,Bm,n}。
6. the method for the unusual checking according to claim 1 based on clustering algorithm, it is characterised in that: in step A, Classification is carried out to the integrated information sample sampled point of equipment to include the following steps, randomly chooses K point as initial matter Heart point, when the cluster allocation result of any one point changes, to each data point that data are concentrated, to each mass center Mass center is calculated at a distance from data point, data point is assigned to apart from nearest cluster, to each cluster, calculates all the points in cluster Mean value, and using mean value as mass center.
7. the method for the unusual checking according to claim 6 based on clustering algorithm, it is characterised in that: pass through profile Coefficient determines the value of K, for each sample point x(i), calculate point x(i)With the every other sampled point in its same cluster away from From average value, be denoted as a(i), for quantifying the condensation degree in cluster;Choose x(i)An outer cluster b calculates x(i)With all the points in b Average distance, traverse every other cluster, find this nearest average distance, be denoted as b(i), as x(i)Neighbours' class, be used for Quantify separating degree between cluster;For sample point x(i), silhouette coefficient;Calculate institute There is sample point x(i)Silhouette coefficient, finding out average value is overall profile coefficient, the tightness degree of metric data cluster.
8. the method for the unusual checking according to claim 7 based on clustering algorithm, it is characterised in that: in step A, The behavior model for establishing equipment includes the following steps, calculate every a kind of sampled point midpoint to such mass center Euclidean distance it is equal Value, standard deviation, maximum value;Such sampled point to such mass center Euclidean distance average calculation method it is as follows:, Such sampled point to such mass center Euclidean distance standard deviation calculation method it is as follows:, such sampling The maximum value calculation method that point arrives the Euclidean distance of such mass center is as follows:, Wherein, xiFor ith sample point, μjFor jth class mass center, m is sampled point quantity;For some specific equipment, can be obtained as follows Information:
9. the method for the unusual checking according to claim 8 based on clustering algorithm, it is characterised in that: in step B, Calculate the sampled point to each classification mass center Euclidean distance, and the class where the smallest mass center of sampled point Euclidean distance, i.e., For class belonging to the sampled point, judge whether sampled point and the Euclidean distance of the mass center of its affiliated class are greater than sampled point to each point The Euclidean distance maximum value of the mass center of class and 2 times of the sum of Euclidean distance standard deviation, the case where in the event of being greater than, then it is assumed that The sampled point is abnormal sample point, otherwise it is assumed that the sampled point is normal sampled point.
CN201811355937.5A 2018-11-15 2018-11-15 Abnormal behavior detection method based on clustering algorithm Active CN109714311B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811355937.5A CN109714311B (en) 2018-11-15 2018-11-15 Abnormal behavior detection method based on clustering algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811355937.5A CN109714311B (en) 2018-11-15 2018-11-15 Abnormal behavior detection method based on clustering algorithm

Publications (2)

Publication Number Publication Date
CN109714311A true CN109714311A (en) 2019-05-03
CN109714311B CN109714311B (en) 2021-12-31

Family

ID=66254848

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811355937.5A Active CN109714311B (en) 2018-11-15 2018-11-15 Abnormal behavior detection method based on clustering algorithm

Country Status (1)

Country Link
CN (1) CN109714311B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110162419A (en) * 2019-05-31 2019-08-23 北京奇艺世纪科技有限公司 A kind of information consumption condition detection method and device
CN110417744A (en) * 2019-06-28 2019-11-05 平安科技(深圳)有限公司 The safe determination method and device of network access
CN110445753A (en) * 2019-06-28 2019-11-12 平安科技(深圳)有限公司 The partition method and device of terminal device abnormal access
CN113765914A (en) * 2021-09-03 2021-12-07 杭州安恒信息技术股份有限公司 CC attack protection method, system, computer equipment and readable storage medium
CN113938410A (en) * 2021-10-14 2022-01-14 广东电网有限责任公司 Terminal protocol identification method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120096551A1 (en) * 2010-10-13 2012-04-19 National Taiwan University Of Science And Technology Intrusion detecting system and method for establishing classifying rules thereof
CN103200133A (en) * 2013-03-21 2013-07-10 南京邮电大学 Flow identification method based on network flow gravitation cluster
CN103929738A (en) * 2014-04-21 2014-07-16 东南大学 WSNs united intrusion detection method based on multiple danger agents
CN106714220A (en) * 2017-01-06 2017-05-24 江南大学 WSN (Wireless Sensor Network) anomaly detection method based on MEA-BP neural network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120096551A1 (en) * 2010-10-13 2012-04-19 National Taiwan University Of Science And Technology Intrusion detecting system and method for establishing classifying rules thereof
CN103200133A (en) * 2013-03-21 2013-07-10 南京邮电大学 Flow identification method based on network flow gravitation cluster
CN103929738A (en) * 2014-04-21 2014-07-16 东南大学 WSNs united intrusion detection method based on multiple danger agents
CN106714220A (en) * 2017-01-06 2017-05-24 江南大学 WSN (Wireless Sensor Network) anomaly detection method based on MEA-BP neural network

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110162419A (en) * 2019-05-31 2019-08-23 北京奇艺世纪科技有限公司 A kind of information consumption condition detection method and device
CN110417744A (en) * 2019-06-28 2019-11-05 平安科技(深圳)有限公司 The safe determination method and device of network access
CN110445753A (en) * 2019-06-28 2019-11-12 平安科技(深圳)有限公司 The partition method and device of terminal device abnormal access
CN110417744B (en) * 2019-06-28 2021-12-24 平安科技(深圳)有限公司 Security determination method and device for network access
CN113765914A (en) * 2021-09-03 2021-12-07 杭州安恒信息技术股份有限公司 CC attack protection method, system, computer equipment and readable storage medium
CN113938410A (en) * 2021-10-14 2022-01-14 广东电网有限责任公司 Terminal protocol identification method and device
CN113938410B (en) * 2021-10-14 2023-05-23 广东电网有限责任公司 Terminal protocol identification method and device

Also Published As

Publication number Publication date
CN109714311B (en) 2021-12-31

Similar Documents

Publication Publication Date Title
CN109714311A (en) A method of the unusual checking based on clustering algorithm
Peng et al. A detection method for anomaly flow in software defined network
CN109729090B (en) Slow denial of service attack detection method based on WEDMS clustering
CN112788066B (en) Abnormal flow detection method and system for Internet of things equipment and storage medium
CN109150859B (en) Botnet detection method based on network traffic flow direction similarity
CN107579846B (en) Cloud computing fault data detection method and system
CN111800430B (en) Attack group identification method, device, equipment and medium
CN110475246B (en) Malicious anchor node detection method based on isolated forest and sequential probability ratio detection
CN109218321A (en) A kind of network inbreak detection method and system
CN105871634A (en) Method and application for detecting cluster anomalies and cluster managing system
CN116304766A (en) Multi-sensor-based quick assessment method for state of switch cabinet
CN109067722A (en) A kind of LDoS detection method based on two steps cluster and detection lug analysis joint algorithm
CN111970229B (en) CAN bus data anomaly detection method aiming at multiple attack modes
CN108683686A (en) A kind of Stochastic subspace name ddos attack detection method
CN110084169A (en) A kind of architecture against regulations object recognition methods based on K-Means cluster and profile topological constraints
CN114422184A (en) Network security attack type and threat level prediction method based on machine learning
CN109450957A (en) A kind of low speed Denial of Service attack detection method based on cloud model
CN110851422A (en) Data anomaly monitoring model construction method based on machine learning
CN111818049B (en) Botnet flow detection method and system based on Markov model
CN107070941A (en) The method and apparatus of abnormal traffic detection
JP2021527873A (en) Protocol-independent anomaly detection
CN103269337B (en) Data processing method and device
CN113794742B (en) High-precision detection method for FDIA of power system
Oh et al. Attack Classification Based on Data Mining Technique and Its Application for Reliable Medical Sensor Communication.
CN112291193B (en) LDoS attack detection method based on NCS-SVM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant