CN109714311A - A method of the unusual checking based on clustering algorithm - Google Patents
A method of the unusual checking based on clustering algorithm Download PDFInfo
- Publication number
- CN109714311A CN109714311A CN201811355937.5A CN201811355937A CN109714311A CN 109714311 A CN109714311 A CN 109714311A CN 201811355937 A CN201811355937 A CN 201811355937A CN 109714311 A CN109714311 A CN 109714311A
- Authority
- CN
- China
- Prior art keywords
- equipment
- point
- sample
- sampled point
- mass center
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Abstract
The invention discloses a kind of methods of unusual checking based on clustering algorithm, comprising the following steps: A, establishes frequency based on the collected connection for equipment, averagely the connection duration, flow bandwidth sample information forms integrated information sample;Classify to the integrated information sample sampled point of equipment, forms the behavior model of each equipment;B, using the behavior model of above equipment, unusual checking is carried out for new sampled point.The present invention can improve the deficiencies in the prior art, and before virus or deliberate threat software being broken out or preclinical network behavior identifies, carry out in due course early warning.
Description
Technical field
The present invention relates to technical field of the computer network, the side of especially a kind of unusual checking based on clustering algorithm
Method.
Background technique
With the development of information technology, industrial control system gradually moves towards open, interconnection, general.Many Industry Control associations
View is gradually run in Industrial Ethernet, and the attack for industrial control system is also more universal.It is interconnected relative to traditional IT
Network, the deliberate threat software in industry control network greatly will not immediately cause industrial network once slipped into successfully
Destroy, but hide, detect and await a favorable opportunity maturation when (such as internet connection receives instruction), then start suddenly into
Row violence damage.
Currently, abnormal traffic detection technology is mainly that the combination of white list and blacklist detects in network, white list with it is black
List technology carries out deep analysis both for network protocol flow, and is matched with white list or blacklist rule, from
And abnormal network protocol traffic is alerted.In or incubation period preceding for virus or the outburst of deliberate threat software, very greatly
A part of virus and deliberate threat software are only detected, and cause white list and black list techniques that can not detect prestige well
Coerce the abnormal network behavior in incubation period, for before virus or the outburst of deliberate threat software or preclinical network behavior without
Method correctly identifies.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of methods of unusual checking based on clustering algorithm, can
The deficiencies in the prior art are solved, virus or the outburst of deliberate threat software can be identified preceding or preclinical network behavior
Come, carries out in due course early warning.
In order to solve the above technical problems, the technical solution used in the present invention is as follows.
A method of the unusual checking based on clustering algorithm, comprising the following steps:
A, frequency is established based on the collected connection for equipment, averagely the connection duration, flow bandwidth sample information, shape
At integrated information sample;Classify to the integrated information sample sampled point of equipment, forms the behavior of each equipment
Model;
B, using the behavior model of above equipment, unusual checking is carried out for new sampled point.
Preferably, network flow is acquired using the probe device disposed in network, in study stage, period in step A
Property frequency established to the connection of every equipment in network sample, form each equipment connection and establish frequency sampling sample
This, as shown in the table,
Wherein, FM,NIndicate that n-th of sampled point in frequency sampling sample is established in the connection of equipment m.
Preferably, network flow is acquired using the probe device disposed in network, in the study stage, to net in step A
The connection duration of every equipment in network is sampled.Through sampling after a period of time, forms each equipment connection and continue
Time sampling sample, as shown in the table,
Wherein, TM,NIndicate n-th of sampled point in the average connection duration sample of equipment m.
Preferably, network flow is acquired using the probe device disposed in network, in study stage, period in step A
Property the flow bandwidth of every equipment in network is sampled.Through sampling after a period of time, each equipment connection is formed
Duration sample, as shown in the table,
Wherein, BM,NIndicate n-th of sampled point in the average connection duration sample of equipment m.
Preferably, frequency is established in the collected connection for equipment in step A, averagely the connection duration, flow
Bandwidth sample information is measured, integrated information sample is formed, as shown in the table,
Wherein, SM,NIndicate n-th of sampled point in the average connection duration sample of equipment m, and Sm,n={Fm,n,
Tm,n,Bm,n}。
Preferably, classification is carried out to the integrated information sample sampled point of equipment and is included the following steps in step A,
K point is randomly choosed as initial center of mass point, when the cluster allocation result of any one point changes, to data set
Each of data point it is nearest that data point is assigned to distance to each centroid calculation mass center at a distance from data point
Cluster calculates the mean value of all the points in cluster to each cluster, and using mean value as mass center.
Preferably, the value of K is determined by silhouette coefficient,
For each sample point x(i), calculate point x(i)With the average value of the every other sampled point distance in its same cluster, note
Make a(i), for quantifying the condensation degree in cluster;
Choose x(i)An outer cluster b calculates x(i)With the average distance of all the points in b, every other cluster is traversed, is found nearest
This average distance, is denoted as b(i), as x(i)Neighbours' class, for quantifying separating degree between cluster;
For sample point x(i), silhouette coefficient;
Calculate all sample pointsx (i) Silhouette coefficient, finding out average value is overall profile coefficient, metric data cluster it is close
Degree.
Preferably, the behavior model for establishing equipment includes the following steps in step A,
Calculate the mean value of Euclidean distance of every a kind of sampled point midpoint to such mass center, standard deviation, maximum value;
Such sampled point to such mass center Euclidean distance average calculation method it is as follows:
,
Such sampled point to such mass center Euclidean distance standard deviation calculation method it is as follows:
,
Such sampled point to such mass center Euclidean distance maximum value calculation method it is as follows:
,
Wherein, xiFor ith sample point, μjFor jth class mass center, m is sampled point quantity;
For some specific equipment, following information can be obtained:
Preferably, in step B, calculate the sampled point to each classification mass center Euclidean distance, with sampled point Euclidean
Class where the smallest mass center, as class belonging to the sampled point judge the Euclidean of sampled point and the mass center of its affiliated class
Distance whether be greater than sampled point to each classification mass center Euclidean distance maximum value and 2 times of the sum of Euclidean distance standard deviation,
The case where in the event of being greater than, then it is assumed that the sampled point is abnormal sample point, otherwise it is assumed that the sampled point is normal sampled point.
Brought beneficial effect is by adopting the above technical scheme: the present invention establishes frequency to connect, when connection continues
Between, the behavior model that 3 dimensions of flow bandwidth are established will break out preceding or preclinical net in virus or deliberate threat software
Network Activity recognition comes out, and carries out in due course early warning.
Specific embodiment
A specific embodiment of the invention the following steps are included:
A, frequency is established based on the collected connection for equipment, averagely the connection duration, flow bandwidth sample information,
Form integrated information sample;Classify to the integrated information sample sampled point of equipment, forms the row of each equipment
For model;
B, using the behavior model of above equipment, unusual checking is carried out for new sampled point.
In step A, network flow is acquired using the probe device disposed in network, in the study stage, periodically to net
The connection of every equipment in network is established frequency and is sampled, and forms each equipment connection and establishes frequency sampling sample, such as following table
It is shown,
Wherein, FM,NIndicate that n-th of sampled point in frequency sampling sample is established in the connection of equipment m.
In step A, network flow is acquired using the probe device disposed in network, in the study stage, to every in network
The connection duration of platform equipment is sampled.Through sampling after a period of time, each equipment connection duration sampling is formed
Sample, as shown in the table,
Wherein, TM,NIndicate n-th of sampled point in the average connection duration sample of equipment m.
In step A, network flow is acquired using the probe device disposed in network, in the study stage, periodically to net
The flow bandwidth of every equipment in network is sampled.Through sampling after a period of time, each equipment flow bandwidth sampling is formed
Sample, as shown in the table,
Wherein, BM,NIndicate n-th of sampled point in the flow bandwidth sample of equipment m.
In step A, frequency is established into the collected connection for equipment, averagely connection duration, flow bandwidth are adopted
Sample information forms integrated information sample, as shown in the table,
Wherein SM,NN-th of sampled point in the integrated information sample of equipment m, and SM,N={ FM,N , TM,N , BM,N }。
In step A, classification is carried out to the integrated information sample sampled point of equipment and is included the following steps,
K point is randomly choosed as initial center of mass point, when the cluster allocation result of any one point changes, to data set
Each of data point it is nearest that data point is assigned to distance to each centroid calculation mass center at a distance from data point
Cluster calculates the mean value of all the points in cluster to each cluster, and using mean value as mass center.
The value of K is determined by silhouette coefficient,
For each sample point x(i), calculate point x(i)With the average value of the every other sampled point distance in its same cluster, note
Make a(i), for quantifying the condensation degree in cluster;
Choose x(i)An outer cluster b calculates x(i)With the average distance of all the points in b, every other cluster is traversed, is found nearest
This average distance, is denoted as b(i), as x(i)Neighbours' class, for quantifying separating degree between cluster;
For sample point x(i), silhouette coefficient;
Calculate all sample point x(i)Silhouette coefficient, finding out average value is overall profile coefficient, metric data cluster it is close
Degree.
In step A, the behavior model for establishing equipment includes the following steps,
Calculate the mean value of Euclidean distance of every a kind of sampled point midpoint to such mass center, standard deviation, maximum value;
Such sampled point to such mass center Euclidean distance average calculation method it is as follows:
,
Such sampled point to such mass center Euclidean distance standard deviation calculation method it is as follows:
,
Such sampled point to such mass center Euclidean distance maximum value calculation method it is as follows:
,
Wherein, xiFor ith sample point, μjFor jth class mass center, m is sampled point quantity;
For some specific equipment, following information can be obtained:
In step B, calculate the sampled point to each classification mass center Euclidean distance, it is the smallest with sampled point Euclidean distance
Class where mass center, as class belonging to the sampled point judge whether the Euclidean distance of sampled point and the mass center of its affiliated class is big
In the Euclidean distance maximum value and 2 times of the sum of Euclidean distance standard deviation of the mass center of sampled point to each classification, in the event of
Greater than the case where, then it is assumed that the sampled point be abnormal sample point, otherwise it is assumed that the sampled point be normal sampled point.
The above shows and describes the basic principles and main features of the present invention and the advantages of the present invention.The technology of the industry
Personnel are it should be appreciated that the present invention is not limited to the above embodiments, and the above embodiments and description only describe this
The principle of invention, without departing from the spirit and scope of the present invention, various changes and improvements may be made to the invention, these changes
Change and improvement all fall within the protetion scope of the claimed invention.The claimed scope of the invention by appended claims and its
Equivalent thereof.
Claims (9)
1. a kind of method of the unusual checking based on clustering algorithm, it is characterised in that the following steps are included: A, based on acquisition
To the connection for equipment establish frequency, averagely connection the duration, flow bandwidth sample information, formed integrated information sampling
Sample;Classify to the integrated information sample sampled point of equipment, forms the behavior model of each equipment;B, using above-mentioned
The behavior model of equipment carries out unusual checking for new sampled point.
2. the method for the unusual checking according to claim 1 based on clustering algorithm, it is characterised in that: in step A,
Network flow is acquired using the probe device disposed in network, in the study stage, periodically to every equipment in network
Connection is established frequency and is sampled, and forms each equipment connection and establishes frequency sampling sample, as shown in the table,
Wherein, FM,NIndicate that n-th of sampled point in frequency sampling sample is established in the connection of equipment m.
3. the method for the unusual checking according to claim 2 based on clustering algorithm, it is characterised in that: in step A,
Network flow is acquired using the probe device disposed in network, in the study stage, the connection of every equipment in network is continued
Time is sampled.Through sampling after a period of time, each equipment connection duration sample is formed, as shown in the table,
Wherein, TM,NIndicate n-th of sampled point in the average connection duration sample of equipment m.
4. the method for the unusual checking according to claim 3 based on clustering algorithm, it is characterised in that: in step A,
Network flow is acquired using the probe device disposed in network, in the study stage, periodically to every equipment in network
Flow bandwidth is sampled.Through sampling after a period of time, each equipment flow bandwidth sample is formed, as shown in the table,
Wherein, BM,NIndicate n-th of sampled point in the flow bandwidth sample of equipment m.
5. the method for the unusual checking according to claim 4 based on clustering algorithm, it is characterised in that: in step A,
Frequency is established into the collected connection for equipment, averagely the connection duration, flow bandwidth sample information forms comprehensive letter
Sample is ceased, as shown in the table,
Wherein SM,NN-th of sampled point in the integrated information sample of equipment m, and Sm,n={Fm,n,Tm,n,Bm,n}。
6. the method for the unusual checking according to claim 1 based on clustering algorithm, it is characterised in that: in step A,
Classification is carried out to the integrated information sample sampled point of equipment to include the following steps, randomly chooses K point as initial matter
Heart point, when the cluster allocation result of any one point changes, to each data point that data are concentrated, to each mass center
Mass center is calculated at a distance from data point, data point is assigned to apart from nearest cluster, to each cluster, calculates all the points in cluster
Mean value, and using mean value as mass center.
7. the method for the unusual checking according to claim 6 based on clustering algorithm, it is characterised in that: pass through profile
Coefficient determines the value of K, for each sample point x(i), calculate point x(i)With the every other sampled point in its same cluster away from
From average value, be denoted as a(i), for quantifying the condensation degree in cluster;Choose x(i)An outer cluster b calculates x(i)With all the points in b
Average distance, traverse every other cluster, find this nearest average distance, be denoted as b(i), as x(i)Neighbours' class, be used for
Quantify separating degree between cluster;For sample point x(i), silhouette coefficient;Calculate institute
There is sample point x(i)Silhouette coefficient, finding out average value is overall profile coefficient, the tightness degree of metric data cluster.
8. the method for the unusual checking according to claim 7 based on clustering algorithm, it is characterised in that: in step A,
The behavior model for establishing equipment includes the following steps, calculate every a kind of sampled point midpoint to such mass center Euclidean distance it is equal
Value, standard deviation, maximum value;Such sampled point to such mass center Euclidean distance average calculation method it is as follows:,
Such sampled point to such mass center Euclidean distance standard deviation calculation method it is as follows:, such sampling
The maximum value calculation method that point arrives the Euclidean distance of such mass center is as follows:,
Wherein, xiFor ith sample point, μjFor jth class mass center, m is sampled point quantity;For some specific equipment, can be obtained as follows
Information:
。
9. the method for the unusual checking according to claim 8 based on clustering algorithm, it is characterised in that: in step B,
Calculate the sampled point to each classification mass center Euclidean distance, and the class where the smallest mass center of sampled point Euclidean distance, i.e.,
For class belonging to the sampled point, judge whether sampled point and the Euclidean distance of the mass center of its affiliated class are greater than sampled point to each point
The Euclidean distance maximum value of the mass center of class and 2 times of the sum of Euclidean distance standard deviation, the case where in the event of being greater than, then it is assumed that
The sampled point is abnormal sample point, otherwise it is assumed that the sampled point is normal sampled point.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811355937.5A CN109714311B (en) | 2018-11-15 | 2018-11-15 | Abnormal behavior detection method based on clustering algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811355937.5A CN109714311B (en) | 2018-11-15 | 2018-11-15 | Abnormal behavior detection method based on clustering algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109714311A true CN109714311A (en) | 2019-05-03 |
CN109714311B CN109714311B (en) | 2021-12-31 |
Family
ID=66254848
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811355937.5A Active CN109714311B (en) | 2018-11-15 | 2018-11-15 | Abnormal behavior detection method based on clustering algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109714311B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110162419A (en) * | 2019-05-31 | 2019-08-23 | 北京奇艺世纪科技有限公司 | A kind of information consumption condition detection method and device |
CN110417744A (en) * | 2019-06-28 | 2019-11-05 | 平安科技(深圳)有限公司 | The safe determination method and device of network access |
CN110445753A (en) * | 2019-06-28 | 2019-11-12 | 平安科技(深圳)有限公司 | The partition method and device of terminal device abnormal access |
CN113765914A (en) * | 2021-09-03 | 2021-12-07 | 杭州安恒信息技术股份有限公司 | CC attack protection method, system, computer equipment and readable storage medium |
CN113938410A (en) * | 2021-10-14 | 2022-01-14 | 广东电网有限责任公司 | Terminal protocol identification method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120096551A1 (en) * | 2010-10-13 | 2012-04-19 | National Taiwan University Of Science And Technology | Intrusion detecting system and method for establishing classifying rules thereof |
CN103200133A (en) * | 2013-03-21 | 2013-07-10 | 南京邮电大学 | Flow identification method based on network flow gravitation cluster |
CN103929738A (en) * | 2014-04-21 | 2014-07-16 | 东南大学 | WSNs united intrusion detection method based on multiple danger agents |
CN106714220A (en) * | 2017-01-06 | 2017-05-24 | 江南大学 | WSN (Wireless Sensor Network) anomaly detection method based on MEA-BP neural network |
-
2018
- 2018-11-15 CN CN201811355937.5A patent/CN109714311B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120096551A1 (en) * | 2010-10-13 | 2012-04-19 | National Taiwan University Of Science And Technology | Intrusion detecting system and method for establishing classifying rules thereof |
CN103200133A (en) * | 2013-03-21 | 2013-07-10 | 南京邮电大学 | Flow identification method based on network flow gravitation cluster |
CN103929738A (en) * | 2014-04-21 | 2014-07-16 | 东南大学 | WSNs united intrusion detection method based on multiple danger agents |
CN106714220A (en) * | 2017-01-06 | 2017-05-24 | 江南大学 | WSN (Wireless Sensor Network) anomaly detection method based on MEA-BP neural network |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110162419A (en) * | 2019-05-31 | 2019-08-23 | 北京奇艺世纪科技有限公司 | A kind of information consumption condition detection method and device |
CN110417744A (en) * | 2019-06-28 | 2019-11-05 | 平安科技(深圳)有限公司 | The safe determination method and device of network access |
CN110445753A (en) * | 2019-06-28 | 2019-11-12 | 平安科技(深圳)有限公司 | The partition method and device of terminal device abnormal access |
CN110417744B (en) * | 2019-06-28 | 2021-12-24 | 平安科技(深圳)有限公司 | Security determination method and device for network access |
CN113765914A (en) * | 2021-09-03 | 2021-12-07 | 杭州安恒信息技术股份有限公司 | CC attack protection method, system, computer equipment and readable storage medium |
CN113938410A (en) * | 2021-10-14 | 2022-01-14 | 广东电网有限责任公司 | Terminal protocol identification method and device |
CN113938410B (en) * | 2021-10-14 | 2023-05-23 | 广东电网有限责任公司 | Terminal protocol identification method and device |
Also Published As
Publication number | Publication date |
---|---|
CN109714311B (en) | 2021-12-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109714311A (en) | A method of the unusual checking based on clustering algorithm | |
Peng et al. | A detection method for anomaly flow in software defined network | |
CN109729090B (en) | Slow denial of service attack detection method based on WEDMS clustering | |
CN112788066B (en) | Abnormal flow detection method and system for Internet of things equipment and storage medium | |
CN109150859B (en) | Botnet detection method based on network traffic flow direction similarity | |
CN107579846B (en) | Cloud computing fault data detection method and system | |
CN111800430B (en) | Attack group identification method, device, equipment and medium | |
CN110475246B (en) | Malicious anchor node detection method based on isolated forest and sequential probability ratio detection | |
CN109218321A (en) | A kind of network inbreak detection method and system | |
CN105871634A (en) | Method and application for detecting cluster anomalies and cluster managing system | |
CN116304766A (en) | Multi-sensor-based quick assessment method for state of switch cabinet | |
CN109067722A (en) | A kind of LDoS detection method based on two steps cluster and detection lug analysis joint algorithm | |
CN111970229B (en) | CAN bus data anomaly detection method aiming at multiple attack modes | |
CN108683686A (en) | A kind of Stochastic subspace name ddos attack detection method | |
CN110084169A (en) | A kind of architecture against regulations object recognition methods based on K-Means cluster and profile topological constraints | |
CN114422184A (en) | Network security attack type and threat level prediction method based on machine learning | |
CN109450957A (en) | A kind of low speed Denial of Service attack detection method based on cloud model | |
CN110851422A (en) | Data anomaly monitoring model construction method based on machine learning | |
CN111818049B (en) | Botnet flow detection method and system based on Markov model | |
CN107070941A (en) | The method and apparatus of abnormal traffic detection | |
JP2021527873A (en) | Protocol-independent anomaly detection | |
CN103269337B (en) | Data processing method and device | |
CN113794742B (en) | High-precision detection method for FDIA of power system | |
Oh et al. | Attack Classification Based on Data Mining Technique and Its Application for Reliable Medical Sensor Communication. | |
CN112291193B (en) | LDoS attack detection method based on NCS-SVM |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |