CN109450860A - A kind of detection method threatened based on entropy and the advanced duration of support vector machines - Google Patents
A kind of detection method threatened based on entropy and the advanced duration of support vector machines Download PDFInfo
- Publication number
- CN109450860A CN109450860A CN201811200227.5A CN201811200227A CN109450860A CN 109450860 A CN109450860 A CN 109450860A CN 201811200227 A CN201811200227 A CN 201811200227A CN 109450860 A CN109450860 A CN 109450860A
- Authority
- CN
- China
- Prior art keywords
- data
- entropy
- flow
- support vector
- vector machines
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computer Security & Cryptography (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Signal Processing (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of detection methods threatened based on entropy and the advanced duration of support vector machines, are related to field of information security technology.The present invention is after arranging original flow data packet, pass through data statistics, it calculates the entropy of flow information in local area network and then extracts new feature vector, reuse support vector machines training and establish machine learning model, be finally reached and detect the purpose that advanced duration threatens.It is an advantage of the invention that mass data is divided into the smaller flow section of the order of magnitude, and then can soon carry out classification based training and detection by being segmented continuous data on flows.Meanwhile will test specific malicious traffic stream and being converted into the flow section that detection includes malicious traffic stream, the processing of greatly simplified data and reduce rate of false alarm.
Description
Technical field
The present invention relates to field of information security technology more particularly to a kind of methods for detecting advanced duration and threatening.
Background technique
In the past more than ten years, Global Internet high speed development.Meanwhile network attack also emerges one after another, in attack
It is all evolving in quantity and means.2010, notorious network attack shake net (Stuxnet) was by security study personnel
It was found that and disclosing a large amount of attack details.Henceforth, advanced duration threatens (Advanced Persistent
Threats, APT) constantly appear in the visual field of people.APT is a kind of long-term, the net customized for specific objective height
Network attack means.Due to its attack feature multiplicity, attack pattern emerges one after another, therefore traditional inspection based on pattern match
Survey mode is difficult to play effective effect.
In recent years, there are many work about APT detection.It is attacked for APT, many security firms and research people
Member gives a variety of solutions.The technologies such as " traditional characteristic matching ", " black and white lists " all can not effectively find that unknown APT is attacked
It hits.Therefore, Xu Wang et al. is proposed after having studied many APT attack cases, is detected for the stage of communication in APT attack
With the presence or absence of the flow of order and controlling behavior and its generation, scheme is found between the destination address of order controlling behavior access
The relevance address information in normal discharge that compares is very different place, and scheme calculates each access by detection
The relevance of location is to find suspicious flow.But the deficiency of the method is that detection time is too long, can not find in time and
Reduce APT attack bring loss.In addition, Sana Siddiqui proposes the classification using the machine learning based on fractal dimension
Method, by study it has been found that the feature for the attack traffic that Malware used in APT attack case generates establishes inspection
Model is surveyed, and by comparing to obtain more outstanding classification performance with common kNN algorithm.Although this scheme can detect portion
Divide malicious traffic stream, but the rate of false alarm generated is very high, and very high calculating energy is needed for the processing of the data on flows of magnanimity
Power, therefore its detectability is greatly restricted.F à tima Barcel ó-Rico et al. lays particular emphasis on detection HTTP request
Behavioural characteristic, with semi-supervised learning model Genetic Programming (GP), two Decision Tree
Classifier (DTC) and Support Vector Machine (SVM) are tested, and compared the different machine of these types
The superiority and inferiority of learning algorithm.As a result, it has been found that SVM and DTC can only only have the case where a small amount of known suspicious example in mass data
Still it is able to maintain preferable performance.
Summary of the invention
Goal of the invention: in view of the deficiencies of the prior art, it is an object of the present invention to provide a kind of inspections that advanced duration threatens
Survey method.This method quickly accurately can identify APT attack, while the feature pole that this method is new by extraction by mass data
Data volume to be processed needed for the earth reduces.
Technical solution: to achieve the goals above, the present invention proposes a kind of advanced lasting based on entropy and support vector machines
Property threaten detection method, include the following steps:
A kind of detection method threatened based on entropy and the advanced duration of support vector machines, is included the following steps:
A) data on flows is recorded on heart interchanger in a local network, acquires flow information, flow information includes but is not limited to
The source of data packet, destination address, source, destination port, byte number, timestamp;
B) data for saving acquisition save several features of data flow by network package analysis-reduction at data flow
Information;
C) time period t is set, step b) the data obtained is divided into the sample that n time interval is t, is calculated separately
The entropy of several characteristic informations of each sample forms new data characteristics vector;
D) using the data obtained feature vector in step c) as input, train foundation that can identify band by machine learning
There is the model of abnormal flow, until the evaluation index of training pattern reaches specified threshold;
E) classified using model obtained by step d) to the flow in any interval of time t, to judge this section
With the presence or absence of abnormal data flow in flow.
Further, in step b, the characteristic information includes transmission byte number, transmits data packet number and data flow
Duration.
Further, in step b, the data that the acquisition saves include normal discharge data sample and abnormal flow data
Sample, the data on flows that wherein normal discharge data sample is collected in step a, abnormal flow data sample is from open
Mila Parkour contribution Contagio malware data library.
Further, in step c, the time period t takes 10 seconds in the network of 100Mbps, and the equipment for handling data is matched
It is set to 4 core 2.5GHz CPU, 8GB RAM of Intel and 2.5TB hard disk.
Further, in step c, when the entropy of calculating data flow characteristics information forms new feature vector, using Shannon
Discrete source comentropy algorithm:
Wherein, H (X) indicates the discrete source entropy of data flow characteristics information X, P (xi) indicate that characteristic information X takes xiWhen, at this
The probability in certain value interval is appeared in sample, b is the logarithm truth of a matter, and m indicates the number of samples of characteristic information X in the sample.
Further, in step d, machine learning model uses support vector machines:
Wherein, { -1 ,+1 } f (t) ∈, the transposition of () ' representing matrix, sign (ζ) indicate the symbol of number ζ, and t is the number
According to the element after feature vector Plays, i.e. test sample, α is Ah's lagrange's variable, xs(s=1 ..., | S |) it is branch
Vector is held, S indicates training sample data set, and p represents polynomial number, and k is sorting parameter obtained in training process;
Kernel function in support vector machines uses gaussian kernel function, for two sample number strong point xiWith observation data point xj,
Its target value is y, obtains gaussian kernel function:
k(xi,xj)=e-y‖xi-xj‖2。
Further, in step d, the evaluation index includes: accuracy rate, and rate of precision returns and calls rate and f1score together.
Further, in step d, the specified threshold is set as 90% or more.
The utility model has the advantages that
The present invention is divided into the smaller flow section of the order of magnitude by the way that continuous data on flows to be segmented, by mass data, and
Calculating the new feature vector of extraction by statistics makes the data volume for being eventually used to training machine learning model further be contracted
It is small, and then can soon carry out classification based training and detection.Meanwhile it will test specific malicious traffic stream and being converted into detection comprising disliking
The flow section of meaning flow, greatly simplifies the processing of data and reduces rate of false alarm.
Detailed description of the invention
Fig. 1 is the overview flow chart of the method for the present invention;
Fig. 2 is data flow duration probability distribution illustrated example.
Specific embodiment:
With reference to the accompanying drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that is retouched
The embodiment stated is only a part of the embodiment of the present invention, instead of all the embodiments.Below at least one exemplary reality
Apply the description only actually of example be it is illustrative, never as to the present invention and its application or any restrictions used.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
The present invention is based on the detection methods that entropy and the advanced duration of support vector machines threaten.Compiling flow packet number
According to later, by data statistics, calculates the entropy of flow information in local area network and then extract new feature vector, reuse support
Machine learning model is established in vector machine training, is finally reached and is detected the purpose that advanced duration threatens.Specific implementation step is as follows,
And detailed process is shown in attached drawing 1.
Step S1: data on flows is acquired from center switch, crucial flow information is extracted, binary data is flowed into
Row reduction, with PCAP (Process Characterization Analysis Package, process characteristic analysis software package) text
The form of part is saved TCP packet, UDP packet.Wherein flow information includes but is not limited to the source of data packet, destination address, source,
Destination port, byte number, timestamp.The untreated original traffic packet data of acquisition is as shown in table 1,
The original traffic packet data of table 1
Step S2: by using network package analysis tool Wireshark, flow data packet is arranged.First use
Filter remove length be 0 and retransmitted packet, specific filter condition be "!tcp.analysis.retransmission and
tcp.len>0".After obtaining pretreated data, using command-line tool tshark according to the serial number and timestamp of data packet
Rearrangement, composition data stream, and preservation is exported into txt file.It is as shown in table 2 derived data instance,
Data instance derived from table 2
Address:port A | Address:port B | Packets | Bytes | Duration |
172.18.125.127:50504 | 42.202.151.230:80 | 14989 | 11543526 | 28.7467 |
172.18.19.8:7037 | 58.221.74.144:80 | 14821 | 16151885 | 21.8665 |
172.26.28.188:26528 | 182.247.250.19:80 | 12694 | 9610185 | 20.8633 |
172.21.61.237:57685 | 172.106.33.219:10019 | 6030 | 4645450 | 23.8794 |
Step S3: one time period t of setting, using the one piece of data stream in the t period as a sample, so present
Some datas on flows are divided into n sections, just obtain n sample.It as shown in Fig. 2, is the probability of duration in one of sample
Distribution map.Horizontal axis is the data flow duration, and the longitudinal axis is the duration of data flow to fall probability within a certain period of time.Then
Pass through shannon entropy formula
Wherein, H (X) indicates the discrete source entropy of data flow characteristics information X, P (xi) indicate that characteristic information X takes xiWhen, at this
The probability in certain value interval is appeared in sample, b is the logarithm truth of a matter, and b=2, m is usually taken to indicate characteristic information X in the sample
Number of samples, calculate the entropy of each sample, saved as record.Calculative feature includes the byte of sample
Number, the entropy of number-of-packet and duration, and then the entropy of these three features become the feature vector of input learner.Table 3
The entropy of each feature obtained is finally computed for wherein 4 samples and for the mark of classification.
The entropy of each feature of table 3 and mark for classification
Ep_Packets | Ep_Bytes | Ep_Duration | flag |
1.62997396765 | 1.13345078829 | 7.70188552486 | 0 |
1.57602566801 | 1.26045873852 | 7.59871075104 | 0 |
1.14233680811 | 0.910838676236 | 4.64659436002 | 1 |
1.13322375711 | 0.859611891072 | 1.74121380584 | 1 |
Step S4: using the result in step S3 as input, machine learning is carried out as learner using support vector machines
Training, establishes disaggregated model, wherein the kernel function in support vector machines uses gaussian kernel function.It constantly changes to learner
Generation training, until indices are more than 90%, these indexs include: accuracy rate, and rate of precision returns and calls rate and f1score together.Wherein prop up
Hold vector machine:
Accuracy rate (Accuracy):
Rate of precision (Precision):
Recall rate (Recall):
F1 score:
Wherein, TP, TN, FP, FN are real example, true counter-example, false positive example, the number of false counter-example respectively.Accuracy rate can be commented
Estimate correct ratio in the result of prediction, rate of precision is capable of the ratio of positive example in the result of assessment prediction, and recall rate can be assessed
The really ratio that the positive example in situation is found out.F1 score is the synthesis of rate of precision and recall rate, and f1 is higher to illustrate model more
Steadily and surely.
Step S5: the subsequent data on flows for carrying out same treatment can be judged after completing model training.
More than, embodiments of the present invention are illustrated by way of example, but the scope of the present invention is not limited to above-mentioned example,
In range recorded in claim, it can be changed, be deformed according to purpose.
Claims (8)
1. a kind of detection method threatened based on entropy and the advanced duration of support vector machines, which is characterized in that including walking as follows
It is rapid:
A) data on flows is recorded on heart interchanger in a local network, acquires flow information, flow information includes but is not limited to data
The source of packet, destination address, source, destination port, byte number, timestamp;
B) data for saving acquisition save several features letter of data flow by network package analysis-reduction at data flow
Breath;
C) time period t is set, step b) the data obtained is divided into the sample that n time interval is t, is calculated separately each
The entropy of several characteristic informations of sample forms new data characteristics vector;
D) using the data obtained feature vector in step c) as input, train foundation that can identify with different by machine learning
The model of normal flow, until the evaluation index of training pattern reaches specified threshold;
E) classified using model obtained by step d) to the flow in any interval of time t, to judge this section of flow
In with the presence or absence of abnormal data flow.
2. a kind of detection method threatened based on entropy and the advanced duration of support vector machines according to claim 1,
It is characterized in that, in step b, the characteristic information includes transmission byte number, transmits the duration of data packet number and data flow.
3. a kind of detection method threatened based on entropy and the advanced duration of support vector machines according to claim 1,
It being characterized in that, in step b, the data that the acquisition saves include normal discharge data sample and abnormal flow data sample,
The data on flows that middle normal discharge data sample is collected in step a, abnormal flow data sample come from disclosed Mila
The Contagio malware data library of Parkour contribution.
4. a kind of detection method threatened based on entropy and the advanced duration of support vector machines according to claim 1,
It is characterized in that, in step c, the time period t takes 10 seconds in the network of 100Mbps, and the device configuration for handling data is Intel
4 core 2.5GHz CPU, 8GB RAM and 2.5TB hard disk.
5. a kind of detection method threatened based on entropy and the advanced duration of support vector machines according to claim 1,
It is characterized in that, in step c, when the entropy of calculating data flow characteristics information forms new feature vector, using the discrete letter of Shannon
Source information entropy algorithm:
Wherein, H (X) indicates the discrete source entropy of data flow characteristics information X, P (xi) indicate that characteristic information X takes xiWhen, in the sample
In appear in probability in certain value interval, b is the logarithm truth of a matter, and m indicates the number of samples of characteristic information X in the sample.
6. a kind of detection method threatened based on entropy and the advanced duration of support vector machines according to claim 1,
It is characterized in that, in step d, machine learning model uses support vector machines:
Wherein, { -1 ,+1 } f (t) ∈, the transposition of () ' representing matrix, sign (ζ) indicate that the symbol of number ζ, t are that the data are special
Element after levying vector Plays, i.e., test sample, α are Ah's lagrange's variable, xs(s=1 ..., | S |) be support to
Amount, S indicate training sample data set, and p represents polynomial number, and k is sorting parameter obtained in training process;
Kernel function in support vector machines uses gaussian kernel function, for two sample number strong point xiWith observation data point xj, mesh
Scale value is y, obtains gaussian kernel function:
k(xi,xj)=e-y‖xi-xj‖2。
7. a kind of detection method threatened based on entropy and the advanced duration of support vector machines according to claim 1,
It is characterized in that, in step d, the evaluation index includes: accuracy rate, and rate of precision returns and calls rate and f1 score together.
8. a kind of detection method threatened based on entropy and the advanced duration of support vector machines according to claim 7,
It is characterized in that, in step d, the specified threshold is set as 90% or more.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811200227.5A CN109450860A (en) | 2018-10-16 | 2018-10-16 | A kind of detection method threatened based on entropy and the advanced duration of support vector machines |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811200227.5A CN109450860A (en) | 2018-10-16 | 2018-10-16 | A kind of detection method threatened based on entropy and the advanced duration of support vector machines |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109450860A true CN109450860A (en) | 2019-03-08 |
Family
ID=65545105
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811200227.5A Pending CN109450860A (en) | 2018-10-16 | 2018-10-16 | A kind of detection method threatened based on entropy and the advanced duration of support vector machines |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109450860A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109992969A (en) * | 2019-03-25 | 2019-07-09 | 腾讯科技(深圳)有限公司 | A kind of malicious file detection method, device and detection platform |
CN111224946A (en) * | 2019-11-26 | 2020-06-02 | 杭州安恒信息技术股份有限公司 | TLS encrypted malicious traffic detection method and device based on supervised learning |
CN111404941A (en) * | 2020-03-17 | 2020-07-10 | 广东九联科技股份有限公司 | Network security protection method and network security protection device |
CN112055007A (en) * | 2020-08-28 | 2020-12-08 | 东南大学 | Software and hardware combined threat situation perception method based on programmable nodes |
CN112968872A (en) * | 2021-01-29 | 2021-06-15 | 成都信息工程大学 | Malicious flow detection method, system and terminal based on natural language processing |
CN114090967A (en) * | 2021-10-25 | 2022-02-25 | 广州大学 | APT (android package) organization tracing and tracing method and system based on PSO-MSVM (Power System-Mobile virtual machine) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070150954A1 (en) * | 2005-12-27 | 2007-06-28 | Tae-Shik Shon | System and method for detecting network intrusion |
CN101610516A (en) * | 2009-08-04 | 2009-12-23 | 华为技术有限公司 | Intrusion detection method in the self-organizing network and equipment |
CN105930723A (en) * | 2016-04-20 | 2016-09-07 | 福州大学 | Intrusion detection method based on feature selection |
CN107392015A (en) * | 2017-07-06 | 2017-11-24 | 长沙学院 | A kind of intrusion detection method based on semi-supervised learning |
-
2018
- 2018-10-16 CN CN201811200227.5A patent/CN109450860A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070150954A1 (en) * | 2005-12-27 | 2007-06-28 | Tae-Shik Shon | System and method for detecting network intrusion |
CN101610516A (en) * | 2009-08-04 | 2009-12-23 | 华为技术有限公司 | Intrusion detection method in the self-organizing network and equipment |
CN105930723A (en) * | 2016-04-20 | 2016-09-07 | 福州大学 | Intrusion detection method based on feature selection |
CN107392015A (en) * | 2017-07-06 | 2017-11-24 | 长沙学院 | A kind of intrusion detection method based on semi-supervised learning |
Non-Patent Citations (2)
Title |
---|
康晓丽: ""基于信息熵与改进SVM的异常流量检测研究"", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
朱文杰等: ""基于信息熵的SVM入侵检测技术"", 《计算机工程与科学》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109992969A (en) * | 2019-03-25 | 2019-07-09 | 腾讯科技(深圳)有限公司 | A kind of malicious file detection method, device and detection platform |
CN109992969B (en) * | 2019-03-25 | 2023-03-21 | 腾讯科技(深圳)有限公司 | Malicious file detection method and device and detection platform |
CN111224946A (en) * | 2019-11-26 | 2020-06-02 | 杭州安恒信息技术股份有限公司 | TLS encrypted malicious traffic detection method and device based on supervised learning |
CN111404941A (en) * | 2020-03-17 | 2020-07-10 | 广东九联科技股份有限公司 | Network security protection method and network security protection device |
CN112055007A (en) * | 2020-08-28 | 2020-12-08 | 东南大学 | Software and hardware combined threat situation perception method based on programmable nodes |
CN112968872A (en) * | 2021-01-29 | 2021-06-15 | 成都信息工程大学 | Malicious flow detection method, system and terminal based on natural language processing |
CN114090967A (en) * | 2021-10-25 | 2022-02-25 | 广州大学 | APT (android package) organization tracing and tracing method and system based on PSO-MSVM (Power System-Mobile virtual machine) |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109450860A (en) | A kind of detection method threatened based on entropy and the advanced duration of support vector machines | |
Meidan et al. | ProfilIoT: A machine learning approach for IoT device identification based on network traffic analysis | |
CN103581186B (en) | A kind of network security situational awareness method and system | |
CN109657470A (en) | Malicious web pages detection model training method, malicious web pages detection method and system | |
CN111523588B (en) | Method for classifying APT attack malicious software traffic based on improved LSTM | |
CN114338195B (en) | Web flow anomaly detection method and device based on improved isolated forest algorithm | |
CN108629183A (en) | Multi-model malicious code detecting method based on Credibility probability section | |
CN108768883A (en) | A kind of network flow identification method and device | |
JP2008176753A (en) | Data similarity inspection method and device | |
CN106657160A (en) | Reliability-based network malicious behavior detection method for large flow | |
Silveira et al. | Smart detection-IoT: A DDoS sensor system for Internet of Things | |
CN109040113A (en) | Detecting method of distributed denial of service attacking and device based on Multiple Kernel Learning | |
CN107368592A (en) | A kind of text feature model modeling method and device for network security report | |
Muhati et al. | Hidden-Markov-model-enabled prediction and visualization of cyber agility in IoT era | |
CN109450876A (en) | A kind of DDos recognition methods and system based on various dimensions state-transition matrix feature | |
Zulhilmi et al. | A comparison of three machine learning algorithms in the classification of network intrusion | |
CN110472410B (en) | Method and device for identifying data and data processing method | |
CN107832611B (en) | Zombie program detection and classification method combining dynamic and static characteristics | |
CN112291506B (en) | Method and system for tracing security vulnerability of streaming data in video conference scene | |
CN112953948A (en) | Real-time network transverse worm attack flow detection method and device | |
CN111224919B (en) | DDOS (distributed denial of service) identification method and device, electronic equipment and medium | |
CN113542310B (en) | Network scanning detection method and device and computer storage medium | |
CN108768774A (en) | A kind of network safety evaluation method and assessment system of quantification | |
EP3964986B1 (en) | Extraction device, extraction method, and extraction program | |
CN110197066B (en) | Virtual machine monitoring method and system in cloud computing environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190308 |