CN109450860A - A kind of detection method threatened based on entropy and the advanced duration of support vector machines - Google Patents

A kind of detection method threatened based on entropy and the advanced duration of support vector machines Download PDF

Info

Publication number
CN109450860A
CN109450860A CN201811200227.5A CN201811200227A CN109450860A CN 109450860 A CN109450860 A CN 109450860A CN 201811200227 A CN201811200227 A CN 201811200227A CN 109450860 A CN109450860 A CN 109450860A
Authority
CN
China
Prior art keywords
data
entropy
flow
support vector
vector machines
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811200227.5A
Other languages
Chinese (zh)
Inventor
谭佳雨
王箭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN201811200227.5A priority Critical patent/CN109450860A/en
Publication of CN109450860A publication Critical patent/CN109450860A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Signal Processing (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of detection methods threatened based on entropy and the advanced duration of support vector machines, are related to field of information security technology.The present invention is after arranging original flow data packet, pass through data statistics, it calculates the entropy of flow information in local area network and then extracts new feature vector, reuse support vector machines training and establish machine learning model, be finally reached and detect the purpose that advanced duration threatens.It is an advantage of the invention that mass data is divided into the smaller flow section of the order of magnitude, and then can soon carry out classification based training and detection by being segmented continuous data on flows.Meanwhile will test specific malicious traffic stream and being converted into the flow section that detection includes malicious traffic stream, the processing of greatly simplified data and reduce rate of false alarm.

Description

A kind of detection method threatened based on entropy and the advanced duration of support vector machines
Technical field
The present invention relates to field of information security technology more particularly to a kind of methods for detecting advanced duration and threatening.
Background technique
In the past more than ten years, Global Internet high speed development.Meanwhile network attack also emerges one after another, in attack It is all evolving in quantity and means.2010, notorious network attack shake net (Stuxnet) was by security study personnel It was found that and disclosing a large amount of attack details.Henceforth, advanced duration threatens (Advanced Persistent Threats, APT) constantly appear in the visual field of people.APT is a kind of long-term, the net customized for specific objective height Network attack means.Due to its attack feature multiplicity, attack pattern emerges one after another, therefore traditional inspection based on pattern match Survey mode is difficult to play effective effect.
In recent years, there are many work about APT detection.It is attacked for APT, many security firms and research people Member gives a variety of solutions.The technologies such as " traditional characteristic matching ", " black and white lists " all can not effectively find that unknown APT is attacked It hits.Therefore, Xu Wang et al. is proposed after having studied many APT attack cases, is detected for the stage of communication in APT attack With the presence or absence of the flow of order and controlling behavior and its generation, scheme is found between the destination address of order controlling behavior access The relevance address information in normal discharge that compares is very different place, and scheme calculates each access by detection The relevance of location is to find suspicious flow.But the deficiency of the method is that detection time is too long, can not find in time and Reduce APT attack bring loss.In addition, Sana Siddiqui proposes the classification using the machine learning based on fractal dimension Method, by study it has been found that the feature for the attack traffic that Malware used in APT attack case generates establishes inspection Model is surveyed, and by comparing to obtain more outstanding classification performance with common kNN algorithm.Although this scheme can detect portion Divide malicious traffic stream, but the rate of false alarm generated is very high, and very high calculating energy is needed for the processing of the data on flows of magnanimity Power, therefore its detectability is greatly restricted.F à tima Barcel ó-Rico et al. lays particular emphasis on detection HTTP request Behavioural characteristic, with semi-supervised learning model Genetic Programming (GP), two Decision Tree Classifier (DTC) and Support Vector Machine (SVM) are tested, and compared the different machine of these types The superiority and inferiority of learning algorithm.As a result, it has been found that SVM and DTC can only only have the case where a small amount of known suspicious example in mass data Still it is able to maintain preferable performance.
Summary of the invention
Goal of the invention: in view of the deficiencies of the prior art, it is an object of the present invention to provide a kind of inspections that advanced duration threatens Survey method.This method quickly accurately can identify APT attack, while the feature pole that this method is new by extraction by mass data Data volume to be processed needed for the earth reduces.
Technical solution: to achieve the goals above, the present invention proposes a kind of advanced lasting based on entropy and support vector machines Property threaten detection method, include the following steps:
A kind of detection method threatened based on entropy and the advanced duration of support vector machines, is included the following steps:
A) data on flows is recorded on heart interchanger in a local network, acquires flow information, flow information includes but is not limited to The source of data packet, destination address, source, destination port, byte number, timestamp;
B) data for saving acquisition save several features of data flow by network package analysis-reduction at data flow Information;
C) time period t is set, step b) the data obtained is divided into the sample that n time interval is t, is calculated separately The entropy of several characteristic informations of each sample forms new data characteristics vector;
D) using the data obtained feature vector in step c) as input, train foundation that can identify band by machine learning There is the model of abnormal flow, until the evaluation index of training pattern reaches specified threshold;
E) classified using model obtained by step d) to the flow in any interval of time t, to judge this section With the presence or absence of abnormal data flow in flow.
Further, in step b, the characteristic information includes transmission byte number, transmits data packet number and data flow Duration.
Further, in step b, the data that the acquisition saves include normal discharge data sample and abnormal flow data Sample, the data on flows that wherein normal discharge data sample is collected in step a, abnormal flow data sample is from open Mila Parkour contribution Contagio malware data library.
Further, in step c, the time period t takes 10 seconds in the network of 100Mbps, and the equipment for handling data is matched It is set to 4 core 2.5GHz CPU, 8GB RAM of Intel and 2.5TB hard disk.
Further, in step c, when the entropy of calculating data flow characteristics information forms new feature vector, using Shannon Discrete source comentropy algorithm:
Wherein, H (X) indicates the discrete source entropy of data flow characteristics information X, P (xi) indicate that characteristic information X takes xiWhen, at this The probability in certain value interval is appeared in sample, b is the logarithm truth of a matter, and m indicates the number of samples of characteristic information X in the sample.
Further, in step d, machine learning model uses support vector machines:
Wherein, { -1 ,+1 } f (t) ∈, the transposition of () ' representing matrix, sign (ζ) indicate the symbol of number ζ, and t is the number According to the element after feature vector Plays, i.e. test sample, α is Ah's lagrange's variable, xs(s=1 ..., | S |) it is branch Vector is held, S indicates training sample data set, and p represents polynomial number, and k is sorting parameter obtained in training process;
Kernel function in support vector machines uses gaussian kernel function, for two sample number strong point xiWith observation data point xj, Its target value is y, obtains gaussian kernel function:
k(xi,xj)=e-y‖xi-xj2
Further, in step d, the evaluation index includes: accuracy rate, and rate of precision returns and calls rate and f1score together.
Further, in step d, the specified threshold is set as 90% or more.
The utility model has the advantages that
The present invention is divided into the smaller flow section of the order of magnitude by the way that continuous data on flows to be segmented, by mass data, and Calculating the new feature vector of extraction by statistics makes the data volume for being eventually used to training machine learning model further be contracted It is small, and then can soon carry out classification based training and detection.Meanwhile it will test specific malicious traffic stream and being converted into detection comprising disliking The flow section of meaning flow, greatly simplifies the processing of data and reduces rate of false alarm.
Detailed description of the invention
Fig. 1 is the overview flow chart of the method for the present invention;
Fig. 2 is data flow duration probability distribution illustrated example.
Specific embodiment:
With reference to the accompanying drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that is retouched The embodiment stated is only a part of the embodiment of the present invention, instead of all the embodiments.Below at least one exemplary reality Apply the description only actually of example be it is illustrative, never as to the present invention and its application or any restrictions used.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
The present invention is based on the detection methods that entropy and the advanced duration of support vector machines threaten.Compiling flow packet number According to later, by data statistics, calculates the entropy of flow information in local area network and then extract new feature vector, reuse support Machine learning model is established in vector machine training, is finally reached and is detected the purpose that advanced duration threatens.Specific implementation step is as follows, And detailed process is shown in attached drawing 1.
Step S1: data on flows is acquired from center switch, crucial flow information is extracted, binary data is flowed into Row reduction, with PCAP (Process Characterization Analysis Package, process characteristic analysis software package) text The form of part is saved TCP packet, UDP packet.Wherein flow information includes but is not limited to the source of data packet, destination address, source, Destination port, byte number, timestamp.The untreated original traffic packet data of acquisition is as shown in table 1,
The original traffic packet data of table 1
Step S2: by using network package analysis tool Wireshark, flow data packet is arranged.First use Filter remove length be 0 and retransmitted packet, specific filter condition be "!tcp.analysis.retransmission and tcp.len>0".After obtaining pretreated data, using command-line tool tshark according to the serial number and timestamp of data packet Rearrangement, composition data stream, and preservation is exported into txt file.It is as shown in table 2 derived data instance,
Data instance derived from table 2
Address:port A Address:port B Packets Bytes Duration
172.18.125.127:50504 42.202.151.230:80 14989 11543526 28.7467
172.18.19.8:7037 58.221.74.144:80 14821 16151885 21.8665
172.26.28.188:26528 182.247.250.19:80 12694 9610185 20.8633
172.21.61.237:57685 172.106.33.219:10019 6030 4645450 23.8794
Step S3: one time period t of setting, using the one piece of data stream in the t period as a sample, so present Some datas on flows are divided into n sections, just obtain n sample.It as shown in Fig. 2, is the probability of duration in one of sample Distribution map.Horizontal axis is the data flow duration, and the longitudinal axis is the duration of data flow to fall probability within a certain period of time.Then Pass through shannon entropy formula
Wherein, H (X) indicates the discrete source entropy of data flow characteristics information X, P (xi) indicate that characteristic information X takes xiWhen, at this The probability in certain value interval is appeared in sample, b is the logarithm truth of a matter, and b=2, m is usually taken to indicate characteristic information X in the sample Number of samples, calculate the entropy of each sample, saved as record.Calculative feature includes the byte of sample Number, the entropy of number-of-packet and duration, and then the entropy of these three features become the feature vector of input learner.Table 3 The entropy of each feature obtained is finally computed for wherein 4 samples and for the mark of classification.
The entropy of each feature of table 3 and mark for classification
Ep_Packets Ep_Bytes Ep_Duration flag
1.62997396765 1.13345078829 7.70188552486 0
1.57602566801 1.26045873852 7.59871075104 0
1.14233680811 0.910838676236 4.64659436002 1
1.13322375711 0.859611891072 1.74121380584 1
Step S4: using the result in step S3 as input, machine learning is carried out as learner using support vector machines Training, establishes disaggregated model, wherein the kernel function in support vector machines uses gaussian kernel function.It constantly changes to learner Generation training, until indices are more than 90%, these indexs include: accuracy rate, and rate of precision returns and calls rate and f1score together.Wherein prop up Hold vector machine:
Accuracy rate (Accuracy):
Rate of precision (Precision):
Recall rate (Recall):
F1 score:
Wherein, TP, TN, FP, FN are real example, true counter-example, false positive example, the number of false counter-example respectively.Accuracy rate can be commented Estimate correct ratio in the result of prediction, rate of precision is capable of the ratio of positive example in the result of assessment prediction, and recall rate can be assessed The really ratio that the positive example in situation is found out.F1 score is the synthesis of rate of precision and recall rate, and f1 is higher to illustrate model more Steadily and surely.
Step S5: the subsequent data on flows for carrying out same treatment can be judged after completing model training.
More than, embodiments of the present invention are illustrated by way of example, but the scope of the present invention is not limited to above-mentioned example, In range recorded in claim, it can be changed, be deformed according to purpose.

Claims (8)

1. a kind of detection method threatened based on entropy and the advanced duration of support vector machines, which is characterized in that including walking as follows It is rapid:
A) data on flows is recorded on heart interchanger in a local network, acquires flow information, flow information includes but is not limited to data The source of packet, destination address, source, destination port, byte number, timestamp;
B) data for saving acquisition save several features letter of data flow by network package analysis-reduction at data flow Breath;
C) time period t is set, step b) the data obtained is divided into the sample that n time interval is t, is calculated separately each The entropy of several characteristic informations of sample forms new data characteristics vector;
D) using the data obtained feature vector in step c) as input, train foundation that can identify with different by machine learning The model of normal flow, until the evaluation index of training pattern reaches specified threshold;
E) classified using model obtained by step d) to the flow in any interval of time t, to judge this section of flow In with the presence or absence of abnormal data flow.
2. a kind of detection method threatened based on entropy and the advanced duration of support vector machines according to claim 1, It is characterized in that, in step b, the characteristic information includes transmission byte number, transmits the duration of data packet number and data flow.
3. a kind of detection method threatened based on entropy and the advanced duration of support vector machines according to claim 1, It being characterized in that, in step b, the data that the acquisition saves include normal discharge data sample and abnormal flow data sample, The data on flows that middle normal discharge data sample is collected in step a, abnormal flow data sample come from disclosed Mila The Contagio malware data library of Parkour contribution.
4. a kind of detection method threatened based on entropy and the advanced duration of support vector machines according to claim 1, It is characterized in that, in step c, the time period t takes 10 seconds in the network of 100Mbps, and the device configuration for handling data is Intel 4 core 2.5GHz CPU, 8GB RAM and 2.5TB hard disk.
5. a kind of detection method threatened based on entropy and the advanced duration of support vector machines according to claim 1, It is characterized in that, in step c, when the entropy of calculating data flow characteristics information forms new feature vector, using the discrete letter of Shannon Source information entropy algorithm:
Wherein, H (X) indicates the discrete source entropy of data flow characteristics information X, P (xi) indicate that characteristic information X takes xiWhen, in the sample In appear in probability in certain value interval, b is the logarithm truth of a matter, and m indicates the number of samples of characteristic information X in the sample.
6. a kind of detection method threatened based on entropy and the advanced duration of support vector machines according to claim 1, It is characterized in that, in step d, machine learning model uses support vector machines:
Wherein, { -1 ,+1 } f (t) ∈, the transposition of () ' representing matrix, sign (ζ) indicate that the symbol of number ζ, t are that the data are special Element after levying vector Plays, i.e., test sample, α are Ah's lagrange's variable, xs(s=1 ..., | S |) be support to Amount, S indicate training sample data set, and p represents polynomial number, and k is sorting parameter obtained in training process;
Kernel function in support vector machines uses gaussian kernel function, for two sample number strong point xiWith observation data point xj, mesh Scale value is y, obtains gaussian kernel function:
k(xi,xj)=e-y‖xi-xj2
7. a kind of detection method threatened based on entropy and the advanced duration of support vector machines according to claim 1, It is characterized in that, in step d, the evaluation index includes: accuracy rate, and rate of precision returns and calls rate and f1 score together.
8. a kind of detection method threatened based on entropy and the advanced duration of support vector machines according to claim 7, It is characterized in that, in step d, the specified threshold is set as 90% or more.
CN201811200227.5A 2018-10-16 2018-10-16 A kind of detection method threatened based on entropy and the advanced duration of support vector machines Pending CN109450860A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811200227.5A CN109450860A (en) 2018-10-16 2018-10-16 A kind of detection method threatened based on entropy and the advanced duration of support vector machines

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811200227.5A CN109450860A (en) 2018-10-16 2018-10-16 A kind of detection method threatened based on entropy and the advanced duration of support vector machines

Publications (1)

Publication Number Publication Date
CN109450860A true CN109450860A (en) 2019-03-08

Family

ID=65545105

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811200227.5A Pending CN109450860A (en) 2018-10-16 2018-10-16 A kind of detection method threatened based on entropy and the advanced duration of support vector machines

Country Status (1)

Country Link
CN (1) CN109450860A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109992969A (en) * 2019-03-25 2019-07-09 腾讯科技(深圳)有限公司 A kind of malicious file detection method, device and detection platform
CN111224946A (en) * 2019-11-26 2020-06-02 杭州安恒信息技术股份有限公司 TLS encrypted malicious traffic detection method and device based on supervised learning
CN111404941A (en) * 2020-03-17 2020-07-10 广东九联科技股份有限公司 Network security protection method and network security protection device
CN112055007A (en) * 2020-08-28 2020-12-08 东南大学 Software and hardware combined threat situation perception method based on programmable nodes
CN112968872A (en) * 2021-01-29 2021-06-15 成都信息工程大学 Malicious flow detection method, system and terminal based on natural language processing
CN114090967A (en) * 2021-10-25 2022-02-25 广州大学 APT (android package) organization tracing and tracing method and system based on PSO-MSVM (Power System-Mobile virtual machine)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070150954A1 (en) * 2005-12-27 2007-06-28 Tae-Shik Shon System and method for detecting network intrusion
CN101610516A (en) * 2009-08-04 2009-12-23 华为技术有限公司 Intrusion detection method in the self-organizing network and equipment
CN105930723A (en) * 2016-04-20 2016-09-07 福州大学 Intrusion detection method based on feature selection
CN107392015A (en) * 2017-07-06 2017-11-24 长沙学院 A kind of intrusion detection method based on semi-supervised learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070150954A1 (en) * 2005-12-27 2007-06-28 Tae-Shik Shon System and method for detecting network intrusion
CN101610516A (en) * 2009-08-04 2009-12-23 华为技术有限公司 Intrusion detection method in the self-organizing network and equipment
CN105930723A (en) * 2016-04-20 2016-09-07 福州大学 Intrusion detection method based on feature selection
CN107392015A (en) * 2017-07-06 2017-11-24 长沙学院 A kind of intrusion detection method based on semi-supervised learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
康晓丽: ""基于信息熵与改进SVM的异常流量检测研究"", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
朱文杰等: ""基于信息熵的SVM入侵检测技术"", 《计算机工程与科学》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109992969A (en) * 2019-03-25 2019-07-09 腾讯科技(深圳)有限公司 A kind of malicious file detection method, device and detection platform
CN109992969B (en) * 2019-03-25 2023-03-21 腾讯科技(深圳)有限公司 Malicious file detection method and device and detection platform
CN111224946A (en) * 2019-11-26 2020-06-02 杭州安恒信息技术股份有限公司 TLS encrypted malicious traffic detection method and device based on supervised learning
CN111404941A (en) * 2020-03-17 2020-07-10 广东九联科技股份有限公司 Network security protection method and network security protection device
CN112055007A (en) * 2020-08-28 2020-12-08 东南大学 Software and hardware combined threat situation perception method based on programmable nodes
CN112968872A (en) * 2021-01-29 2021-06-15 成都信息工程大学 Malicious flow detection method, system and terminal based on natural language processing
CN114090967A (en) * 2021-10-25 2022-02-25 广州大学 APT (android package) organization tracing and tracing method and system based on PSO-MSVM (Power System-Mobile virtual machine)

Similar Documents

Publication Publication Date Title
CN109450860A (en) A kind of detection method threatened based on entropy and the advanced duration of support vector machines
Meidan et al. ProfilIoT: A machine learning approach for IoT device identification based on network traffic analysis
CN103581186B (en) A kind of network security situational awareness method and system
CN109657470A (en) Malicious web pages detection model training method, malicious web pages detection method and system
CN111523588B (en) Method for classifying APT attack malicious software traffic based on improved LSTM
CN114338195B (en) Web flow anomaly detection method and device based on improved isolated forest algorithm
CN108629183A (en) Multi-model malicious code detecting method based on Credibility probability section
CN108768883A (en) A kind of network flow identification method and device
JP2008176753A (en) Data similarity inspection method and device
CN106657160A (en) Reliability-based network malicious behavior detection method for large flow
Silveira et al. Smart detection-IoT: A DDoS sensor system for Internet of Things
CN109040113A (en) Detecting method of distributed denial of service attacking and device based on Multiple Kernel Learning
CN107368592A (en) A kind of text feature model modeling method and device for network security report
Muhati et al. Hidden-Markov-model-enabled prediction and visualization of cyber agility in IoT era
CN109450876A (en) A kind of DDos recognition methods and system based on various dimensions state-transition matrix feature
Zulhilmi et al. A comparison of three machine learning algorithms in the classification of network intrusion
CN110472410B (en) Method and device for identifying data and data processing method
CN107832611B (en) Zombie program detection and classification method combining dynamic and static characteristics
CN112291506B (en) Method and system for tracing security vulnerability of streaming data in video conference scene
CN112953948A (en) Real-time network transverse worm attack flow detection method and device
CN111224919B (en) DDOS (distributed denial of service) identification method and device, electronic equipment and medium
CN113542310B (en) Network scanning detection method and device and computer storage medium
CN108768774A (en) A kind of network safety evaluation method and assessment system of quantification
EP3964986B1 (en) Extraction device, extraction method, and extraction program
CN110197066B (en) Virtual machine monitoring method and system in cloud computing environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190308