CN105871619A - Method for n-gram-based multi-feature flow load type detection - Google Patents
Method for n-gram-based multi-feature flow load type detection Download PDFInfo
- Publication number
- CN105871619A CN105871619A CN201610240406.6A CN201610240406A CN105871619A CN 105871619 A CN105871619 A CN 105871619A CN 201610240406 A CN201610240406 A CN 201610240406A CN 105871619 A CN105871619 A CN 105871619A
- Authority
- CN
- China
- Prior art keywords
- payload
- network
- substring
- len
- packet
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000001514 detection method Methods 0.000 title claims abstract description 9
- 238000012549 training Methods 0.000 claims abstract description 10
- 230000011218 segmentation Effects 0.000 claims description 6
- 230000005484 gravity Effects 0.000 claims description 4
- 238000003066 decision tree Methods 0.000 claims description 3
- 238000002372 labelling Methods 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims description 3
- 238000013145 classification model Methods 0.000 abstract 2
- 238000005192 partition Methods 0.000 abstract 1
- 230000005540 biological transmission Effects 0.000 description 6
- 230000003542 behavioural effect Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 239000000284 extract Substances 0.000 description 4
- 238000000605 extraction Methods 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 4
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000000977 initiatory effect Effects 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 241000894007 species Species 0.000 description 2
- 235000006508 Nelumbo nucifera Nutrition 0.000 description 1
- 240000002853 Nelumbo nucifera Species 0.000 description 1
- 235000006510 Nelumbo pentapetala Nutrition 0.000 description 1
- 108091081062 Repeated sequence (DNA) Proteins 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 229910002056 binary alloy Inorganic materials 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 230000009545 invasion Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000002203 pretreatment Methods 0.000 description 1
- 238000004445 quantitative analysis Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/16—Threshold monitoring
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/24—Traffic characterised by specific attributes, e.g. priority or QoS
- H04L47/2441—Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Environmental & Geological Engineering (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a method for n-gram-based multi-feature flow load type detection. The method comprises a step 1) of reading a data packet of sample network flow, and marking the sample network flow which the data packet belongs to according to a quintuple of the data packet; calculating a Hash value of the sample network flow quintuple as a key; if the item does not exist in a Hash table, taking the Hash value as a new key and distributing a structural body as a key value, and inserting the structural body into the Hash table; if the key exists, obtaining a corresponding structural body in the Hash table, and storing load data of the data packet to the structural body; a step 2) of performing n-gram substring partition on load data in each structural body, and generating a characteristic vector of the sample network flow; a step 3) of training and obtaining a classification model according to the characteristic vector; a step 4) of generating, for to-be-classified network flow, a characteristic vector of the network flow, and utilizing the classification model to judge the type of the network flow. According to the method, the detection efficiency is greatly improved.
Description
Technical field
The invention belongs to network traffic information security fields, relate to a kind of method that flow is detected by its load type, energy
Enough it is applied to improve network service quality, optimize the aspect such as network bandwidth allocation, Strengthens network safety management.
Background technology
Along with the universal of the Internet and the fast development of network technology, network traffics present explosive growth, the most efficiently network
Bandwidth plan, network invasion monitoring and defence and charge on traffic, be current network service provider and network manager is faced
Challenge.Flow can be classified by net flow assorted technology according to network application type or protocol type, can be above-mentioned urgently
The problem that need to solve provides important technical support.
Current existing net flow assorted technology, mainly by net flow assorted to concrete network application and procotol,
Method has 4 kinds: traffic classification method based on port, traffic classification method based on payload, Intrusion Detection based on host behavioral pattern
Traffic classification method and traffic classification method based on machine learning.
(1) traffic classification method based on port
Can be used in high speed real-time traffic classification, but due to random port and the abuse of port camouflage, the method by
Gradually lost efficacy.
(2) traffic classification method based on payload
Practical Project is most widely used, mainly has benefited from class character string fingerprint matching algorithm fast and accurately.But due to
Therefore the extraction of application protocol fingerprint can only cannot process encryption flow for known non-encrypted network application or other is unknown
The flow of network application.
(3) the traffic classification method of Intrusion Detection based on host behavioral pattern
The method has the strongest theory significance, is independent of protocol characteristic, it is not necessary to resolve packet, is handed over by main frame
Mutual behavioural information identifies network traffics, is difficult to ensure that requirement of real-time, and the knowledge that cannot become more meticulous owing to model is complex
Do not go out application type.
(4) traffic classification method based on machine learning
Assume for given network application, behavioral statistics feature (stream interval, single packet byte length, the adjacent bag time interval of stream
Deng) there is uniqueness, can be based on this type of tagsort heterogeneous networks application traffic.But shortcoming is to be difficult to find out effective feature pair
The flow that network application or agreement produce is classified accurately, and categorizing process consuming resource is more, is applied at thread environment
There is certain difficulty.
Quantitative analysis and the problem of identification is flowed with encrypting, it is proposed that convection current amount load type is identified and divides for unknown applications flow
Class, method mainly has following 3 kinds:
(1) based on the assumption that inspection sorting technique
The method is primarily directed to encrypt the identification of flow, utilizes the randomness feature of encryption data, real one by one to network message
Execute accumulation and inspection, according to message length, result is weighted comprehensively, it is not necessary to decryption oprerations, without coupling certain content,
Achieve the pervasive identification to encryption flow. it is dynamically adapted the amount detection of message, to reach the unification of time delay and accuracy rate.Lack
Point is the identification that cannot be applied to other load type, and easily misidentifies the network traffics of compression type.
(2) sorting technique based on Flow Behavior feature
Owing to specific encryption protocol is at connection establishment stage mutual message, content is similar, and form is fixed, and often has specific
Traffic characteristic, such as message length, message time of advent etc..Utilize these particular flow rate features, by means of the method for machine learning
The identification to specific encryption protocol can be realized.But length and the time of advent of message generally will not be substantially changed due to cryptographic operation,
This makes major part business datum have identical traffic characteristic when using plaintext transmission and use encrypted transmission.Some algorithms are claimed
It is capable of identify that encryption flow, is substantially the business identifying encrypted transmission.Such as P2P software is through plaintext transmission and encrypted biography
During transmission of data, its traffic characteristic is identical.Regardless of whether encrypted transmission, these algorithms can identify that it is P2P business.But,
Whether these its service traffics of algorithm None-identified encrypt.
(3) sorting technique based on statistical natures such as load entropy
Have Many researchers at present and entropy feature be applied to the classification of load type, and combine the methods such as machine learning SVM,
Load type is categorized as the types such as text, encryption, binary system, but conventional this kind of method have employed the most single statistics spy
Levy and portray different load type, and have ignored " long tail effect " of substring frequency distribution, cause overall classification average accurately
Rate is only about 86%, and the accurate of particular category is even below 80%, it is difficult to meet actual demand.
Summary of the invention
The problem existed for above-mentioned existing method, the invention discloses a kind of flow load type based on n-gram multiple features (literary composition
Basis, audio frequency, video, picture, executable file, compress, encryption etc.) detection method.
First the part definition that present invention relates to is given:
(1) definition 1:n-gram continuous substring set refers to split, with the sliding window of a length of n, the substring collection that former string obtains
Closing, former string here refers to payload content;
Such as, former string is " abbcccdefg ", during n=2, and sliding window such as Fig. 1, obtain 2-gram substring set:
S2={ ab, bb, bc, cc, cc, cd, de, ef, fg}
(2) definition 2: high frequency continuous substring set refers to substring set duplicate removal continuous to n-gram, and adds up each substring
Frequency, is exceeded the set that the substring of threshold value k is constituted by frequency;
Such as, when n=2, k=1, high frequency continuous substring set, the 2-gram continuous substring collection i.e. exceeding threshold value 1 is combined into:
S'2,1={ ab, bb, bc, cc, cd, de, ef, fg}
(3) definition 3: consecutive identical character substring set refers to by shape such as " bb ", and " ccc " so only comprises a kind of character
The set that substring is constituted continuously;
Concrete steps of the present invention include:
(1) initiation parameter: Payload structure sets to 0, Payload structure includes having cached map network stream respectively
The load data payload_buff received, load data length payload_len and the most treated received in payload_buff
Number-of-packet pkt_num;Payload_ft structure sets to 0, for preserving the load characteristic extracting every network flow;The side of setting
Method global parameter max_payload_len, represents the greatest length receiving every stream loading data;Min_payload_len is set,
Represent the minimum data length for extracting load characteristic;Head_len is set, represents the length that data pack protocol head is estimated;If
Put max_packet_num, represent the maximum bag number for gathering load characteristic;Maximum high frequency substring frequency threshold k is set, if
Put maximum n-gram maximized window length threshold N;Train_flag is initially set to true, represents the training rank being introduced into disaggregated model
Section, completes rearmounted for false, entrance online classification stage at model training;
Payload structure is:
Payload_ft structure:
(2) arranging train_flag is true, enters the training stage of model, the network of samples flow of input known load type;
(3) read data packet carry out stream gravity group: read the packet in network traffics, by five-tuple (source IP, purpose IP,
Source port, destination interface, TCP/UDP) network flow belonging to this packet of labelling, calculate the cryptographic Hash of this network flow five-tuple,
As key Key, search in Hash table, if there is no this, then using this cryptographic Hash as new key Key, for this network
Stream distribution one new Payload structure, as key assignments Value, inserts in Hash table;If Hash table exists this, then enter
Row step 4);
(4) according to the calculated cryptographic Hash of five-tuple of packet as key Key, Hash table obtains the Payload of correspondence
Structure, after skipping packet starting position head_len length, is saved into load data afterwards in Payload, and will
This stream reduced data bag number pkt_num adds 1, until it reaches max_payload_len length, proceeds to step (5);If worked as
The number-of-packet of pre-treatment is more than max_packet_num, and payload_len then proceeds to step not less than min_payload_len
(5);If currently processed packet is last packet of this stream, and payload_len is not less than min_payload_len,
Then proceed to step (5), if the payload_len after this network flow is disposed is less than min_payload_len, the most not to this stream
Carry out subsequent characteristics extraction, and this stream is removed Hash table;Continue executing with step (3);
(5) if train_flag is true, then perform step (6), otherwise perform step (9);
(6) load data of convection current carries out n-gram substring segmentation, takes between [1, N] each different value n as sliding window size,
Splitting former load data, obtain n-gram continuous substring set and shape such as " bb ", " ccc " so only comprises the continuous of a kind of character
Identical characters substring set;
(7) every frequency in statistics n-gram continuous substring set, takes between [1, K] each different value k as frequency threshold value,
Filter n-gram continuous substring set, obtain high frequency continuous substring set;
(8) the high frequency continuous substring set obtained in step (6) in consecutive identical character substring set and step (7) is extracted
Following statistical nature, proceeds to step (10) after completing:
(8.1) statistical nature of high frequency continuous substring set is extracted: frequency exceedes the different Element Species number m of threshold value kn,k, element
Maximum frequency mfn,k, average meann,k, variance dn,k, comentropy hn,k;
(8.2) statistical nature of consecutive identical character substring set is extracted: quantity sc_num of consecutive identical character substring, continuously
The kind number sc_diff_num of identical characters substring, length sc_max_len of maximum consecutive identical character substring, consecutive identical
Average length sc_mean_len of character substring;
(9) according to characteristic of division collection in step (11), convection current load data carries out n-gram substring segmentation, and extracts corresponding special
Levy, construct the characteristic vector of every stream;Proceed to step (10);
(10) other features in addition to comentropy are all taken the logarithm process, such as Variance feature d1,1After logarithm process it is
log(d1,1), step (8) each mark sheet is shown as the characteristic vector of this stream;
(11) if train_flag is true, and packet not yet runs through, then proceed to step (3), if packet runs through
Then identify the load type of each stream characteristic vector, and utilize the method for X 2 test and the information gain spy to all network of samples streams
Levying vector and calculate score sequence side by side, by order from front to back, before selection comes two kinds of methods successively, the feature of 10 is as classification spy
Collection (selecting 10 features altogether), as the characteristic of division collection of map network stream, proceeds to step (12);If train_flag is false
Then proceed to step (13);
(12) utilize C4.5 decision tree as disaggregated model, construct training sample with the characteristic of division collection in step (11),
To C4.5 disaggregated model;To the classifying rules in C4.5 disaggregated model, it is converted into IF-ELSE rule;Proceed to step (14);
(13) according to the IF-ELSE rule in step (12), it is judged that the corresponding load type of characteristic vector, network is exported
The load type of stream, proceeds to step (3);
(14) arranging train_flag is false, and input needs the network traffics of classification, proceeds to step (3).
Compared with published method, present invention have the advantage that
(1) having only to the load characteristic information of some bags before network flow of extracting, classification speed is very fast and has only to use every stream few
The payload content of amount, committed memory is less;
(2) support multiple load type is classified, including text, audio frequency, video, picture, executable file, compression,
Encryption etc.;
(3) payload content is carried out n-gram segmentation, and the high frequency substring collection after utilizing threshold value screening closes and extracts effectively spy
Levy, compare existing method and there is higher classification accuracy and recall rate;
(4) there is parameter flexibly arrange, the extraction length of payload content of feature, data packet header length, load can be set
Lotus minimum lengths etc., make balance between performance and classifying quality;
(5) characteristic set used can be adjusted, it is thus achieved that preferably classify according to given load type and data with existing collection
Effect.
Accompanying drawing explanation
Fig. 1 is the sliding window of a length of 2;
Fig. 2 is that load characteristic based on n-gram extracts schematic diagram;
Fig. 3 is traffic classification flow chart based on n-gram multiple features.
Detailed description of the invention
Below, the present invention is described in detail in conjunction with specific embodiments.Fig. 2 is load data to be carried out n-gram feature carry
The schematic diagram taken, corresponding step (6) is to step (10), and Fig. 3 is load type sorting technique stream based on n-gram multiple features
Cheng Tu.
(1) initiation parameter: Payload structure sets to 0, the load data payload_buff being received for caching, connect
The data length payload_len received and the most treated number-of-packet pkt_num;Payload_ft structure sets to 0, and is used for preserving
The load characteristic extracted;Max_payload_len is set, represents the greatest length receiving every stream loading data;Arrange
Min_payload_len, represents the minimum data length for extracting load characteristic;Head_len is set, represents data pack protocol
The length that head is estimated;Max_packet_num is set, represents the maximum bag number for gathering load characteristic;Maximum high frequency is set
String frequency threshold k, arranges maximum n-gram maximized window length threshold N;Train_flag is set to true, represents that needs are carried out point
The training of class model;
(2) inputting the network of samples flow of known load type, arranging train_flag is true;
(3) read data packet carry out stream gravity group, by five-tuple (source IP, purpose IP, source port, destination interface, TCP/UDP)
Labelling every network flow, to every new network flow using the cryptographic Hash of five-tuple as Key, Payload structure as Value,
Insert in HashMap;
(4) process each packet one by one, calculate the cryptographic Hash of the five-tuple of packet, from HashMap, obtain Payload
Structure, skips head_len and is saved in Payload by remaining load data, and by this stream reduced data bag number pkt_num
Add 1, until it reaches max_payload_len length, proceed to step (5);If currently processed number-of-packet exceedes
Max_packet_num, and payload_len then proceeds to step (5) not less than min_payload_len;If it is currently processed
Packet is last packet of this stream or reaches stream gravity group time-out time, and payload_len is not less than
Min_payload_len, then proceed to step (5), if payload_len is less than min_payload_len, does not carries out this stream
Subsequent characteristics is extracted, and this stream is removed HashMap;Continue executing with step (3);
(5) if train_flag is true, then perform step (6), otherwise perform step (9);
(6) the load data B of convection current carries out n-gram substring segmentation, takes each different value n between [1, N] big as sliding window
Little, split former load data, obtain n-gram continuous substring set Sn={ s1,s2,s3,...,si,...,sL-n+1, L represents load data B
Length;Obtaining shape such as " bb ", " ccc " so only comprises the consecutive identical character substring set of a kind of character simultaneously;
It is exemplified below:
Former string is " abbcccdefg ", during n=2, and sliding window such as Fig. 1, obtain 2-gram substring set:
S2={ ab, bb, bc, cc, cc, cd, de, ef, fg};
(7) statistics n-gram continuous substring set SnIn every frequency, take between [1, K] each different value k as frequency threshold value,
Filter n-gram continuous substring set, obtain high frequency continuous substring set S'n,k;
S'n,k={ s'1,k,s'2,k,s'3,k,...,s'i,k,...,s'm,k, k=1,2,3 ..., K
K represents given frequency threshold value, and m represents that frequency is not less than the number of the not repeat element of threshold value k;
With | s'i,k| represent element s'i,kFrequency, | S'n,k| represent set S'n,kTotal frequency of middle all elements, then
Such as, payload content is " abbcccdefg ", takes n=2, can be divided into 2-gram continuous substring set
S2={ ab, bb, bc, cc, cc, cd, de, ef, fg};
As k=1, obtain S'2,1=ab, bb, bc, cc, cd, de, ef, fg}, | S'2,1|=9;
As k=2, obtain S'2,2=cc}, | S'2,2|=2;
(8) the high frequency continuous substring set obtained in step (6) in consecutive identical character substring set and step (7) is extracted
Following statistical nature, proceeds to step (10) after completing:
(8.1) statistical nature of extraction high frequency continuous substring set:
Frequency exceedes the different Element Species numbers of threshold value k: mn,k=m;
Element maximum frequency: mfn,k=max (| s'i,k|), i=1,2 ..., m, reflect the peak value of frequency distribution;
AverageReflect continuation character set SnThe average level of each element frequency size;
VarianceReflect the degree of scatter of data deviation average;
ComentropyReflect the confusion degree of system, it and element number and the frequency of each element
Number size is the most relevant;
(8.2) statistical nature of consecutive identical character substring set is extracted:
Quantity sc_num of consecutive identical character substring: the sum that all consecutive identical character substrings occur;
The kind number sc_diff_num of consecutive identical character substring: the variety classes that occurred (constitute the character of substring different or
Person's substring length is different) the quantity of consecutive identical character substring;
Length sc_max_len of maximum consecutive identical character substring: the greatest length of the consecutive identical character substring occurred;
Average length sc_mean_len of consecutive identical character substring: the total length of all consecutive identical character substring occurred
The result obtained divided by sc_num;
(9) according to characteristic of division collection in step (11), convection current load data carries out n-gram substring segmentation, and extracts corresponding special
Levy, proceed to step (10);
(10) other features in addition to comentropy are all taken the logarithm process, such as Variance feature d1,1After logarithm process it is
log(d1,1), each mark sheet is shown as every stream characteristic vector:
(sc_num,...,sc_mean_len,m1,1,...,h1,1,...,mn,k,...,hn,k)
Wherein, n=1,2,3..., N, k=1,2,3 ..., K;
(11) if train_flag is true, and packet not yet runs through, then proceed to step (3), if packet runs through
Then identify the load type of each stream characteristic vector, and utilize the method combined selection of X 2 test and information gain to come the spy of front 10
Levy as characteristic of division collection, proceed to step (12);If train_flag is false, proceed to step (13);
(12) utilize C4.5 decision tree as disaggregated model, construct training sample with the characteristic of division collection in step (11),
To C4.5 disaggregated model;To the classifying rules in C4.5 disaggregated model, it is converted into IF-ELSE rule;Proceed to step (14);
(13) according to the IF-ELSE rule in step (12), it is judged that the corresponding load type of characteristic vector, network is exported
The load type of stream, proceeds to step (3);
(14) arranging train_flag is false, and input needs the network traffics of classification, proceeds to step (3).
Claims (10)
1. a flow load type detection method based on n-gram multiple features, the steps include:
1) read the packet of each network of samples stream of selected known load type, according to this packet by its institute of five-tuple labelling
The network of samples stream belonged to;Then the cryptographic Hash of this network of samples stream five-tuple is calculated as key Key, according to this key Key
Hash table is searched, if there is no this, then using this cryptographic Hash as new key Key, for this network of samples stream
Distribute a Payload structure as key assignments Value, insert in Hash table;If there is this key Key, then at Hash
Table obtains the Payload structure of correspondence, the load data of this packet is saved in this Payload structure;
2) to step 1) each Payload structure of obtaining: the load data in Payload structure is carried out n-gram
Substring is split, and obtains consecutive identical character substring set and the n-gram of this Payload structure correspondence network of samples stream
Substring set continuously, then adds up frequency every in this n-gram continuous substring set, obtains a high frequency
Set of strings;Then from this consecutive identical character substring set, high frequency continuous substring set, this network of samples stream is extracted
Statistical nature, generates the characteristic vector of this network of samples stream;
3) according to step 2) characteristic vector that obtains, training obtains a disaggregated model;
4) for network flow to be sorted, generate the characteristic vector of this network flow, then utilize this disaggregated model to judge this network flow
Type.
2. the method for claim 1, it is characterised in that this Payload structure includes having received charge number for storage
According to field payload_buff, field payload_len of load data length received and number-of-packet pkt_num.
3. method as claimed in claim 1 or 2, it is characterised in that step 1) in, the load data of this packet is saved in
During this Payload structure, the reduced data bag number pkt_num of this Payload structure is added 1.
4. method as claimed in claim 3, it is characterised in that if load data length payload_len of this Payload structure
Reach the greatest length max_payload_len set, proceed to step 2).
5. method as claimed in claim 3, it is characterised in that set if the number-of-packet that this Payload structure has received exceedes
Fixed maximum bag number max_packet_num, and payload_len is not less than the minimum data length set
Min_payload_len then proceeds to step 2).
6. method as claimed in claim 3, it is characterised in that if the currently processed packet of this Payload structure is corresponding
Last packet of network of samples stream or reach set stream gravity group time-out time, and payload_len not less than set
Minimum data length min_payload_len, then proceed to step 2).
7. method as claimed in claim 3, it is characterised in that if network of samples stream corresponding to this Payload structure has processed
This network of samples stream less than minimum data length min_payload_len set, is then removed Kazakhstan by the payload_len after Biing
Uncommon table.
8. the method for claim 1, it is characterised in that the load data in Payload structure is carried out n-gram substring
The method of segmentation is: takes each different value n between [1, N] and, as sliding window size, splits former load data, obtain n-gram
Substring set and consecutive identical character substring set continuously.
9. the method for claim 1, it is characterised in that the statistical nature extracted from high frequency continuous substring set includes: frequently
Number exceedes the different Element Species number m of threshold value kn,k, element maximum frequency mfn,k, average meann,k, variance dn,k, comentropy
hn,k;The statistical nature extracted from consecutive identical character substring set includes: quantity sc_num of consecutive identical character substring,
The kind number sc_diff_num of consecutive identical character substring, length sc_max_len of maximum consecutive identical character substring, company
Average length sc_mean_len of continuous identical characters substring.
10. the method for claim 1, it is characterised in that training obtains the method for this disaggregated model and is: examine first with card side
The method with information gain of testing calculates score sequence side by side, if choosing for each network of samples stream to the characteristic vector of each network of samples stream
Dry feature is as the characteristic of division collection of corresponding network of samples stream;Then utilize decision tree as disaggregated model, use this characteristic of division
Collection structure training sample, obtains disaggregated model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610240406.6A CN105871619B (en) | 2016-04-18 | 2016-04-18 | A kind of flow load type detection method based on n-gram multiple features |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610240406.6A CN105871619B (en) | 2016-04-18 | 2016-04-18 | A kind of flow load type detection method based on n-gram multiple features |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105871619A true CN105871619A (en) | 2016-08-17 |
CN105871619B CN105871619B (en) | 2019-03-01 |
Family
ID=56633356
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610240406.6A Expired - Fee Related CN105871619B (en) | 2016-04-18 | 2016-04-18 | A kind of flow load type detection method based on n-gram multiple features |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105871619B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107682348A (en) * | 2017-10-19 | 2018-02-09 | 杭州安恒信息技术有限公司 | DGA domain name Quick method and devices based on machine learning |
WO2019149076A1 (en) * | 2018-02-05 | 2019-08-08 | 阿里巴巴集团控股有限公司 | Word vector generation method, apparatus and device |
CN110362343A (en) * | 2019-07-19 | 2019-10-22 | 上海交通大学 | The method of the detection bytecode similarity of N-Gram |
CN110719274A (en) * | 2019-09-29 | 2020-01-21 | 武汉极意网络科技有限公司 | Network security control method, device, equipment and storage medium |
CN111144470A (en) * | 2019-12-20 | 2020-05-12 | 中国科学院信息工程研究所 | Unknown network flow identification method and system based on deep self-encoder |
CN111563234A (en) * | 2020-04-23 | 2020-08-21 | 华南理工大学 | Feature extraction method of system call data in host anomaly detection |
CN111723846A (en) * | 2020-05-20 | 2020-09-29 | 中国人民解放军战略支援部队信息工程大学 | Method and device for identifying encryption and compressed flow based on randomness characteristics |
CN112765599A (en) * | 2020-12-28 | 2021-05-07 | 中科曙光(南京)计算技术有限公司 | Intrusion detection method for application program |
CN113965631A (en) * | 2021-10-29 | 2022-01-21 | 复旦大学 | SECS2 data packet identification method for HSMS header information loss |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050060295A1 (en) * | 2003-09-12 | 2005-03-17 | Sensory Networks, Inc. | Statistical classification of high-speed network data through content inspection |
CN101051958A (en) * | 2007-05-11 | 2007-10-10 | 北京工业大学 | Extracting method for behaviour analysis parameter of network behaviour |
CN101282251A (en) * | 2008-05-08 | 2008-10-08 | 中国科学院计算技术研究所 | Method for digging recognition characteristic of application layer protocol |
CN101714952A (en) * | 2009-12-22 | 2010-05-26 | 北京邮电大学 | Method and device for identifying traffic of access network |
CN101741908A (en) * | 2009-12-25 | 2010-06-16 | 青岛朗讯科技通讯设备有限公司 | Identification method for application layer protocol characteristic |
CN102468987A (en) * | 2010-11-08 | 2012-05-23 | 清华大学 | NetFlow characteristic vector extraction method |
-
2016
- 2016-04-18 CN CN201610240406.6A patent/CN105871619B/en not_active Expired - Fee Related
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050060295A1 (en) * | 2003-09-12 | 2005-03-17 | Sensory Networks, Inc. | Statistical classification of high-speed network data through content inspection |
CN101051958A (en) * | 2007-05-11 | 2007-10-10 | 北京工业大学 | Extracting method for behaviour analysis parameter of network behaviour |
CN101282251A (en) * | 2008-05-08 | 2008-10-08 | 中国科学院计算技术研究所 | Method for digging recognition characteristic of application layer protocol |
CN101714952A (en) * | 2009-12-22 | 2010-05-26 | 北京邮电大学 | Method and device for identifying traffic of access network |
CN101741908A (en) * | 2009-12-25 | 2010-06-16 | 青岛朗讯科技通讯设备有限公司 | Identification method for application layer protocol characteristic |
CN102468987A (en) * | 2010-11-08 | 2012-05-23 | 清华大学 | NetFlow characteristic vector extraction method |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107682348A (en) * | 2017-10-19 | 2018-02-09 | 杭州安恒信息技术有限公司 | DGA domain name Quick method and devices based on machine learning |
WO2019149076A1 (en) * | 2018-02-05 | 2019-08-08 | 阿里巴巴集团控股有限公司 | Word vector generation method, apparatus and device |
US10824819B2 (en) | 2018-02-05 | 2020-11-03 | Alibaba Group Holding Limited | Generating word vectors by recurrent neural networks based on n-ary characters |
CN110362343A (en) * | 2019-07-19 | 2019-10-22 | 上海交通大学 | The method of the detection bytecode similarity of N-Gram |
CN110719274B (en) * | 2019-09-29 | 2022-10-04 | 武汉极意网络科技有限公司 | Network security control method, device, equipment and storage medium |
CN110719274A (en) * | 2019-09-29 | 2020-01-21 | 武汉极意网络科技有限公司 | Network security control method, device, equipment and storage medium |
CN111144470A (en) * | 2019-12-20 | 2020-05-12 | 中国科学院信息工程研究所 | Unknown network flow identification method and system based on deep self-encoder |
CN111144470B (en) * | 2019-12-20 | 2022-12-16 | 中国科学院信息工程研究所 | Unknown network flow identification method and system based on deep self-encoder |
CN111563234A (en) * | 2020-04-23 | 2020-08-21 | 华南理工大学 | Feature extraction method of system call data in host anomaly detection |
CN111723846A (en) * | 2020-05-20 | 2020-09-29 | 中国人民解放军战略支援部队信息工程大学 | Method and device for identifying encryption and compressed flow based on randomness characteristics |
CN111723846B (en) * | 2020-05-20 | 2024-01-26 | 中国人民解放军战略支援部队信息工程大学 | Encryption and compression flow identification method and device based on randomness characteristics |
CN112765599A (en) * | 2020-12-28 | 2021-05-07 | 中科曙光(南京)计算技术有限公司 | Intrusion detection method for application program |
CN113965631A (en) * | 2021-10-29 | 2022-01-21 | 复旦大学 | SECS2 data packet identification method for HSMS header information loss |
CN113965631B (en) * | 2021-10-29 | 2023-10-13 | 复旦大学 | SECS2 data packet identification method for HSMS head information loss |
Also Published As
Publication number | Publication date |
---|---|
CN105871619B (en) | 2019-03-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105871619A (en) | Method for n-gram-based multi-feature flow load type detection | |
CN107665191B (en) | Private protocol message format inference method based on extended prefix tree | |
Aceto et al. | PortLoad: taking the best of two worlds in traffic classification | |
CN109951444B (en) | Encrypted anonymous network traffic identification method | |
Pei et al. | A DDoS attack detection method based on machine learning | |
CN108881192B (en) | Encryption type botnet detection system and method based on deep learning | |
CN1881950B (en) | Packet classification acceleration using spectral analysis | |
CN104244035B (en) | Network video stream sorting technique based on multi-level clustering | |
Alshammari et al. | A flow based approach for SSH traffic detection | |
CN102420723A (en) | Anomaly detection method for various kinds of intrusion | |
Park et al. | Toward fine-grained traffic classification | |
CN110611640A (en) | DNS protocol hidden channel detection method based on random forest | |
CN112800424A (en) | Botnet malicious traffic monitoring method based on random forest | |
Peraković et al. | Model for detection and classification of DDoS traffic based on artificial neural network | |
CN110417729A (en) | A kind of service and application class method and system encrypting flow | |
CN108028807A (en) | Method and system for on-line automatic identification Model of network traffic | |
CN108462707A (en) | A kind of mobile application recognition methods based on deep learning sequence analysis | |
CN110519228B (en) | Method and system for identifying malicious cloud robot in black-production scene | |
CN113472751A (en) | Encrypted flow identification method and device based on data packet header | |
Coelho et al. | BACKORDERS: using random forests to detect DDoS attacks in programmable data planes | |
CN110858837B (en) | Network management and control method and device and electronic equipment | |
Özdel et al. | Payload-based network traffic analysis for application classification and intrusion detection | |
CN107832611B (en) | Zombie program detection and classification method combining dynamic and static characteristics | |
CN105429817A (en) | Illegal business identification device and illegal business identification method based on DPI and DFI | |
Wang et al. | Internet traffic classification using machine learning: a token-based approach |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20190301 |
|
CF01 | Termination of patent right due to non-payment of annual fee |