CN105871619A - Method for n-gram-based multi-feature flow load type detection - Google Patents

Method for n-gram-based multi-feature flow load type detection Download PDF

Info

Publication number
CN105871619A
CN105871619A CN201610240406.6A CN201610240406A CN105871619A CN 105871619 A CN105871619 A CN 105871619A CN 201610240406 A CN201610240406 A CN 201610240406A CN 105871619 A CN105871619 A CN 105871619A
Authority
CN
China
Prior art keywords
payload
network
substring
len
packet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610240406.6A
Other languages
Chinese (zh)
Other versions
CN105871619B (en
Inventor
庹宇鹏
张永铮
常鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201610240406.6A priority Critical patent/CN105871619B/en
Publication of CN105871619A publication Critical patent/CN105871619A/en
Application granted granted Critical
Publication of CN105871619B publication Critical patent/CN105871619B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2441Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a method for n-gram-based multi-feature flow load type detection. The method comprises a step 1) of reading a data packet of sample network flow, and marking the sample network flow which the data packet belongs to according to a quintuple of the data packet; calculating a Hash value of the sample network flow quintuple as a key; if the item does not exist in a Hash table, taking the Hash value as a new key and distributing a structural body as a key value, and inserting the structural body into the Hash table; if the key exists, obtaining a corresponding structural body in the Hash table, and storing load data of the data packet to the structural body; a step 2) of performing n-gram substring partition on load data in each structural body, and generating a characteristic vector of the sample network flow; a step 3) of training and obtaining a classification model according to the characteristic vector; a step 4) of generating, for to-be-classified network flow, a characteristic vector of the network flow, and utilizing the classification model to judge the type of the network flow. According to the method, the detection efficiency is greatly improved.

Description

A kind of flow load type detection method based on n-gram multiple features
Technical field
The invention belongs to network traffic information security fields, relate to a kind of method that flow is detected by its load type, energy Enough it is applied to improve network service quality, optimize the aspect such as network bandwidth allocation, Strengthens network safety management.
Background technology
Along with the universal of the Internet and the fast development of network technology, network traffics present explosive growth, the most efficiently network Bandwidth plan, network invasion monitoring and defence and charge on traffic, be current network service provider and network manager is faced Challenge.Flow can be classified by net flow assorted technology according to network application type or protocol type, can be above-mentioned urgently The problem that need to solve provides important technical support.
Current existing net flow assorted technology, mainly by net flow assorted to concrete network application and procotol, Method has 4 kinds: traffic classification method based on port, traffic classification method based on payload, Intrusion Detection based on host behavioral pattern Traffic classification method and traffic classification method based on machine learning.
(1) traffic classification method based on port
Can be used in high speed real-time traffic classification, but due to random port and the abuse of port camouflage, the method by Gradually lost efficacy.
(2) traffic classification method based on payload
Practical Project is most widely used, mainly has benefited from class character string fingerprint matching algorithm fast and accurately.But due to Therefore the extraction of application protocol fingerprint can only cannot process encryption flow for known non-encrypted network application or other is unknown The flow of network application.
(3) the traffic classification method of Intrusion Detection based on host behavioral pattern
The method has the strongest theory significance, is independent of protocol characteristic, it is not necessary to resolve packet, is handed over by main frame Mutual behavioural information identifies network traffics, is difficult to ensure that requirement of real-time, and the knowledge that cannot become more meticulous owing to model is complex Do not go out application type.
(4) traffic classification method based on machine learning
Assume for given network application, behavioral statistics feature (stream interval, single packet byte length, the adjacent bag time interval of stream Deng) there is uniqueness, can be based on this type of tagsort heterogeneous networks application traffic.But shortcoming is to be difficult to find out effective feature pair The flow that network application or agreement produce is classified accurately, and categorizing process consuming resource is more, is applied at thread environment There is certain difficulty.
Quantitative analysis and the problem of identification is flowed with encrypting, it is proposed that convection current amount load type is identified and divides for unknown applications flow Class, method mainly has following 3 kinds:
(1) based on the assumption that inspection sorting technique
The method is primarily directed to encrypt the identification of flow, utilizes the randomness feature of encryption data, real one by one to network message Execute accumulation and inspection, according to message length, result is weighted comprehensively, it is not necessary to decryption oprerations, without coupling certain content, Achieve the pervasive identification to encryption flow. it is dynamically adapted the amount detection of message, to reach the unification of time delay and accuracy rate.Lack Point is the identification that cannot be applied to other load type, and easily misidentifies the network traffics of compression type.
(2) sorting technique based on Flow Behavior feature
Owing to specific encryption protocol is at connection establishment stage mutual message, content is similar, and form is fixed, and often has specific Traffic characteristic, such as message length, message time of advent etc..Utilize these particular flow rate features, by means of the method for machine learning The identification to specific encryption protocol can be realized.But length and the time of advent of message generally will not be substantially changed due to cryptographic operation, This makes major part business datum have identical traffic characteristic when using plaintext transmission and use encrypted transmission.Some algorithms are claimed It is capable of identify that encryption flow, is substantially the business identifying encrypted transmission.Such as P2P software is through plaintext transmission and encrypted biography During transmission of data, its traffic characteristic is identical.Regardless of whether encrypted transmission, these algorithms can identify that it is P2P business.But, Whether these its service traffics of algorithm None-identified encrypt.
(3) sorting technique based on statistical natures such as load entropy
Have Many researchers at present and entropy feature be applied to the classification of load type, and combine the methods such as machine learning SVM, Load type is categorized as the types such as text, encryption, binary system, but conventional this kind of method have employed the most single statistics spy Levy and portray different load type, and have ignored " long tail effect " of substring frequency distribution, cause overall classification average accurately Rate is only about 86%, and the accurate of particular category is even below 80%, it is difficult to meet actual demand.
Summary of the invention
The problem existed for above-mentioned existing method, the invention discloses a kind of flow load type based on n-gram multiple features (literary composition Basis, audio frequency, video, picture, executable file, compress, encryption etc.) detection method.
First the part definition that present invention relates to is given:
(1) definition 1:n-gram continuous substring set refers to split, with the sliding window of a length of n, the substring collection that former string obtains Closing, former string here refers to payload content;
Such as, former string is " abbcccdefg ", during n=2, and sliding window such as Fig. 1, obtain 2-gram substring set:
S2={ ab, bb, bc, cc, cc, cd, de, ef, fg}
(2) definition 2: high frequency continuous substring set refers to substring set duplicate removal continuous to n-gram, and adds up each substring Frequency, is exceeded the set that the substring of threshold value k is constituted by frequency;
Such as, when n=2, k=1, high frequency continuous substring set, the 2-gram continuous substring collection i.e. exceeding threshold value 1 is combined into:
S'2,1={ ab, bb, bc, cc, cd, de, ef, fg}
(3) definition 3: consecutive identical character substring set refers to by shape such as " bb ", and " ccc " so only comprises a kind of character The set that substring is constituted continuously;
Concrete steps of the present invention include:
(1) initiation parameter: Payload structure sets to 0, Payload structure includes having cached map network stream respectively The load data payload_buff received, load data length payload_len and the most treated received in payload_buff Number-of-packet pkt_num;Payload_ft structure sets to 0, for preserving the load characteristic extracting every network flow;The side of setting Method global parameter max_payload_len, represents the greatest length receiving every stream loading data;Min_payload_len is set, Represent the minimum data length for extracting load characteristic;Head_len is set, represents the length that data pack protocol head is estimated;If Put max_packet_num, represent the maximum bag number for gathering load characteristic;Maximum high frequency substring frequency threshold k is set, if Put maximum n-gram maximized window length threshold N;Train_flag is initially set to true, represents the training rank being introduced into disaggregated model Section, completes rearmounted for false, entrance online classification stage at model training;
Payload structure is:
Payload_ft structure:
(2) arranging train_flag is true, enters the training stage of model, the network of samples flow of input known load type;
(3) read data packet carry out stream gravity group: read the packet in network traffics, by five-tuple (source IP, purpose IP, Source port, destination interface, TCP/UDP) network flow belonging to this packet of labelling, calculate the cryptographic Hash of this network flow five-tuple, As key Key, search in Hash table, if there is no this, then using this cryptographic Hash as new key Key, for this network Stream distribution one new Payload structure, as key assignments Value, inserts in Hash table;If Hash table exists this, then enter Row step 4);
(4) according to the calculated cryptographic Hash of five-tuple of packet as key Key, Hash table obtains the Payload of correspondence Structure, after skipping packet starting position head_len length, is saved into load data afterwards in Payload, and will This stream reduced data bag number pkt_num adds 1, until it reaches max_payload_len length, proceeds to step (5);If worked as The number-of-packet of pre-treatment is more than max_packet_num, and payload_len then proceeds to step not less than min_payload_len (5);If currently processed packet is last packet of this stream, and payload_len is not less than min_payload_len, Then proceed to step (5), if the payload_len after this network flow is disposed is less than min_payload_len, the most not to this stream Carry out subsequent characteristics extraction, and this stream is removed Hash table;Continue executing with step (3);
(5) if train_flag is true, then perform step (6), otherwise perform step (9);
(6) load data of convection current carries out n-gram substring segmentation, takes between [1, N] each different value n as sliding window size, Splitting former load data, obtain n-gram continuous substring set and shape such as " bb ", " ccc " so only comprises the continuous of a kind of character Identical characters substring set;
(7) every frequency in statistics n-gram continuous substring set, takes between [1, K] each different value k as frequency threshold value, Filter n-gram continuous substring set, obtain high frequency continuous substring set;
(8) the high frequency continuous substring set obtained in step (6) in consecutive identical character substring set and step (7) is extracted Following statistical nature, proceeds to step (10) after completing:
(8.1) statistical nature of high frequency continuous substring set is extracted: frequency exceedes the different Element Species number m of threshold value kn,k, element Maximum frequency mfn,k, average meann,k, variance dn,k, comentropy hn,k
(8.2) statistical nature of consecutive identical character substring set is extracted: quantity sc_num of consecutive identical character substring, continuously The kind number sc_diff_num of identical characters substring, length sc_max_len of maximum consecutive identical character substring, consecutive identical Average length sc_mean_len of character substring;
(9) according to characteristic of division collection in step (11), convection current load data carries out n-gram substring segmentation, and extracts corresponding special Levy, construct the characteristic vector of every stream;Proceed to step (10);
(10) other features in addition to comentropy are all taken the logarithm process, such as Variance feature d1,1After logarithm process it is log(d1,1), step (8) each mark sheet is shown as the characteristic vector of this stream;
(11) if train_flag is true, and packet not yet runs through, then proceed to step (3), if packet runs through Then identify the load type of each stream characteristic vector, and utilize the method for X 2 test and the information gain spy to all network of samples streams Levying vector and calculate score sequence side by side, by order from front to back, before selection comes two kinds of methods successively, the feature of 10 is as classification spy Collection (selecting 10 features altogether), as the characteristic of division collection of map network stream, proceeds to step (12);If train_flag is false Then proceed to step (13);
(12) utilize C4.5 decision tree as disaggregated model, construct training sample with the characteristic of division collection in step (11), To C4.5 disaggregated model;To the classifying rules in C4.5 disaggregated model, it is converted into IF-ELSE rule;Proceed to step (14);
(13) according to the IF-ELSE rule in step (12), it is judged that the corresponding load type of characteristic vector, network is exported The load type of stream, proceeds to step (3);
(14) arranging train_flag is false, and input needs the network traffics of classification, proceeds to step (3).
Compared with published method, present invention have the advantage that
(1) having only to the load characteristic information of some bags before network flow of extracting, classification speed is very fast and has only to use every stream few The payload content of amount, committed memory is less;
(2) support multiple load type is classified, including text, audio frequency, video, picture, executable file, compression, Encryption etc.;
(3) payload content is carried out n-gram segmentation, and the high frequency substring collection after utilizing threshold value screening closes and extracts effectively spy Levy, compare existing method and there is higher classification accuracy and recall rate;
(4) there is parameter flexibly arrange, the extraction length of payload content of feature, data packet header length, load can be set Lotus minimum lengths etc., make balance between performance and classifying quality;
(5) characteristic set used can be adjusted, it is thus achieved that preferably classify according to given load type and data with existing collection Effect.
Accompanying drawing explanation
Fig. 1 is the sliding window of a length of 2;
Fig. 2 is that load characteristic based on n-gram extracts schematic diagram;
Fig. 3 is traffic classification flow chart based on n-gram multiple features.
Detailed description of the invention
Below, the present invention is described in detail in conjunction with specific embodiments.Fig. 2 is load data to be carried out n-gram feature carry The schematic diagram taken, corresponding step (6) is to step (10), and Fig. 3 is load type sorting technique stream based on n-gram multiple features Cheng Tu.
(1) initiation parameter: Payload structure sets to 0, the load data payload_buff being received for caching, connect The data length payload_len received and the most treated number-of-packet pkt_num;Payload_ft structure sets to 0, and is used for preserving The load characteristic extracted;Max_payload_len is set, represents the greatest length receiving every stream loading data;Arrange Min_payload_len, represents the minimum data length for extracting load characteristic;Head_len is set, represents data pack protocol The length that head is estimated;Max_packet_num is set, represents the maximum bag number for gathering load characteristic;Maximum high frequency is set String frequency threshold k, arranges maximum n-gram maximized window length threshold N;Train_flag is set to true, represents that needs are carried out point The training of class model;
(2) inputting the network of samples flow of known load type, arranging train_flag is true;
(3) read data packet carry out stream gravity group, by five-tuple (source IP, purpose IP, source port, destination interface, TCP/UDP) Labelling every network flow, to every new network flow using the cryptographic Hash of five-tuple as Key, Payload structure as Value, Insert in HashMap;
(4) process each packet one by one, calculate the cryptographic Hash of the five-tuple of packet, from HashMap, obtain Payload Structure, skips head_len and is saved in Payload by remaining load data, and by this stream reduced data bag number pkt_num Add 1, until it reaches max_payload_len length, proceed to step (5);If currently processed number-of-packet exceedes Max_packet_num, and payload_len then proceeds to step (5) not less than min_payload_len;If it is currently processed Packet is last packet of this stream or reaches stream gravity group time-out time, and payload_len is not less than Min_payload_len, then proceed to step (5), if payload_len is less than min_payload_len, does not carries out this stream Subsequent characteristics is extracted, and this stream is removed HashMap;Continue executing with step (3);
(5) if train_flag is true, then perform step (6), otherwise perform step (9);
(6) the load data B of convection current carries out n-gram substring segmentation, takes each different value n between [1, N] big as sliding window Little, split former load data, obtain n-gram continuous substring set Sn={ s1,s2,s3,...,si,...,sL-n+1, L represents load data B Length;Obtaining shape such as " bb ", " ccc " so only comprises the consecutive identical character substring set of a kind of character simultaneously;
It is exemplified below:
Former string is " abbcccdefg ", during n=2, and sliding window such as Fig. 1, obtain 2-gram substring set:
S2={ ab, bb, bc, cc, cc, cd, de, ef, fg};
(7) statistics n-gram continuous substring set SnIn every frequency, take between [1, K] each different value k as frequency threshold value, Filter n-gram continuous substring set, obtain high frequency continuous substring set S'n,k
S'n,k={ s'1,k,s'2,k,s'3,k,...,s'i,k,...,s'm,k, k=1,2,3 ..., K
K represents given frequency threshold value, and m represents that frequency is not less than the number of the not repeat element of threshold value k;
With | s'i,k| represent element s'i,kFrequency, | S'n,k| represent set S'n,kTotal frequency of middle all elements, then
Such as, payload content is " abbcccdefg ", takes n=2, can be divided into 2-gram continuous substring set S2={ ab, bb, bc, cc, cc, cd, de, ef, fg};
As k=1, obtain S'2,1=ab, bb, bc, cc, cd, de, ef, fg}, | S'2,1|=9;
As k=2, obtain S'2,2=cc}, | S'2,2|=2;
(8) the high frequency continuous substring set obtained in step (6) in consecutive identical character substring set and step (7) is extracted Following statistical nature, proceeds to step (10) after completing:
(8.1) statistical nature of extraction high frequency continuous substring set:
Frequency exceedes the different Element Species numbers of threshold value k: mn,k=m;
Element maximum frequency: mfn,k=max (| s'i,k|), i=1,2 ..., m, reflect the peak value of frequency distribution;
AverageReflect continuation character set SnThe average level of each element frequency size;
VarianceReflect the degree of scatter of data deviation average;
ComentropyReflect the confusion degree of system, it and element number and the frequency of each element Number size is the most relevant;
(8.2) statistical nature of consecutive identical character substring set is extracted:
Quantity sc_num of consecutive identical character substring: the sum that all consecutive identical character substrings occur;
The kind number sc_diff_num of consecutive identical character substring: the variety classes that occurred (constitute the character of substring different or Person's substring length is different) the quantity of consecutive identical character substring;
Length sc_max_len of maximum consecutive identical character substring: the greatest length of the consecutive identical character substring occurred;
Average length sc_mean_len of consecutive identical character substring: the total length of all consecutive identical character substring occurred The result obtained divided by sc_num;
(9) according to characteristic of division collection in step (11), convection current load data carries out n-gram substring segmentation, and extracts corresponding special Levy, proceed to step (10);
(10) other features in addition to comentropy are all taken the logarithm process, such as Variance feature d1,1After logarithm process it is log(d1,1), each mark sheet is shown as every stream characteristic vector:
(sc_num,...,sc_mean_len,m1,1,...,h1,1,...,mn,k,...,hn,k)
Wherein, n=1,2,3..., N, k=1,2,3 ..., K;
(11) if train_flag is true, and packet not yet runs through, then proceed to step (3), if packet runs through Then identify the load type of each stream characteristic vector, and utilize the method combined selection of X 2 test and information gain to come the spy of front 10 Levy as characteristic of division collection, proceed to step (12);If train_flag is false, proceed to step (13);
(12) utilize C4.5 decision tree as disaggregated model, construct training sample with the characteristic of division collection in step (11), To C4.5 disaggregated model;To the classifying rules in C4.5 disaggregated model, it is converted into IF-ELSE rule;Proceed to step (14);
(13) according to the IF-ELSE rule in step (12), it is judged that the corresponding load type of characteristic vector, network is exported The load type of stream, proceeds to step (3);
(14) arranging train_flag is false, and input needs the network traffics of classification, proceeds to step (3).

Claims (10)

1. a flow load type detection method based on n-gram multiple features, the steps include:
1) read the packet of each network of samples stream of selected known load type, according to this packet by its institute of five-tuple labelling The network of samples stream belonged to;Then the cryptographic Hash of this network of samples stream five-tuple is calculated as key Key, according to this key Key Hash table is searched, if there is no this, then using this cryptographic Hash as new key Key, for this network of samples stream Distribute a Payload structure as key assignments Value, insert in Hash table;If there is this key Key, then at Hash Table obtains the Payload structure of correspondence, the load data of this packet is saved in this Payload structure;
2) to step 1) each Payload structure of obtaining: the load data in Payload structure is carried out n-gram Substring is split, and obtains consecutive identical character substring set and the n-gram of this Payload structure correspondence network of samples stream Substring set continuously, then adds up frequency every in this n-gram continuous substring set, obtains a high frequency Set of strings;Then from this consecutive identical character substring set, high frequency continuous substring set, this network of samples stream is extracted Statistical nature, generates the characteristic vector of this network of samples stream;
3) according to step 2) characteristic vector that obtains, training obtains a disaggregated model;
4) for network flow to be sorted, generate the characteristic vector of this network flow, then utilize this disaggregated model to judge this network flow Type.
2. the method for claim 1, it is characterised in that this Payload structure includes having received charge number for storage According to field payload_buff, field payload_len of load data length received and number-of-packet pkt_num.
3. method as claimed in claim 1 or 2, it is characterised in that step 1) in, the load data of this packet is saved in During this Payload structure, the reduced data bag number pkt_num of this Payload structure is added 1.
4. method as claimed in claim 3, it is characterised in that if load data length payload_len of this Payload structure Reach the greatest length max_payload_len set, proceed to step 2).
5. method as claimed in claim 3, it is characterised in that set if the number-of-packet that this Payload structure has received exceedes Fixed maximum bag number max_packet_num, and payload_len is not less than the minimum data length set Min_payload_len then proceeds to step 2).
6. method as claimed in claim 3, it is characterised in that if the currently processed packet of this Payload structure is corresponding Last packet of network of samples stream or reach set stream gravity group time-out time, and payload_len not less than set Minimum data length min_payload_len, then proceed to step 2).
7. method as claimed in claim 3, it is characterised in that if network of samples stream corresponding to this Payload structure has processed This network of samples stream less than minimum data length min_payload_len set, is then removed Kazakhstan by the payload_len after Biing Uncommon table.
8. the method for claim 1, it is characterised in that the load data in Payload structure is carried out n-gram substring The method of segmentation is: takes each different value n between [1, N] and, as sliding window size, splits former load data, obtain n-gram Substring set and consecutive identical character substring set continuously.
9. the method for claim 1, it is characterised in that the statistical nature extracted from high frequency continuous substring set includes: frequently Number exceedes the different Element Species number m of threshold value kn,k, element maximum frequency mfn,k, average meann,k, variance dn,k, comentropy hn,k;The statistical nature extracted from consecutive identical character substring set includes: quantity sc_num of consecutive identical character substring, The kind number sc_diff_num of consecutive identical character substring, length sc_max_len of maximum consecutive identical character substring, company Average length sc_mean_len of continuous identical characters substring.
10. the method for claim 1, it is characterised in that training obtains the method for this disaggregated model and is: examine first with card side The method with information gain of testing calculates score sequence side by side, if choosing for each network of samples stream to the characteristic vector of each network of samples stream Dry feature is as the characteristic of division collection of corresponding network of samples stream;Then utilize decision tree as disaggregated model, use this characteristic of division Collection structure training sample, obtains disaggregated model.
CN201610240406.6A 2016-04-18 2016-04-18 A kind of flow load type detection method based on n-gram multiple features Expired - Fee Related CN105871619B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610240406.6A CN105871619B (en) 2016-04-18 2016-04-18 A kind of flow load type detection method based on n-gram multiple features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610240406.6A CN105871619B (en) 2016-04-18 2016-04-18 A kind of flow load type detection method based on n-gram multiple features

Publications (2)

Publication Number Publication Date
CN105871619A true CN105871619A (en) 2016-08-17
CN105871619B CN105871619B (en) 2019-03-01

Family

ID=56633356

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610240406.6A Expired - Fee Related CN105871619B (en) 2016-04-18 2016-04-18 A kind of flow load type detection method based on n-gram multiple features

Country Status (1)

Country Link
CN (1) CN105871619B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107682348A (en) * 2017-10-19 2018-02-09 杭州安恒信息技术有限公司 DGA domain name Quick method and devices based on machine learning
WO2019149076A1 (en) * 2018-02-05 2019-08-08 阿里巴巴集团控股有限公司 Word vector generation method, apparatus and device
CN110362343A (en) * 2019-07-19 2019-10-22 上海交通大学 The method of the detection bytecode similarity of N-Gram
CN110719274A (en) * 2019-09-29 2020-01-21 武汉极意网络科技有限公司 Network security control method, device, equipment and storage medium
CN111144470A (en) * 2019-12-20 2020-05-12 中国科学院信息工程研究所 Unknown network flow identification method and system based on deep self-encoder
CN111563234A (en) * 2020-04-23 2020-08-21 华南理工大学 Feature extraction method of system call data in host anomaly detection
CN111723846A (en) * 2020-05-20 2020-09-29 中国人民解放军战略支援部队信息工程大学 Method and device for identifying encryption and compressed flow based on randomness characteristics
CN112765599A (en) * 2020-12-28 2021-05-07 中科曙光(南京)计算技术有限公司 Intrusion detection method for application program
CN113965631A (en) * 2021-10-29 2022-01-21 复旦大学 SECS2 data packet identification method for HSMS header information loss

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050060295A1 (en) * 2003-09-12 2005-03-17 Sensory Networks, Inc. Statistical classification of high-speed network data through content inspection
CN101051958A (en) * 2007-05-11 2007-10-10 北京工业大学 Extracting method for behaviour analysis parameter of network behaviour
CN101282251A (en) * 2008-05-08 2008-10-08 中国科学院计算技术研究所 Method for digging recognition characteristic of application layer protocol
CN101714952A (en) * 2009-12-22 2010-05-26 北京邮电大学 Method and device for identifying traffic of access network
CN101741908A (en) * 2009-12-25 2010-06-16 青岛朗讯科技通讯设备有限公司 Identification method for application layer protocol characteristic
CN102468987A (en) * 2010-11-08 2012-05-23 清华大学 NetFlow characteristic vector extraction method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050060295A1 (en) * 2003-09-12 2005-03-17 Sensory Networks, Inc. Statistical classification of high-speed network data through content inspection
CN101051958A (en) * 2007-05-11 2007-10-10 北京工业大学 Extracting method for behaviour analysis parameter of network behaviour
CN101282251A (en) * 2008-05-08 2008-10-08 中国科学院计算技术研究所 Method for digging recognition characteristic of application layer protocol
CN101714952A (en) * 2009-12-22 2010-05-26 北京邮电大学 Method and device for identifying traffic of access network
CN101741908A (en) * 2009-12-25 2010-06-16 青岛朗讯科技通讯设备有限公司 Identification method for application layer protocol characteristic
CN102468987A (en) * 2010-11-08 2012-05-23 清华大学 NetFlow characteristic vector extraction method

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107682348A (en) * 2017-10-19 2018-02-09 杭州安恒信息技术有限公司 DGA domain name Quick method and devices based on machine learning
WO2019149076A1 (en) * 2018-02-05 2019-08-08 阿里巴巴集团控股有限公司 Word vector generation method, apparatus and device
US10824819B2 (en) 2018-02-05 2020-11-03 Alibaba Group Holding Limited Generating word vectors by recurrent neural networks based on n-ary characters
CN110362343A (en) * 2019-07-19 2019-10-22 上海交通大学 The method of the detection bytecode similarity of N-Gram
CN110719274B (en) * 2019-09-29 2022-10-04 武汉极意网络科技有限公司 Network security control method, device, equipment and storage medium
CN110719274A (en) * 2019-09-29 2020-01-21 武汉极意网络科技有限公司 Network security control method, device, equipment and storage medium
CN111144470A (en) * 2019-12-20 2020-05-12 中国科学院信息工程研究所 Unknown network flow identification method and system based on deep self-encoder
CN111144470B (en) * 2019-12-20 2022-12-16 中国科学院信息工程研究所 Unknown network flow identification method and system based on deep self-encoder
CN111563234A (en) * 2020-04-23 2020-08-21 华南理工大学 Feature extraction method of system call data in host anomaly detection
CN111723846A (en) * 2020-05-20 2020-09-29 中国人民解放军战略支援部队信息工程大学 Method and device for identifying encryption and compressed flow based on randomness characteristics
CN111723846B (en) * 2020-05-20 2024-01-26 中国人民解放军战略支援部队信息工程大学 Encryption and compression flow identification method and device based on randomness characteristics
CN112765599A (en) * 2020-12-28 2021-05-07 中科曙光(南京)计算技术有限公司 Intrusion detection method for application program
CN113965631A (en) * 2021-10-29 2022-01-21 复旦大学 SECS2 data packet identification method for HSMS header information loss
CN113965631B (en) * 2021-10-29 2023-10-13 复旦大学 SECS2 data packet identification method for HSMS head information loss

Also Published As

Publication number Publication date
CN105871619B (en) 2019-03-01

Similar Documents

Publication Publication Date Title
CN105871619A (en) Method for n-gram-based multi-feature flow load type detection
CN107665191B (en) Private protocol message format inference method based on extended prefix tree
Aceto et al. PortLoad: taking the best of two worlds in traffic classification
CN109951444B (en) Encrypted anonymous network traffic identification method
Pei et al. A DDoS attack detection method based on machine learning
CN108881192B (en) Encryption type botnet detection system and method based on deep learning
CN1881950B (en) Packet classification acceleration using spectral analysis
CN104244035B (en) Network video stream sorting technique based on multi-level clustering
Alshammari et al. A flow based approach for SSH traffic detection
CN102420723A (en) Anomaly detection method for various kinds of intrusion
Park et al. Toward fine-grained traffic classification
CN110611640A (en) DNS protocol hidden channel detection method based on random forest
CN112800424A (en) Botnet malicious traffic monitoring method based on random forest
Peraković et al. Model for detection and classification of DDoS traffic based on artificial neural network
CN110417729A (en) A kind of service and application class method and system encrypting flow
CN108028807A (en) Method and system for on-line automatic identification Model of network traffic
CN108462707A (en) A kind of mobile application recognition methods based on deep learning sequence analysis
CN110519228B (en) Method and system for identifying malicious cloud robot in black-production scene
CN113472751A (en) Encrypted flow identification method and device based on data packet header
Coelho et al. BACKORDERS: using random forests to detect DDoS attacks in programmable data planes
CN110858837B (en) Network management and control method and device and electronic equipment
Özdel et al. Payload-based network traffic analysis for application classification and intrusion detection
CN107832611B (en) Zombie program detection and classification method combining dynamic and static characteristics
CN105429817A (en) Illegal business identification device and illegal business identification method based on DPI and DFI
Wang et al. Internet traffic classification using machine learning: a token-based approach

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190301

CF01 Termination of patent right due to non-payment of annual fee