CN104796407A - Method for extracting unknown protocol features - Google Patents

Method for extracting unknown protocol features Download PDF

Info

Publication number
CN104796407A
CN104796407A CN201510127979.3A CN201510127979A CN104796407A CN 104796407 A CN104796407 A CN 104796407A CN 201510127979 A CN201510127979 A CN 201510127979A CN 104796407 A CN104796407 A CN 104796407A
Authority
CN
China
Prior art keywords
agreement
feature
byte
frequent
candidate set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510127979.3A
Other languages
Chinese (zh)
Other versions
CN104796407B (en
Inventor
张凤荔
周洪川
张春瑞
王勇
张俊娇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201510127979.3A priority Critical patent/CN104796407B/en
Publication of CN104796407A publication Critical patent/CN104796407A/en
Application granted granted Critical
Publication of CN104796407B publication Critical patent/CN104796407B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/18Protocol analysers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/02Protocol performance

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention discloses a method for extracting unknown protocol features. The method includes: the data frames of each kind of protocol are divided into two parts, each part is segmented according to bytes, and occurrence times and frequencies of each byte are counted to obtain frequent bytes; the frequent bytes are screened to obtain the frequent bytes corresponding to each kind protocol; splicing the frequent bytes, which occurs continuously, corresponding to each kind protocol to obtain feature long strings, namely frequent strings, and the frequent strings are screened to obtain the feature candidate set of each kind of protocol; the data frames of each kind of protocol are represented into vectors according to the corresponding feature candidate set; a correlative feature selecting CFS algorithm is used to perform feature selection on each feature candidate set, and the selected features are recorded; a KNN algorithm is used to categorize the features, and categorizing accuracy rate and recognition rate are statistically counted. By the method for extracting unknown protocol features, decision makers can effectively recognize unknown protocols.

Description

A kind of extracting method of unknown protocol feature
Technical field
The present invention relates to a kind of extracting method of unknown protocol feature.
Background technology
Along with the development of network is increasingly sophisticated, ensure that the safety of information network has become the core content of national information strategy; Under specific network environment, the threat being undertaken stealing secret information by any special measures is increasingly severe, this type of approach of stealing secret information normally sends classified information by the mode of radio communication, and the agreement that this communication adopts is unconventional special unknown protocol, and the existing precautionary measures are basic only for known protocol, most employing, cannot to such the steal secret information monitoring of channel type and detection based on methods such as port mapping or static nature couplings.
In order to ensure network safe operation and to attacking and the early warning of dangerous act, policymaker in the urgent need to accurately finding the feature of agreement to be identified under current structure complex network environment, therefore we need the extracting method finding a kind of feasible protocol characteristic, and aid decision making person identifies unknown protocol efficiently.
Summary of the invention
The object of the invention is to overcome the deficiencies in the prior art, provide a kind of extracting method of unknown protocol feature, aid decision making person identifies unknown protocol efficiently.
The object of the invention is to be achieved through the following technical solutions: a kind of extracting method of unknown protocol feature, it comprises the following steps:
S1. the Frame of each agreement of data centralization is divided into two parts at random, by byte, cutting is carried out to every part, and add up number of times and the frequency of the appearance of each byte respectively, obtain frequent byte;
S2. use Jaccard parameter to screen frequent byte, select the frequent byte that each agreement is corresponding;
S3. the frequent byte of continuous appearance corresponding for a kind of agreement is spliced, obtain the long string of feature i.e. frequent string, and filter out byte and occur that quantity is greater than the long string of feature of frame total byte quantity 50%, obtain two feature Candidate Sets of this agreement, get it to occur simultaneously as the feature Candidate Set of this agreement, frequent byte corresponding to each agreement respectively carries out the feature Candidate Set that above-mentioned process obtains each agreement;
S4. according to the feature Candidate Set obtaining each agreement, the Frame of this agreement is characterized by vector, makes each frame data become the vectorial of feature Candidate Set;
S5. use correlative character to select CFS algorithm to carry out feature selecting to the feature Candidate Set of each obtained agreement, and the feature poised out is carried out record;
S6. KNN algorithm is utilized to classify, the accuracy rate of statistical classification and discrimination, as the evaluation index of feature selecting result.
Described step S2 comprises following sub-step:
S21. the threshold value by changing a kind of agreement calculates different Jaccard values;
S22. when Jaccard value first time peaks, the threshold value of this agreement corresponding to record;
S23. corresponding according to this agreement Threshold selection goes out frequent byte corresponding to this agreement;
S24. respectively aforesaid operations is carried out to each agreement and obtain frequent byte corresponding to each agreement.
Described step S3 comprises following sub-step:
S31. to each frame data in a kind of agreement, if the frequent byte screened occurs continuously, just they are stitched together as the long string of feature and pick out;
S32. filter out wherein byte and occur that quantity is greater than the long string of feature of frame total byte quantity 50%, obtain two feature Candidate Sets of this agreement;
S33. the feature Candidate Set of common factor as this agreement of two feature Candidate Sets is got;
S34. corresponding to each agreement respectively frequent byte carries out above-mentioned process, obtains the feature Candidate Set of each agreement.
Described Jaccard parameter is defined as:
J ( A , B ) = Σ i = 1 n T 1 i * T 2 i Σ i = 1 n T 1 i 2 + Σ i = 1 n T 2 i 2 - Σ i = 1 n T 1 i * T 2 i
In formula, T1 iand T2 irepresent i-th feature in A and B respectively.
Described correlative character selects CFS algorithm, and formula is as follows:
Merit s ( k ) = k r ‾ cf k + k ( k - 1 ) r ‾ ff
Wherein Merit sk (), represents an evaluation comprising the character subset S of k feature, larger then selected character subset S is more excellent for its value;
the mean value of the coefficient correlation between each feature and classification c;
the mean value of the coefficient correlation between feature and feature.
Will with bring formula into it to be converted to further:
Merit s ( k ) = r c f 2 + r c f 2 + · · · + r cf k k + 2 ( r f 1 f 2 + · · · + r f i f i · · · + r f k f 1 )
The effect that CFS algorithm is classified for each by each feature of assessment, thus draw final character subset.
The invention has the beneficial effects as follows: (1) is by the extraction of frequent byte, the splicing of frequent long string, classification after frame data to the conversion and feature selecting of vector, the feature of identification data frame can be obtained, and the feature obtained not only greatly reduces the quantity of feature after feature selecting, and the few of decline of the classification accuracy of protocol frame; (2) by the extraction of protocol characteristic, can aid decision making person efficiently unknown protocol be identified.
Accompanying drawing explanation
Fig. 1 is flow chart of the present invention.
Embodiment
Below in conjunction with accompanying drawing, technical scheme of the present invention is described in further detail, but protection scope of the present invention is not limited to the following stated.
As shown in Figure 1, a kind of extracting method of unknown protocol feature, it comprises the following steps:
S1. the Frame of each agreement of data centralization is divided into two parts at random, by byte, cutting is carried out to every part, and add up number of times and the frequency of the appearance of each byte respectively, obtain frequent byte;
S2. use Jaccard parameter to screen frequent byte, select the frequent byte that each agreement is corresponding;
S3. the frequent byte of continuous appearance corresponding for a kind of agreement is spliced, obtain the long string of feature i.e. frequent string, and filter out byte and occur that quantity is greater than the long string of feature of frame total byte quantity 50%, obtain two feature Candidate Sets of this agreement, get it to occur simultaneously as the feature Candidate Set of this agreement, frequent byte corresponding to each agreement respectively carries out the feature Candidate Set that above-mentioned process obtains each agreement;
S4. according to the feature Candidate Set obtaining each agreement, the Frame of this agreement is characterized by vector, makes each frame data become the vectorial of feature Candidate Set;
S5. use correlative character to select CFS algorithm to carry out feature selecting to the feature Candidate Set of each obtained agreement, and the feature poised out is carried out record;
S6. KNN algorithm is utilized to classify, the accuracy rate of statistical classification and discrimination, as the evaluation index of feature selecting result.Described step S2 comprises following sub-step:
S21. the threshold value by changing a kind of agreement calculates different Jaccard values;
S22. when Jaccard value first time peaks, the threshold value of this agreement corresponding to record;
S23. corresponding according to this agreement Threshold selection goes out frequent byte corresponding to this agreement;
S24. respectively aforesaid operations is carried out to each agreement and obtain frequent byte corresponding to each agreement.
Described step S3 comprises following sub-step:
S31. to each frame data in a kind of agreement, if the frequent byte screened occurs continuously, just they are stitched together as the long string of feature and pick out;
S32. filter out wherein byte and occur that quantity is greater than the long string of feature of frame total byte quantity 50%, obtain two feature Candidate Sets of this agreement;
S33. the feature Candidate Set of common factor as this agreement of two feature Candidate Sets is got;
S34. corresponding to each agreement respectively frequent byte carries out above-mentioned process, obtains the feature Candidate Set of each agreement.Described Jaccard parameter is defined as:
J ( A , B ) = Σ i = 1 n T 1 i * T 2 i Σ i = 1 n T 1 i 2 + Σ i = 1 n T 2 i 2 - Σ i = 1 n T 1 i * T 2 i
In formula, T1 iand T2 irepresent i-th feature in A and B respectively.
Described correlative character selects CFS algorithm, and formula is as follows:
Merit s ( k ) = k r ‾ cf k + k ( k - 1 ) r ‾ ff
Wherein Merit sk (), represents an evaluation comprising the character subset S of k feature, larger then selected character subset S is more excellent for its value;
the mean value of the coefficient correlation between each feature and classification c;
the mean value of the coefficient correlation between feature and feature.
Will with bring formula into it to be converted to further:
Merit s ( k ) = r c f 1 + r c f 2 + · · · + r cf k k + 2 ( r f 1 f 2 + · · · + r f i f i · · · + r f k f 1 )
The effect that CFS algorithm is classified for each by each feature of assessment, thus draw final character subset.

Claims (4)

1. an extracting method for unknown protocol feature, is characterized in that: it comprises the following steps:
S1. the Frame of each agreement of data centralization is divided into two parts at random, by byte, cutting is carried out to every part, and add up number of times and the frequency of the appearance of each byte respectively, obtain frequent byte;
S2. use Jaccard parameter to screen frequent byte, select the frequent byte that each agreement is corresponding;
S3. the frequent byte of continuous appearance corresponding for a kind of agreement is spliced, obtain the long string of feature i.e. frequent string, and filter out byte and occur that quantity is greater than the long string of feature of frame total byte quantity 50%, obtain two feature Candidate Sets of this agreement, get it to occur simultaneously as the feature Candidate Set of this agreement, frequent byte corresponding to each agreement respectively carries out the feature Candidate Set that above-mentioned process obtains each agreement;
S4. according to the feature Candidate Set obtaining each agreement, the Frame of this agreement is characterized by vector, makes each frame data become the vectorial of feature Candidate Set;
S5. use correlative character to select CFS algorithm to carry out feature selecting to the feature Candidate Set of each obtained agreement, and the feature poised out is carried out record;
S6. KNN algorithm is utilized to classify.
2. the extracting method of a kind of unknown protocol feature according to claim 1, is characterized in that: described step S2 comprises following sub-step:
S21. the threshold value by changing a kind of agreement calculates different Jaccard values;
S22. when Jaccard value first time peaks, the threshold value of this agreement corresponding to record;
S23. corresponding according to this agreement Threshold selection goes out frequent byte corresponding to this agreement;
S24. respectively aforesaid operations is carried out to each agreement and obtain frequent byte corresponding to each agreement.
3. the extracting method of a kind of unknown protocol feature according to claim 1, is characterized in that: described step S3 comprises following sub-step:
S31. to each frame data in a kind of agreement, if the frequent byte screened occurs continuously, just they are stitched together as the long string of feature and pick out;
S32. filter out wherein byte and occur that quantity is greater than the long string of feature of frame total byte quantity 50%, obtain two feature Candidate Sets of this agreement;
S33. the feature Candidate Set of common factor as this agreement of two feature Candidate Sets is got;
S34. corresponding to each agreement respectively frequent byte carries out above-mentioned process, obtains the feature Candidate Set of each agreement.
4. the extracting method of a kind of unknown protocol feature according to claim 2, is characterized in that: described Jaccard parameter is defined as:
J ( A , B ) = Σ i = 1 n T 1 i * T 2 i Σ i = 1 n T 1 i 2 + Σ i = 1 n T 2 i 2 - Σ i = 1 n T 1 i * T 2 i
In formula, T1 iand T2 irepresent i-th feature in A and B respectively.
CN201510127979.3A 2015-03-23 2015-03-23 A kind of extracting method of unknown protocol feature Active CN104796407B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510127979.3A CN104796407B (en) 2015-03-23 2015-03-23 A kind of extracting method of unknown protocol feature

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510127979.3A CN104796407B (en) 2015-03-23 2015-03-23 A kind of extracting method of unknown protocol feature

Publications (2)

Publication Number Publication Date
CN104796407A true CN104796407A (en) 2015-07-22
CN104796407B CN104796407B (en) 2018-03-30

Family

ID=53560919

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510127979.3A Active CN104796407B (en) 2015-03-23 2015-03-23 A kind of extracting method of unknown protocol feature

Country Status (1)

Country Link
CN (1) CN104796407B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105827603A (en) * 2016-03-14 2016-08-03 中国人民解放军信息工程大学 Inexplicit protocol feature library establishment method and device and inexplicit message classification method and device
CN108632252A (en) * 2018-04-03 2018-10-09 中国人民解放军战略支援部队信息工程大学 A kind of private network agreement iteration conversed analysis method, apparatus and server
CN110061976A (en) * 2019-03-29 2019-07-26 中国空间技术研究院 A kind of unknown protocol frame sequence extracting method and system based on data mining
CN110457465A (en) * 2019-06-21 2019-11-15 武汉大学 A kind of classification method for known bits stream protocol
CN111274235A (en) * 2020-01-16 2020-06-12 电子科技大学 Unknown protocol data cleaning and protocol field feature extraction method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100138443A1 (en) * 2008-11-17 2010-06-03 Ramakrishnan Kadangode K User-Powered Recommendation System
CN103414722A (en) * 2013-08-19 2013-11-27 中国科学院空间科学与应用研究中心 Space link protocol blind identification method and system
CN103955539A (en) * 2014-05-19 2014-07-30 中国人民解放军信息工程大学 Method and device for obtaining control field demarcation point in binary protocol data
CN104159232A (en) * 2014-09-01 2014-11-19 电子科技大学 Method of recognizing protocol format of binary message data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100138443A1 (en) * 2008-11-17 2010-06-03 Ramakrishnan Kadangode K User-Powered Recommendation System
CN103414722A (en) * 2013-08-19 2013-11-27 中国科学院空间科学与应用研究中心 Space link protocol blind identification method and system
CN103955539A (en) * 2014-05-19 2014-07-30 中国人民解放军信息工程大学 Method and device for obtaining control field demarcation point in binary protocol data
CN104159232A (en) * 2014-09-01 2014-11-19 电子科技大学 Method of recognizing protocol format of binary message data

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105827603A (en) * 2016-03-14 2016-08-03 中国人民解放军信息工程大学 Inexplicit protocol feature library establishment method and device and inexplicit message classification method and device
CN108632252A (en) * 2018-04-03 2018-10-09 中国人民解放军战略支援部队信息工程大学 A kind of private network agreement iteration conversed analysis method, apparatus and server
CN108632252B (en) * 2018-04-03 2021-02-02 中国人民解放军战略支援部队信息工程大学 Private network protocol iteration reverse analysis method, device and server
CN110061976A (en) * 2019-03-29 2019-07-26 中国空间技术研究院 A kind of unknown protocol frame sequence extracting method and system based on data mining
CN110457465A (en) * 2019-06-21 2019-11-15 武汉大学 A kind of classification method for known bits stream protocol
CN111274235A (en) * 2020-01-16 2020-06-12 电子科技大学 Unknown protocol data cleaning and protocol field feature extraction method
CN111274235B (en) * 2020-01-16 2022-11-04 电子科技大学 Unknown protocol data cleaning and protocol field feature extraction method

Also Published As

Publication number Publication date
CN104796407B (en) 2018-03-30

Similar Documents

Publication Publication Date Title
CN104796407A (en) Method for extracting unknown protocol features
CN111817982A (en) Encrypted flow identification method for category imbalance
CN112788066B (en) Abnormal flow detection method and system for Internet of things equipment and storage medium
CN105678273A (en) Initial point detection algorithm of transient signal in radio frequency fingerprint identification technology
WO2018160136A1 (en) Method and apparatus for determining an identity of an unknown internet-of-things (iot) device in a communication network
CN104159232B (en) Method of recognizing protocol format of binary message data
CN113821793B (en) Multi-stage attack scene construction method and system based on graph convolution neural network
CN105024993A (en) Protocol comparison method based on vector operation
CN105721242A (en) Information entropy-based encrypted traffic identification method
CN113094707B (en) Lateral movement attack detection method and system based on heterogeneous graph network
CN103944919A (en) Wireless multi-step attack mode excavation method for WLAN
CN107682103B (en) Double-feature spectrum sensing method based on maximum feature value and principal feature vector
CN112738014A (en) Industrial control flow abnormity detection method and system based on convolution time sequence network
CN101442535A (en) Method for recognizing and tracking application based on keyword sequence
CN106792883A (en) Sensor network abnormal deviation data examination method and system
CN112491849B (en) Power terminal vulnerability attack protection method based on flow characteristics
CN114143037A (en) Malicious encrypted channel detection method based on process behavior analysis
CN105183780A (en) Improved AGNES algorithm based protocol classification method
CN107689899A (en) A kind of unknown protocol recognition methods and system based on bit stream
CN111314910B (en) Wireless sensor network abnormal data detection method for mapping isolation forest
CN102467670B (en) Immunity-based anomaly detection method
Zheng et al. Preprocessing method for encrypted traffic based on semisupervised clustering
CN109145889B (en) Bright and ciphertext signal classification detection method for blind estimation of wireless signals
CN111191720A (en) Service scene identification method and device and electronic equipment
Soewu et al. Analysis of Data Mining-Based Approach for Intrusion Detection System

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant