CN104796407A - Method for extracting unknown protocol features - Google Patents
Method for extracting unknown protocol features Download PDFInfo
- Publication number
- CN104796407A CN104796407A CN201510127979.3A CN201510127979A CN104796407A CN 104796407 A CN104796407 A CN 104796407A CN 201510127979 A CN201510127979 A CN 201510127979A CN 104796407 A CN104796407 A CN 104796407A
- Authority
- CN
- China
- Prior art keywords
- agreement
- feature
- byte
- frequent
- candidate set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/18—Protocol analysers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/02—Protocol performance
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The invention discloses a method for extracting unknown protocol features. The method includes: the data frames of each kind of protocol are divided into two parts, each part is segmented according to bytes, and occurrence times and frequencies of each byte are counted to obtain frequent bytes; the frequent bytes are screened to obtain the frequent bytes corresponding to each kind protocol; splicing the frequent bytes, which occurs continuously, corresponding to each kind protocol to obtain feature long strings, namely frequent strings, and the frequent strings are screened to obtain the feature candidate set of each kind of protocol; the data frames of each kind of protocol are represented into vectors according to the corresponding feature candidate set; a correlative feature selecting CFS algorithm is used to perform feature selection on each feature candidate set, and the selected features are recorded; a KNN algorithm is used to categorize the features, and categorizing accuracy rate and recognition rate are statistically counted. By the method for extracting unknown protocol features, decision makers can effectively recognize unknown protocols.
Description
Technical field
The present invention relates to a kind of extracting method of unknown protocol feature.
Background technology
Along with the development of network is increasingly sophisticated, ensure that the safety of information network has become the core content of national information strategy; Under specific network environment, the threat being undertaken stealing secret information by any special measures is increasingly severe, this type of approach of stealing secret information normally sends classified information by the mode of radio communication, and the agreement that this communication adopts is unconventional special unknown protocol, and the existing precautionary measures are basic only for known protocol, most employing, cannot to such the steal secret information monitoring of channel type and detection based on methods such as port mapping or static nature couplings.
In order to ensure network safe operation and to attacking and the early warning of dangerous act, policymaker in the urgent need to accurately finding the feature of agreement to be identified under current structure complex network environment, therefore we need the extracting method finding a kind of feasible protocol characteristic, and aid decision making person identifies unknown protocol efficiently.
Summary of the invention
The object of the invention is to overcome the deficiencies in the prior art, provide a kind of extracting method of unknown protocol feature, aid decision making person identifies unknown protocol efficiently.
The object of the invention is to be achieved through the following technical solutions: a kind of extracting method of unknown protocol feature, it comprises the following steps:
S1. the Frame of each agreement of data centralization is divided into two parts at random, by byte, cutting is carried out to every part, and add up number of times and the frequency of the appearance of each byte respectively, obtain frequent byte;
S2. use Jaccard parameter to screen frequent byte, select the frequent byte that each agreement is corresponding;
S3. the frequent byte of continuous appearance corresponding for a kind of agreement is spliced, obtain the long string of feature i.e. frequent string, and filter out byte and occur that quantity is greater than the long string of feature of frame total byte quantity 50%, obtain two feature Candidate Sets of this agreement, get it to occur simultaneously as the feature Candidate Set of this agreement, frequent byte corresponding to each agreement respectively carries out the feature Candidate Set that above-mentioned process obtains each agreement;
S4. according to the feature Candidate Set obtaining each agreement, the Frame of this agreement is characterized by vector, makes each frame data become the vectorial of feature Candidate Set;
S5. use correlative character to select CFS algorithm to carry out feature selecting to the feature Candidate Set of each obtained agreement, and the feature poised out is carried out record;
S6. KNN algorithm is utilized to classify, the accuracy rate of statistical classification and discrimination, as the evaluation index of feature selecting result.
Described step S2 comprises following sub-step:
S21. the threshold value by changing a kind of agreement calculates different Jaccard values;
S22. when Jaccard value first time peaks, the threshold value of this agreement corresponding to record;
S23. corresponding according to this agreement Threshold selection goes out frequent byte corresponding to this agreement;
S24. respectively aforesaid operations is carried out to each agreement and obtain frequent byte corresponding to each agreement.
Described step S3 comprises following sub-step:
S31. to each frame data in a kind of agreement, if the frequent byte screened occurs continuously, just they are stitched together as the long string of feature and pick out;
S32. filter out wherein byte and occur that quantity is greater than the long string of feature of frame total byte quantity 50%, obtain two feature Candidate Sets of this agreement;
S33. the feature Candidate Set of common factor as this agreement of two feature Candidate Sets is got;
S34. corresponding to each agreement respectively frequent byte carries out above-mentioned process, obtains the feature Candidate Set of each agreement.
Described Jaccard parameter is defined as:
In formula, T1
iand T2
irepresent i-th feature in A and B respectively.
Described correlative character selects CFS algorithm, and formula is as follows:
Wherein Merit
sk (), represents an evaluation comprising the character subset S of k feature, larger then selected character subset S is more excellent for its value;
the mean value of the coefficient correlation between each feature and classification c;
the mean value of the coefficient correlation between feature and feature.
Will
with
bring formula into it to be converted to further:
The effect that CFS algorithm is classified for each by each feature of assessment, thus draw final character subset.
The invention has the beneficial effects as follows: (1) is by the extraction of frequent byte, the splicing of frequent long string, classification after frame data to the conversion and feature selecting of vector, the feature of identification data frame can be obtained, and the feature obtained not only greatly reduces the quantity of feature after feature selecting, and the few of decline of the classification accuracy of protocol frame; (2) by the extraction of protocol characteristic, can aid decision making person efficiently unknown protocol be identified.
Accompanying drawing explanation
Fig. 1 is flow chart of the present invention.
Embodiment
Below in conjunction with accompanying drawing, technical scheme of the present invention is described in further detail, but protection scope of the present invention is not limited to the following stated.
As shown in Figure 1, a kind of extracting method of unknown protocol feature, it comprises the following steps:
S1. the Frame of each agreement of data centralization is divided into two parts at random, by byte, cutting is carried out to every part, and add up number of times and the frequency of the appearance of each byte respectively, obtain frequent byte;
S2. use Jaccard parameter to screen frequent byte, select the frequent byte that each agreement is corresponding;
S3. the frequent byte of continuous appearance corresponding for a kind of agreement is spliced, obtain the long string of feature i.e. frequent string, and filter out byte and occur that quantity is greater than the long string of feature of frame total byte quantity 50%, obtain two feature Candidate Sets of this agreement, get it to occur simultaneously as the feature Candidate Set of this agreement, frequent byte corresponding to each agreement respectively carries out the feature Candidate Set that above-mentioned process obtains each agreement;
S4. according to the feature Candidate Set obtaining each agreement, the Frame of this agreement is characterized by vector, makes each frame data become the vectorial of feature Candidate Set;
S5. use correlative character to select CFS algorithm to carry out feature selecting to the feature Candidate Set of each obtained agreement, and the feature poised out is carried out record;
S6. KNN algorithm is utilized to classify, the accuracy rate of statistical classification and discrimination, as the evaluation index of feature selecting result.Described step S2 comprises following sub-step:
S21. the threshold value by changing a kind of agreement calculates different Jaccard values;
S22. when Jaccard value first time peaks, the threshold value of this agreement corresponding to record;
S23. corresponding according to this agreement Threshold selection goes out frequent byte corresponding to this agreement;
S24. respectively aforesaid operations is carried out to each agreement and obtain frequent byte corresponding to each agreement.
Described step S3 comprises following sub-step:
S31. to each frame data in a kind of agreement, if the frequent byte screened occurs continuously, just they are stitched together as the long string of feature and pick out;
S32. filter out wherein byte and occur that quantity is greater than the long string of feature of frame total byte quantity 50%, obtain two feature Candidate Sets of this agreement;
S33. the feature Candidate Set of common factor as this agreement of two feature Candidate Sets is got;
S34. corresponding to each agreement respectively frequent byte carries out above-mentioned process, obtains the feature Candidate Set of each agreement.Described Jaccard parameter is defined as:
In formula, T1
iand T2
irepresent i-th feature in A and B respectively.
Described correlative character selects CFS algorithm, and formula is as follows:
Wherein Merit
sk (), represents an evaluation comprising the character subset S of k feature, larger then selected character subset S is more excellent for its value;
the mean value of the coefficient correlation between each feature and classification c;
the mean value of the coefficient correlation between feature and feature.
Will
with
bring formula into it to be converted to further:
The effect that CFS algorithm is classified for each by each feature of assessment, thus draw final character subset.
Claims (4)
1. an extracting method for unknown protocol feature, is characterized in that: it comprises the following steps:
S1. the Frame of each agreement of data centralization is divided into two parts at random, by byte, cutting is carried out to every part, and add up number of times and the frequency of the appearance of each byte respectively, obtain frequent byte;
S2. use Jaccard parameter to screen frequent byte, select the frequent byte that each agreement is corresponding;
S3. the frequent byte of continuous appearance corresponding for a kind of agreement is spliced, obtain the long string of feature i.e. frequent string, and filter out byte and occur that quantity is greater than the long string of feature of frame total byte quantity 50%, obtain two feature Candidate Sets of this agreement, get it to occur simultaneously as the feature Candidate Set of this agreement, frequent byte corresponding to each agreement respectively carries out the feature Candidate Set that above-mentioned process obtains each agreement;
S4. according to the feature Candidate Set obtaining each agreement, the Frame of this agreement is characterized by vector, makes each frame data become the vectorial of feature Candidate Set;
S5. use correlative character to select CFS algorithm to carry out feature selecting to the feature Candidate Set of each obtained agreement, and the feature poised out is carried out record;
S6. KNN algorithm is utilized to classify.
2. the extracting method of a kind of unknown protocol feature according to claim 1, is characterized in that: described step S2 comprises following sub-step:
S21. the threshold value by changing a kind of agreement calculates different Jaccard values;
S22. when Jaccard value first time peaks, the threshold value of this agreement corresponding to record;
S23. corresponding according to this agreement Threshold selection goes out frequent byte corresponding to this agreement;
S24. respectively aforesaid operations is carried out to each agreement and obtain frequent byte corresponding to each agreement.
3. the extracting method of a kind of unknown protocol feature according to claim 1, is characterized in that: described step S3 comprises following sub-step:
S31. to each frame data in a kind of agreement, if the frequent byte screened occurs continuously, just they are stitched together as the long string of feature and pick out;
S32. filter out wherein byte and occur that quantity is greater than the long string of feature of frame total byte quantity 50%, obtain two feature Candidate Sets of this agreement;
S33. the feature Candidate Set of common factor as this agreement of two feature Candidate Sets is got;
S34. corresponding to each agreement respectively frequent byte carries out above-mentioned process, obtains the feature Candidate Set of each agreement.
4. the extracting method of a kind of unknown protocol feature according to claim 2, is characterized in that: described Jaccard parameter is defined as:
In formula, T1
iand T2
irepresent i-th feature in A and B respectively.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510127979.3A CN104796407B (en) | 2015-03-23 | 2015-03-23 | A kind of extracting method of unknown protocol feature |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510127979.3A CN104796407B (en) | 2015-03-23 | 2015-03-23 | A kind of extracting method of unknown protocol feature |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104796407A true CN104796407A (en) | 2015-07-22 |
CN104796407B CN104796407B (en) | 2018-03-30 |
Family
ID=53560919
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510127979.3A Active CN104796407B (en) | 2015-03-23 | 2015-03-23 | A kind of extracting method of unknown protocol feature |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104796407B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105827603A (en) * | 2016-03-14 | 2016-08-03 | 中国人民解放军信息工程大学 | Inexplicit protocol feature library establishment method and device and inexplicit message classification method and device |
CN108632252A (en) * | 2018-04-03 | 2018-10-09 | 中国人民解放军战略支援部队信息工程大学 | A kind of private network agreement iteration conversed analysis method, apparatus and server |
CN110061976A (en) * | 2019-03-29 | 2019-07-26 | 中国空间技术研究院 | A kind of unknown protocol frame sequence extracting method and system based on data mining |
CN110457465A (en) * | 2019-06-21 | 2019-11-15 | 武汉大学 | A kind of classification method for known bits stream protocol |
CN111274235A (en) * | 2020-01-16 | 2020-06-12 | 电子科技大学 | Unknown protocol data cleaning and protocol field feature extraction method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100138443A1 (en) * | 2008-11-17 | 2010-06-03 | Ramakrishnan Kadangode K | User-Powered Recommendation System |
CN103414722A (en) * | 2013-08-19 | 2013-11-27 | 中国科学院空间科学与应用研究中心 | Space link protocol blind identification method and system |
CN103955539A (en) * | 2014-05-19 | 2014-07-30 | 中国人民解放军信息工程大学 | Method and device for obtaining control field demarcation point in binary protocol data |
CN104159232A (en) * | 2014-09-01 | 2014-11-19 | 电子科技大学 | Method of recognizing protocol format of binary message data |
-
2015
- 2015-03-23 CN CN201510127979.3A patent/CN104796407B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100138443A1 (en) * | 2008-11-17 | 2010-06-03 | Ramakrishnan Kadangode K | User-Powered Recommendation System |
CN103414722A (en) * | 2013-08-19 | 2013-11-27 | 中国科学院空间科学与应用研究中心 | Space link protocol blind identification method and system |
CN103955539A (en) * | 2014-05-19 | 2014-07-30 | 中国人民解放军信息工程大学 | Method and device for obtaining control field demarcation point in binary protocol data |
CN104159232A (en) * | 2014-09-01 | 2014-11-19 | 电子科技大学 | Method of recognizing protocol format of binary message data |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105827603A (en) * | 2016-03-14 | 2016-08-03 | 中国人民解放军信息工程大学 | Inexplicit protocol feature library establishment method and device and inexplicit message classification method and device |
CN108632252A (en) * | 2018-04-03 | 2018-10-09 | 中国人民解放军战略支援部队信息工程大学 | A kind of private network agreement iteration conversed analysis method, apparatus and server |
CN108632252B (en) * | 2018-04-03 | 2021-02-02 | 中国人民解放军战略支援部队信息工程大学 | Private network protocol iteration reverse analysis method, device and server |
CN110061976A (en) * | 2019-03-29 | 2019-07-26 | 中国空间技术研究院 | A kind of unknown protocol frame sequence extracting method and system based on data mining |
CN110457465A (en) * | 2019-06-21 | 2019-11-15 | 武汉大学 | A kind of classification method for known bits stream protocol |
CN111274235A (en) * | 2020-01-16 | 2020-06-12 | 电子科技大学 | Unknown protocol data cleaning and protocol field feature extraction method |
CN111274235B (en) * | 2020-01-16 | 2022-11-04 | 电子科技大学 | Unknown protocol data cleaning and protocol field feature extraction method |
Also Published As
Publication number | Publication date |
---|---|
CN104796407B (en) | 2018-03-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104796407A (en) | Method for extracting unknown protocol features | |
CN111817982A (en) | Encrypted flow identification method for category imbalance | |
CN112788066B (en) | Abnormal flow detection method and system for Internet of things equipment and storage medium | |
CN105678273A (en) | Initial point detection algorithm of transient signal in radio frequency fingerprint identification technology | |
WO2018160136A1 (en) | Method and apparatus for determining an identity of an unknown internet-of-things (iot) device in a communication network | |
CN104159232B (en) | Method of recognizing protocol format of binary message data | |
CN113821793B (en) | Multi-stage attack scene construction method and system based on graph convolution neural network | |
CN105024993A (en) | Protocol comparison method based on vector operation | |
CN105721242A (en) | Information entropy-based encrypted traffic identification method | |
CN113094707B (en) | Lateral movement attack detection method and system based on heterogeneous graph network | |
CN103944919A (en) | Wireless multi-step attack mode excavation method for WLAN | |
CN107682103B (en) | Double-feature spectrum sensing method based on maximum feature value and principal feature vector | |
CN112738014A (en) | Industrial control flow abnormity detection method and system based on convolution time sequence network | |
CN101442535A (en) | Method for recognizing and tracking application based on keyword sequence | |
CN106792883A (en) | Sensor network abnormal deviation data examination method and system | |
CN112491849B (en) | Power terminal vulnerability attack protection method based on flow characteristics | |
CN114143037A (en) | Malicious encrypted channel detection method based on process behavior analysis | |
CN105183780A (en) | Improved AGNES algorithm based protocol classification method | |
CN107689899A (en) | A kind of unknown protocol recognition methods and system based on bit stream | |
CN111314910B (en) | Wireless sensor network abnormal data detection method for mapping isolation forest | |
CN102467670B (en) | Immunity-based anomaly detection method | |
Zheng et al. | Preprocessing method for encrypted traffic based on semisupervised clustering | |
CN109145889B (en) | Bright and ciphertext signal classification detection method for blind estimation of wireless signals | |
CN111191720A (en) | Service scene identification method and device and electronic equipment | |
Soewu et al. | Analysis of Data Mining-Based Approach for Intrusion Detection System |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
EXSB | Decision made by sipo to initiate substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |