CN104796407B - A kind of extracting method of unknown protocol feature - Google Patents
A kind of extracting method of unknown protocol feature Download PDFInfo
- Publication number
- CN104796407B CN104796407B CN201510127979.3A CN201510127979A CN104796407B CN 104796407 B CN104796407 B CN 104796407B CN 201510127979 A CN201510127979 A CN 201510127979A CN 104796407 B CN104796407 B CN 104796407B
- Authority
- CN
- China
- Prior art keywords
- feature
- agreement
- byte
- frequent
- candidate set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/18—Protocol analysers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/02—Protocol performance
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The invention discloses a kind of extracting method of unknown protocol feature, the data frame of each agreement is randomly divided into two parts by it, and cutting is carried out by byte to each section, and counts number and the frequency that each byte occurs, and obtains frequent byte;Frequent byte is screened, obtains frequent byte corresponding to each agreement;The frequent byte continuously occurred corresponding to a kind of agreement is spliced, obtains feature length string i.e. frequently string, and screen and obtain the feature Candidate Set of each agreement;The data frame of the agreement is characterized as by vector according to feature Candidate Set;Feature selecting is carried out using correlative character selection CFS algorithms to feature Candidate Set, selected feature is recorded;Classified using KNN algorithms, and the accuracy rate and discrimination of statistical classification.The invention provides a kind of extracting method of unknown protocol feature, efficiently unknown protocol is identified by aid decision making person.
Description
Technical field
The present invention relates to a kind of extracting method of unknown protocol feature.
Background technology
Increasingly sophisticated with the development of network, ensure the safety of information network has turned into the core of national information strategy
Hold;Under specific network environment, the threat stolen secret information by any special measures is increasingly severe, and such approach of stealing secret information is typically logical
The mode for crossing radio communication sends classified information, and the agreement that this communication uses is unconventional special unknown protocol, and
The existing precautionary measures just for known protocol, use the methods of being matched based on port mapping or static nature, nothing mostly substantially
Method is to such the steal secret information monitoring of channel type and detection.
In order to ensure the safe operation of network and to attack with the early warning of dangerous act, there is an urgent need to current by policymaker
The feature of agreement to be identified is accurately found under complicated network environment, therefore we need to find a kind of feasible protocol characteristic
Extracting method, efficiently unknown protocol is identified aid decision making person.
The content of the invention
It is an object of the invention to overcome the deficiencies of the prior art and provide a kind of extracting method of unknown protocol feature, side
Help policymaker that efficiently unknown protocol is identified.
The purpose of the present invention is achieved through the following technical solutions:A kind of extracting method of unknown protocol feature, it
Comprise the following steps:
S1. the data frame of each agreement in data set is randomly divided into two parts, each section is cut by byte
Point, and number and the frequency that each byte occurs are counted respectively, obtain frequent byte;
S2. frequent byte is screened using Jaccard parameters, selects frequent word corresponding to each agreement
Section;
S3. the frequent byte continuously occurred corresponding to a kind of agreement is spliced, obtains feature length string and frequently go here and there, and
Filter out byte and feature length string of the quantity more than frame total byte quantity 50% occur, obtain two feature Candidate Sets of this agreement,
Its feature Candidate Set as this agreement that occurs simultaneously is taken, carrying out above-mentioned processing to frequent byte corresponding to each agreement respectively obtains
The feature Candidate Set of each agreement;
S4. the data frame of the agreement is characterized as by vector according to the feature Candidate Set for obtaining each agreement, makes each frame
Data become the vectorial of feature Candidate Set;
S5. feature is carried out using correlative character selection CFS algorithms to the feature Candidate Set of each resulting agreement
Selection, and the feature poised out is recorded;
S6. classified using KNN algorithms, the accuracy rate and discrimination of statistical classification, as commenting for feature selecting result
Valency index.
Described step S2 includes following sub-step:
S21. different Jaccard values are calculated by changing a kind of threshold value of agreement;
S22. when Jaccard values peak for the first time, the threshold value of the corresponding agreement is recorded;
S23. the threshold value according to corresponding to the agreement selects frequent byte corresponding to the agreement;
S24. aforesaid operations are carried out to each agreement respectively and obtains frequent byte corresponding to each agreement.
Described step S3 includes following sub-step:
S31. to each frame data in a kind of agreement, if the frequent byte screened continuously occurs, just
They are stitched together and picked out as the long string of feature;
S32. filter out wherein byte and feature length string of the quantity more than frame total byte quantity 50% occur, obtain this agreement
Two feature Candidate Sets;
S33. feature Candidate Set of the common factor of two feature Candidate Sets as this agreement is taken;
S34. above-mentioned processing is carried out to frequent byte corresponding to each agreement respectively, the feature for obtaining each agreement is waited
Selected works.
The definition of described Jaccard parameters is:
In formula, T1iAnd T2iThe ith feature in A and B is represented respectively.
Described correlative character selection CFS algorithms, formula are as follows:
Wherein Merits(k), one character subset S comprising k feature of an expression evaluation, its value are more big then selected
Character subset S it is more excellent;
It is the average value of the coefficient correlation between each feature and classification c;
It is the average value of the coefficient correlation between feature and feature.
WillWithBringing formula into can be further converted into:
The effect that CFS algorithms are classified by assessing each feature for each, so as to draw final character subset.
The beneficial effects of the invention are as follows:(1) extraction of frequent byte, frequent long string of splicing, frame data to vector are passed through
Conversion and feature selecting after classification, the feature of mark data frame can be obtained, and the feature obtained is by feature choosing
The quantity of feature is not only greatly reduced after selecting, and the classification accuracy of protocol frame is not much decline;(2) agreement is passed through
The extraction of feature, efficiently unknown protocol can be identified with aid decision making person.
Brief description of the drawings
Fig. 1 is the flow chart of the present invention.
Embodiment
Technical scheme is described in further detail below in conjunction with the accompanying drawings, but protection scope of the present invention is not limited to
It is as described below.
As shown in figure 1, a kind of extracting method of unknown protocol feature, it comprises the following steps:
S1. the data frame of each agreement in data set is randomly divided into two parts, each section is cut by byte
Point, and number and the frequency that each byte occurs are counted respectively, obtain frequent byte;
S2. frequent byte is screened using Jaccard parameters, selects frequent word corresponding to each agreement
Section;
S3. the frequent byte continuously occurred corresponding to a kind of agreement is spliced, obtains feature length string and frequently go here and there, and
Filter out byte and feature length string of the quantity more than frame total byte quantity 50% occur, obtain two feature Candidate Sets of this agreement,
Its feature Candidate Set as this agreement that occurs simultaneously is taken, carrying out above-mentioned processing to frequent byte corresponding to each agreement respectively obtains
The feature Candidate Set of each agreement;
S4. the data frame of the agreement is characterized as by vector according to the feature Candidate Set for obtaining each agreement, makes each frame
Data become the vectorial of feature Candidate Set;
S5. feature is carried out using correlative character selection CFS algorithms to the feature Candidate Set of each resulting agreement
Selection, and the feature poised out is recorded;
S6. classified using KNN algorithms, the accuracy rate and discrimination of statistical classification, as commenting for feature selecting result
Valency index.Described step S2 includes following sub-step:
S21. different Jaccard values are calculated by changing a kind of threshold value of agreement;
S22. when Jaccard values peak for the first time, the threshold value of the corresponding agreement is recorded;
S23. the threshold value according to corresponding to the agreement selects frequent byte corresponding to the agreement;
S24. aforesaid operations are carried out to each agreement respectively and obtains frequent byte corresponding to each agreement.
Described step S3 includes following sub-step:
S31. to each frame data in a kind of agreement, if the frequent byte screened continuously occurs, just
They are stitched together and picked out as the long string of feature;
S32. filter out wherein byte and feature length string of the quantity more than frame total byte quantity 50% occur, obtain this agreement
Two feature Candidate Sets;
S33. feature Candidate Set of the common factor of two feature Candidate Sets as this agreement is taken;
S34. above-mentioned processing is carried out to frequent byte corresponding to each agreement respectively, the feature for obtaining each agreement is waited
Selected works.The definition of described Jaccard parameters is:
In formula, T1iAnd T2iThe ith feature in A and B is represented respectively.
Described correlative character selection CFS algorithms, formula are as follows:
Wherein Merits(k), one character subset S comprising k feature of an expression evaluation, its value are more big then selected
Character subset S it is more excellent;
It is the average value of the coefficient correlation between each feature and classification c;
It is the average value of the coefficient correlation between feature and feature.
WillWithBringing formula into can be further converted into:
The effect that CFS algorithms are classified by assessing each feature for each, so as to draw final character subset.
Claims (2)
- A kind of 1. extracting method of unknown protocol feature, it is characterised in that:It comprises the following steps:S1. the data frame of each agreement in data set is randomly divided into two parts, cutting is carried out by byte to each section, and Number and the frequency that each byte occurs are counted respectively, obtain frequent byte;S2. frequent byte is screened using Jaccard parameters, selects frequent byte corresponding to each agreement;S3. the frequent byte continuously occurred corresponding to a kind of agreement is spliced, obtains feature length string i.e. frequently string, and screening Go out byte and feature length string of the quantity more than frame total byte quantity 50% occur, obtain two feature Candidate Sets of this agreement, take it The feature Candidate Set to occur simultaneously as this agreement, the above-mentioned processing of frequent byte progress corresponding to each agreement is obtained respectively each The feature Candidate Set of kind agreement;S4. the data frame of the agreement is characterized as by vector according to the feature Candidate Set for obtaining each agreement, makes each frame data Become the vectorial of feature Candidate Set;S5. feature selecting is carried out using correlative character selection CFS algorithms to the feature Candidate Set of each resulting agreement, And the feature selected is recorded;Described correlative character selection CFS algorithms, formula are as follows:Wherein Merits(k), represent that one of a character subset S comprising k feature evaluates, the more big then selected spy of its value It is more excellent to levy subset S;It is the average value of the coefficient correlation between each feature and classification c;It is the average value of the coefficient correlation between feature and feature;WillWithBringing formula into can be further converted into:The effect that CFS algorithms are classified by assessing each feature for each, so as to draw final character subset;S6. classified using KNN algorithms;Described step S2 includes following sub-step:S21. different Jaccard values are calculated by changing a kind of threshold value of agreement;S22. when Jaccard values peak for the first time, the threshold value of the corresponding agreement is recorded;S23. the threshold value according to corresponding to the agreement selects frequent byte corresponding to the agreement;S24. aforesaid operations are carried out to each agreement respectively and obtains frequent byte corresponding to each agreement;Described step S3 includes following sub-step:S31. to each frame data in a kind of agreement, if the frequent byte screened continuously occurs, just them It is stitched together and is picked out as the long string of feature;S32. filter out wherein byte and feature length string of the quantity more than frame total byte quantity 50% occur, obtain two of this agreement Feature Candidate Set;S33. feature Candidate Set of the common factor of two feature Candidate Sets as this agreement is taken;S34. above-mentioned processing is carried out to frequent byte corresponding to each agreement respectively, obtains the feature candidate of each agreement Collection.
- A kind of 2. extracting method of unknown protocol feature according to claim 1, it is characterised in that:Described Jaccard The definition of parameter is:In formula, T1iAnd T2iThe ith feature in A and B is represented respectively.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510127979.3A CN104796407B (en) | 2015-03-23 | 2015-03-23 | A kind of extracting method of unknown protocol feature |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510127979.3A CN104796407B (en) | 2015-03-23 | 2015-03-23 | A kind of extracting method of unknown protocol feature |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104796407A CN104796407A (en) | 2015-07-22 |
CN104796407B true CN104796407B (en) | 2018-03-30 |
Family
ID=53560919
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510127979.3A Active CN104796407B (en) | 2015-03-23 | 2015-03-23 | A kind of extracting method of unknown protocol feature |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104796407B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105827603A (en) * | 2016-03-14 | 2016-08-03 | 中国人民解放军信息工程大学 | Inexplicit protocol feature library establishment method and device and inexplicit message classification method and device |
CN108632252B (en) * | 2018-04-03 | 2021-02-02 | 中国人民解放军战略支援部队信息工程大学 | Private network protocol iteration reverse analysis method, device and server |
CN110061976B (en) * | 2019-03-29 | 2021-06-11 | 中国空间技术研究院 | Unknown protocol frame sequence extraction method and system based on data mining |
CN110457465B (en) * | 2019-06-21 | 2022-04-26 | 武汉大学 | Classification method for unknown bit stream protocol |
CN111274235B (en) * | 2020-01-16 | 2022-11-04 | 电子科技大学 | Unknown protocol data cleaning and protocol field feature extraction method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103414722A (en) * | 2013-08-19 | 2013-11-27 | 中国科学院空间科学与应用研究中心 | Space link protocol blind identification method and system |
CN103955539A (en) * | 2014-05-19 | 2014-07-30 | 中国人民解放军信息工程大学 | Method and device for obtaining control field demarcation point in binary protocol data |
CN104159232A (en) * | 2014-09-01 | 2014-11-19 | 电子科技大学 | Method of recognizing protocol format of binary message data |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8943081B2 (en) * | 2008-11-17 | 2015-01-27 | At&T Intellectual Property I, L.P. | User-powered recommendation system |
-
2015
- 2015-03-23 CN CN201510127979.3A patent/CN104796407B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103414722A (en) * | 2013-08-19 | 2013-11-27 | 中国科学院空间科学与应用研究中心 | Space link protocol blind identification method and system |
CN103955539A (en) * | 2014-05-19 | 2014-07-30 | 中国人民解放军信息工程大学 | Method and device for obtaining control field demarcation point in binary protocol data |
CN104159232A (en) * | 2014-09-01 | 2014-11-19 | 电子科技大学 | Method of recognizing protocol format of binary message data |
Also Published As
Publication number | Publication date |
---|---|
CN104796407A (en) | 2015-07-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104796407B (en) | A kind of extracting method of unknown protocol feature | |
CN111817982B (en) | Encrypted flow identification method for category imbalance | |
US9407649B2 (en) | Log analysis device and method | |
CN105208037B (en) | A kind of DoS/DDoS attack detectings and filter method based on lightweight intrusion detection | |
CN105577679A (en) | Method for detecting anomaly traffic based on feature selection and density peak clustering | |
CN113645232B (en) | Intelligent flow monitoring method, system and storage medium for industrial Internet | |
CN110611640A (en) | DNS protocol hidden channel detection method based on random forest | |
CN112788066A (en) | Abnormal flow detection method and system for Internet of things equipment and storage medium | |
CN113420802B (en) | Alarm data fusion method based on improved spectral clustering | |
JP6174520B2 (en) | Malignant communication pattern detection device, malignant communication pattern detection method, and malignant communication pattern detection program | |
CN113821793B (en) | Multi-stage attack scene construction method and system based on graph convolution neural network | |
TWI717831B (en) | Attack path detection method, attack path detection system and non-transitory computer-readable medium | |
CN105183780B (en) | Based on the protocol classification method for improving AGNES algorithms | |
CN112738014A (en) | Industrial control flow abnormity detection method and system based on convolution time sequence network | |
CN110768946A (en) | Industrial control network intrusion detection system and method based on bloom filter | |
CN105871861B (en) | A kind of intrusion detection method of self study protocol rule | |
CN107689899A (en) | A kind of unknown protocol recognition methods and system based on bit stream | |
CN105307185B (en) | A kind of gunz cooperation spectrum sensing method based on data purification | |
CN101335752B (en) | Network intrusion detection method based on frequent fragment rule | |
CN112235242A (en) | C & C channel detection method and system | |
CN105592087A (en) | DNP abnormity detection method based on vector machine learning | |
CN109376531B (en) | Web intrusion detection method based on semantic recoding and feature space separation | |
CN107124410A (en) | Network safety situation feature clustering method based on machine deep learning | |
CN111371727A (en) | Detection method for NTP protocol covert communication | |
CN105516164A (en) | P2P botnet detection method based on fractal and self-adaptation fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
EXSB | Decision made by sipo to initiate substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |