CN104796407B - A kind of extracting method of unknown protocol feature - Google Patents

A kind of extracting method of unknown protocol feature Download PDF

Info

Publication number
CN104796407B
CN104796407B CN201510127979.3A CN201510127979A CN104796407B CN 104796407 B CN104796407 B CN 104796407B CN 201510127979 A CN201510127979 A CN 201510127979A CN 104796407 B CN104796407 B CN 104796407B
Authority
CN
China
Prior art keywords
feature
agreement
byte
frequent
candidate set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510127979.3A
Other languages
Chinese (zh)
Other versions
CN104796407A (en
Inventor
张凤荔
周洪川
张春瑞
王勇
张俊娇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201510127979.3A priority Critical patent/CN104796407B/en
Publication of CN104796407A publication Critical patent/CN104796407A/en
Application granted granted Critical
Publication of CN104796407B publication Critical patent/CN104796407B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/18Protocol analysers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/02Protocol performance

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention discloses a kind of extracting method of unknown protocol feature, the data frame of each agreement is randomly divided into two parts by it, and cutting is carried out by byte to each section, and counts number and the frequency that each byte occurs, and obtains frequent byte;Frequent byte is screened, obtains frequent byte corresponding to each agreement;The frequent byte continuously occurred corresponding to a kind of agreement is spliced, obtains feature length string i.e. frequently string, and screen and obtain the feature Candidate Set of each agreement;The data frame of the agreement is characterized as by vector according to feature Candidate Set;Feature selecting is carried out using correlative character selection CFS algorithms to feature Candidate Set, selected feature is recorded;Classified using KNN algorithms, and the accuracy rate and discrimination of statistical classification.The invention provides a kind of extracting method of unknown protocol feature, efficiently unknown protocol is identified by aid decision making person.

Description

A kind of extracting method of unknown protocol feature
Technical field
The present invention relates to a kind of extracting method of unknown protocol feature.
Background technology
Increasingly sophisticated with the development of network, ensure the safety of information network has turned into the core of national information strategy Hold;Under specific network environment, the threat stolen secret information by any special measures is increasingly severe, and such approach of stealing secret information is typically logical The mode for crossing radio communication sends classified information, and the agreement that this communication uses is unconventional special unknown protocol, and The existing precautionary measures just for known protocol, use the methods of being matched based on port mapping or static nature, nothing mostly substantially Method is to such the steal secret information monitoring of channel type and detection.
In order to ensure the safe operation of network and to attack with the early warning of dangerous act, there is an urgent need to current by policymaker The feature of agreement to be identified is accurately found under complicated network environment, therefore we need to find a kind of feasible protocol characteristic Extracting method, efficiently unknown protocol is identified aid decision making person.
The content of the invention
It is an object of the invention to overcome the deficiencies of the prior art and provide a kind of extracting method of unknown protocol feature, side Help policymaker that efficiently unknown protocol is identified.
The purpose of the present invention is achieved through the following technical solutions:A kind of extracting method of unknown protocol feature, it Comprise the following steps:
S1. the data frame of each agreement in data set is randomly divided into two parts, each section is cut by byte Point, and number and the frequency that each byte occurs are counted respectively, obtain frequent byte;
S2. frequent byte is screened using Jaccard parameters, selects frequent word corresponding to each agreement Section;
S3. the frequent byte continuously occurred corresponding to a kind of agreement is spliced, obtains feature length string and frequently go here and there, and Filter out byte and feature length string of the quantity more than frame total byte quantity 50% occur, obtain two feature Candidate Sets of this agreement, Its feature Candidate Set as this agreement that occurs simultaneously is taken, carrying out above-mentioned processing to frequent byte corresponding to each agreement respectively obtains The feature Candidate Set of each agreement;
S4. the data frame of the agreement is characterized as by vector according to the feature Candidate Set for obtaining each agreement, makes each frame Data become the vectorial of feature Candidate Set;
S5. feature is carried out using correlative character selection CFS algorithms to the feature Candidate Set of each resulting agreement Selection, and the feature poised out is recorded;
S6. classified using KNN algorithms, the accuracy rate and discrimination of statistical classification, as commenting for feature selecting result Valency index.
Described step S2 includes following sub-step:
S21. different Jaccard values are calculated by changing a kind of threshold value of agreement;
S22. when Jaccard values peak for the first time, the threshold value of the corresponding agreement is recorded;
S23. the threshold value according to corresponding to the agreement selects frequent byte corresponding to the agreement;
S24. aforesaid operations are carried out to each agreement respectively and obtains frequent byte corresponding to each agreement.
Described step S3 includes following sub-step:
S31. to each frame data in a kind of agreement, if the frequent byte screened continuously occurs, just They are stitched together and picked out as the long string of feature;
S32. filter out wherein byte and feature length string of the quantity more than frame total byte quantity 50% occur, obtain this agreement Two feature Candidate Sets;
S33. feature Candidate Set of the common factor of two feature Candidate Sets as this agreement is taken;
S34. above-mentioned processing is carried out to frequent byte corresponding to each agreement respectively, the feature for obtaining each agreement is waited Selected works.
The definition of described Jaccard parameters is:
In formula, T1iAnd T2iThe ith feature in A and B is represented respectively.
Described correlative character selection CFS algorithms, formula are as follows:
Wherein Merits(k), one character subset S comprising k feature of an expression evaluation, its value are more big then selected Character subset S it is more excellent;
It is the average value of the coefficient correlation between each feature and classification c;
It is the average value of the coefficient correlation between feature and feature.
WillWithBringing formula into can be further converted into:
The effect that CFS algorithms are classified by assessing each feature for each, so as to draw final character subset.
The beneficial effects of the invention are as follows:(1) extraction of frequent byte, frequent long string of splicing, frame data to vector are passed through Conversion and feature selecting after classification, the feature of mark data frame can be obtained, and the feature obtained is by feature choosing The quantity of feature is not only greatly reduced after selecting, and the classification accuracy of protocol frame is not much decline;(2) agreement is passed through The extraction of feature, efficiently unknown protocol can be identified with aid decision making person.
Brief description of the drawings
Fig. 1 is the flow chart of the present invention.
Embodiment
Technical scheme is described in further detail below in conjunction with the accompanying drawings, but protection scope of the present invention is not limited to It is as described below.
As shown in figure 1, a kind of extracting method of unknown protocol feature, it comprises the following steps:
S1. the data frame of each agreement in data set is randomly divided into two parts, each section is cut by byte Point, and number and the frequency that each byte occurs are counted respectively, obtain frequent byte;
S2. frequent byte is screened using Jaccard parameters, selects frequent word corresponding to each agreement Section;
S3. the frequent byte continuously occurred corresponding to a kind of agreement is spliced, obtains feature length string and frequently go here and there, and Filter out byte and feature length string of the quantity more than frame total byte quantity 50% occur, obtain two feature Candidate Sets of this agreement, Its feature Candidate Set as this agreement that occurs simultaneously is taken, carrying out above-mentioned processing to frequent byte corresponding to each agreement respectively obtains The feature Candidate Set of each agreement;
S4. the data frame of the agreement is characterized as by vector according to the feature Candidate Set for obtaining each agreement, makes each frame Data become the vectorial of feature Candidate Set;
S5. feature is carried out using correlative character selection CFS algorithms to the feature Candidate Set of each resulting agreement Selection, and the feature poised out is recorded;
S6. classified using KNN algorithms, the accuracy rate and discrimination of statistical classification, as commenting for feature selecting result Valency index.Described step S2 includes following sub-step:
S21. different Jaccard values are calculated by changing a kind of threshold value of agreement;
S22. when Jaccard values peak for the first time, the threshold value of the corresponding agreement is recorded;
S23. the threshold value according to corresponding to the agreement selects frequent byte corresponding to the agreement;
S24. aforesaid operations are carried out to each agreement respectively and obtains frequent byte corresponding to each agreement.
Described step S3 includes following sub-step:
S31. to each frame data in a kind of agreement, if the frequent byte screened continuously occurs, just They are stitched together and picked out as the long string of feature;
S32. filter out wherein byte and feature length string of the quantity more than frame total byte quantity 50% occur, obtain this agreement Two feature Candidate Sets;
S33. feature Candidate Set of the common factor of two feature Candidate Sets as this agreement is taken;
S34. above-mentioned processing is carried out to frequent byte corresponding to each agreement respectively, the feature for obtaining each agreement is waited Selected works.The definition of described Jaccard parameters is:
In formula, T1iAnd T2iThe ith feature in A and B is represented respectively.
Described correlative character selection CFS algorithms, formula are as follows:
Wherein Merits(k), one character subset S comprising k feature of an expression evaluation, its value are more big then selected Character subset S it is more excellent;
It is the average value of the coefficient correlation between each feature and classification c;
It is the average value of the coefficient correlation between feature and feature.
WillWithBringing formula into can be further converted into:
The effect that CFS algorithms are classified by assessing each feature for each, so as to draw final character subset.

Claims (2)

  1. A kind of 1. extracting method of unknown protocol feature, it is characterised in that:It comprises the following steps:
    S1. the data frame of each agreement in data set is randomly divided into two parts, cutting is carried out by byte to each section, and Number and the frequency that each byte occurs are counted respectively, obtain frequent byte;
    S2. frequent byte is screened using Jaccard parameters, selects frequent byte corresponding to each agreement;
    S3. the frequent byte continuously occurred corresponding to a kind of agreement is spliced, obtains feature length string i.e. frequently string, and screening Go out byte and feature length string of the quantity more than frame total byte quantity 50% occur, obtain two feature Candidate Sets of this agreement, take it The feature Candidate Set to occur simultaneously as this agreement, the above-mentioned processing of frequent byte progress corresponding to each agreement is obtained respectively each The feature Candidate Set of kind agreement;
    S4. the data frame of the agreement is characterized as by vector according to the feature Candidate Set for obtaining each agreement, makes each frame data Become the vectorial of feature Candidate Set;
    S5. feature selecting is carried out using correlative character selection CFS algorithms to the feature Candidate Set of each resulting agreement, And the feature selected is recorded;Described correlative character selection CFS algorithms, formula are as follows:
    Wherein Merits(k), represent that one of a character subset S comprising k feature evaluates, the more big then selected spy of its value It is more excellent to levy subset S;
    It is the average value of the coefficient correlation between each feature and classification c;
    It is the average value of the coefficient correlation between feature and feature;
    WillWithBringing formula into can be further converted into:
    The effect that CFS algorithms are classified by assessing each feature for each, so as to draw final character subset;
    S6. classified using KNN algorithms;
    Described step S2 includes following sub-step:
    S21. different Jaccard values are calculated by changing a kind of threshold value of agreement;
    S22. when Jaccard values peak for the first time, the threshold value of the corresponding agreement is recorded;
    S23. the threshold value according to corresponding to the agreement selects frequent byte corresponding to the agreement;
    S24. aforesaid operations are carried out to each agreement respectively and obtains frequent byte corresponding to each agreement;
    Described step S3 includes following sub-step:
    S31. to each frame data in a kind of agreement, if the frequent byte screened continuously occurs, just them It is stitched together and is picked out as the long string of feature;
    S32. filter out wherein byte and feature length string of the quantity more than frame total byte quantity 50% occur, obtain two of this agreement Feature Candidate Set;
    S33. feature Candidate Set of the common factor of two feature Candidate Sets as this agreement is taken;
    S34. above-mentioned processing is carried out to frequent byte corresponding to each agreement respectively, obtains the feature candidate of each agreement Collection.
  2. A kind of 2. extracting method of unknown protocol feature according to claim 1, it is characterised in that:Described Jaccard The definition of parameter is:
    In formula, T1iAnd T2iThe ith feature in A and B is represented respectively.
CN201510127979.3A 2015-03-23 2015-03-23 A kind of extracting method of unknown protocol feature Active CN104796407B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510127979.3A CN104796407B (en) 2015-03-23 2015-03-23 A kind of extracting method of unknown protocol feature

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510127979.3A CN104796407B (en) 2015-03-23 2015-03-23 A kind of extracting method of unknown protocol feature

Publications (2)

Publication Number Publication Date
CN104796407A CN104796407A (en) 2015-07-22
CN104796407B true CN104796407B (en) 2018-03-30

Family

ID=53560919

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510127979.3A Active CN104796407B (en) 2015-03-23 2015-03-23 A kind of extracting method of unknown protocol feature

Country Status (1)

Country Link
CN (1) CN104796407B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105827603A (en) * 2016-03-14 2016-08-03 中国人民解放军信息工程大学 Inexplicit protocol feature library establishment method and device and inexplicit message classification method and device
CN108632252B (en) * 2018-04-03 2021-02-02 中国人民解放军战略支援部队信息工程大学 Private network protocol iteration reverse analysis method, device and server
CN110061976B (en) * 2019-03-29 2021-06-11 中国空间技术研究院 Unknown protocol frame sequence extraction method and system based on data mining
CN110457465B (en) * 2019-06-21 2022-04-26 武汉大学 Classification method for unknown bit stream protocol
CN111274235B (en) * 2020-01-16 2022-11-04 电子科技大学 Unknown protocol data cleaning and protocol field feature extraction method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103414722A (en) * 2013-08-19 2013-11-27 中国科学院空间科学与应用研究中心 Space link protocol blind identification method and system
CN103955539A (en) * 2014-05-19 2014-07-30 中国人民解放军信息工程大学 Method and device for obtaining control field demarcation point in binary protocol data
CN104159232A (en) * 2014-09-01 2014-11-19 电子科技大学 Method of recognizing protocol format of binary message data

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8943081B2 (en) * 2008-11-17 2015-01-27 At&T Intellectual Property I, L.P. User-powered recommendation system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103414722A (en) * 2013-08-19 2013-11-27 中国科学院空间科学与应用研究中心 Space link protocol blind identification method and system
CN103955539A (en) * 2014-05-19 2014-07-30 中国人民解放军信息工程大学 Method and device for obtaining control field demarcation point in binary protocol data
CN104159232A (en) * 2014-09-01 2014-11-19 电子科技大学 Method of recognizing protocol format of binary message data

Also Published As

Publication number Publication date
CN104796407A (en) 2015-07-22

Similar Documents

Publication Publication Date Title
CN104796407B (en) A kind of extracting method of unknown protocol feature
CN111817982B (en) Encrypted flow identification method for category imbalance
US9407649B2 (en) Log analysis device and method
CN105208037B (en) A kind of DoS/DDoS attack detectings and filter method based on lightweight intrusion detection
CN105577679A (en) Method for detecting anomaly traffic based on feature selection and density peak clustering
CN113645232B (en) Intelligent flow monitoring method, system and storage medium for industrial Internet
CN110611640A (en) DNS protocol hidden channel detection method based on random forest
CN112788066A (en) Abnormal flow detection method and system for Internet of things equipment and storage medium
CN113420802B (en) Alarm data fusion method based on improved spectral clustering
JP6174520B2 (en) Malignant communication pattern detection device, malignant communication pattern detection method, and malignant communication pattern detection program
CN113821793B (en) Multi-stage attack scene construction method and system based on graph convolution neural network
TWI717831B (en) Attack path detection method, attack path detection system and non-transitory computer-readable medium
CN105183780B (en) Based on the protocol classification method for improving AGNES algorithms
CN112738014A (en) Industrial control flow abnormity detection method and system based on convolution time sequence network
CN110768946A (en) Industrial control network intrusion detection system and method based on bloom filter
CN105871861B (en) A kind of intrusion detection method of self study protocol rule
CN107689899A (en) A kind of unknown protocol recognition methods and system based on bit stream
CN105307185B (en) A kind of gunz cooperation spectrum sensing method based on data purification
CN101335752B (en) Network intrusion detection method based on frequent fragment rule
CN112235242A (en) C & C channel detection method and system
CN105592087A (en) DNP abnormity detection method based on vector machine learning
CN109376531B (en) Web intrusion detection method based on semantic recoding and feature space separation
CN107124410A (en) Network safety situation feature clustering method based on machine deep learning
CN111371727A (en) Detection method for NTP protocol covert communication
CN105516164A (en) P2P botnet detection method based on fractal and self-adaptation fusion

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant