CN104468262A - Network protocol recognition method and system based on semantic sensitivity - Google Patents

Network protocol recognition method and system based on semantic sensitivity Download PDF

Info

Publication number
CN104468262A
CN104468262A CN201410652834.0A CN201410652834A CN104468262A CN 104468262 A CN104468262 A CN 104468262A CN 201410652834 A CN201410652834 A CN 201410652834A CN 104468262 A CN104468262 A CN 104468262A
Authority
CN
China
Prior art keywords
protocol
data message
gram
network
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410652834.0A
Other languages
Chinese (zh)
Other versions
CN104468262B (en
Inventor
云晓春
张永铮
王一鹏
周宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201410652834.0A priority Critical patent/CN104468262B/en
Publication of CN104468262A publication Critical patent/CN104468262A/en
Application granted granted Critical
Publication of CN104468262B publication Critical patent/CN104468262B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention relates to a network protocol recognition method and system based on semantic sensitivity. In the modeling stage, a network data message set of specific application protocols serves as input, and a keyword model of the analyzed protocols is established through a Latent Dirichlet Allocation method; in the training stage, classification feature information of data messages is extracted according to the keyword model of the protocols, acquired keyword feature vectors serve as input, an offline training data set is learned and trained through a supervision machine learning method, and a classification model of the analyzed protocols is acquired; in the classification stage, classification feature information of the data messages is extracted according to the keyword model of the protocols, protocol attributes of the network data messages to be measured are judged through a protocol classification model, and whether the network data messages to be measured belong to network data messages of target protocols or not is judged. By the adoption of the method and system, potential protocol semantic information in the network messages can be fully excavated, and the network protocols can be effectively recognized.

Description

A kind of based on semantic responsive network protocol identification method and system
Technical field
The invention belongs to networking technology area, be specifically related to a kind of based on semantic responsive network protocol identification method and system.
Background technology
Protocol identification technology, is by the network data flow process corresponding with embody rule agreement, has very important application, such as network measure, intrusion detection and strick precaution and Botnet behavioral value etc. in many networks and security fields.In intrusion detection and crime prevention system, example is applied as with it, each data pack load part is considered as a series of byte sequence by traditional intrusion detection and crime prevention system, and the signature (normally being represented by one group of regular expression) of the sequence information of these bytes and rogue program (malware) is carried out matching operation.The signature checking mechanism of this coarseness ignores the application protocol structure of data pack load part due to it thus is greatly limited in reliability.
The more semantic sensitivity that modern intrusion detection and crime prevention system are just developing.Specifically, its information format according to application protocol obtains the field information of institute's analytical applications agreement thus realizes reasonable parsing to network packet.The application protocol analytical tool of several types, as FlowSifter, UltraPAC, binpac, and GAPA, propose in research work before this.All these application protocol analytical tools all need the protocol specification information of analyzed agreement, thus produce the analytical tool corresponding to this agreement.But many application protocols are proprietary protocols in the Internet, and these agreements do not disclose available agreement fingerprint standardize information.Find flowing quantitative statistics in backbone network according to Internet2NetFlow tissue: the network data flow more than 40% belongs to Unidentified application protocol.By rogue program (malware) and Botnet (botnet) the network communication protocol that uses do not come from the protocol specification information of its designer.In order to resolve unknown applications protocol network data flow, first needing to carry out agreement deduction thus obtaining agreement finger print information.Network monitoring instrument, such as Wireshark, NetDude, SNORT and BRO etc. also need application protocol analytical tool to realize their function.
Network protocol identification method can be divided into based on transport layer port, based on data message load and data stream statistics behavioural characteristic three kind Network Based according to the difference of its research object.The present invention is using the load of network data message as basic research object.The existing method of this area roughly can be divided into classification in two: (1) is based on the method for protocol analysis; (2) based on the method for agreement signature.The invention belongs to Equations of The Second Kind, based on the method for agreement signature.Method based on agreement signature only depends on the analysis to data message load in analytic process, does not rely on the executable code of application program.Before this, the research work automatically built about agreement signature does not use the potential applications information be present in data message, the incidence relation namely in data message before syntactic element.It should be noted that this type of research work can not reach to realize using less characteristic of division and the goal in research reaching higher accuracy.Meanwhile, compared to research work before this, the present invention itself makes less hypotheses to analyzed procotol.
Summary of the invention
The object of the invention is to design and Implement a kind of network protocol identification method based on semantic sensitivity and system, make in procotol identifying, fully excavate protocol semantics information potential in internet message message; Under the prerequisite ensureing higher recognition accuracy and recall rate, in practice there is stronger universality and robustness simultaneously.
Invention motivation of the present invention derives from the diversified unknown network flow continuing to rise, and the protocol recognition method of the novelty of design and system for basic premise, realize the comprehensive automation of application-specific protocol identification process with minimum manpower demand.
Specifically, the present invention adopts following technical scheme:
Based on a semantic responsive network protocol identification method, comprise modelling phase, training stage and sorting phase;
In the modelling phase, using the network data message set of application-specific agreement as input, utilize Latent DirichletAllocation method to build analyze the keyword model of agreement;
In the training stage, the characteristic of division information of the protocol keyword model extraction data message obtained according to the modelling phase, using the keyword feature vector obtained as input, utilize Supervised machine learning method to Offline training data collection learning training, thus obtain analyze the disaggregated model of agreement;
At sorting phase, the characteristic of division information of the protocol keyword model extraction data message obtained according to the modelling phase, the protocol classification model utilizing the training stage to export, makes differentiation to the protocol attribute of network under test data message, judges whether it belongs to the network data message of target protocol.
Further, the concrete steps of described modelling phase comprise:
1) gather the network data message belonging to application-specific agreement, thus network data message is divided into two kinds: a class is the data message set belonging to the application protocol that will analyze; Another kind of is the datagram collected works not belonging to wanted analytical applications agreement;
2) n-gram model is utilized network data message to be converted into network data message using n-gram element as elementary cell; Described n-gram model is the subsequence of n continuous element of given sequence;
3) the protocol keyword model will analyzing agreement based on Latent Dirichlet Allocation method structure is utilized.
Further, the concrete steps of the described Latent of utilization Dirichlet Allocation method structure protocol keyword model comprise:
1) for including all n-gram in the set D of M data message distribute a random key word index number here w (m, i)in representative data message m, i-th n-gram, z (m, i)the key word index number of this n-gram, N mit is the number of n-gram element in data message m;
2) use representative is except z (m, i)the key word index number of every other n-gram in addition, when numerical value remains unchanged, according to Posterior probability distribution for n-gramw (m, i)a new key word index numerical value z is produced by the method for sampling (m, i); Wherein α and β is given hyper parameter, represent element t in n-gram dictionary and distribute to the number of times of keyword k, represent the number of times that in message packet m, keyword k occurs, W represents the number of n-gram element in n-gram dictionary;
3) according to the z that the Gibbs method of sampling obtains (m, i)numerical value, upgrades the expired numerical value in Posterior probability distribution;
4) all repeat above-mentioned sampling operation to all first ancestrals (m, i) in data acquisition system, if reach Gibbs sampling condition of convergence L, then algorithm stops, and returns final key word index number otherwise repeat step 1) to 3);
5) utilize by step 1) to 4) key word index that obtains number builds protocol keyword model
Wherein K represents the number of protocol keyword,
Adopt the procotol recognition system based on semantic sensitivity of said method, it comprises:
Modeling unit, using the network data message set of application-specific agreement as input, utilize Latent Dirichlet Allocation model construction analyze the keyword model of agreement;
Training unit, the characteristic of division information of the protocol keyword model extraction data message obtained according to modeling unit, using the keyword feature vector obtained as input, utilize Supervised machine learning method to Offline training data collection learning training, thus obtain analyze the disaggregated model of agreement;
Taxon, the characteristic of division information of the protocol keyword model extraction data message obtained according to modeling unit, the protocol classification model utilizing training unit to export, makes differentiation to the protocol attribute of network under test data message, judges whether it belongs to the network data message of target protocol.
Key technology point of the present invention is:
1) the potential applications information existed in protocol message message is taken full advantage of.The present invention can distinguish the different implications in different messages represented by identical n-grams element.These different messages may have different semantemes, therefore should be classified as different protocol keywords.It should be noted that the protocol information form estimating method of data flow Network Based before this can not process problem recited above preferably.Because method relies on the frequency that statistics character string occurs mostly before, thus have ignored the context environmental of each character string appearance.
2) in addition, the present invention can find the relevance between different n-grams.In protocol message message, multiple n-grams can form an element in protocol information form together.Such as, in a SMTP message packet, 3-grams, " 250 " and " OK " characterize the protocol elements that can be used for confirming mail session jointly.Utilize protocol keyword identification, inter-related n-grams can be aggregated to together by the present invention, and then forms a protocol keyword.
Utilize method of the present invention can carry out effective protocol identification to multiple network agreement, compared with published correlation technique, tool has the following advantages:
1. the method can solve connection oriented protocol (as TCP) and the application protocol identification problem towards connectionless protocol (as UDP);
2. the method is the load statistical information based on data message, and it does not suppose any priori of protocol specification.Therefore, be applicable to text, the identification of binary system and encryption quasi-protocol;
3., as a kind of procotol identification solution based on message, the method does not need IP datagram literary composition to be assembled into application layer messages.Therefore, its be suitable for be applicable to packet-by-packet simultaneously and by stream protocol classification scheme.
4. the method is to greatly enhancing stream (as SMTP) and little short stream (as FTP) is all applicable in real network environment.
Accompanying drawing explanation
Fig. 1 is based on semantic responsive network protocol identification method modelling phase flow chart.
Fig. 2 is the training stage flow chart based on semantic responsive network protocol identification method.
Fig. 3 is the sorting phase flow chart based on semantic responsive network protocol identification method.
Fig. 4 is the protocol keyword model construction flow chart based on Latent Dirichlet Allocation method.
Fig. 5 is the Organization Chart based on semantic responsive procotol recognition system.
Embodiment
For enabling above-mentioned purpose of the present invention, feature and advantage become apparent more, and below by specific embodiments and the drawings, the present invention will be further described.
Of the present invention based on semantic responsive network protocol identification method, taking network data flow as input, automatically from mixing network traffics, the network data flow of analyzed agreement accurately being identified.The payload segment of the method analyzing IP data message, does not need to carry out conversed analysis to the executable code of program, does not rely on the priori in protocol specification yet.Meanwhile, the method can solve connection oriented protocol (as TCP) and the identification problem towards connectionless protocol (as UDP).The method is made up of three Main Stage: modelling phase, training stage and sorting phase.
Modelling phase is produced by data acquisition, data message n-gram, keyword model builds (keyword recognition) three module compositions.Its flow chart as shown in Figure 1, is described as follows:
1. data acquisition: the effect of data acquisition module gathers the network data message belonging to application-specific agreement.Thus network data message is divided into two kinds: a class is the data message set belonging to the application protocol that will analyze; Another kind of is the data message set not belonging to wanted analytical applications agreement.
2. data message n-gram produces: packet n-gram generation operation utilizes n-gram model network data message to be converted into network data message using n-gram element as elementary cell.N-gram model of the present invention is the subsequence of given sequence (being at least the sequence of n element) n continuous element.The set of given network data message, byte-sized is the network data packet sequence b of m by n-gram model 1, b 2..., b mbe decomposed into n-grams (n≤m) sequence: b 1, b 2..., b n, b 2, b 3..., b n+1..., b m-n+1, b m-n+2..., b m.In practice process, usually only select front W the n-gram element that statistic frequency is higher, and form its n-gram dictionary.
3. keyword recognition: keyword recognition module utilizes and builds based on Latent Dirichlet Allocation (LDA) method the protocol keyword model will analyzing agreement.The Output rusults of modelling phase is the protocol keyword model of analyzed procotol.
Training stage learns four module compositions by data acquisition, data message n-gram generation, feature extraction, grader.Its flow chart as shown in Figure 2, is described as follows:
1. data acquisition: operate with modelling phase step 1.
2. data message n-gram produces: operate with modelling phase step 2.
3. feature extraction: the protocol keyword model obtained according to modelling phase step 3 network data message, carries out characteristic of division extraction.The probability that in this step calculated data message, different keyword occurs, and according to this thus form the K dimensional characteristics vector of this data message.
4. grader study: utilize supervised learning method, the characteristic of division obtained according to step 3 message characteristic extraction module, builds the two-value grader of institute's analytical applications agreement.
Sorting phase is by data message n-gram generation, feature extraction, grader three module compositions.Its flow chart as shown in Figure 3, is described as follows:
1. data message n-gram produces: with training stage step 2.
2. feature extraction: with training stage step 3.
3. grader: the disaggregated model that the characteristic of division vector sum training stage step 4 obtained according to step 1, step 2 obtains carries out protocol class judgement to the network data message do not identified.Output rusults is two classes: a class is the network data message belonging to target protocol, and another kind of is the network data message of non-target protocols.
And the innovative point of whole method is the building process of protocol keyword model, it can be divided into following step, and Fig. 4 gives the flow chart built based on the protocol keyword identification of Latent Dirichlet Allocation (LDA) method and keyword model.
The input of protocol keyword model construction process is the message packet set belonging to certain application-specific agreement.L is the termination condition of protocol keyword model construction process.The Output rusults of protocol keyword model construction process is the protocol keyword model of analyzed procotol.This method builds protocol keyword model based on Latent Dirichlet Allocation (LDA) method, and its concrete implementation step is as follows:
1. first, for including all n-gram in the set D of M data message distribute a random key word index number here w (m, i)in representative data message m, i-th n-gram, z (m, i)the key word index number of this n-gram, N mit is the number of n-gram element in data message m;
2. following, use representative is except z (m, i)the key word index number of every other n-gram in addition.? when numerical value remains unchanged, according to Posterior probability distribution for n-gram w (m, i)a new key word index numerical value z is produced by the method for sampling (m, i).Wherein α and β is given hyper parameter, represent element t in n-gram dictionary and distribute to the number of times of keyword k, represent the number of times that in message packet m, keyword k occurs.W represents the number of n-gram element in n-gram dictionary.
3. according to the z that the Gibbs method of sampling obtains (m, i)numerical value, upgrades the expired numerical value in Posterior probability distribution;
4. all first ancestrals (m, i) in pair data acquisition system all repeat above-mentioned sampling operation.If reach Gibbs sampling condition of convergence L, then algorithm stops, and returns final key word index number otherwise repeat step 1-3.
5. utilize the key word index number obtained by step 1-4 to build protocol keyword model
Wherein K represents the number of protocol keyword,
In conjunction with above-mentioned based on semantic responsive network protocol identification method, the present invention discloses a kind of based on semantic responsive procotol recognition system.Native system is formed primarily of modeling unit, training unit and taxon three part, and corresponding to modelling phase, training stage and sorting phase respectively, system diagram framework as shown in Figure 5.
1. modeling unit: using the network data message set of application-specific agreement as input, utilize Latent DirichletAllocation model construction analyze the keyword model of agreement.The Output rusults of this unit is the protocol keyword model of analyzed agreement.
2. training unit: the characteristic of division information of the protocol keyword model extraction data message obtained according to modeling unit.The keyword feature vector obtained using characteristic extracting module, as input, utilizes Supervised machine learning method to Offline training data collection learning training, thus obtain analyze the disaggregated model of agreement.
3. taxon: the characteristic of division information of the protocol keyword model extraction data message obtained according to modeling unit, the protocol detection model (i.e. above-mentioned disaggregated model) utilizing training unit to export, makes differentiation to the protocol attribute of network under test data message.Output rusults is two classes: a class is the network data message belonging to target protocol, and another kind of is the network data message of non-target protocols.
In confirmatory experiment, the present invention tests when the total number W of n-gram element is different value respectively to DNS Protocol and File Transfer Protocol, and contrast its accuracy rate under different supervised learning algorithm, recall rate and F estimate.To certain application protocol that fixed system will be analyzed, first the present invention defines following three kinds of data acquisition systems:
● True Positives (TP): be the network packet of certain agreement by system identification, and be the network packet set belonging to this agreement really.
● False Positives (FP): be the network packet of certain agreement by system identification, but do not belong to the network packet set of this agreement.
● False Negatives (FN): the network packet by system identification being certain agreement non-, but be the network packet set belonging to this agreement in fact.
● True Negatives (TN): the network packet by system identification being certain agreement non-, and the network packet set really not belonging to this agreement.
Based on above-mentioned three kinds of data acquisition systems, the present invention adopts normally used accuracy rate (precision) in machine learning field, and recall rate (recall) and F estimate the validity and reliability of (F-Measure) three kinds of evaluation indexes to system and evaluate.Three kinds of evaluation indexes are defined as follows:
precision = TP TP + FP
recall = TP TP + FN
F - Measure = 2 * precision * recall precision + recall
Because accuracy rate and recall rate distinguish two aspects of descriptive system performance, single use accuracy rate and recall rate have limitation as evaluation index, therefore, select F measurement index to be considered by this two indices herein, thus choose the best alternatives.As shown in the table in the experimental result of DNS Protocol and File Transfer Protocol based on semantic responsive network protocol identification method.
Table 1:DNS agreement experimental result
Table 2:FTP agreement experimental result
Table 1 illustrates the experimental result of DNS Protocol.The present invention notices the accuracy rate numerical value of DNS Protocol, and under different parameters setting, its excursion is 94.16% ~ 99.74%.Recall rate numerical value, under different parameters setting, its excursion is 98.21% ~ 99.85%.For DNS Protocol, it is C4.5 decision tree that the present invention finds that it reaches best experimental result, and corresponding W numerical value is 1000.
Table 2 illustrates the experimental result of File Transfer Protocol.The present invention notices the accuracy rate numerical value of File Transfer Protocol, and under different parameters setting, its excursion is 97.20% ~ 99.56%.Recall rate numerical value, under different parameters setting, its excursion is 87.16% ~ 97.28%.For File Transfer Protocol, it is use C4.5 decision tree that the present invention finds that it reaches best experimental result, and corresponding W numerical value is 1500.
Above embodiment is only in order to illustrate technical scheme of the present invention but not to be limited; those of ordinary skill in the art can modify to technical scheme of the present invention or equivalent replacement; and not departing from the spirit and scope of the present invention, protection scope of the present invention should be as the criterion with described in claim.

Claims (8)

1., based on a semantic responsive network protocol identification method, it is characterized in that, comprise modelling phase, training stage and sorting phase;
In the modelling phase, using the network data message set of application-specific agreement as input, utilize Latent DirichletAllocation method to build analyze the keyword model of agreement;
In the training stage, the characteristic of division information of the protocol keyword model extraction data message obtained according to the modelling phase, using the keyword feature vector obtained as input, utilize Supervised machine learning method to Offline training data collection learning training, thus obtain analyze the disaggregated model of agreement;
At sorting phase, the characteristic of division information of the protocol keyword model extraction data message obtained according to the modelling phase, the protocol classification model utilizing the training stage to export, makes differentiation to the protocol attribute of network under test data message, judges whether it belongs to the network data message of target protocol.
2. the method for claim 1, is characterized in that, the concrete steps of described modelling phase comprise:
1) gather the network data message belonging to application-specific agreement, thus network data message is divided into two kinds: a class is the data message set belonging to the application protocol that will analyze; Another kind of is the datagram collected works not belonging to wanted analytical applications agreement;
2) n-gram model is utilized network data message to be converted into network data message using n-gram element as elementary cell; Described n-gram model is the subsequence of n continuous element of given sequence;
3) the protocol keyword model will analyzing agreement based on Latent Dirichlet Allocation method structure is utilized.
3. method as claimed in claim 2, is characterized in that, the concrete steps utilizing Latent Dirichlet Allocation method to build protocol keyword model comprise:
1) for including all n-gram in the set D of M data message distribute a random key word index number here w (m, i)in representative data message m, i-th n-gram, z (m, i)the key word index number of this n-gram, N mit is the number of n-gram element in data message m;
2) use representative is except z (m, i)the key word index number of every other n-gram in addition, when numerical value remains unchanged, according to Posterior probability distribution p ( z ( m , i ) | z → ⫬ ( m , i ) , W → ) ∝ ( n k ( t ) - 1 + β ) ( n m ( k ) - 1 + α ) ( Σ t = 1 W n k ( t ) - 1 + Wβ ) , For n-gramw (m, i)a new key word index numerical value z is produced by the method for sampling (m, i); Wherein α and β is given hyper parameter, represent element t in n-gram dictionary and distribute to the number of times of keyword k, represent the number of times that in message packet m, keyword k occurs, W represents the number of n-gram element in n-gram dictionary;
3) according to the z that the Gibbs method of sampling obtains (m, i)numerical value, upgrades the expired numerical value in Posterior probability distribution;
4) all repeat above-mentioned sampling operation to all first ancestrals (m, i) in data acquisition system, if reach Gibbs sampling condition of convergence L, then algorithm stops, and returns final key word index number otherwise repeat step 1) to 3);
5) utilize by step 1) to 4) key word index that obtains number builds protocol keyword model
Wherein K represents the number of protocol keyword,
4. method as claimed in claim 2 or claim 3, is characterized in that: when producing data message n-gram, only selecting front W the n-gram element that statistic frequency is higher, and forming its n-gram dictionary.
5. method as claimed in claim 2 or claim 3, it is characterized in that, the concrete steps of described training stage comprise:
1) data acquisition, with modelling phase step 1) operation;
2) data message n-gram produces, with modelling phase step 2) operation;
3) to network data message according to modelling phase step 3) the protocol keyword model that obtains carries out characteristic of division extraction;
4) supervised learning method is utilized, according to the two-value grader extracting the characteristic of division structure institute analytical applications agreement obtained.
6. method as claimed in claim 5, it is characterized in that, the concrete steps of described modelling phase comprise:
1) data message n-gram produces, with training stage step 2);
2) feature extraction, with training stage step 3);
3) according to step 1), step 2) the characteristic of division vector sum training stage step 4 that obtains) disaggregated model that obtains carries out protocol class judgement to the network data message do not identified.
7. the method for claim 1, is characterized in that, procotol to be measured is connection oriented protocol and/or towards connectionless protocol.
8. adopt the procotol recognition system based on semantic sensitivity of method described in claim 1, it is characterized in that, comprising:
Modeling unit, using the network data message set of application-specific agreement as input, utilize Latent Dirichlet Allocation model construction analyze the keyword model of agreement;
Training unit, the characteristic of division information of the protocol keyword model extraction data message obtained according to modeling unit, using the keyword feature vector obtained as input, utilize Supervised machine learning method to Offline training data collection learning training, thus obtain analyze the disaggregated model of agreement;
Taxon, the characteristic of division information of the protocol keyword model extraction data message obtained according to modeling unit, the protocol classification model utilizing training unit to export, makes differentiation to the protocol attribute of network under test data message, judges whether it belongs to the network data message of target protocol.
CN201410652834.0A 2014-11-17 2014-11-17 A kind of network protocol identification method and system based on semantic sensitivity Active CN104468262B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410652834.0A CN104468262B (en) 2014-11-17 2014-11-17 A kind of network protocol identification method and system based on semantic sensitivity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410652834.0A CN104468262B (en) 2014-11-17 2014-11-17 A kind of network protocol identification method and system based on semantic sensitivity

Publications (2)

Publication Number Publication Date
CN104468262A true CN104468262A (en) 2015-03-25
CN104468262B CN104468262B (en) 2017-12-15

Family

ID=52913669

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410652834.0A Active CN104468262B (en) 2014-11-17 2014-11-17 A kind of network protocol identification method and system based on semantic sensitivity

Country Status (1)

Country Link
CN (1) CN104468262B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105100091A (en) * 2015-07-13 2015-11-25 北京奇虎科技有限公司 Protocol identification method and protocol identification system
CN105390132A (en) * 2015-10-10 2016-03-09 中国科学院信息工程研究所 Language model-based application protocol identification method and system
CN105488539A (en) * 2015-12-16 2016-04-13 百度在线网络技术(北京)有限公司 Generation method and device of classification method, and estimation method and device of system capacity
CN106850338A (en) * 2016-12-30 2017-06-13 西可通信技术设备(河源)有限公司 A kind of R+1 classes application protocol recognition method and device based on semantic analysis
CN106850350A (en) * 2017-02-09 2017-06-13 珠海格力电器股份有限公司 Apparatus testing method and system based on BACnet
CN110225001A (en) * 2019-05-21 2019-09-10 清华大学深圳研究生院 A kind of dynamic self refresh net flow assorted method based on topic model
CN110855608A (en) * 2019-09-29 2020-02-28 上海天旦网络科技发展有限公司 Protocol reverse engineering system based on reinforcement learning and working method thereof
CN113852605A (en) * 2021-08-29 2021-12-28 北京工业大学 Protocol format automatic inference method and system based on relational reasoning
CN115150481A (en) * 2022-09-02 2022-10-04 浙江工企信息技术股份有限公司 Unknown communication protocol equipment-oriented code point address detection method and system
CN115334179A (en) * 2022-07-19 2022-11-11 四川大学 Unknown protocol reverse analysis method based on named entity recognition
CN117176471A (en) * 2023-10-25 2023-12-05 北京派网科技有限公司 Dual high-efficiency detection method, device and storage medium for anomaly of text and digital network protocol

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101587493A (en) * 2009-06-29 2009-11-25 中国科学技术大学 Text classification method
CN103297427A (en) * 2013-05-21 2013-09-11 中国科学院信息工程研究所 Unknown network protocol identification method and system
US20140129510A1 (en) * 2011-07-13 2014-05-08 Huawei Technologies Co., Ltd. Parameter Inference Method, Calculation Apparatus, and System Based on Latent Dirichlet Allocation Model
CN103870751A (en) * 2012-12-18 2014-06-18 中国移动通信集团山东有限公司 Method and system for intrusion detection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101587493A (en) * 2009-06-29 2009-11-25 中国科学技术大学 Text classification method
US20140129510A1 (en) * 2011-07-13 2014-05-08 Huawei Technologies Co., Ltd. Parameter Inference Method, Calculation Apparatus, and System Based on Latent Dirichlet Allocation Model
CN103870751A (en) * 2012-12-18 2014-06-18 中国移动通信集团山东有限公司 Method and system for intrusion detection
CN103297427A (en) * 2013-05-21 2013-09-11 中国科学院信息工程研究所 Unknown network protocol identification method and system

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105100091A (en) * 2015-07-13 2015-11-25 北京奇虎科技有限公司 Protocol identification method and protocol identification system
CN105100091B (en) * 2015-07-13 2018-12-14 北京奇安信科技有限公司 A kind of protocol recognition method and system
CN105390132A (en) * 2015-10-10 2016-03-09 中国科学院信息工程研究所 Language model-based application protocol identification method and system
CN105390132B (en) * 2015-10-10 2019-03-22 中国科学院信息工程研究所 A kind of application protocol recognition methods and system based on language model
CN105488539A (en) * 2015-12-16 2016-04-13 百度在线网络技术(北京)有限公司 Generation method and device of classification method, and estimation method and device of system capacity
CN106850338A (en) * 2016-12-30 2017-06-13 西可通信技术设备(河源)有限公司 A kind of R+1 classes application protocol recognition method and device based on semantic analysis
CN106850350A (en) * 2017-02-09 2017-06-13 珠海格力电器股份有限公司 Apparatus testing method and system based on BACnet
CN106850350B (en) * 2017-02-09 2018-10-16 珠海格力电器股份有限公司 Apparatus testing method based on BACnet and system
CN110225001A (en) * 2019-05-21 2019-09-10 清华大学深圳研究生院 A kind of dynamic self refresh net flow assorted method based on topic model
CN110225001B (en) * 2019-05-21 2021-06-04 清华大学深圳研究生院 Dynamic self-updating network traffic classification method based on topic model
CN110855608A (en) * 2019-09-29 2020-02-28 上海天旦网络科技发展有限公司 Protocol reverse engineering system based on reinforcement learning and working method thereof
CN110855608B (en) * 2019-09-29 2022-03-18 上海天旦网络科技发展有限公司 Protocol reverse engineering system based on reinforcement learning and working method thereof
CN113852605A (en) * 2021-08-29 2021-12-28 北京工业大学 Protocol format automatic inference method and system based on relational reasoning
CN113852605B (en) * 2021-08-29 2023-09-22 北京工业大学 Protocol format automatic inference method and system based on relation reasoning
CN115334179A (en) * 2022-07-19 2022-11-11 四川大学 Unknown protocol reverse analysis method based on named entity recognition
CN115334179B (en) * 2022-07-19 2023-09-01 四川大学 Unknown protocol reverse analysis method based on named entity recognition
CN115150481A (en) * 2022-09-02 2022-10-04 浙江工企信息技术股份有限公司 Unknown communication protocol equipment-oriented code point address detection method and system
CN117176471A (en) * 2023-10-25 2023-12-05 北京派网科技有限公司 Dual high-efficiency detection method, device and storage medium for anomaly of text and digital network protocol
CN117176471B (en) * 2023-10-25 2023-12-29 北京派网科技有限公司 Dual high-efficiency detection method, device and storage medium for anomaly of text and digital network protocol

Also Published As

Publication number Publication date
CN104468262B (en) 2017-12-15

Similar Documents

Publication Publication Date Title
CN104468262B (en) A kind of network protocol identification method and system based on semantic sensitivity
Wang et al. A semantics aware approach to automated reverse engineering unknown protocols
CN104270392B (en) A kind of network protocol identification method learnt based on three grader coorinated trainings and system
CN107241352B (en) Network security event classification and prediction method and system
CN103297427B (en) A kind of unknown network protocol recognition method and system
CN112910859B (en) Internet of things equipment monitoring and early warning method based on C5.0 decision tree and time sequence analysis
AlEroud et al. Queryable semantics to detect cyber-attacks: A flow-based detection approach
CN111726351B (en) Bagging-improved GRU parallel network flow abnormity detection method
CN105390132B (en) A kind of application protocol recognition methods and system based on language model
CN112202718A (en) XGboost algorithm-based operating system identification method, storage medium and device
Yao et al. Network anomaly detection using random forests and entropy of traffic features
Yan et al. Principal Component Analysis Based Network Traffic Classification.
Harbola et al. Improved intrusion detection in DDoS applying feature selection using rank & score of attributes in KDD-99 data set
Li et al. Protocol reverse engineering using LDA and association analysis
CN111211948B (en) Shodan flow identification method based on load characteristics and statistical characteristics
CN102098346B (en) Method for identifying flow of P2P (peer-to-peer) stream media in unknown flow
Ma et al. Automatic protocol reverse engineering for industrial control systems with dynamic taint analysis
Tang et al. Malware Traffic Classification Based on Recurrence Quantification Analysis.
CN115622926A (en) Industrial control protocol reverse analysis method based on network traffic
Wang et al. Reverse engineering of industrial control protocol by XGBoost with V-gram
Yu et al. Mining anomaly communication patterns for industrial control systems
Yu et al. Anomaly network detection model based on mobile agent
CN117669594B (en) Big data relation network analysis method and system for abnormal information
Tang et al. Relational reasoning-based approach for network protocol reverse engineering
Sang et al. Fingerprinting protocol at bit-level granularity: A graph-based approach using cell embedding

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant