CN109951464A - The sequence of message clustering method of unknown binary system proprietary protocol - Google Patents

The sequence of message clustering method of unknown binary system proprietary protocol Download PDF

Info

Publication number
CN109951464A
CN109951464A CN201910173504.6A CN201910173504A CN109951464A CN 109951464 A CN109951464 A CN 109951464A CN 201910173504 A CN201910173504 A CN 201910173504A CN 109951464 A CN109951464 A CN 109951464A
Authority
CN
China
Prior art keywords
message sequence
sequence
vector
message
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910173504.6A
Other languages
Chinese (zh)
Other versions
CN109951464B (en
Inventor
杨超
吴继超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Cyber Peace Technology Co Ltd
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201910173504.6A priority Critical patent/CN109951464B/en
Publication of CN109951464A publication Critical patent/CN109951464A/en
Application granted granted Critical
Publication of CN109951464B publication Critical patent/CN109951464B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a kind of sequence of message clustering method of unknown binary system proprietary protocol, mainly solve the problems, such as that the prior art can not accurately measure similitude between protocol massages sequence in agreement reverse process.Its implementation are as follows: 1) acquire unknown binary system proprietary protocol sequence of message;2) sequence of message of acquisition is pre-processed;3) the multiple dimensioned N-gram feature of pretreatment sequence of message is extracted;4) dimensionality reduction is carried out to multiple dimensioned N-gram feature based on variance selection;5) doing insertion to sequence of message according to the multiple dimensioned N-gram feature after dimensionality reduction indicates;6) being embedded according to sequence of message indicates to determine best cluster number K;7) sequence of message is clustered according to best cluster number K.The present invention has sufficiently excavated the potential applications information of message packet sequence, can accurately measure the similitude between sequence of message, improve the accuracy of cluster, can be used for the cluster to unknown binary system proprietary protocol.

Description

Message sequence clustering method for unknown binary private protocol
Technical Field
The invention belongs to the technical field of information, and further relates to a message sequence clustering method which can be used for clustering unknown binary private protocols.
Background
Network protocols are specifications for the communication of entities in a network that specify the format of data and the associated synchronization issues when communicating entities exchange information with one another. In addition to standardized communication protocols, there are a number of unknown proprietary protocols in networks. Message sequence clustering is the first work in the protocol reverse process, namely separating messages of various types of private protocol message sequences to the maximum extent according to the similarity between message sequences, and then performing field format inference and state machine inference.
The core problem of private protocol message sequence clustering, i.e. network protocol identification, is how to accurately measure the similarity between message sequences. The current packet sequence clustering algorithm of the unknown private protocol can be roughly divided into three categories, namely a sequence clustering algorithm based on an edit distance, a sequence clustering algorithm based on a keyword and a sequence clustering algorithm based on a probability model. Edit distance measures similarity between sequences by the minimum number of operations required to change one string to another, including inserting, deleting, and replacing a character. The idea of finding the longest common subsequence in the edit distance algorithm and the Needleman-Wunsch algorithm is similar, and from the perspective of text matching, local features between sequences are ignored, and the local features may be a key for measuring similarity between sequences in a protocol cluster, namely, a protocol keyword. In a sequence clustering algorithm based on a probability model, modeling is difficult, and the method is very effective only in long sequence clustering calculation. The sequence clustering algorithm based on the keywords is an Apriori algorithm, and has the problems that a large number of frequently overlapped items appear, so that the dimensionality of a feature vector representing a message sequence is very large. In 2013, Wang Yipeng et al pioneering that the N-gram and Dirichlet distribution LDA model in natural language processing are introduced into protocol sequence clustering, the optimal value of N is determined by using Zipf of the Ziff law, and then modeling is performed by using LDA. The method ignores the length of the key word of the protocol message sequence, does not consider the semantic association characteristics before and after words when performing message embedding representation, and can not accurately measure the similarity between the message sequences and has poor clustering effect.
Disclosure of Invention
The invention aims to provide a message sequence clustering method of an unknown binary private protocol aiming at the defects of the prior art, so that the potential semantic information in the message is fully mined in the private protocol feature extraction process, and the clustering accuracy is improved.
The technical scheme of the invention is as follows: modeling a message sequence by using an N-gram language model, extracting multi-scale N-gram characteristics under the condition of not fixing an N value, and training a message sequence word vector embedding representation by using a word2vec model, wherein the implementation steps comprise the following steps:
(1) collecting an unknown binary private protocol message sequence by using a data collection method;
(2) preprocessing an acquired unknown binary private protocol message sequence:
(2a) stripping link layer and transmission layer data of an unknown binary private protocol message sequence by a network packet analysis technology to obtain application layer binary private protocol message sequence data;
(2b) converting the binary message sequence data of the application layer into hexadecimal message sequence data according to a binary conversion rule;
(2c) marking the hexadecimal message sequence data to generate a sample data set;
(3) extracting multi-scale N-gram characteristics of the sample data set:
(3a) determining the minimum value and the maximum value range of the N value;
(3b) taking N value in the range, and cutting the sample data set by using an N-gram model to obtain a word vector of the segmented message sequence of the message as the multi-scale N-gram characteristic of the sample data set;
(4) and (3) reducing the dimension of the multi-scale N-gram feature based on variance selection:
(4a) according to the word vector of the message sequence, carrying out One-Hot coding on the message sequence by utilizing the One-Hot coding to obtain a characteristic vector space model after the message sequence is coded;
(4b) calculating the variance distribution of each eigenvector according to the eigenvector space model;
(4c) reducing the dimension of the extracted multi-scale N-gram characteristics according to the variance distribution of each characteristic vector, namely selecting the characteristic vector with larger variance as a characteristic vector vocabulary of the sample data set;
(5) and embedding and expressing the message sequence according to the feature vector vocabulary:
(5a) screening word vectors of the message sequence by using the characteristic vector vocabulary, and only leaving the word vectors in the characteristic vector vocabulary as word vector characteristics of the message sequence;
(5b) taking the word vector characteristics of the sample training set as input, and training by using a word2vec model to obtain an embedded vector dictionary wv taking the weight matrix of the hidden layer of the shallow neural network as a vocabulary;
(5c) finding out the embedded vector representation wv [ w ] of the word in the embedded vector dictionary wv for the word w in each message sequence]Adding and averaging to obtain embedded vector representation E of each message sequencev
(5d) Embedding vector E of each message sequencevNormalizing into unit vector to obtain embedded vector matrix X of message sequenceEv
(6) Embedded vector matrix X of message sequence by using MeanShift probability density estimation methodEvPerforming modular point search to obtain the optimal clustering number K of the message sequence;
(7) clustering message sequences:
(7a) taking an embedded vector matrix of the message sequence as input, and dividing the message sequence into K sets by using a K-Means clustering method;
(7b) and respectively storing the message sequence data divided into each set.
Compared with the prior art, the invention has the following advantages:
firstly, the invention carries out multi-scale N-gram feature extraction under the condition of not fixing the N value, and overcomes the problem of embedding representation of unequal lengths of keywords of message sequences.
Secondly, the word table is embedded and expressed by a word2vec model, and semantic association characteristics of front and rear words are combined on the aspect of determining the weight of the keywords, so that the method can fully mine potential semantic information in the message, and further improve the accuracy of clustering.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention.
Detailed Description
The specific steps of the present invention are further described below with reference to fig. 1.
Step 1, collecting an unknown binary private protocol message sequence by using a data collection method.
(1a) Setting a network card mode of the server acquisition equipment into a hybrid mode, enabling the network card mode to monitor wireless communication data, and then opening a communication entity A and a communication entity B to establish communication connection;
(1b) and intercepting message sequence communication data between the communication entities A and B by using wireshark software, and storing the message sequence communication data as a pcap format file to obtain an unknown binary private protocol message sequence, wherein the message sequence comprises link layer data, transmission layer data and application layer data.
And 2, preprocessing the acquired unknown binary private protocol message sequence.
(2a) Analyzing an intercepted unknown binary private protocol message sequence according to the structure of a network data packet, namely stripping link layer data and transmission layer data contained in the message sequence to obtain application layer data of the message sequence, wherein the application layer data is in a binary format;
(2b) according to the conversion rule between the binary systems, for example, binary 1111 corresponds to hexadecimal as F, the application layer binary message sequence data is converted into hexadecimal message sequence data.
(2c) And marking the hexadecimal message sequence data to generate a sample data set.
And 3, extracting the multi-scale N-gram characteristics of the sample data set.
The N-gram model is a natural language processing model based on string statistics, based on the assumption that the nth word occurs onlyIs related to the first n-1 words and is not related to any other words. The probability of occurrence of the entire sequence is equal to the product of the probabilities of occurrence of the individual words, assuming that the sequence T is formed by the sequence of words ω12,...,ωnAnd (3) forming, wherein the probability of occurrence of the sequence T is as follows:
P(T)=p(ω1)×p(ω2)×...×p(ωn)
=p(ω1)×p(ω21)×...×p(ωn1ω2ω3...),
wherein P (T) is the probability of occurrence of the sequence T, P (. omega.)i) Is the word omegaiThe probability of occurrence.
The selection of the N value in the N-gram model is very important, the integrity of segmentation data can be ensured when N is larger, but the effectiveness is reduced, and complete lexical information cannot be contained in the word segmentation process when N is too small. The fixed value of N can be extended to a range of values that can take on multiple values.
The specific implementation of extracting the multi-scale N-gram characteristics of the sample data set in the step is as follows:
(3a) the minimum value and the maximum value range of the N value are determined and are generally set to be 2-5.
(3b) Taking N values in the range, and respectively segmenting the sample data set by using an N-gram model to obtain segmented word vectors of the message sequence, for example, when N is 2, for the message sequence '020 a', the segmented word vectors can be obtained as '02200 a';
(3c) and combining the word vectors obtained under different N values to obtain the segmented word vectors of the message sequence of the message, wherein the word vectors are used as the multi-scale N-gram characteristics of the sample data set.
And 4, reducing the dimension of the multi-scale N-gram characteristics based on the variance selection.
The feature selection method can be divided into three types according to the form of feature selection: (1) a filtering method, which scores each feature according to the divergence or the correlation, sets a scoring threshold value or the number of features to be selected, and selects the features; (2) a packaging method, selecting a plurality of characteristics each time according to the target function, or excluding a plurality of characteristics; (3) in the integration method, certain machine learning algorithms and models are used for training to obtain weight coefficients of all the features, and the features are selected according to the coefficients from large to small.
The invention uses but is not limited to the variance selection in the filtering method to reduce the dimension of the characteristic, and the specific implementation is as follows:
(4a) encoding a message packet sequence word vector according to a One-Hot encoding rule, One-Hot encoding using an N-bit state register to encode N states, each state having its own independent register bits and only One of which is valid at any time, e.g., for a feature vector vocabulary "a 0b0c0a00005 … …", feature "a 0" may be encoded as [ 10000 … … ", feature" b0 "as [ 01000 … … ], an embedded vector of the message packet sequence" a 000055 e e 00 cc 00445 a 0000505 e 5e0e0c 0c0c04045 "is denoted as [ 10011 … … ];
(4b) combining the embedded vectors after each message sequence is coded to obtain a characteristic vector space model;
(4c) calculating the variance distribution of each feature vector according to the feature vector space model:
whereinFor the variance of each feature vector, x is the value of each feature vector in the sample data set, u is the mean value, and N is the total number of the sample data set;
(4d) and setting a score threshold value of the variance selection, and selecting the feature vector with the variance larger than the threshold value as a feature vector vocabulary of the sample data set.
And 5, training the word vector characteristics of the message sequence by using the word2vec model according to the characteristic vector vocabulary to obtain a word embedded vector dictionary wv.
The word2vec model is a shallow neural network correlation model used for generating word vectors, each word is mapped to an embedded vector by predicting the maximum likelihood probability of input words at adjacent positions so as to represent semantic relation between word pairs, and the vector is a hidden layer of the shallow neural network.
The specific implementation of this step is as follows:
(5a) screening word vectors of the message sequence by using the feature vector vocabulary table, and only leaving the word vectors in the feature vector vocabulary table as word vector features of the message sequence, for example, the feature vector vocabulary table is "a 0b0c0a00005 … …", and for the message sequence word vector "a 000055 e e 00 c c 00445 a 0000505 e 5e0e0c 0c0c 04045", the word vector of the message sequence after screening is "a 0c0a 00005";
(5b) taking the word vector characteristics of the sample training set as input, training by using a word2vec model to obtain a weight matrix of a shallow neural network hidden layer, and taking the weight matrix as an embedded vector dictionary wv of words.
And 6, carrying out embedded representation on the message sequence.
(6a) Finding out the embedded vector representation wv [ w ] of the word in the embedded vector dictionary wv for the word w in each message sequence]Adding and averaging to obtain embedded vector representation E of each message sequencev
Wherein EvEmbedding for each message sequenceVector representation, wv [ w ]]Representing the embedded vector corresponding to each vocabulary, wherein M is the number of vocabularies contained in each message sequence;
(6b) embedding vector E of each message sequencevNormalized into unit vector, and then combined to obtain embedded vector matrix of message sequence
Step 7, utilizing the MeanShift probability density estimation method to embed vector matrix of message sequenceAnd performing modular point search to obtain the optimal clustering number K of the message sequence.
(7a) Randomly selecting an embedded vector matrixThe point in (1) is taken as a starting point s;
(7b) setting a search radius h, and calculating the movement of a point s to each point x in the radius range hiThe sum of the required offsets and averaging to obtain the average offset
(7c) Shift point s along the averageIs moved to a new point s', the length of the movement being the average offsetDie length of
(7d) Repeating (6a) with the moved new point s' as a new starting point) (6c) and continuously iterating until the average offsetAnd if the value is smaller than the set threshold value or the iteration times are reached, the obtained new point s' is a clustering center, and the number of the clustering centers is the optimal clustering number K.
And 8, clustering the message sequence by using a K-Means clustering method.
In the K-Means clustering, sample data is divided into K clusters, so that each point belongs to a nearest mean value, namely a class corresponding to a clustering center, and the specific implementation is as follows:
(8a) randomly selecting an embedded vector matrixThe medium K points are used as initial clustering centers: { u1,u2,...,uk};
(8b) Computing an embedded vector matrixAll points x iniAnd cluster center ujEuropean distance ofX is to beiClass λ corresponding to Euclidean distance d marked as minimumiAt this time, the cluster is updated
(8c) Updating the clustering center ui', the mean of all points in each cluster is taken as the new cluster center for the cluster:
(8d) by ui' repeat (8a) - (8d) for new cluster centers, and iterate untilThe clustering center is not changed any more or the iteration times are reached;
(8e) and respectively storing the message sequence data divided into each cluster.
The effect of the present invention will be further described with reference to simulation experiments.
1. Simulation experiment conditions are as follows:
in the simulation experiment, two communication entities are a Dobby pocket unmanned aerial vehicle and a ground station which are delivered from ZeroTech, and communication data between the two communication entities is intercepted and used as message sequence data of an unknown binary private protocol in the simulation experiment.
The simulation conditions of the server in the simulation experiment of the invention are that the protocol message sequence is clustered and analyzed on a computer with Intel (R) core (TM) i5-8250U CPU processor, Windows 10 operating system and memory 16GB, and the simulation experiment result is tested.
2. Simulation experiment content and result analysis:
the simulation experiment of the invention is to collect the protocol message sequence data of both communication entities, and to extract the characteristics and perform cluster analysis on the server. The simulation experiment comprises the following specific steps:
step 1, collecting protocol message sequence data of both communication entities, selecting 2000 protocol message sequences for data annotation;
step 2, preprocessing the protocol message sequence data to generate application layer hexadecimal message sequence data;
step 3, setting the maximum value N of the range of N valuesmax
Step 4, segmenting and combining the message sequence according to the set N value threshold range to obtain a word vector of the message sequence;
step 5, performing one-hot encoding on the message sequence word vectors, calculating the variance distribution of the feature vectors, selecting the threshold parameter of 0.8 in the simulation experiment according to the threshold value, and selecting the feature with the variance larger than the threshold value as a feature vector vocabulary;
step 6, screening word vectors of the message sequence according to the vocabulary table, taking the word vectors as input, and training by using a word2vec model to obtain an embedded vector dictionary wv of the vocabulary;
7, finding out the embedded vector representation wv [ w ] of the vocabulary in the embedded vector dictionary wv for the vocabulary w in each message sequence]Adding and averaging to obtain embedded vector representation E of each message sequencev
Step 8, embedding vector E for each message sequencevNormalizing into unit vector to obtain embedded vector matrix of message sequence
Step 9, determining the optimal clustering number K of the message sequence by using MeanShift modular point search;
and step 10, clustering the message sequence by using a K-Means clustering method according to the optimal clustering number K to obtain the class division of the message sequence, wherein the result is shown in a table 1 and is stored.
Table 1: summary of simulation test results
Number of messages Ratio of Representation format
Class 1 1484 74.20% a018*
Class 2 295 14.75% a010*
Class 3 147 7.35% a00f*
Class 4 74 3.70% a005*
As can be seen from table 1, the simulation experiment result divides the collected 2000 protocol message sequence data into 4 categories, and each category message sequence representation format is "a 018 ×," a010 ×, "a 00f ×," a005 × "respectively, and is consistent with manual labeling.
3. Accuracy analysis of simulation experiment:
to demonstrate the effectiveness of the cluster analysis of the present invention, the threshold N at different ranges of N values was calculated by the following formulamaxClustering accuracy under the conditions, results are shown in table 2.
Table 2: accuracy list of message sequence clustering analysis
Nmax 2 3 4 5 6 7 8 9
Accuracy of 0.5383 0.7189 0.7247 0.9940 0.5301 0.9658 0.6019 0.6450
As can be seen from Table 2, when N is presentmaxWhen the accuracy is 5, the accuracy is highest and reaches 99.40%. The result verifies the effectiveness of the invention, and shows that the method can be used as an effective unknown binary private protocol message sequenceAnd (4) clustering method.

Claims (7)

1. A message sequence clustering method of an unknown binary private protocol is characterized by comprising the following steps:
(1) collecting an unknown binary private protocol message sequence by using a data collection method;
(2) preprocessing an acquired unknown binary private protocol message sequence:
(2a) stripping link layer and transmission layer data of an unknown binary private protocol message sequence by a network packet analysis technology to obtain application layer binary private protocol message sequence data;
(2b) converting the binary message sequence data of the application layer into hexadecimal message sequence data according to a binary conversion rule;
(2c) marking the hexadecimal message sequence data to generate a sample data set;
(3) extracting multi-scale N-gram characteristics of the sample data set:
(3a) determining the minimum value and the maximum value range of the N value;
(3b) taking N value in the range, and segmenting the sample data set by using an N-gram model to obtain segmented message sequence word vectors serving as multi-scale N-gram characteristics of the sample data set;
(4) and (3) reducing the dimension of the multi-scale N-gram feature based on variance selection:
(4a) according to the word vector of the message sequence, carrying out One-Hot coding on the message sequence by utilizing the One-Hot coding to obtain a characteristic vector space model after the message sequence is coded;
(4b) calculating the variance distribution of each eigenvector according to the eigenvector space model;
(4c) reducing the dimension of the extracted multi-scale N-gram characteristics according to the variance distribution of each characteristic vector, namely selecting the characteristic vector with larger variance as a characteristic vector vocabulary of the sample data set;
(5) and embedding and expressing the message sequence according to the feature vector vocabulary:
(5a) screening word vectors of the message sequence by using the characteristic vector vocabulary, and only leaving the word vectors in the characteristic vector vocabulary as word vector characteristics of the message sequence;
(5b) taking the word vector characteristics of the sample training set as input, and training by using a word2vec model to obtain an embedded vector dictionary wv taking the weight matrix of the hidden layer of the shallow neural network as a vocabulary;
(5c) finding out the embedded vector representation wv [ w ] of the word in the embedded vector dictionary wv for the word w in each message sequence]Adding and averaging to obtain embedded vector representation E of each message sequencev
(5d) Embedding vector E of each message sequencevNormalizing into unit vector to obtain embedded vector matrix of message sequence
(6) Embedded vector matrix of message sequence by using MeanShift probability density estimation methodPerforming modular point search to obtain the optimal clustering number K of the message sequence;
(7) clustering message sequences:
(7a) taking an embedded vector matrix of the message sequence as input, and dividing the message sequence into K sets by using a K-Means clustering method;
(7b) and respectively storing the message sequence data divided into each set.
2. The method according to claim 1, wherein the binary private protocol packet sequence in (1) comprises link layer data, transport layer data, and application layer data.
3. The method of claim 1, wherein the One-Hot encoding in (4a) is performed using an N-bit status register to encode N states, each state having its own independent register bit and only One of which is active at any time.
4. The method of claim 1, wherein the variance distribution of each feature vector is calculated in (4b) by the following formula:
wherein,for the variance of each feature vector, x is the value of each feature vector in the sample data set, u is the mean, and N is the total number of sample data sets.
5. The method of claim 1, wherein the word2vec model in (5b) is a shallow neural network correlation model for generating word vectors, each word is mapped to an embedded vector by predicting the maximum likelihood probability of input words in neighboring positions to represent semantic relationships between word-to-word, and the vector is a hidden layer of the shallow neural network.
6. The method of claim 1, wherein the embedded vector matrix of the message sequence in (6) is estimated by using MeanShift probability densityAnd performing a mode point search, and realizing the following steps:
(6a) randomly selecting an embedded vector matrixThe point in (1) is taken as a starting point s;
(6b) setting a search radius h, and calculating the movement of a point s to each point x in the radius range hiThe sum of the required offsets and averaging to obtain the average offset
(6c) Shift point s along the averageIs moved to a new point s', the length of the movement being the average offsetDie length of
(6d) Repeating (6a) - (6c) with the moved new point s' as a new starting point, and repeating the steps until the average offset is reachedAnd if the value is smaller than the set threshold value or the iteration times are reached, the obtained new point s' is a clustering center, and the number of the clustering centers is the optimal clustering number K.
7. The method according to claim 1, wherein said (7a) uses K-Means clustering to divide the message packet sequence into K sets, as follows:
(7a) randomly selecting an embedded vector matrixThe medium K points are used as initial clustering centers: { u1,u2,...,uk};
(7b) Computing an embedded vector matrixAll points x iniAnd cluster center ujEuropean distance ofX is to beiClass λ corresponding to Euclidean distance d marked as minimumiAt this time, the cluster is updated
(7c) Updating clustering center u'iTaking the average value of all points in each cluster as the new cluster center of the cluster:
(7d) u'iRepeating (8a) - (8d) for the new cluster center, and continuously iterating until the cluster center is not changed any more or the iteration number is reached.
CN201910173504.6A 2019-03-07 2019-03-07 Message sequence clustering method for unknown binary private protocol Active CN109951464B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910173504.6A CN109951464B (en) 2019-03-07 2019-03-07 Message sequence clustering method for unknown binary private protocol

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910173504.6A CN109951464B (en) 2019-03-07 2019-03-07 Message sequence clustering method for unknown binary private protocol

Publications (2)

Publication Number Publication Date
CN109951464A true CN109951464A (en) 2019-06-28
CN109951464B CN109951464B (en) 2021-05-14

Family

ID=67008531

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910173504.6A Active CN109951464B (en) 2019-03-07 2019-03-07 Message sequence clustering method for unknown binary private protocol

Country Status (1)

Country Link
CN (1) CN109951464B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110602073A (en) * 2019-09-02 2019-12-20 西安电子科技大学 Unmanned aerial vehicle flight control protocol field division method based on information theory
CN112367325A (en) * 2020-11-13 2021-02-12 中国人民解放军陆军工程大学 Unknown protocol message clustering method and system based on closed frequent item mining
CN112398865A (en) * 2020-11-20 2021-02-23 苏州新网天盾科技有限公司 Application layer information reasoning method under multilayer protocol nesting condition
CN114722961A (en) * 2022-04-20 2022-07-08 重庆邮电大学 Mixed data frame clustering method of binary protocol under zero knowledge
CN114724069A (en) * 2022-04-09 2022-07-08 北京天防安全科技有限公司 Video equipment model confirming method, device, equipment and medium
CN115334179A (en) * 2022-07-19 2022-11-11 四川大学 Unknown protocol reverse analysis method based on named entity recognition
CN116016690A (en) * 2022-12-02 2023-04-25 国家工业信息安全发展研究中心 Automatic reverse analysis method and system for industrial private protocol

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106095736A (en) * 2016-06-07 2016-11-09 华东师范大学 A kind of method of field neologisms extraction
US20170024455A1 (en) * 2015-07-24 2017-01-26 Facebook, Inc. Expanding mutually exclusive clusters of users of an online system clustered based on a specified dimension
CN107015963A (en) * 2017-03-22 2017-08-04 重庆邮电大学 Natural language semantic parsing system and method based on deep neural network
CN107273352A (en) * 2017-06-07 2017-10-20 北京理工大学 A kind of word insertion learning model and training method based on Zolu functions
US20180089164A1 (en) * 2016-09-28 2018-03-29 Microsoft Technology Licensing, Llc Entity-specific conversational artificial intelligence
CN108280357A (en) * 2018-01-31 2018-07-13 云易天成(北京)安全科技开发有限公司 Data leakage prevention method, system based on semantic feature extraction
CN109165383A (en) * 2018-08-09 2019-01-08 四川政资汇智能科技有限公司 A kind of data convergence, analysis, excavation and sharing method based on cloud platform

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170024455A1 (en) * 2015-07-24 2017-01-26 Facebook, Inc. Expanding mutually exclusive clusters of users of an online system clustered based on a specified dimension
CN106095736A (en) * 2016-06-07 2016-11-09 华东师范大学 A kind of method of field neologisms extraction
US20180089164A1 (en) * 2016-09-28 2018-03-29 Microsoft Technology Licensing, Llc Entity-specific conversational artificial intelligence
CN107015963A (en) * 2017-03-22 2017-08-04 重庆邮电大学 Natural language semantic parsing system and method based on deep neural network
CN107273352A (en) * 2017-06-07 2017-10-20 北京理工大学 A kind of word insertion learning model and training method based on Zolu functions
CN108280357A (en) * 2018-01-31 2018-07-13 云易天成(北京)安全科技开发有限公司 Data leakage prevention method, system based on semantic feature extraction
CN109165383A (en) * 2018-08-09 2019-01-08 四川政资汇智能科技有限公司 A kind of data convergence, analysis, excavation and sharing method based on cloud platform

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MOHAMAD ABDOLAHI KHARAZMI: "Text coherence new method using word2vec sentence vectors and most likely n-grams", 《2017 3RD IRANIAN CONFERENCE ON INTELLIGENT SYSTEMS AND SIGNAL PROCESSING (ICSPIS)》 *
张奇: "基于语言模型和机器学习的文本情感分类研究", 《中国优秀硕士学位论文全文数据库(电子期刊)信息科技辑》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110602073A (en) * 2019-09-02 2019-12-20 西安电子科技大学 Unmanned aerial vehicle flight control protocol field division method based on information theory
CN110602073B (en) * 2019-09-02 2021-05-18 西安电子科技大学 Unmanned aerial vehicle flight control protocol field division method based on information theory
CN112367325A (en) * 2020-11-13 2021-02-12 中国人民解放军陆军工程大学 Unknown protocol message clustering method and system based on closed frequent item mining
CN112367325B (en) * 2020-11-13 2023-11-07 中国人民解放军陆军工程大学 Unknown protocol message clustering method and system based on closed frequent item mining
CN112398865A (en) * 2020-11-20 2021-02-23 苏州新网天盾科技有限公司 Application layer information reasoning method under multilayer protocol nesting condition
CN114724069A (en) * 2022-04-09 2022-07-08 北京天防安全科技有限公司 Video equipment model confirming method, device, equipment and medium
CN114722961A (en) * 2022-04-20 2022-07-08 重庆邮电大学 Mixed data frame clustering method of binary protocol under zero knowledge
CN115334179A (en) * 2022-07-19 2022-11-11 四川大学 Unknown protocol reverse analysis method based on named entity recognition
CN115334179B (en) * 2022-07-19 2023-09-01 四川大学 Unknown protocol reverse analysis method based on named entity recognition
CN116016690A (en) * 2022-12-02 2023-04-25 国家工业信息安全发展研究中心 Automatic reverse analysis method and system for industrial private protocol

Also Published As

Publication number Publication date
CN109951464B (en) 2021-05-14

Similar Documents

Publication Publication Date Title
CN109951464B (en) Message sequence clustering method for unknown binary private protocol
CN107085581B (en) Short text classification method and device
US10963685B2 (en) Generating variations of a known shred
CN111506599B (en) Industrial control equipment identification method and system based on rule matching and deep learning
CN107862046A (en) A kind of tax commodity code sorting technique and system based on short text similarity
CN111274804A (en) Case information extraction method based on named entity recognition
CN113392191B (en) Text matching method and device based on multi-dimensional semantic joint learning
CN111177367B (en) Case classification method, classification model training method and related products
CN112926045B (en) Group control equipment identification method based on logistic regression model
CN116049412B (en) Text classification method, model training method, device and electronic equipment
CN113590810B (en) Abstract generation model training method, abstract generation device and electronic equipment
CN111159377B (en) Attribute recall model training method, attribute recall model training device, electronic equipment and storage medium
CN111984792A (en) Website classification method and device, computer equipment and storage medium
CN112800249A (en) Fine-grained cross-media retrieval method based on generation of countermeasure network
CN115344693B (en) Clustering method based on fusion of traditional algorithm and neural network algorithm
CN113486173A (en) Text labeling neural network model and labeling method thereof
CN108519978A (en) A kind of Chinese document segmenting method based on Active Learning
CN115953123A (en) Method, device and equipment for generating robot automation flow and storage medium
CN117235137B (en) Professional information query method and device based on vector database
CN112711944B (en) Word segmentation method and system, and word segmentation device generation method and system
CN113222059A (en) Multi-label emotion classification method using cooperative neural network chain
CN116069947A (en) Log data event map construction method, device, equipment and storage medium
CN115130475A (en) Extensible universal end-to-end named entity identification method
CN103744830A (en) Semantic analysis based identification method of identity information in EXCEL document
CN114021658A (en) Training method, application method and system of named entity recognition model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230620

Address after: Floor 9, Building 3, Yougu, No. 12, Mozhou East Road, Moling Street, Jiangning District, Nanjing, Jiangsu 211111

Patentee after: NANJING CYBER PEACE INFORMATION TECHNOLOGY CO.,LTD.

Address before: 710071 No. 2 Taibai South Road, Shaanxi, Xi'an

Patentee before: XIDIAN University