CN101282251A - Method for digging recognition characteristic of application layer protocol - Google Patents

Method for digging recognition characteristic of application layer protocol Download PDF

Info

Publication number
CN101282251A
CN101282251A CNA2008101060589A CN200810106058A CN101282251A CN 101282251 A CN101282251 A CN 101282251A CN A2008101060589 A CNA2008101060589 A CN A2008101060589A CN 200810106058 A CN200810106058 A CN 200810106058A CN 101282251 A CN101282251 A CN 101282251A
Authority
CN
China
Prior art keywords
frequent
feature
application layer
digging
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2008101060589A
Other languages
Chinese (zh)
Other versions
CN101282251B (en
Inventor
刘兴彬
杨建华
胡玥
谢高岗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN2008101060589A priority Critical patent/CN101282251B/en
Publication of CN101282251A publication Critical patent/CN101282251A/en
Application granted granted Critical
Publication of CN101282251B publication Critical patent/CN101282251B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a method for digging identification characteristics of application layer protocol. The method comprises the following steps of: A, filtering firstly and coding a training data packet set, extracting standard protocol identification characteristic data information; B, performing a first digging to the extracted standard protocol identification characteristic data information to obtain a multistage frequent set; C, performing the first digging to the multistage frequent set, and correcting frequent degree of the rest multistage frequent set after the first digging, performing a second digging to obtain final protocol identification characteristics; D, if byte identification rate of all the final protocol identification characteristics meets the demand, or the total identification rate of the data packet meets the demand, no longer digging the data of the second and the subsequent data packets; otherwise, circularly digging the second and the subsequent data packets until the total identification rate meets the demand. The invention can analyze, dig the data packet set, and extract all the identification characteristics of the corresponding application layer protocol, which greatly improves characteristic extraction efficiency and the total identification rate.

Description

A kind of method for digging recognition characteristic of application layer protocol
Technical field
The present invention relates to computer network flow monitoring analysis technical field, particularly relate to a kind of method for digging recognition characteristic of application layer protocol.
Background technology
The identification of network application layer flow is most important to the network planning, network management, traffic engineering, safety detection etc.Traditional application layer recognition methods is mainly acted on behalf of (the InternetAssigned Numbers Authority of member management office based on Internet, IANA) the corresponding protocols port is used in definition, but for hiding flow or otherwise needs, prior art is used dynamic protocol port or encrypted packets load in a large number, and this has brought very big challenge for traditional application layer recognition methods.
In order to address this problem, people have proposed deep layer packet analytic technique (Deep PacketInspection, DPI) recognition methods.This method is formed the recognition characteristic of application layer protocol storehouse by finding out the packet characteristic character string, adopts the mode of characteristic matching to carry out flow identification.
But the prerequisite of using this method is correctly to find out the application layer protocol characteristic of agreement, and the accuracy of the recognition feature of application layer protocol all has great influence to discrimination, accuracy rate and false recognition rate.
The method of extracting application layer protocol characteristic at present mainly contains following two kinds:
The first, find the definition document of application layer protocol, obtain the application layer feature of this agreement according to the regulation of document.But present a lot of new application layer protocol adopts proprietary protocol to realize, can't obtain the protocol definition document, as PPlive.In addition, even agreement is open, the frequency of protocol update has brought huge difficulty also for the renewal of agreement recognition feature.
The second, catch the job contract tool by wireshark, tcpdump etc. and catch the packet of the communication process of agreement, check, contrast each and flow corresponding packet by artificial, find the application layer feature of this agreement.But, this method efficient and with a low credibility.
Along with new application constantly occurs, realize that application layer traffic is discerned the application layer feature that will constantly seek and upgrade every kind of agreement accurately.But owing to there are not better application layer protocol feature mining method, the method for manual analysis to remain the most frequently used a kind of method for digging recognition characteristic of application layer protocol at present at present.At present a lot of network monitor analytic product companies have a powerful team to be responsible for regularly various application being followed the tracks of in order to improve the accuracy in recognition feature storehouse specially, comprise packet capture, artificial protocal analysis etc.
But, rely on manual analysis to realize the method for digging of recognition characteristic of application layer protocol, feature extraction efficient is very low, can't satisfy that new application layer protocol continues to bring out and the demand of the frequent upgrading of protocols having, and, the recognition feature of finding by manual observation is incomplete usually, causes overall recognition efficiency not high.
Summary of the invention
The object of the present invention is to provide a kind of method for digging recognition characteristic of application layer protocol, its can be full automatic to packet set analyze, excavate, all recognition features of respective application layer protocol be can extract, feature extraction efficient and overall discrimination improved greatly.
A kind of method for digging recognition characteristic of application layer protocol for realizing that purpose of the present invention provides comprises the following steps:
Steps A, the filtration first time is carried out in set to the training data bag, and encodes, and extracts accurate agreement recognition feature data message;
Step B carries out the first time and excavates from the accurate agreement recognition feature data message that extracts, obtain multistage frequent item set;
Step C carries out the first time to described multistage frequent item set and filters, and after the frequent degree that filters the remaining multistage frequent item set in back is for the first time revised and excavated for the second time, it is carried out the second time filter, and obtains the final agreement recognition feature.
Described method for digging recognition characteristic of application layer protocol also comprises the following steps:
Step D if the byte discrimination of all final agreement recognition features reaches requirement, when perhaps identification of data packets rate summation reaches requirement, then no longer excavates second and reaches the data of packet later on; Otherwise circulation is excavated second and is reached packet later on, reaches requirement up to total discrimination.
Described steps A comprises the following steps:
A1. catch training data bag set and the training data bag is stored in the flow structure body after set is divided by stream;
A2. utilize the flow that mixes that mixes in the set of traffic filtering method filtration training data bag;
A3. utilizing position-based that the byte in the packet that extracts is carried out Methods for Coding encodes to application layer load;
A4. to extracting, extract accurate agreement recognition feature data message through the information of coding back data.
In the described steps A 2, mix the traffic filtering method and comprise the following steps:
A21. filter out the content of the stream that satisfies http protocol and File Transfer Protocol;
A22. filter out the stream that does not have complete three-way handshake in the TCP stream.
In the steps A 21, the described content that filters out the stream that satisfies File Transfer Protocol comprises the following steps:
Filter out the packet of the flow structure body that adopts the PASV pattern communication;
Filter out the packet of the structure that adopts 20,21 ports.
The determination methods of the flow structure body of the PASV pattern communication of described employing FTP comprises the following steps:
Seek the flow structure body that adopts 21 ports, judge whether the packet that belongs to this structure has the packet with 227 beginnings;
If have then further judge whether it is the response packet of PASV pattern, if then just comprising server end in this packet prepares to carry out IP address and the port numbers that the PASV mode data is connected with client, the purpose IP address of this packet also is the IP address of client simultaneously, the FTP data connect the employing Transmission Control Protocol, write down this four data;
After having traveled through all flow structure bodies that adopt 21 ports, obtain the stream information that all adopt the PASV pattern communication of FTP.
The described packet that filters out the flow structure body that adopts the PASV pattern communication comprises the following steps:
The data of the PASV pattern of the employing FTP of each stream and record are connected stream information to be compared, if the stream information with the PASV mode data connection of writing down is identical respectively for four in the five-tuple information in the flow structure body, assert that then this stream is the stream that adopts the PASV pattern communication of FTP, abandons all packets in this stream.
In the described steps A 3, position-based carries out Methods for Coding to the byte in the packet that extracts, and comprises the following steps:
Value with the byte represented with two hexadecimal numbers in the packet, be encoded to the individual event of 5 character representations of a usefulness, the 1st character is I from the left side, expression Item, which position is this byte of second and third character representation be in above extracted top n byte, use hexadecimal representation, count from zero, if be zero less than 16 second; Fourth, fifth character is two hexadecimal characters of original byte.
In the described steps A 4, described accurate agreement recognition feature data message comprises the process information encoded with same offset packet, and relevant statistics auxiliary data information;
The accurate agreement recognition feature of described extraction data message comprises the following steps:
A41. TCP stream is extracted accurate agreement recognition feature data message, and import in the transaction database;
A42. UDP stream is extracted accurate agreement recognition feature data message, and import in the transaction database.
Described step B comprises the following steps:
Step B1 sets earlier an initial frequent rate, multiply by total number of affairs in the database by initial frequent rate, rounds then and obtains initial frequent degree; The frequent degree of each individual event in the calculated data storehouse filters out the individual event of frequent degree less than initial frequent degree, and remaining each individual event is called frequent of one-level, and the set that all one-levels are frequent is called the one-level frequent item set;
Step B2, from K-1 level frequent item set, select frequent A of two K-1 levels and B, it is identical with K-2 the individual event of the frequent B of K-1 level that the frequent item of these two K-1 levels must satisfy preceding K-2 the individual event of the frequent A of K-1 level, K-1 the individual event of the frequent A of K-1 level is different with the position of K-1 the individual event of the frequent B of K-1 level, if the position of K-1 the individual event of B is greater than the position of K-1 the individual event of A, then with K-1 the individual event of B, the back that is attached to A obtains frequent of accurate K level, calculate the frequent degree of frequent of this accurate K level, if more than or equal to initial frequent degree, then it is frequent of K level really, according to said method obtains frequent of all K levels, forms K level frequent item set; Wherein, K 〉=2.
Described affairs comprise through the byte of coding, bag length, byte percentage and four attribute fields of bag percentage, the data that storage is extracted.
Described step C comprises the following steps:
C1. utilize frequent spending to revise and a frequent filter method, described multistage frequent item set is carried out the frequently filtration first time, and frequent degree correction and frequent filtration second time;
C2. the frequent item in the frequent item set after will revising and filtering is converted into feature string, excavate tolerance relationship characteristic in long feature of absolute bag and the Bao Changyu, and mark respective transaction, to filter corresponding to the accurate agreement recognition feature data message of these affairs then, obtain final agreement recognition feature.
Described step C1 comprises the following steps:
Step C11 carries out frequent of the first time to described multistage frequent item set and filters;
Step C12 frequently spends a correction and a frequently filtration for the second time to the multistage frequent item set after frequent item filters for the first time, eliminates the inclusion relation between frequent item.
Described step C12 comprises the following steps:
Step C121 utilizes following formula that 1 grade frequent frequent degree is revised;
freq new = k 1 &times; freq old ; pos 0 = 0,0.9 &le; k 1 < 1 f ( pos 0 ) &times; freq old ; pos 0 &NotEqual; 0,0 < f ( pos 0 ) &le; k 1
Wherein, freq NewNew frequent degree later, freq are revised in expression OldFrequent frequent degree before expression is revised, pos iRepresent the position of a frequent Xiang Zhongdi i individual event, i starts from scratch, and k represents the number of individual event in frequent; The position of the individual event numbering of starting from scratch, k 1Be a constant, f (pos 0) be a continuous monotone decreasing function;
Step C122 utilizes following formula that 2 grades frequent frequent degree is revised;
Figure A20081010605800111
Wherein, k 2Be a constant, f ((pos 1-pos 0)) be a continuous monotone decreasing function;
Step C123 at first, utilizes following formula to ask average distance between frequent discipline and the item, is designated as ave Dist
ave dist = &Sigma; i = 1 k - 1 pos i - pos i - 1 k - 1
Described distance is meant in the item absolute value of the difference of two positions of facing individual event mutually;
If ave Dist≠ 1, utilize following formula to ave DistRevise;
Figure A20081010605800113
Wherein, k 3, k 4It is constant;
Then, 3,4 grades frequent frequent degree utilized following formula correction;
Wherein, f 1(k) be continuous monotonic increasing function about k; f 2(ave Dist) and f 3(ave Dist) be about ave DistContinuous monotone decreasing function, and satisfy f 2(ave Dist)>f 3(ave Dist);
Step C124 filters out little frequent of frequent degree in inclusion relation frequent.
Described step C2 comprises the following steps:
Step C21, frequent item in the frequent item set after revising and filtering is converted into feature string, retrieval is met the affairs of this feature string from transaction database, excavate tolerance relationship characteristic in long feature of absolute bag and the Bao Changyu, and the affairs that will satisfy tolerance relationship characteristic in long feature of absolute bag and the Bao Changyu are carried out mark;
Step C22 will filter out corresponding to the accurate agreement recognition feature data message of these affairs and repeat to excavate feature, weak feature and the doubtful feature that mixes, and obtain final agreement recognition feature.
Beneficial effect of the present invention is:
1. has very high efficient by every kind of recognition characteristic of application layer protocol of methods analyst of the present invention, use technology of the present invention to make and realize that periodically updating of every kind of agreement recognition feature becomes reality in the feature database, for realizing that fully, accurately discerning all flows lays the foundation;
2. recognition characteristic of application layer protocol completeness, reliability that method of the present invention draws all have great raising than current techniques.Because this method is to excavate recognition characteristic of application layer protocol on the basis that mass data is extracted, analyzed, can access more complete, complete recognition characteristic of application layer protocol, and before drawing final recognition characteristic of application layer protocol, a plurality of possible features have been carried out multistage filtering and checking, its reliability also has great raising than the reliability that current techniques is only analyzed the feature that the finite data bag draws;
3. method of the present invention has proposed the index of tolerance recognition characteristic of application layer protocol correctness: discrimination, accuracy rate, positive false recognition rate, negative false recognition rate, in all its bearings the recognition characteristic of application layer protocol that draws is automatically weighed, make the agreement recognition feature that draws automatically promptly can reach high recognition, guaranteed simultaneously higher accuracy rate and lower false recognition rate again, made that the correctness of the recognition characteristic of application layer protocol of identification has had basis for estimation automatically;
4. method of the present invention not only is the manual analysis process of current identification protocol feature is converted into the process that the computer automatic mining is handled recognition characteristic of application layer protocol, more be simultaneously the aspects such as accuracy, completeness and reliability of the agreement recognition feature that it draws, increasing along with identification protocol before this makes, the phenomenon that false recognition rate between the agreement increases has very big change, this method has been reduced to the false recognition rate between the agreement in the very little scope, makes reliability be greatly improved.
Description of drawings
Fig. 1 is the workflow diagram of method for digging recognition characteristic of application layer protocol of the present invention;
Fig. 2 is the workflow diagram of training data extraction step among the present invention;
Fig. 3 is the workflow diagram of accurate feature primary filter step among the present invention;
Fig. 4 is the workflow diagram that accurate feature secondary excavates filtration step among the present invention.
Embodiment
In order to make purpose of the present invention, technical scheme and advantage clearer,, a kind of method for digging recognition characteristic of application layer protocol of the present invention is further elaborated below in conjunction with drawings and Examples.Should be appreciated that specific embodiment described herein only in order to explanation the present invention, and be not used in qualification the present invention.
The present invention proposes a kind of method for digging recognition characteristic of application layer protocol, this method only need capture the packet set in this application layer protocol communication process, by this packet set is analyzed, is excavated, can extract all recognition features of this application layer protocol, feature extraction efficient and overall discrimination all have very big raising than prior art.
For better a kind of method for digging recognition characteristic of application layer protocol of explanation the present invention, below at first to the term that uses among the present invention and the several characteristic that can excavate describe:
Stream: refer to a five-tuple that comprises source IP, purpose IP, source port, destination interface and the agreement of communicating pair.
The set of training data bag: the packet of the communication process of a concrete agreement of catching is gathered, and gathers with this training data bag of catching and excavates recognition characteristic of application layer protocol.
Affairs: a record in the transaction database, affairs comprise byte, bag length, byte percentage and four attribute fields of bag percentage through coding, the data that storage is extracted in the present invention.
Individual event: the byte B in the application layer load 1Through the expression behind the coding, first is character I, expression Item, and the hexadecimal value of second and third bit representation is called the position of item, fourth, fifth bit representation byte B 1Hexadecimal value, for example: I0039.
: the sequence by one or more individual events are formed, separate (as comma) with punctuation mark between the individual event in the sequence, and the relation that increases progressively is from left to right satisfied in the position of individual event in the sequence.
Frequent degree: the number of times that item occurs in transaction database.
Frequent: a frequent item of spending more than or equal to initial frequent degree (a predefined positive integer).
Frequent of K level: frequent of comprising K individual event.
Frequent item set a: set of forming by frequent items at different levels.
Feature string: the fixed position occurs in application layer load, can identify the combination of the character of this agreement.
The long feature of absolute bag: the length of the application layer load of the packet of all satisfied feature strings all equates, just claims it to satisfy the long feature of absolute bag.The long feature of this absolute bag is the assemblage characteristic with feature string, so necessarily satisfy a character string feature when satisfying the long feature of absolute bag, its very effective mistake that reduced is discerned, and has improved the identification accuracy simultaneously.
The difference relationship characteristic of bag length and content: all satisfy the packet of feature strings, if the value at place, a fixed position all differs the value of a fixed size in the length of its application layer load and the application layer load, just claim it to satisfy the difference relationship characteristic of bag length and content.This bag is long and interior tolerance relationship characteristic is and the assemblage characteristic of feature string, so necessarily satisfy a character string feature when satisfying the difference relationship characteristic that wraps length and content, its very effective mistake that reduced is discerned, and has improved the identification accuracy simultaneously.
The present invention proposes a kind of method for digging recognition characteristic of application layer protocol, can excavate the following three kinds of features of extraction: the difference relationship characteristic of feature string, the long feature of absolute bag, bag length and content.Introduce a kind of recognition characteristic of application layer protocol automatic mining of the present invention method below in conjunction with above-mentioned target, as shown in Figure 1, comprise the following steps:
Step S100, utilize mix the coding method of traffic filtering methods and applications layer load to the training data bag set filter and encode, extract accurate agreement recognition feature data message;
As shown in Figure 2, described step S100 comprises the following steps;
Step S110 catches training data bag set and the training data bag is stored in the flow structure body after set is divided by stream;
From the IP datagram header of each packet, extract its source IP, purpose IP's and protocol number.If being 6 expression transport layers, protocol number adopts Transmission Control Protocol, extraction source port numbers and destination slogan from the TCP header; If being 17 expression transport layers, protocol number adopts udp protocol, extraction source port numbers and destination slogan from UDP header; If protocol number then abandons this packet for other value.
The packet that will have identical sources IP, purpose IP, source port, destination interface and agreement is divided into a class, and arrives by it and successively to be stored in the flow structure body, with the five-tuple of the stream label as it.
Step S120 utilizes the flow that mixes that mixes in the set of traffic filtering method filtration training data bag;
Step S121 filters out the content of the stream of determining to satisfy http protocol and File Transfer Protocol;
Because present many softwares much all embed some advertisements on the page, the http protocol transmission is adopted in these advertisements, and some software is also supported to download or transfer of data based on the file of http protocol or File Transfer Protocol simultaneously, though this causes only adopting a kind of software to communicate, but produced the flow of different agreement, if and have software to adopt any in this dual mode, HTTP that produces in the communication or File Transfer Protocol flow generally can account for the bigger ratio of total flow, if it is not filtered out, will excavate HTTP or File Transfer Protocol feature when excavating this protocol characteristic, this will cause when Real time identification HTTP or File Transfer Protocol being identified as this agreement.
Therefore, when catching the set of training data bag, be the packet that adopts same application layer protocol communication basically for making the packet of catching, only open a kind of application software, need filter out the content of the stream that satisfies http protocol and File Transfer Protocol.
Experiment shows that present other application layer protocol produces mixes flow to account for the ratio of total flow very little, can not cause big influence to excavating the result, filters by weak feature just can filter out them.
Step S121A at first filters out the content of the stream of http protocol;
Judge each TCP flow structure body successively, if satisfy following two data packet discardings that condition then comprises this structure: 1) this flow structure body contains 80 ports; 2) in this structure the 4th packet to begin be among GET, HEAD, POST, PUT, PATCH, COPY, MOVE, DELETE, LINK, UNLINK, the OPTION any.
Step S121B filters out the content of the stream of File Transfer Protocol then;
(1) filters out the packet of the flow structure body that adopts the PASV pattern communication;
(2) filter out the packet of the structure that adopts 20,21 ports;
Judge the method for the flow structure body of the PASV pattern communication that adopts FTP:
At first seek the flow structure body that adopts 21 ports, judge whether the packet that belongs to this structure has the packet with 227 beginnings;
Then further judge whether it is the response packet form of File Transfer Protocol response packet (promptly according to) of PASV pattern if having, if then just comprising server end in this packet prepares to carry out IP address and the port numbers that the PASV mode data is connected with client, the purpose IP address of this packet also is the IP address of client simultaneously, the FTP data connect the employing Transmission Control Protocol, so far adopt the five-tuple information of stream of the PASV pattern communication of FTP to obtain 4, write down this four data; After having traveled through all flow structure bodies that adopt 21 ports, can obtain the stream information that all adopt the PASV pattern communication of FTP;
To the structure packet of 20 ports, Direct Filtration is fallen.
The data of the PASV pattern of each stream and the employing FTP of described record are connected stream information to be compared, if in the five-tuple information in the flow structure body four stream informations that connect with the PASV mode data of described record respectively are identical, assert that then this stream is the stream that adopts the PASV pattern communication of FTP, abandons all packets in this stream.
Step S122 filters out the stream that does not have complete three-way handshake in the TCP stream;
According to the flag bit of the three-way handshake packet of Transmission Control Protocol, judge whether each TCP stream is the stream with complete three-way handshake, filter out the TCP stream that does not have complete three-way handshake packet;
Step S130 utilizes position-based that the byte in the packet that extracts is carried out Methods for Coding application layer load is encoded;
In the header of agreement, the identical data of diverse location is represented different meanings, and the content that the present invention analyzes much relates to the header of agreement, and the result that draws with searching frequent item set method under this external not coding situation also is insignificant for problem of the present invention.To the present invention proposes a kind of position-based the byte in the packet that extracts is carried out Methods for Coding in order to address this problem, and utilize this method that all packet contents are encoded, it not only makes a distinction the identical bytes of diverse location, and marginal to the automatic generation of the filtration of the accurate recognition characteristic of application layer protocol in back, recognition characteristic of application layer protocol.
Described coding method is: the value of a byte is represented with two hexadecimal numbers in the packet originally, be this byte code an individual event that 5 characters are arranged now, the 1st character is I from the left side, expression Item, which position is this byte of second and third character representation be in above extracted top n byte, use hexadecimal representation, count from zero, if be zero less than 16 second; Fourth, fifth character is two hexadecimal characters of original byte.
Step S140 to extracting through the information of coding back data, extracts accurate agreement recognition feature data message.
Accurate agreement recognition feature data message comprises the process information encoded with same offset packet, and relevant statistics auxiliary data information.
To crossing the stream that filters, according to the upper-layer protocol type in the IP datagram header, be divided into TCP stream and UDP stream, and carry out accurate agreement recognition feature data message respectively and extract, comprise the steps:
Step S141 extracts accurate agreement recognition feature data message to TCP stream, and imports in the transaction database;
TCP stream extracted have process the information encoded of same offset packet, promptly i packet top n of the stream total data bag number that accounts in the set of training data bag all percentages with total bytes that complete three-way handshake TCP flows, this stream through the bag total bytes long, this stream of the byte of coding, an i packet accounts for the training data bag all has the percentage of the total data bag number that complete three-way handshake TCP flows in gathering.
Then, difference according to the i value imports in the different table of transaction database, preferably, described wherein 4≤i≤(n+4), unwrapping from the 4th that the beginning extracts is in order not extract the data of the three-way handshake packet that connects, i is (n+4) to the maximum, n packet extracted in expression altogether, extracting n packet is that iteration is proceeded to excavate to the content of n packet with the 2nd, 3 when being not enough to effective identification protocol for the information that obtains the 1st packet excavation;
Step S142 extracts accurate agreement recognition feature data message to UDP stream, and imports in the transaction database.
Extraction and the extraction of TCP stream to UDP stream are similar, but 1≤i≤n, because the process that UDP does not connect;
Step S200 excavates from the accurate agreement recognition feature data message that extracts and obtains multistage frequent item set;
Feature string (for example Gu Ding version number, state etc.) is one of modal recognition feature, it generally can occur at the header portion of agreement, and each bar data that previous step is extracted all are the protocol header partial data most probably, if can in these data, find out the combining characters string of frequent appearance, then these character strings just are likely the feature of this agreement of sign, and the problem of therefore extracting feature string can be converted into the problem of the data mining analysis of encoding being sought frequent item set.
Excavation of the present invention obtains one-level frequent item set processing method: set earlier an initial frequent rate, multiply by total number of affairs in the transaction database by initial frequent rate, round then and obtain initial frequent degree; The frequent degree of each individual event in the calculated data storehouse filters out the individual event of frequent degree less than initial frequent degree, and remaining each individual event is called frequent of one-level, and the set that all one-levels are frequent is called the one-level frequent item set;
The method for digging that K (K 〉=2) level is frequent is: select frequent A of two (K-1) level and B from (K-1) level frequent item set, it is identical with (K-2) individual individual event of the frequent B of (K-1) level that frequent of this two (K-1) level must satisfy preceding (K-2) individual individual event of the frequent A of (K-1) level, (K-1) (K-1) individual individual event of the frequent A of level is different with the position of (K-1) individual individual event of the frequent B of (K-1) level, if the position of (K-1) individual individual event of B is greater than the position of (K-1) individual individual event of A, then with (K-1) individual individual event of B, the back that is attached to A obtains accurate K level frequent (being made up of K individual event), calculate the frequent degree of frequent of this accurate K level, if more than or equal to initial frequent degree, then it is frequent of K level really, according to said method obtain frequent of all K levels, form K level frequent item set.
For example, the method for digging that secondary is frequent is: optional two one-levels are frequent from the one-level frequent item set, the opsition dependent composition of relations becomes frequent of accurate secondary, scan transaction database then, calculate the frequent degree of frequent of this accurate secondary,, determine that then frequent of this accurate secondary is frequent of secondary really if more than or equal to initial frequent degree, according to said method can obtain frequent of all secondarys, form the secondary frequent item set.
Because if there is frequent of secondary, then it must be the combination of frequent of two one-levels, and because the characteristics of encoding among the present invention, the position of the frequent discipline of secondary must be satisfied from left to right by arranging from small to large, rather than the arbitrary combination of the frequent item of one-level, this has just reduced the combination number of times greatly, has improved treatment effeciency.
Three grades frequent method for digging is: select frequent A of two secondarys and B from the secondary frequent item set, first individual event of the frequent A of the frequent meeting tier 2 of these two secondarys is identical with first individual event of the frequent B of secondary, second individual event of the frequent A of secondary is different with the position of second individual event of the frequent B of secondary, if the position of second individual event of B is greater than the position of second individual event of A, then with second individual event of B, the back that is attached to A obtains accurate three grades frequent (being made up of three individual events), calculate the frequent degree of three grades frequent of this standard, if more than or equal to initial frequent degree, then it is three grades frequent really, according to said method obtain three grades of all frequent items, form three grades of frequent item sets.
Frequent of level Four reaches frequent-item method later at different levels therewith roughly the same.
Iterative processing is when looking for not longer frequent, and processing finishes.Frequent item set more than or equal to a set point is just frequently spent in frequent combinations at different levels, and will comprise the more relatively frequent item of individual event number and be called senior frequent, comprise less frequent of individual event number and be rudimentary frequent.
Step S300, utilize frequent degree to revise and a frequent filter method, described multistage frequent item set is carried out the first time filter,, obtain the final agreement recognition feature the frequent degree of the remaining multistage frequent item in filtration back is revised for the first time, secondary excavates and a frequently filtration for the second time;
Step S310 utilizes frequent spending to revise and a frequent filter method, and described multistage frequent item set is carried out the frequently filtration first time, and frequent filtration of the frequent degree correction and the second time;
This step comprises a frequent for the first time filter process, and frequent degree is revised and a frequent for the second time filter process.
A frequent for the first time filter process filters redundant accurate feature and complete zero accurate feature;
Frequent degree is revised and a frequent for the second time filter process: 4 grades frequent and following frequent degree at different levels frequent are revised, filtered out the relatively low feature of reliability by a frequent filter method for the second time then.
The reduction that frequent degree modification method is strong the frequent degree of unreliable feature, guarantee that unreliable feature being filtered in the frequent filter process for the second time, has reduced accurate feature number greatly.Make the Feature Recognition accuracy be greatly improved, reduced the probability that feature is intersected between the different agreement simultaneously.
As shown in Figure 3, described step S310 comprises the following steps.
Step S311 carries out frequent of the first time to described multistage frequent item set and filters;
Handle the multistage frequent item set that obtains among the step S200, very big redundancy is arranged, promptly comprise a lot of redundancy features, need carry out frequent of the first time to it and filter.
Described redundancy feature is meant: if the frequent degree of a frequent item equates with senior frequent the frequent degree that comprises it then claim frequent item to be redundancy feature.
Because senior frequent existence just means frequent degree and is not less than the frequently existence of item of norator of the frequent degree of current senior frequent item.
A frequent for the first time filtration is checked successively from frequent beginning of one-level and is filtered out all redundancy features.Because this is of equal value the filtration, can not produce losing of feature.
Frequent for the first time in addition is filtered frequent that also will filter out content complete zero.
Frequent of described complete zero is meant: back two of each individual event is zero entirely in frequent.
Adopted possibility is very little in the at first complete zero feature reality, because it can only identify information seldom; Secondly complete zero feature is insecure in practice, because there are a lot of agreements to adopt zero padding, a large amount of complete zero frequent all occur in the feature mining analytic process to many agreements, if its filtration can not caused the intersection mistake identification between agreement.
Step S312 frequently spends correction to the multistage frequent item set after frequently item filters for the first time, carries out the frequently filtration second time then, the inclusion relation between the frequent item of elimination;
After frequent item filters for the first time, still may have the frequent item of mutual inclusion relation, this never allows in final result, otherwise senior frequent item is with automatic conductively-closed, rudimentary frequent frequent Du Genggao that occurs, but the identification of senior frequent more difficult generation mistake.
Therefore, the present invention proposes 4 grades frequent and frequent frequent degree below 4 grades, based on its frequent degree of average distance correction between individual event among original position and frequent and the individual event.
Because rudimentary frequent the senior frequent item of likelihood ratio that produces mistake identification is big, and rudimentary frequent item is generally more, some rudimentary frequent original positions are bigger than the distance between big and individual event and the individual event, and frequent so general discontented football association view recognition feature, their existence can produce higher mistake identification, frequently spends to reach the purpose that it is filtered out so can reduce it according to these values.With length 4 is because length is generally less greater than 4 frequent as the boundary of revising frequent degree, frequently spend relatively low, and its to cause other maximum probability of mistake be 1/2 40Even do not filter and can not cause big mistake identification yet.
Further, described frequent degree to 4 grades of frequent items and frequent item below 4 grades is frequently spent correction and the frequent item filtration second time, comprises the steps:
Step S3121 utilizes formula (1) that 1 grade frequent frequent degree is revised;
freq new = k 1 &times; freq old ; pos 0 = 0,0.9 &le; k 1 < 1 f ( pos 0 ) &times; freq old ; pos 0 &NotEqual; 0,0 < f ( pos 0 ) &le; k 1 - - - ( 1 )
Wherein, freq NewNew frequent degree later, freq are revised in expression OldFrequent frequent degree before expression is revised, pos iRepresent the position of a frequent Xiang Zhongdi i individual event, i starts from scratch, and k represents the number of individual event in frequent.The position of the individual event numbering of starting from scratch, k 1Be a constant, f (pos 0) be a continuous monotone decreasing function.
Step S3122 utilizes formula (2) that 2 grades frequent frequent degree is revised;
Figure A20081010605800201
Wherein, k 2Be a constant, f ((pos 1-pos 0)) be a continuous monotone decreasing function.
To original position at preceding four, and two frequent items that the individual event position is faced mutually, do not reduce its frequent degree, to frequently spend rising on the contrary, purpose is for eliminating frequent degree than frequent of its big slightly one-level, raising this frequent degree can be not influential to the frequent degree of the more advantage that comprises it, even the frequent degree than frequent of the length that comprises is not big yet because its frequent degree does not raise.
Equally, in following step, increase the frequent purpose of spending therewith roughly the same.
Step S3123 utilizes formula (5) that 3,4 grades frequent frequent degree is revised;
At first, utilize formula (3) to ask average distance between frequent discipline and the item, be designated as ave Dist
ave dist = &Sigma; i = 1 k - 1 pos i - pos i - 1 k - 1 - - - ( 3 )
Described distance is meant in the item absolute value of the difference of two positions of facing individual event mutually.
If ave Dist≠ 1, utilize formula (4) to ave DistRevise, reduce suitably to guarantee frequent degree, and be unlikely to once to reduce too big;
Figure A20081010605800203
Wherein, k 3, k 4It is constant.
Correction formula to 3,4 grades frequent frequent degree is formula (5)
Figure A20081010605800204
Wherein, f 1(k) be continuous monotonic increasing function about k; f 2(ave Dist) and f 3(ave Dist) be about ave DistContinuous monotone decreasing function, and satisfy f 2(ave Dist)>f 3(ave Dist).
Step S3124 filters out little frequent of frequent degree in inclusion relation frequent.
After process was frequently spent and revised, more any two frequent degree of frequent with inclusion relation were big or small, and the utilization frequent filter method second time promptly filters out and wherein frequently spends less frequent item.
Step S320, frequent item in the frequent item set after revising and filtering is converted into feature string, excavate tolerance relationship characteristic in long feature of absolute bag and the Bao Changyu, and mark respective transaction, to filter corresponding to the accurate agreement recognition feature data message of these affairs then, obtain final agreement recognition feature.
This step comprises that the affairs that satisfy feature string are carried out secondary to be excavated, and excavates tolerance relationship characteristic in long feature of absolute bag and the Bao Changyu, and the affairs that will satisfy tolerance relationship characteristic in long feature of absolute bag and the Bao Changyu are carried out mark; And will filter out weak feature corresponding to the accurate agreement recognition feature data message of these affairs, repeat to excavate feature, the doubtful feature that mixes, reduce false recognition rate, obtain final agreement recognition feature.
The effect of this step is accuracy and the reliability that increases agreement recognition feature feature, improve the accuracy rate of discerning greatly, reduced false recognition rate, the method that tolerance relationship characteristic in long feature and the Bao Changyu is definitely wrapped in the excavation that proposes in this step is the assurance of accurate identification protocol, do not have this method, the false recognition rate between the agreement will sharply rise; The weak feature that proposes in this step in addition, repeat to excavate feature and the doubtful feature filter method that mixes has also reduced false recognition rate between agreement to a certain extent.
Step S321, frequent item in the frequent item set after revising and filtering is converted into feature string, retrieval is met the affairs of this feature string from transaction database, excavate tolerance relationship characteristic in long feature of absolute bag and the Bao Changyu, and the affairs that will satisfy tolerance relationship characteristic in long feature of absolute bag and the Bao Changyu are carried out mark;
Will be through frequent filtration for the first time, and each the frequent item after frequent correction second time and the filtration is converted into a feature string, scan each affairs in the transaction database successively, to satisfying the affairs of this feature string, the actual packet that at first writes down the pairing packet of these affairs is long, secondly with actual content of the top n byte extracted (be after each individual event two value) since the 0th byte, combined two bytes that front and back are faced mutually, be combined into (N-1) individual big end syllable sequence (Big-Endian) double byte signless integer and (N-1) individual small end syllable sequence (Little-Endian) double byte signless integer respectively, and be provided with two and can store that 2n * (N-1) storage organization of individual value is stored.With long (N-1) the individual big end syllable sequence signless integer that deducts the front respectively of the pairing packet bag of these affairs, if its value is at [n, n-1] between then in the above in the storage organization corresponding position counting add one, and write down its position, and (N-1) individual small end syllable sequence signless integer is done identical processing in corresponding position.
If there is the long feature of absolute bag, the actual packet length that then satisfies the pairing packet of all affairs of this feature string is that the actual packet length of an identical value or the pairing packet of most affairs is an identical value.If there is the difference relationship characteristic of Bao Changyu content, if its satisfy with original position be that 2 big end syllable sequence double byte signless integer difference is 5 feature, then satisfy the actual packet of all or the pairing packet of most affairs of this feature string long with original position be that 2 big end syllable sequence double byte signless integer difference all is 5.
By all affairs in the transaction database are scanned after one time, just can judge in all affairs that satisfy this feature string, whether the pairing bag appearance of affairs of the overwhelming majority (a definite percentage) etc. is arranged, if, illustrate and satisfy the long feature of absolute bag, will definitely wrap long feature and be attached to after the feature string; If not, the affairs that judge whether the overwhelming majority again exist its Bao Changyu one fixedly the section start double byte value differ the value of a fixed size, if then satisfy the difference feature between the Bao Changyu content, want earlier this moment the feature string correction, modification method is to remove the individual event that position in the feature string is the position of tolerance relationship characteristic in the Bao Changyu, and then the difference relationship characteristic of affix Bao Changyu content.
The total data bag number that the scanning transaction database recomputates the frequent degree of corrected feature, stream that all satisfy this feature accounts for the percentage of the pairing total bytes of total byte this agreement of sum of the percentage of the pairing total data bag number of this agreement (TCP or UDP), stream that all satisfy this feature, and the affairs that satisfy this new feature are marked.
Step S322 will filter out corresponding to the accurate agreement recognition feature data message of these affairs and repeat to excavate feature, weak feature and the doubtful feature that mixes, and obtain final agreement recognition feature.
The last step is filtered the remaining accurate agreement recognition feature data message feature in back be referred to as the accurate feature in the second level, to any two accurate features in the second level, write down the number of times that they occur simultaneously in transaction database, account for the p of the frequent degree of that less accurate feature of frequent degree as if the number of times that occurs simultaneously 1More than the %, wherein, p 1A predefined value, 80≤p 1≤ 100, think that then this two accurate features are actually repetition, they are found from essentially identical affairs, claim the little second level standard of frequent degree to be characterized as and repeat to excavate feature, filter out this and repeat to excavate feature.
After filtering, judge successively more whether the accurate feature in each second level only is one 1 grade frequent, if then be called weak feature,, can cause very high mistake identification, so it will be filtered out because the probability of its appearance is 1/256; For the accurate feature in remaining all second level, judge that whether the total data bag and the total bytes that satisfy accurate all streams of feature in this second level account for the percentage of pairing total data bag of this agreement and total bytes respectively less than p simultaneously 2% (wherein, p 2A predefined value), claim less than p as if the accurate feature in any second level 2%, then this is characterized as the doubtful feature that mixes, producing this doubtful reason that mixes feature may be the flow that has been mixed into other agreement in the set of training data bag, it to be filtered out that (experimental result shows that the accurate feature that a lot of underfeds 1% are arranged has caused the identification of very big mistake, after filtering out these accurate features, discrimination does not reduce basically, but false recognition rate becomes very low), the final agreement recognition feature of remaining accurate feature as this agreement.
Step S400 if the byte discrimination of all final agreement recognition features all reaches requirement, when perhaps identification of data packets rate summation reaches requirement, then no longer excavates second and reaches the data of packet later on; Otherwise circulation is excavated second and is reached packet later on, reaches requirement up to total discrimination.
After revising, excavate and filtering, if the byte discrimination or the identification of data packets rate summation of all final agreement recognition features all reach p 3Above (wherein, the described p of % 3Be a predefined value, 80≤p 3≤ 100), then no longer excavate second and reach the data of packet later on, otherwise second of circulation excavation reaches packet later on, reaches p up to total discrimination 3More than the %, draw corresponding tag file in the final agreement recognition feature generating feature storehouse with excavating then.So far just can move the flow recognizer carries out real-time online and has discerned.
Described with relevant tag file in the final agreement recognition feature generating feature storehouse, and utilize the flow recognizer to carry out real-time online identification, be a kind of prior art, therefore, in embodiments of the present invention, describe in detail no longer one by one.
Generate the application layer protocol that a sudden peal of thunder adopts Transmission Control Protocol to communicate with automatic analysis below, the extraction recognition feature is an example, and method for digging recognition characteristic of application layer protocol of the present invention is further explained.
Step S100 ' extracts the satisfy condition data of TCP stream of a sudden peal of thunder;
The partial data of first packet of each stream is as shown in table 1 in 804 TCP streams that an extraction sudden peal of thunder satisfies condition.
Table 1:
Affairs Top n is through the byte of coding Bag is long Byte percentage Bag percentage
1 I0038,I0100,I0200,I0300,I040d,I0500, I0600,I0700,I0884,I09ab,I0a0c,I0b00 21 0.007039% 0.00597%
2 I0038,I0100,I0200,I0300,I0472,I0500, I0600,I0700,I0864,I09e5,I0a5c,I0b00 122 0.002133% 0.000246%
3 I0038,I0100,I0200,I0300,I040d,I0500, I0600,I0700,I0884,I09f4,I0a1e,I0b00 21 0.011092% 0.00957%
4 I0038,I0100,I0200,I0300,I040d,I0500, I0600,I0700,I0884,I0933,I0a00,I0b00 21 0.014291% 0.012682%
5 I0038,I0100,I0200,I0300,I040d,I0500, I0600,I0700,I0884,I095d,I0a13,I0b00 21 0.011305% 0.011376%
6 I0038,I0100,I0200,I0300,I0472,I0500, I0600,I0700,I0864,I099d,I0a5f,I0b00 122 0.001493% 0.000159%
7 I0032,I0132,I0230,I032d,I0453,I0565, I0672,I0776,I082d,I0955,I1020,I1146 44 0.001067% 0.000103%
8 I0038,I0100,I0200,I0300,I0472,I0500, I0600,I0700,I0864,I09dc,I0a5c,I0b00 122 0.002133% 0.000246%
. . . . . . . . . . . . . . .
803 I0038,I0100,I0200,I0300,I0472,I0500, I0600,I0700,I0864,I09e5,I0a5c,I0b00 122 0.001493% 0.000269%
804 I0038,I0100,I0200,I0300,I0472,I0500, I0600,I0700,I0864,I09e5,I0a5c,I0b00 122 0.001493% 0.000269%
Step S200 ' finds out all frequent greater than a frequent degree.
Initial frequent rate is set to 0.02; Initial frequent degree=(initial frequent rate * affairs number) round=and (0.02 * 804) round=16, excavates all frequent degree greater than 16 frequent, excavates 31680 frequent among this embodiment altogether, and part is enumerated as shown in table 2.
Table 2
Sequence number Frequent degree Frequent at different levels
1 27 I0032
2 723 I0038
3 248 I040d
4 29 I0955
5 29 I0a20
. . . . . . . . .
51 27 I0032,I0132
52 27 I0032,I0230
53 640 I0000,I0200
. . . . . . . . .
397 27 I0032,I0132,I0230
398 27 I0032,I0132,I032d
. . . . . . . . .
5423 720 I0038,I0100,I0200,I0500,I0600
5424 720 I0038,I0100,I0200,I0500,I0700
5425 720 I0038,I0100,I0200,I0300,I0500,I0600,I070 0
5426 228 I0100,I0200,I0300,I040d,I0500,I0600,I070 0
. . . . . . . . .
10000 718 I0038,I0100,I0200,I0300,I0500,I0600,I070 0,I0b00
. . . . . . . . .
31678 84 I0038,I0100,I0200,I0300,I0472,I0500,I060 0,I0700, I0864,I09dc,I0a5c,I0b00
31679 63 I0038,I0100,I0200,I0300,I0472,I0500,I060 0,I0700,I0864, I09e5,I0a5c,I0b00
31680 20 I0067,I0165,I0274,I032f,I040d,I050a,I066 c,I0765,I086e, I0967,I0a74,I0b68
Step S300 ', accurate feature is filtered;
Step S310 ', the frequent item that (filtering for the first time) drew the last step carries out the first time and filters:
The frequent item that obtains of step S200 ' has the redundancy of height obviously, need filter it, filters for the first time to comprise that filtering redundancy feature and content is complete zero frequent (filter method is referring to execution mode).
Filter 1649 frequent items of filter 23 altogether in this example for the first time, the very strong number that has reduced candidate feature, and redundancy feature to filter be of equal value the filtration, guarantee that filter process does not have losing of information, filter 31 frequent of back residue, part is enumerated as shown in table 3.
Table 3:
Sequence number Frequent degree Frequent at different levels
2 723 I0038
3 248 I040d
4 29 I0955
5 29 I0a20
5425 720 I0038,I0100,I0200,I0300,I0500,I0600,I0700
5426 228 I0100,I0200,I0300,I040d,I0500,I0600,I0700
10000 718 I0038,I0100,I0200,I0300,I0500,I0600,I0700,I0b00
. . . . . . . . .
31678 84 I0038,I0100,I0200,I0300,I0472,I0500,I0600,I0700, I0864,I09dc,I0a5c,I0b00
31679 63 I0038,I0100,I0200,I0300,I0472,I0500,I0600,I0700, I0864,I09e5,I0a5c,I0b00
31680 20 I0067,I0165,I0274,I032f,I040d,I050a,I066c,I0765, I086e,I0967,I0a74,I0b68
Step S320 ', (frequently degree correction and frequently item filtration for the second time) for 4 grades of frequent items and the frequent item more than 4 grades, based on its frequent degree of the average distance correction between original position and the individual event, and carried out the second time and filtered.
The method of revising frequent is revised later as shown in table 4 referring to execution mode to frequent.
Table 4:
Sequence number Frequent degree Frequent at different levels
2 686 I0038
3 137 I040d
4 10 I0955
5 9 I0a20
5425 720 I0038,I0100,I0200,I0300,I0500,I0600,I0700
5426 228 I0100,I0200,I0300,I040d,I0500,I0600,I0700
10000 718 I0038,I0100,I0200,I0300,I0500,I0600,I0700,I0b00
. . . . . . . . .
31678 84 I0038,I0100,I0200,I0300,I0472,I0500,I0600,I0700, I0864,I09dc,I0a5c,I0b00
31679 63 I0038,I0100,I0200,I0300,I0472,I0500,I0600,I0700, I0864,I09e5,I0a5c,I0b00
31680 20 I0067,I0165,I0274,I032f,I040d,I050a,I066c,I0765, I086e,I0967,I0a74,I0b68
Sequence number 2,3,4,5 parts are through revised frequent degree in the table 4, because of other frequent comprised four or more individual event, so do not revise its frequent degree.
After frequent degree correction, filtering out any two has little frequent of frequent degree in inclusion relation frequent, filters the back result and enumerates as shown in table 5.
Table 5:
Sequence number Frequent degree Frequent at different levels
5425 720 I0038,I0100,I0200,I0300,I0500,I0600,I0700
5426 228 I0100,I0200,I0300,I040d,I0500,I0600,I0700
31680 20 I0067,I0165,I0274,I032f,I040d,I050a,I066c, I0765,I086e,I0967,I0a74,I0b68
Step S400 ', accurate feature secondary excavate and filter;
Step S410 ', whether each frequent that excavates after filtering for the second time satisfied tolerance relation in long relation of absolute bag and the Bao Changyu simultaneously.If exist above-mentioned relation then the new discovery feature to be added, and old feature (promptly frequent item) revised.
The frequent item of 5425 expressions of sequence number in the table 5 is converted into a feature string, the meaning that this feature string is represented: the 0th byte (numbering of starting from scratch) of application layer load is 0x38,1st, 2,3,5,6,7 is zero, and its frequent degree is designated as Support=720.
With this feature string is that the long characterization method of the absolute bag of example excavation is as follows:
For example can define a structure, satisfy the long distributed intelligence of bag of feature string with record, as shown in table 6.
Table 6:
Bag is long 0 1 2 3 ... 1497 1498 1499 1500
Count 0 0 0 0 ... 0 0 0 0
The scanning transaction database is to the bag of its correspondence of transaction journal of satisfying this feature string long (for example: if 21 count value unit at Bao Changwei 21 of Bao Changwei add 1).After scanning finishes the affair database, can obtain the value Count of the counting unit of count value maximum MaxWith the long P of its pairing bag LenIf, (Count Max/ Support) 〉=P 5%, (a definite percentage is as 95%) thinks that then the affairs that satisfy this feature string also satisfy absolute Bao Changwei P LenFeature, this feature is attached to feature string after, form an assemblage characteristic.
With sequence number is that 5425 frequent the feature string that transforms is that the method for the example difference relationship characteristic that excavates the Bao Changyu content is as follows:
The individual event number maximum that can comprise in affairs in the transaction database is designated as N, then comprise N individual event and satisfy the affairs of this feature string, can extract and be combined into (N-1) individual big end syllable sequence (Big-Endian) signless integer and (N-1) individual small end syllable sequence (Little-Endian) signless integer one.(combined method is referring to execution mode in detail) available following mode is represented: table 7 expression (N-1) individual big end syllable sequence signless integer, sequence number is also represented the position simultaneously.For example: if affairs comprise following three individual event I0001, I0102, I0203, then it can form two big end syllable sequence double byte signless integers and two small end syllable sequence double byte signless integers, the big end syllable sequence double byte signless integer of being extracted combination by I0001 and I0102 is 18, and sequence number is 0 (equaling in two individual events that are combined the position of the individual event that the position is less), the small end syllable sequence double byte signless integer of being extracted combination by I0001 and I0102 is 33, and sequence number is 0.Table 8 expression (N-1) individual small end syllable sequence double byte signless integer.
The individual big end syllable sequence signless integer table of table 7 (N-1)
Sequence number (position) Value (holding syllable sequence greatly)
0 BigVal 0
1 BigVal 1
... ...
N-2 BigVal N-2
N-1 BigVal N-1
The individual small end syllable sequence of table 8 (N-1) signless integer table
Sequence number (position) Value (small end syllable sequence)
0 LittVal 0
1 LittVal 1
... ...
N-2 LittVal N-2
N-1 LittVal N-1
The pairing bag length of each affairs is designated as P in the database LenThe structure that needs two overall situations in addition is with writing down the difference relation of wrapping long and top 2 * (N-1) values respectively.For example: the P of every affairs of available following structure record LenWith the difference relation of (N 1) individual big end syllable sequence double byte signless integer, as shown in table 9.
Table 9 difference relation table
Sequence number (position) -n -(n-1) ... 0 1 ... (n-1)
0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0
... 0 0 0 0 0 0 0
N-2 0 0 0 0 0 0 0
N-1 0 0 0 0 0 0 0
N represents a positive integer in the table 9, note (P Len-BigVal i)=MinVal i, if-n≤MinVal i<n is then at the capable MinVal of i iColumn count adds 1, to obtaining a maximum count value Count behind the scan database one time Max, it is capable that it is arranged in table 9 structure Pos, the Val row.If (Count Max/ Support) 〉=P 6%, (a definite percentage is as 95%), then think the affairs that satisfy this feature string also satisfy the Pos of Bao Changyu application layer load and (Pos+1) the big end syllable sequence double byte signless integer difference formed of individual byte be the feature of Val.Whether comprise the position in the judging characteristic character string and equal Pos or value (Pos+1), if having these two locational values deleted from feature string, again the difference feature of Bao Changyu content is attached to feature string after, form assemblage characteristic.
The total data bag number that the scanning transaction database recomputates the frequent degree of corrected feature, stream that all satisfy this feature accounts for the percentage (i.e. the byte discrimination of this accurate feature) of the pairing total bytes of total byte this agreement of sum of the percentage (i.e. the bag discrimination of this accurate feature) of the pairing total data bag of this agreement number, stream that all satisfy this feature, and the affairs that satisfy this feature are marked.It is as shown in table 10 that accurate feature was revised in the excavation of process secondary.
Table 10:
Sequence number Frequent degree The bag discrimination The byte discrimination Accurate feature
5425 720 99.5% 99.94% I0038,I0100,I0200,I0600, I0700*BE*BG:3*MV:8*LE*BG:4*MV:8
5426 228 18.79% 20.58% I0100,I0200,I0300,I040d,I0500,I0600, I0700*APL:21
31680 20 0.04% 0% I0067,I0165,I0274,I032f,I040d,I050a, I066c,I0765,I086e,I0967,I1074, I1168*APL:75
Step S420 ' filters weak feature and repeats to excavate feature.
Filter out the accurate feature that only comprises frequent an of one-level, the probability that this accurate feature occurs is 1/256, can cause very high false recognition rate.
Filter out and repeat to excavate feature (repeating to excavate the feature notion) referring to execution mode, the sequence number that last step obtains is that 5425 and 5426 accurate feature is to excavate to obtain from essentially identical affairs, because when sequence number be 5426 accurate feature when occurring sequence number be all having occurred of 5425 accurate feature (99%), think that the standard of sequence number 5426 expression is characterized as and repeat to excavate feature, it be filtered.
Filtration is substitute discrimination or byte discrimination less than p 2The accurate feature of % claims this doubtful feature that mixes that is characterized as, and the reason that produces this feature may be because be mixed into the flow of a spot of other agreement in the set of training data bag.The feature that experimental results show that many discrimination less thaies 1% has caused mistake identification serious between agreement, filters out after these features discrimination not influence basically, but greatly reduces false recognition rate.Fallen the accurate feature of sequence number 3 and 4 according to this rule-based filtering.
Only remaining at present as next feature, as shown in table 11.
Table 11:
Sequence number Frequent degree Bag percentage Byte percentage The application layer feature
5425 720 99.502% 99.939% I0038,I0100,I0200,I0600,I0700 *BE*BG:3*MV:8*LE*BG:4*MV:8
I0038, I0100, I0200, I0600, I0700 are the representation feature character strings, represent that the 0-2 position is 0x38,0x00,0x00, the 6th, 7 is 0x00,0x00; * BE*BG:3*MV:8 represents the difference relation of Bao Changyu content, and * BE represents it is to hold syllable sequence greatly, and * BG:3 represents start bit at the 3rd, and MV:8 represents that the difference of the big end syllable sequence double byte signless integer of the 3rd, 4 compositions of Bao Changyu is 8.LE*B6:4*MV:8 represents the difference relation of Bao Changyu content, and * LE represents it is the small end syllable sequence, and * BG:4 represents start bit at the 4th, and MV:8 represents that the difference of the small end syllable sequence double byte signless integer of the 4th, 5 compositions of Bao Changyu is 8.
Step S500 ' with the agreement recognition feature of excavating, generates the tag file of this agreement.This example just can be moved the flow recognizer and carry out online in real time identification with XML representation of file tag file after the generating feature file.
Below be the computer realization algorithm example of the tag file that generates with the feature of above-mentioned excavation:
<tcp>
<type value=″1″>
<content_length>
<offset>4</offset>
<byte_number>10</byte_number>
<differ_value>8</differ_value>
</content_length>
<byte>
<content>0x38</content>
<offset>0</offset>
</byte>
<byte>
<content>0x00</content>
<offset>1</offset>
</byte>
<byte>
<content>0x00</content>
<offset>2</offset>
</byte>
<byte>
<content>0x00</content>
<offset>6</offset>
</byte>
<byte>
<content>0x00</content>
<offset>7</offset>
</byte></type><type value=″2″>
<content_length>
<offset>3</offset>
<byte_number>2</byte_number>
<differ_value>8</differ_value>
</content_length>
<byte>
<content>0x38</content>
<offset>0</offset>
</byte>
<byte>
<content>0x00</content>
<offset>1</offset>
</byte>
<byte>
<content>0x00</content>
<offset>2</offset>
</byte>
<byte>
<content>0x00</content>
<offset>6</offset>
</byte>
<byte>
<content>0x00</content>
<offset>7</offset>
</byte>
</type>
</tcp>
Beneficial effect of the present invention is:
1. has very high efficient by every kind of recognition characteristic of application layer protocol of methods analyst of the present invention, use technology of the present invention to make and realize that periodically updating of every kind of agreement recognition feature becomes reality in the feature database, for realizing that fully, accurately discerning all flows lays the foundation;
2. recognition characteristic of application layer protocol completeness, reliability that method of the present invention draws all have great raising than current techniques.Because this method is to excavate recognition characteristic of application layer protocol on the basis that mass data is extracted, analyzed, can access more complete, complete recognition characteristic of application layer protocol, and before drawing final recognition characteristic of application layer protocol, a plurality of possible features have been carried out multistage filtering and checking, its reliability also has great raising than the reliability that current techniques is only analyzed the feature that the finite data bag draws;
3. method of the present invention has proposed the index of tolerance recognition characteristic of application layer protocol correctness: discrimination, accuracy rate, positive false recognition rate, negative false recognition rate, in all its bearings the recognition characteristic of application layer protocol that draws is automatically weighed, make the agreement recognition feature that draws automatically promptly can reach high recognition, guaranteed simultaneously higher accuracy rate and lower false recognition rate again, made that the correctness of the recognition characteristic of application layer protocol of identification has had basis for estimation automatically;
4. method of the present invention not only is the manual analysis process of current identification protocol feature is converted into the process that the computer automatic mining is handled recognition characteristic of application layer protocol, more be simultaneously the aspects such as accuracy, completeness and reliability of the agreement recognition feature that it draws, increasing along with identification protocol before this makes, the phenomenon that false recognition rate between the agreement increases has very big change, this method has been reduced to the false recognition rate between the agreement in the very little scope, makes reliability be greatly improved.
In conjunction with the accompanying drawings to the description of the specific embodiment of the invention, others of the present invention and feature are conspicuous to those skilled in the art by above.
More than specific embodiments of the invention are described and illustrate it is exemplary that these embodiment should be considered to it, and be not used in and limit the invention, the present invention should make an explanation according to appended claim.

Claims (15)

1. a method for digging recognition characteristic of application layer protocol is characterized in that, comprises the following steps:
Steps A, the filtration first time is carried out in set to the training data bag, and encodes, and extracts accurate agreement recognition feature data message;
Step B carries out the first time and excavates from the accurate agreement recognition feature data message that extracts, obtain multistage frequent item set;
Step C carries out the first time to described multistage frequent item set and filters, and after the frequent degree that filters the remaining multistage frequent item set in back is for the first time revised and excavated for the second time, it is carried out the second time filter, and obtains the final agreement recognition feature.
2. method for digging recognition characteristic of application layer protocol according to claim 1 is characterized in that, also comprises the following steps:
Step D if the byte discrimination of all final agreement recognition features reaches requirement, when perhaps identification of data packets rate summation reaches requirement, then no longer excavates second and reaches the data of packet later on; Otherwise circulation is excavated second and is reached packet later on, reaches requirement up to total discrimination.
3. method for digging recognition characteristic of application layer protocol according to claim 1 and 2 is characterized in that described steps A comprises the following steps:
A1. catch training data bag set and the training data bag is stored in the flow structure body after set is divided by stream;
A2. utilize the flow that mixes that mixes in the set of traffic filtering method filtration training data bag;
A3. utilizing position-based that the byte in the packet that extracts is carried out Methods for Coding encodes to application layer load;
A4. to extracting, extract accurate agreement recognition feature data message through the information of coding back data.
4. a kind of method for digging recognition characteristic of application layer protocol according to claim 3 is characterized in that, in the described steps A 2, mixes the traffic filtering method and comprises the following steps:
A21. filter out the content of the stream that satisfies http protocol and File Transfer Protocol;
A22. filter out the stream that does not have complete three-way handshake in the TCP stream.
5. a kind of method for digging recognition characteristic of application layer protocol according to claim 4 is characterized in that, in the steps A 21, the described content that filters out the stream that satisfies File Transfer Protocol comprises the following steps:
Filter out the packet of the flow structure body that adopts the PASV pattern communication;
Filter out the packet of the structure that adopts 20,21 ports.
6. a kind of method for digging recognition characteristic of application layer protocol according to claim 5 is characterized in that, the determination methods of the flow structure body of the PASV pattern communication of described employing FTP comprises the following steps:
Seek the flow structure body that adopts 21 ports, judge whether the packet that belongs to this structure has the packet with 227 beginnings;
If have then further judge whether it is the response packet of PASV pattern, if then just comprising server end in this packet prepares to carry out IP address and the port numbers that the PASV mode data is connected with client, the purpose IP address of this packet also is the IP address of client simultaneously, the FTP data connect the employing Transmission Control Protocol, write down this four data;
After having traveled through all flow structure bodies that adopt 21 ports, obtain the stream information that all adopt the PASV pattern communication of FTP.
7. a kind of method for digging recognition characteristic of application layer protocol according to claim 6 is characterized in that, the described packet that filters out the flow structure body that adopts the PASV pattern communication comprises the following steps:
The data of the PASV pattern of the employing FTP of each stream and record are connected stream information to be compared, if the stream information with the PASV mode data connection of writing down is identical respectively for four in the five-tuple information in the flow structure body, assert that then this stream is the stream that adopts the PASV pattern communication of FTP, abandons all packets in this stream.
8. a kind of method for digging recognition characteristic of application layer protocol according to claim 3 is characterized in that, in the described steps A 3, described position-based carries out Methods for Coding to the byte in the packet that extracts, and comprises the following steps:
The value of the byte that two hexadecimal numbers of a usefulness in the packet are represented, be encoded to the individual event of 5 character representations of a usefulness, the 1st character is I from the left side, expression Item, which position is this byte of second and third character representation be in above extracted top n byte, use hexadecimal representation, count from zero, if be zero less than 16 second; Fourth, fifth character is two hexadecimal characters of original byte.
9. a kind of method for digging recognition characteristic of application layer protocol according to claim 3, it is characterized in that, in the described steps A 4, described accurate agreement recognition feature data message, comprise process information encoded with same offset packet, and relevant statistics auxiliary data information;
The accurate agreement recognition feature of described extraction data message comprises the following steps:
A41. TCP stream is extracted accurate agreement recognition feature data message, and import in the transaction database;
A42. UDP stream is extracted accurate agreement recognition feature data message, and import in the transaction database.
10. a kind of method for digging recognition characteristic of application layer protocol according to claim 3 is characterized in that described step B comprises the following steps:
Step B1 sets earlier an initial frequent rate, multiply by total number of affairs in the database by initial frequent rate, rounds then and obtains initial frequent degree; The frequent degree of each individual event in the calculated data storehouse filters out the individual event of frequent degree less than initial frequent degree, and remaining each individual event is called frequent of one-level, and the set that all one-levels are frequent is called the one-level frequent item set;
Step B2, from K-1 level frequent item set, select frequent A of two K-1 levels and B, it is identical with K-2 the individual event of the frequent B of K-1 level that the frequent item of these two K-1 levels must satisfy preceding K-2 the individual event of the frequent A of K-1 level, K-1 the individual event of the frequent A of K-1 level is different with the position of K-1 the individual event of the frequent B of K-1 level, if the position of K-1 the individual event of B is greater than the position of K-1 the individual event of A, then with K-1 the individual event of B, the back that is attached to A obtains frequent of accurate K level, calculate the frequent degree of frequent of this accurate K level, if more than or equal to initial frequent degree, then it is frequent of K level really, according to said method obtains frequent of all K levels, forms K level frequent item set; Wherein, K 〉=2.
11. a kind of method for digging recognition characteristic of application layer protocol according to claim 10 is characterized in that, described affairs comprise through the byte of coding, bag length, byte percentage and four attribute fields of bag percentage, the data that storage is extracted.
12. a kind of method for digging recognition characteristic of application layer protocol according to claim 1 is characterized in that described step C comprises the following steps:
C1. utilize frequent spending to revise and a frequent filter method, described multistage frequent item set is carried out the frequently filtration first time, and frequent degree correction and frequent filtration second time;
C2. the frequent item in the frequent item set after will revising and filtering is converted into feature string, excavate tolerance relationship characteristic in long feature of absolute bag and the Bao Changyu, and mark respective transaction, to filter corresponding to the accurate agreement recognition feature data message of these affairs then, obtain final agreement recognition feature.
13. a kind of method for digging recognition characteristic of application layer protocol according to claim 12 is characterized in that described step C1 comprises the following steps:
Step C11 carries out frequent of the first time to described multistage frequent item set and filters;
Step C12 frequently spends a correction and a frequently filtration for the second time to the multistage frequent item set after frequent item filters for the first time, eliminates the inclusion relation between frequent item.
14. a kind of method for digging recognition characteristic of application layer protocol according to claim 13 is characterized in that described step C12 comprises the following steps:
Step C121 utilizes following formula that 1 grade frequent frequent degree is revised;
freq new = k 1 &times; freq old ; pos 0 = 0,0.9 &le; k 1 < 1 f ( pos 0 ) &times; freq old ; pos 0 &NotEqual; 0,0 < f ( pos 0 ) &le; k 1
Wherein, freq NewNew frequent degree later, freq are revised in expression OldFrequent frequent degree before expression is revised, pos iRepresent the position of a frequent Xiang Zhongdi i individual event, i starts from scratch, and k represents the number of individual event in frequent; The position of the individual event numbering of starting from scratch, k 1Be a constant, f (pos 0) be a continuous monotone decreasing function;
Step C122 utilizes following formula that 2 grades frequent frequent degree is revised;
Figure A20081010605800052
Wherein, k 2Be a constant, f ((pos 1-pos 0)) be a continuous monotone decreasing function;
Step C123 at first, utilizes following formula to ask average distance between frequent discipline and the item, is designated as ave Dist
ave dist = &Sigma; i = 1 k - 1 pos i - pos i - 1 k - 1
Described distance is meant in the item absolute value of the difference of two positions of facing individual event mutually; If ave Dist≠ 1, utilize following formula to ave DistRevise;
Figure A20081010605800054
Wherein, k 3, k 4It is constant;
Then, 3,4 grades frequent frequent degree utilized following formula correction;
Figure A20081010605800055
Wherein, f 1(k) be continuous monotonic increasing function about k; f 2(ave Dist) and f 3(ave Dist) be about ave DistContinuous monotone decreasing function, and satisfy f 2(ave Dist)>f 3(ave Dist);
Step C124 filters out little frequent of frequent degree in inclusion relation frequent.
15. a kind of method for digging recognition characteristic of application layer protocol according to claim 12 is characterized in that described step C2 comprises the following steps:
Step C21, frequent item in the frequent item set after revising and filtering is converted into feature string, retrieval is met the affairs of this feature string from transaction database, excavate tolerance relationship characteristic in long feature of absolute bag and the Bao Changyu, and the affairs that will satisfy tolerance relationship characteristic in long feature of absolute bag and the Bao Changyu are carried out mark;
Step C22 will filter out corresponding to the accurate agreement recognition feature data message of these affairs and repeat to excavate feature, weak feature and the doubtful feature that mixes, and obtain final agreement recognition feature.
CN2008101060589A 2008-05-08 2008-05-08 Method for digging recognition characteristic of application layer protocol Expired - Fee Related CN101282251B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008101060589A CN101282251B (en) 2008-05-08 2008-05-08 Method for digging recognition characteristic of application layer protocol

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008101060589A CN101282251B (en) 2008-05-08 2008-05-08 Method for digging recognition characteristic of application layer protocol

Publications (2)

Publication Number Publication Date
CN101282251A true CN101282251A (en) 2008-10-08
CN101282251B CN101282251B (en) 2011-04-13

Family

ID=40014544

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008101060589A Expired - Fee Related CN101282251B (en) 2008-05-08 2008-05-08 Method for digging recognition characteristic of application layer protocol

Country Status (1)

Country Link
CN (1) CN101282251B (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102195945A (en) * 2010-03-11 2011-09-21 凹凸电子(武汉)有限公司 Protocol identification method, device and system
CN102420830A (en) * 2010-12-16 2012-04-18 北京大学 Peer-to-peer (P2P) protocol type identification method
CN102468987A (en) * 2010-11-08 2012-05-23 清华大学 NetFlow characteristic vector extraction method
CN102546363A (en) * 2010-12-21 2012-07-04 深圳市恒扬科技有限公司 Message processing method, device and equipment
CN101741908B (en) * 2009-12-25 2012-07-11 青岛朗讯科技通讯设备有限公司 Identification method for application layer protocol characteristic
CN103051480A (en) * 2012-12-25 2013-04-17 华为技术有限公司 DN (Domain Name) storage method and DN storage device
CN103095718A (en) * 2013-01-29 2013-05-08 电子科技大学 Application layer protocol characteristic extracting method based on Hadoop
CN103997385A (en) * 2014-05-23 2014-08-20 北京中和卓远科技有限公司 Data playback simulating method and system
CN105681297A (en) * 2016-01-12 2016-06-15 西安电子科技大学 Method for mining unknown network protocol hidden behaviors through clustering instruction sequences
CN105871619A (en) * 2016-04-18 2016-08-17 中国科学院信息工程研究所 Method for n-gram-based multi-feature flow load type detection
CN106878102A (en) * 2016-12-23 2017-06-20 中国科学院信息工程研究所 A kind of Pedestrian flow detection method and system based on the identification of network traffics multi-field
CN107113183A (en) * 2014-11-14 2017-08-29 马林·利佐尤 The controlled shared system and method for big data
CN108092792A (en) * 2016-11-23 2018-05-29 中国移动通信集团湖北有限公司 A kind of OTT applications byte-stream characteristic extracting method and device
CN108173781A (en) * 2017-12-20 2018-06-15 广东宜通世纪科技股份有限公司 HTTPS method for recognizing flux, device, terminal device and storage medium
CN109768887A (en) * 2019-01-11 2019-05-17 四川大学 A kind of method of automatic mining industry control flow period feature
CN109873838A (en) * 2019-04-19 2019-06-11 国网甘肃省电力公司电力科学研究院 A kind of illegal network channel recognition methods of new energy plant stand novel maintenance
CN110896388A (en) * 2018-09-12 2020-03-20 西门子(中国)有限公司 Network traffic analysis method, device and computer readable medium
CN111314170A (en) * 2020-01-16 2020-06-19 福建奇点时空数字科技有限公司 Feature fuzzy P2P protocol identification method based on connection statistical rule analysis
CN112887289A (en) * 2021-01-19 2021-06-01 恒安嘉新(北京)科技股份公司 Network data processing method and device, computer equipment and storage medium
CN113890835A (en) * 2021-09-29 2022-01-04 杭州迪普科技股份有限公司 Method and device for processing DPI application test message
CN115190056A (en) * 2022-09-08 2022-10-14 杭州海康威视数字技术股份有限公司 Method, device and equipment for identifying and analyzing programmable traffic protocol

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107302474B (en) * 2017-07-04 2020-02-04 四川无声信息技术有限公司 Feature extraction method and device for network data application

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1612135B (en) * 2003-10-30 2012-07-04 北京神州绿盟信息安全科技股份有限公司 Invasion detection (protection) product and firewall product protocol identifying technology
CN100429617C (en) * 2006-05-16 2008-10-29 北京启明星辰信息技术有限公司 Automatic protocol recognition method and system
CN101035111B (en) * 2007-04-13 2010-10-13 北京启明星辰信息技术股份有限公司 Intelligent protocol parsing method and device

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101741908B (en) * 2009-12-25 2012-07-11 青岛朗讯科技通讯设备有限公司 Identification method for application layer protocol characteristic
CN102195945A (en) * 2010-03-11 2011-09-21 凹凸电子(武汉)有限公司 Protocol identification method, device and system
CN102468987B (en) * 2010-11-08 2015-01-14 清华大学 NetFlow characteristic vector extraction method
CN102468987A (en) * 2010-11-08 2012-05-23 清华大学 NetFlow characteristic vector extraction method
CN102420830A (en) * 2010-12-16 2012-04-18 北京大学 Peer-to-peer (P2P) protocol type identification method
CN102546363A (en) * 2010-12-21 2012-07-04 深圳市恒扬科技有限公司 Message processing method, device and equipment
CN103051480A (en) * 2012-12-25 2013-04-17 华为技术有限公司 DN (Domain Name) storage method and DN storage device
CN103051480B (en) * 2012-12-25 2015-09-30 华为技术有限公司 The storage means of a kind of DN and DN storage device
CN103095718A (en) * 2013-01-29 2013-05-08 电子科技大学 Application layer protocol characteristic extracting method based on Hadoop
CN103095718B (en) * 2013-01-29 2015-07-15 电子科技大学 Application layer protocol characteristic extracting method based on Hadoop
CN103997385A (en) * 2014-05-23 2014-08-20 北京中和卓远科技有限公司 Data playback simulating method and system
CN103997385B (en) * 2014-05-23 2017-05-03 北京中和卓远科技有限公司 Data playback simulating method and system
CN107113183A (en) * 2014-11-14 2017-08-29 马林·利佐尤 The controlled shared system and method for big data
CN105681297A (en) * 2016-01-12 2016-06-15 西安电子科技大学 Method for mining unknown network protocol hidden behaviors through clustering instruction sequences
CN105871619A (en) * 2016-04-18 2016-08-17 中国科学院信息工程研究所 Method for n-gram-based multi-feature flow load type detection
CN105871619B (en) * 2016-04-18 2019-03-01 中国科学院信息工程研究所 A kind of flow load type detection method based on n-gram multiple features
CN108092792A (en) * 2016-11-23 2018-05-29 中国移动通信集团湖北有限公司 A kind of OTT applications byte-stream characteristic extracting method and device
CN106878102A (en) * 2016-12-23 2017-06-20 中国科学院信息工程研究所 A kind of Pedestrian flow detection method and system based on the identification of network traffics multi-field
CN106878102B (en) * 2016-12-23 2020-05-22 中国科学院信息工程研究所 People flow detection method and system based on network flow multi-field identification
CN108173781A (en) * 2017-12-20 2018-06-15 广东宜通世纪科技股份有限公司 HTTPS method for recognizing flux, device, terminal device and storage medium
CN110896388A (en) * 2018-09-12 2020-03-20 西门子(中国)有限公司 Network traffic analysis method, device and computer readable medium
CN109768887A (en) * 2019-01-11 2019-05-17 四川大学 A kind of method of automatic mining industry control flow period feature
CN109873838A (en) * 2019-04-19 2019-06-11 国网甘肃省电力公司电力科学研究院 A kind of illegal network channel recognition methods of new energy plant stand novel maintenance
CN111314170A (en) * 2020-01-16 2020-06-19 福建奇点时空数字科技有限公司 Feature fuzzy P2P protocol identification method based on connection statistical rule analysis
CN112887289A (en) * 2021-01-19 2021-06-01 恒安嘉新(北京)科技股份公司 Network data processing method and device, computer equipment and storage medium
CN113890835A (en) * 2021-09-29 2022-01-04 杭州迪普科技股份有限公司 Method and device for processing DPI application test message
CN115190056A (en) * 2022-09-08 2022-10-14 杭州海康威视数字技术股份有限公司 Method, device and equipment for identifying and analyzing programmable traffic protocol

Also Published As

Publication number Publication date
CN101282251B (en) 2011-04-13

Similar Documents

Publication Publication Date Title
CN101282251B (en) Method for digging recognition characteristic of application layer protocol
CN102831121B (en) Method and system for extracting webpage information
CN111488582B (en) Intelligent contract reentrant vulnerability detection method based on graph neural network
CN100489879C (en) Method, system and server for checking page data
US6360224B1 (en) Fast extraction of one-way and two-way counts from sparse data
CN102104635B (en) Method and device for updating Internet protocol (IP) address base
US20090226098A1 (en) Character string updated degree evaluation program
CN106446228A (en) Collection analysis method and device for WEB page data
CN111125598A (en) Intelligent data query method, device, equipment and storage medium
CN103605738A (en) Webpage access data statistical method and webpage access data statistical device
KR102051350B1 (en) Method and system for data acquisition for analyzing transaction of cryptocurrency
US7254577B2 (en) Methods, apparatus and computer programs for evaluating and using a resilient data representation
CN107437026A (en) A kind of malicious web pages commercial detection method based on advertising network topology
CN103631710A (en) Software specification development supporting method and device
Hostiadi et al. Dataset for Botnet group activity with adaptive generator
CN104079450A (en) Method and device for generating characteristic pattern set
CN111898126A (en) Android repackaging application detection method based on dynamically acquired user interface
CN111625838A (en) Vulnerability scene identification method based on deep learning
CN108304301B (en) Method and device for recording user behavior track
CN103106217B (en) The processing method of a kind of information of leaving a message and equipment
Buscarino et al. Nyquist plots under frequency transformations
CN103778210A (en) Method and device for judging specific file type of file to be analyzed
CN116414808A (en) Method, device, computer equipment and storage medium for normalizing detailed address
CN106506614A (en) The generation method of mobile terminal identification code, generating means and mobile terminal
Hirokawa et al. Predictive labeling

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110413

Termination date: 20200508