CN100550909C - A kind of system, method and apparatus of realizing professional perception - Google Patents

A kind of system, method and apparatus of realizing professional perception Download PDF

Info

Publication number
CN100550909C
CN100550909C CNB2006100630196A CN200610063019A CN100550909C CN 100550909 C CN100550909 C CN 100550909C CN B2006100630196 A CNB2006100630196 A CN B2006100630196A CN 200610063019 A CN200610063019 A CN 200610063019A CN 100550909 C CN100550909 C CN 100550909C
Authority
CN
China
Prior art keywords
strategy
recognition
module
model example
flow model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CNB2006100630196A
Other languages
Chinese (zh)
Other versions
CN1997007A (en
Inventor
刘源
董沛影
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CNB2006100630196A priority Critical patent/CN100550909C/en
Publication of CN1997007A publication Critical patent/CN1997007A/en
Application granted granted Critical
Publication of CN100550909C publication Critical patent/CN100550909C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a kind of system, method and apparatus of realizing professional perception, this system that realizes professional perception comprises: the data extract policy library is used to provide the data extract strategy; Packet feature extraction module is used for extracting the characteristic of stream; The flow model example makes up module, is used for the set of setting up the flow model example according to the properties of flow parameter of packet feature extraction module output; The recognition strategy storehouse is used for providing recognition strategy for the integrated service sensing module; The integrated service sensing module, being used for the convection model example carries out perception.This method that realizes professional perception comprises: extract the stream characteristic; Make up the set of flow model example; The convection model example carries out perception, the output recognition result.This equipment of realizing professional perception comprises: the quick identification module is used for the convection model example and carries out quick identification; The machine learning module, the flow model example that is used for can't discerning the quick identification module is for further processing, the output recognition result.

Description

A kind of system, method and apparatus of realizing professional perception
Technical field
The present invention relates to networking technology area, particularly a kind of system, method and apparatus of realizing professional perception.
Background technology
Along with popularizing of Internet, the user constantly promotes for the demand of communication service, people's simple internet, applications of this class of webpage that no longer contented just to surf the web, but wish to enjoy more at a high speed, abundanter multimedia application, voice IP (VoIP for example, Voice over IP, visual telephone, video conference, video request program (VOD, Video on Demand), IPTV application such as (IPTV); On the other hand, the fixed network operators of main flow also is being faced with the challenge that traditional voice service is devalued day by day, thereby wishes to make full use of Internet resources, finds new business growth point, increases professional income, cuts operating costs.
Yet, along with the appearance of large quantities of new network applications in recent years, impact for traditional business and network also grows with each passing day, representative wherein has: point-to-point file-sharing software (P2P software), download (the efficient P2P file based on the BitTorrent agreement is shared download), electric donkey download file-sharing softwares such as (eDonkey) as BT; MSN is as MSN (immediate information software that Microsoft releases), QQ (the timely information software of company of Tengxun exploitation) etc.; VoIP software is as Skype (free speech communication software) etc.Show according to investigation, have in the current network flow to come from the P2P application greatly, and these P2P use and can't manage well, have brought greater impact to existing network.In addition, operator also wishes constantly to introduce new type of service, improves running income.Therefore, how to manage the existing network service traffics and just become key factor current and that next generation network need be considered with the service quality that guarantees new business.。
Based on above-mentioned background, professional cognition technology has obtained increasing attention.So-called professional perception is meant the technology of distinguishing different business stream in the network.Here said Business Stream refers to the pairing data flow of concrete web application.Various traffic streams in the sensing network all has great important to Network Management, charging, safety, service quality (QoS, Quality of Service) assurance etc.: 1) professional perception is the basis of implementing network QoS strategy and security strategy; 2) the service management model can provide differentiated service to select for the user according to the result of professional perception; 3) professional cognition technology is the prerequisite of scheduling of resource, and by professional cognition technology, business management system can accurately and timely be obtained network capabilities baseline and resource requirement baseline, realizes the scheduling of resource of dynamic flexible.
The technology that realizes at present professional perception mainly contains the traffic classification method that detects based on five-tuple, deep packet inspection technical, based on the recognition technology of the behavioral trait of stream.Wherein, based on the traffic classification method that five-tuple detects packet is analyzed, parse its five-tuple data (source IP address, source port, purpose IP address, destination interface, protocol number), compare with existing application type according to these five-tuple data then, judge which kind of application it belongs to.For example, traditional Web service usually uses 80 (perhaps 8080) port is as the serve port of oneself, therefore by comparing port numbers, can know which kind of network application the packet in the Business Stream belongs to, this scheme strategy is simple, easy to implement, low to the device processes Capability Requirement, yet only be applicable to and detect traditional network application (as Web service, FTP service etc.) with stiff end slogan, if new type of service (as P2P) has adopted port numbers at random or that negotiation is determined, this way is just powerless.
On the basis of five-tuple detection method, deep packet inspection method is further attempted to resolve using layer data, then the keyword that obtains is mated with the employed keyword of known application type, thus the identification services type.Fig. 2 has described the method that a kind of use deep packet inspection method identification Kazaa uses (a kind of P2P software that is used for shared file), wherein the payload segment of Transmission Control Protocol has comprised the protocol contents of Kazaa definition, by analysis to the Kazaa agreement, can extract the coupling keyword, as " HTTP " and " Kazaa " etc.; Utilize deep packet inspection method that these keywords are mated with the TCP payload content in the packet then,, can think that then this packet belongs to Kazaa and uses if matching result satisfies condition.Yet, deep packet inspection method exists a lot of not enough: at first, will analyze the payload (valid data that carry in the packet) of each packet on this theoretical method, if data volume is bigger, then require the hardware handles ability very high, strategy also can be quite complicated; Secondly, the content part of detection packet also can relate to the problem of secret protection; Once more, in case application program is encrypted the data payload, this method has not just prove effective; At last, the deep packet inspection method very flexible if the agreement that application program is used changes or new type of service occurs, just needs the change matched rule.
Some is different with above-mentioned two kinds of methods, existingly tries hard to by the packet in the Business Stream is added up and signature analysis based on the recognition technology of Flow Behavior characteristic, extracts specific stream feature mode and discerns.Be applied as example with P2P, because using, P2P takes distributed network structure basically, can be with the Internet communication diameter as basis for estimation, (Internet communication diameter: for example shown in Figure 1, A initiates connection request to B, B initiates connection request to C, C initiates connection request to D again, then propagation path is A-〉B-〉C-〉D, network diameter is 3), according to this feature, data flow transmitted in can monitoring network, approximate calculation goes out the Internet communication diameter, when its during greater than certain threshold values that configures (such as 2), just think that the application type of this data flow is that P2P uses.This method changes insensitive to agreement, can identify encrypting traffic or new data stream type, yet this method can only identify the general type (such as Web, P2P etc.) of stream, be difficult to determine to belong to which kind of application-specific, such as identifying is the P2P data flow, to have adopted which kind of agreement be concrete type of service but can't discern it, and BT downloads or electric donkey is downloaded as distinguishing.
Summary of the invention
Because above-mentioned prior art can't realize professional perception and identification quickly and efficiently and find new type of service, therefore, main purpose of the present invention is to provide a kind of system, method and apparatus of realizing professional perception, known type of service not only can be discerned apace, also the type of service that makes new advances can be discerned flexibly, efficiently.
A kind of system of professional perception that realizes provided by the invention is achieved in that
A kind of system that realizes professional perception comprises:
Packet feature extraction module is used for resolving the data packet stream of input and extracts the characteristic that flows from packet;
The data extract policy library is used for providing the data extract strategy for packet feature extraction module;
The flow model example makes up module, is used for the set of setting up the flow model example according to the properties of flow parameter of packet feature extraction module output;
The recognition strategy storehouse is used for obtaining and preserve new recognition strategy for the integrated service sensing module provides recognition strategy from the integrated service sensing module;
The integrated service sensing module is used for the convection model example collection to analyze, and judges whether the flow model example is complementary with existing recognition strategy, if coupling, then output embodies the recognition result of type of service; If do not match, then generate new recognition strategy, and the output recognition result.
Wherein, also comprise:
The policy mappings module, its function is: the recognition result according to integrated service sensing module output, obtain data flow corresponding service type, and set up the mapping between type of service and the corresponding network application strategy.
Wherein, described data extract strategy, recognition strategy, for:
Data extract strategy, recognition strategy with the formal language description.
Wherein, described stream characteristic comprises at least:
One of the duration of source IP address, purpose IP address, source port number, destination slogan, protocol number, packet size, number of data packets, stream or flow.
Wherein, described integrated service sensing module comprises:
The quick identification module, its function is: according to existing recognition strategy, discern processing with method for quickly identifying convection model example, and the output recognition result;
The machine learning module, its function is: to the flow model example that the quick identification module can't be discerned, use the method for machine learning to analyze, analysis result is mated with existing recognition strategy, if can mate, then export recognition result; Otherwise, generate new recognition strategy, and the output recognition result.
Wherein, described method for quickly identifying comprises:
Five-tuple detection method, physical port detection method or finite data packet inspection method.
A kind of method of professional perception that realizes provided by the invention is achieved in that
Set up the recognition strategy storehouse, dynamically generate or adjust recognition strategy; Also comprise step:
A) extract the stream characteristic;
B) set of structure flow model example;
C) the convection model example carries out perception, the output recognition result.
Preferably, also comprise:
D) set up mapping between type of service and the network application strategy.
Wherein, described steps A is:
According to the data flow of input, from the data extract policy library, transfer corresponding data extract strategy, from the data packet stream of input, extract the stream characteristic according to this data extract strategy.
Wherein, described step B is:
Describe according to predefined flow model, the stream characteristic in conjunction with input is built into the set of flow model example.
Wherein, described step C is:
The integrated service sensing module carries out perception according to existing recognition strategy to the flow model example of importing, the output recognition result.
Wherein, when the integrated service sensing module comprised quick identification module and machine learning module, described step C comprised:
C1) the quick identification module adopts method for quickly identifying that the flow model example of input is carried out matching treatment;
C2) the flow model example that can't discern for the quick identification module, machine learning module adopt the method for machine learning to analyze, and analysis result are mated with existing recognition strategy, if coupling, the then corresponding recognition result of output; Otherwise, generate new recognition strategy, and the output recognition result.
Wherein, the matching treatment described in the step C1 is:
The convection model example is analyzed, and flow model example and existing recognition strategy is mated, if coupling is then exported corresponding recognition result; Otherwise enter the machine learning resume module.
A kind of equipment of professional perception of realizing provided by the invention is achieved in that
A kind of equipment of realizing professional perception comprises
The quick identification module is used for according to existing recognition strategy, adopts method for quickly identifying that the flow model example of input is analyzed identification, the output recognition result;
The machine learning module is used for the flow model example that can't discern the quick identification module, uses the method for machine learning to analyze, and analysis result and existing recognition strategy are complementary, if can mate, then exports corresponding recognition result; Otherwise, generate new recognition strategy, and the output recognition result.
Wherein, described method for quickly identifying comprises:
Five-tuple detection method, physical port detection method or finite data packet inspection method.
The invention has the beneficial effects as follows:
1, owing to be provided with the data extract policy library, can be at the The data in different pieces of information source different data extract strategies has more specific aim and flexibility;
2, owing to introduced the data extract strategy, so just can be according to the actual requirements, from packet, extract professional perception parameters needed, and parameter area not only can comprise traditional five-tuple parameter, can also comprise other parameters such as packet size, bag number, by these parameters are carried out combinatory analysis, can obtain the more information of related service application data stream.Therefore, effectively remedy the drawback of methods such as traditional five-tuple detection, physical port number detection, the identification of Flow Behavior characteristic, not only can identify the general type of stream, more can identify concrete application protocol, thereby determined concrete type of service.
3, owing to made up the flow model example collection, played the packet characterisitic parameter has been carried out pretreated effect, can set up corresponding data structure, like this, helped the quick identification of type of service for different types of service;
4, because the comprehensive cognitive method that has adopted quick identification and machine learning to combine, both can discern the traditional business type quickly and efficiently, can learn dynamically again to discern new type of service, thereby successfully avoid the drawback that easy detection methods such as five-tuple detection can not Dynamic Recognition new business type;
5, owing to before machine learning, at first carry out quick identification, so just reduced follow-up data flow of carrying out machine learning, greatly reduced demand for the hardware handles ability;
6, owing to need not the payload of each packet is resolved, therefore can not relate to legal issues such as privacy of user;
7, owing to having adopted machine learning techniques flexibly, at the data flow that random port number is distributed or negotiation distributes, and the encrypted data flow of payload, can identify the corresponding service type efficiently too;
8, owing in the process of identification, can dynamically learn and generate new recognition strategy, therefore, whole integrated service sensing module has constituted a self learning system, need not frequent artificial setting, has outstanding intelligent and automaticity;
9, owing to can dynamically generate or adjust matching strategy, therefore have stronger flexibility, can adapt to different network condition preferably according to actual demands such as network conditions.
10, owing to combine the advantage of existing business cognition technology, and made further optimization, therefore, the present invention goes for the existing applied various occasions of cognition technology fully, has more adaptability and robustness.
Description of drawings
Fig. 1 is a deep packet detection method schematic diagram in the prior art;
Fig. 2 is a Flow Behavior characteristics recognizing method schematic diagram in the prior art;
Fig. 3 is a system schematic of the present invention;
Fig. 4 is the data structure schematic diagram of flow model example among the present invention;
Fig. 5 is a Web application model schematic diagram;
Fig. 6 is the workflow schematic diagram of quick identification module and machine learning module in the system of the present invention;
Fig. 7 is the packet size and the distribution relation schematic diagram of time of 4 class different application data flow;
Fig. 8 is a method schematic diagram of the present invention;
Fig. 9 is an equipment schematic diagram of the present invention.
Embodiment
See also Fig. 3, in the embodiment of system of the present invention, comprise the data extract policy library, packet feature extraction module, the flow model example makes up module, integrated service sensing module, recognition strategy storehouse, policy mappings module.By the data flow of input is carried out a series of processing, perception goes out the pairing type of service of data flow.Perception described here just is meant the processing such as analysis identification that specific data stream is carried out, thereby judges the pairing type of service of specific data stream.
The function of described data characteristic extraction module is to be responsible for resolution data bag stream, and extracts the characteristic of data flow, i.e. properties of flow parameter according to the data extract strategy.The data packet stream that is input as equipment such as switch in the packet network, router, firewall box, gateway of packet feature extraction module is output as the parameter sets that reflects properties of flow.The stream characteristic of extracting can comprise: five-tuple parameter (source, purpose IP, source, destination slogan, agreement); The blanking time that the minimum dimension of packet, full-size, average-size, quantity, packet arrive, packet burst etc.; The duration of data flow; Data quantity transmitted under interactive mode and the bulk transfer mode; Connect the time that is spent under idle pulley, interactive mode and the bulk transfer mode that is in.
Described data extract policy library is made of concrete data extract strategy, and the data extract strategy has described how to extract required properties of flow parameter from the packet that disperses, and comprises the number of parameter, scope and type etc.Can adopt XML (extensible Markup Language) (XML, Extensible Markup Language) description data extraction strategy in the present embodiment, for example can adopt following form to be described:
<?xml?version=″1.0″?>
......
<Sample_Strategy>
<SrcIP>
<minOccurs>131.107.1.1</minOccurs>
<maxOccurs>131.107.1.10</maxOccurs>
</SrcIP>
<DstIP>157.60.1.5</DstIP>
<SrcPort>
<minOccurs>1024</minOccurs>
<maxOccurs>8000</maxOccurs>
</SrcPort>
<DstPort>80</DstPort>
<Protocol>TCP</Protocol>
<PacketSize>
<minOccurs>512</minOccurs>
<maxOccurs>1024</maxOccurs>
</PacketSize>
......
</Sample_Strategy>
This XML segment has been described a kind of data extract strategy, its implication is: the source IP address of packet is positioned at 131.107.1.1 between 131.107.1.10, purpose IP address is 157.60.1.5, source port is positioned at 1024 to 8000, destination interface is 80, agreement is TCP, and the bag size is positioned between 512 to 1024 bytes etc.Need to prove, in actual application of the present invention, also can wait other formal language description data extraction strategies with ASN.1 (Abstract Syntax Notation One).Data policy is extracted in the storehouse and is comprised the several data fetch strategy, at different data sources, can provide corresponding data extract strategy respectively, by packet feature extraction module the packet of respective data sources is handled, and extracts required properties of flow parameter.Flexibility and efficient that characteristic is extracted have so just been improved.In actual application of the present invention, the user can describe the rule that dynamically changes extraction packet characterisitic parameter by revising fetch strategy.
The function that described flow model example makes up module is the set of setting up the flow model example according to the stream characteristic.Flow model is the description abstract to Business Stream, and it has defined the formation and the feature of stream.The flow model example is the properties of flow parameter according to the output of packet feature extraction module, the constructed data structure that meets the flow model description.In this embodiment, flow model adopts following form:
Flow::={
SrcIP (STRING),--source IP address, character string forms
DstIP (STRING),--purpose IP address, character string forms
SrcPort (INTEGER),--source port number, integer form
DstPort (INTEGER),--destination slogan, integer form
Protocol (ENUM),--the agreement of use, enumeration type, as TCP,
UDP etc.
PacketSize (INTEGER),--average packet size, integer type
PacketNum (INTEGER),--bag number, integer type
Traffic (INTEGER),--flow, integer type
Time (TIME),--duration, time type
PhysicalPort (IDENTIFIER)--physical port number, type of identifier
--the parameter of other sign stream
}
Describe according to flow model, the flow model example makes up module constructs flow model according to the stream characteristic example.In IPv4, the example of each stream can be distinguished mutually by IP address and port numbers, except adopting IP address and port numbers distinguish, can also adopt stream label (Flow Label) to distinguish for some application among the IPv6.The process of this structure flow model example collection is one mixed and disorderly packet is mapped as the process of logic flow, also is the process of the data of extracting being carried out preliminary classification, so that the data of follow-up professional sensing module convection current are handled and discerned.This embodiment adopts the data structure storage stream example shown in the accompanying drawing 4, this data structure is with the set of the form recorded stream example of similar Hash table (Hash Table), and the list item of corresponding each stream comprises a unique identifier or key assignments and is used for distinguishing not homogeneous turbulence and can locatees stream fast.System makes up and safeguards this table by the Data Dynamic ground of analyzing packet, when the key assignments that do not have in table of discovery, can make up a new list item, fills in property value in the table according to the characterisitic parameter of the packet of the description of flow model and extraction; If do not have the packet of related streams in a period of time, can delete corresponding list item.Need to prove that in other embodiments of system of the present invention, the data structure of flow model and stream example can adopt other to realize the mode of similar functions.
Described recognition strategy storehouse be used for to the integrated service sensing module provide conclude good, with the recognition strategy of certain formalization language description.A kind of recognition strategy is a kind of formalized description of certain network application mode feature of being different from other network application mode.These strategies can come from the analysis of existing network application model and conclusion, can come from also that the integrated service sensing module carries out machine learning to new network application and the new recognition strategy of summarizing.Simultaneously, also comprised matching precision in the recognition strategy, be used for judging whether certain Business Stream matches with existing network application mode, matching precision can dynamically be adjusted according to the actual requirements by the user.Accompanying drawing 5 has shown a kind of typical Web application model.Wherein host A is a Web server, and host B, C are the Web client, communicate by letter with host A by browsers such as Internet Explorer.Can obtain Web and use some features that are different from other network applications from this model: (1) adopts Transmission Control Protocol; (2) source IP address of its data flow and port are the IP address and the port numbers 80 (http protocol) or 443 (HTTPS agreement) of Web server, and purpose IP address is the IP address and irregular port numbers (Random assignment when the client end slogan generally is connected by main frame establishment TCP) of Web client host with port; (3) Web server meeting whiles and a plurality of client communicate, the port numbers of client often different (because port numbers is Random assignment).At this application model, can extract the pattern of traffic of Web server and adopt this pattern of XML language description:
<?xml?version=″1.0″?>
......
<WebServer_Flow>
<SrcPort〉--source port number
<Value>80</Value>
<Value>443</Value>
</SrcPort>
<DstDiffPort_Num〉--have the target linking number of different port number
<Value>5</Value>
</DstDiffPort_Num>
<Protocol〉--the agreement of use
<Value>TCP</Value>
</Protocol>
......
</WebServer_Flow>
In this was described, the feature of Web server data flow comprised: (1) source IP port numbers is one of 80 or 443; (2) source IP address is identical, and the target linking number with different ports is more than or equal to 5; (3) use agreement is TCP.Professional sensing module is at first resolved this XML and is described the load identification strategy, analyzes according to the described pattern convection current of recognition strategy instance data then, and analysis result and recognition strategy are mated.Satisfy the requirement of Web service stream recognition strategy matching precision if find certain stream, think that then this data flow is a Web service stream.Matching precision herein is defined by the user, can dynamically adjust matching precision by the threshold value of adjusting parameter in the Matching Model, to be fit to the actual conditions of network; As in above-mentioned XML describes, the target linking number of different IP/ ports is exactly an adjustable threshold value.For new network application mode,, then do not need to define new matching strategy if it belongs to certain known application type.For example need to discern a kind of new P2P and use, but the pattern of traffic that this new P2P uses can then need not define new matching strategy by the description of existing P 2P matched rule; If will the recognition network application model be brand-new, then need to define new recognition strategy and join in the recognition strategy storehouse or revise existing recognition strategy.Need to prove and adopt XML language description recognition strategy in the present embodiment, in other execution modes of system of the present invention, can adopt ASN.1 (Abstract Syntax Notation One) to wait other formal languages to describe recognition strategy, the Web application model also can be expressed as other similar type simultaneously.
Described integrated service sensing module is the nucleus module in the system of the present invention, and this module is discerned according to the recognition strategy convection model example that provides in the above-mentioned strategy identification storehouse, thus the output recognition result; Simultaneously,, by machine learning, conclude the recognition strategy that makes new advances, and new recognition strategy is joined in the recognition strategy storehouse, be used for discerning new flow model for the flow model example that can't discern.The integrated service sensing module can be further divided into two modules: quick identification module and machine learning module.See also Fig. 6, this Figure illustrates the workflow of quick identification module and machine learning module: at first, stream example collection to be identified enters the quick identification module and handles, and identifies the stream of easy identification according to recognition strategy, the output recognition result; Secondly, what the quick identification module can't be discerned flows to into the machine learning module, and the machine learning module is carried out analytic induction to these streams, compares with existing recognition strategy then, if meet set matching precision, then thinks coupling, exports recognition result; If do not meet set matching precision, then think not match, then generate new recognition strategy, and the output recognition result.Below quick identification module and machine learning module are done further explanation:
The stream example that the quick identification module adopts relatively simpler method convection model examples fast such as traditional five-tuple detection, physical port detection or finite data bag detection (only detecting a small amount of part of packet load) to make up the module generation analyzes also and recognition strategy mates.Wherein, the detection of finite data bag is not meant to be checked whole payloads of packet, and only checks the wherein deep packet inspection technical of a part.At present a lot of application programs can be learnt corresponding type by analyzing its application layer protocol, and can obtain (as the application layer protocol stem, all being positioned at the front of actual user data) by a certain partial bytes of analyzing payload data under the many situations of application layer protocol, with regard to there is no need all payload datas are analyzed in this case, thereby significantly reduced amount of calculation.Some traditional network application data streams can distinguish on parameters such as port numbers at an easy rate as Web, FTP etc., therefore can just the data of these streams be separated from total data acquisition system in the quick identification module.Using the purpose of quick identification module is that the stream of easy identification is discerned efficiently, plays a pretreated effect, reduces the follow-up data that enter the machine learning module.The machine learning module adopts sophisticated identification algorithms such as classification, cluster usually, and relative efficiency is lower, and the data that reduce the machine learning resume module can greatly reduce overall recognition time.
The stream that the machine learning module adopts the method for machine learning can't discern the quick identification module carries out analytic induction, compares with existing recognition strategy then, judges whether coupling, if coupling is then exported recognition result; If existing recognition strategy can't be complementary with it, then generate new recognition strategy, and the output recognition result.The machine learning techniques that present embodiment adopts can be classification learning, related study, cluster or numerical prediction etc., specifically is described below:
1) classification learning (classification learning): adopt certain classified sample set to represent the study scheme, and from this sample set, learn method that following sample is classified.
2) related study (association learning): be not only in order to predict a specific type, but seek the association between the sample set data.
3) cluster (clustering): the sample that searching can be combined, and classify according to combination.
4) numerical prediction (numeric prediction): predicted numerical value amount rather than discrete class.
The input of machine learning is the set of data instance, by the machine learning scheme classify, association or cluster.These inputs are called as example (Instance), and each example all is be used to learn single, sample independently.Each is single, independently example is by one group of fixing predefined feature or attribute (attribute) value conduct input.Data mining is output as the version in the data of learning.These output forms comprise:
Decision table (decision table): the fairly simple output rule that adopts and import same form.
Decision tree (decision tree): adopt tree structure that learning outcome is exported.The node of decision tree has comprised the test to certain particular community; Leaf node provides one or a group categories to the example that all arrive leaf nodes, or one comprised the probability distribution that might classify.
Classifying rules (classification rules): a kind of method that replaces decision tree.In the method, the prerequisite of a rule or prerequisite are a series of tests, and conclusion then provides the one or more classification that are fit to cover with rule example, or provides the probability distribution of example on all classes.
Correlation rule (association rule): can predict any attribute and combination of attributes and be not only class.Correlation rule is combined into a rule and uses unlike classifying rules, different correlation rules disclose the different rules of rule set, is commonly used to predict different things.The overlay capacity of a correlation rule is the example quantity that correlation rule can correctly be predicted, be called support (support), accuracy is called confidence level (confidence), uses the ratio that occupies in related whole examples for the instance number scale that will correctly predict is shown it at correlation rule.
The rule that comprises exception: be the expansion of classifying rules, allow rule to comprise exception.It is to use the exception representation incrementally to revise rule set on the existing rule, and does not need to rebulid whole rule set.
The rule of inclusion relation: the rule of front has supposed that the condition in the rule relates to the test of a property value and a constant, and this rule-like is called proposition.The rule of inclusion relation can be described the relation between the sample, has more directly perceived and terse description than the proposition rule in some cases.
Expression (instance-based learning) based on example: preserve example itself based on the expression of example, and the new example of the unknown and existing known example connected operate.This method is directly worked on sample, rather than sets up rule.This method is slack, delay substantive work as far as possible, and other method is eager, as long as find that data just produce a conclusion.
Cluster: by machine learning to be cluster rather than a grader, output can adopt one to show how example falls into the graphic form of cluster.Often be accompanied by the step of deriving a decision tree or rule set after the cluster, thereby with the cluster of each example allocation under it.
As a kind of embodiment preferred, in the present embodiment, the process of machine learning is specially: the set of stream example is as the sample set that is input to the machine learning module, each stream example can be regarded as the example of a study, and constitutes the parameter of stream, as source, purpose IP address, source, destination slogan, agreement, bag sizes etc. are as the community set of machine learning; Recognition strategy is as the rule set that defines; The machine learning module adopts the set of certain algorithm convection current example to learn and export learning outcome, and the form of this learning outcome has reflected the pattern of stream example collection inside; Machine learning module and then mate with existing recognition strategy according to this learning outcome, and judge whether to meet predefined matching precision, if meet, think that then the set of this stream example belongs to recognition strategy corresponding service stream type, the output recognition result; If do not meet, then generate new recognition strategy and deposit strategy identification storehouse, the recognition result that output is corresponding in for corresponding network application mode.It is pointed out that here described matching precision can dynamically set according to demand.
In the present embodiment, the machine learning module can adopt the property set of expectation-maximization (EM, Expectation-Maximization) algorithm convection current example collection to handle, and its aim of learning is that the stream example is divided into different cluster (cluster).Can think that the stream example in the cluster has some similarity each other, and these similarities are relevant data distributes between the inner or stream by stream potential rule decisions, as the IP address of the Business Stream of different application distribute, port numbers distributes and the parameters such as size of packet can both embody these rules.Fig. 7 has described the packet size and the distribution relation of time of 4 class different application data flow, wherein transverse axis is the time, the longitudinal axis is the size of packet, dark data point be client to the data in server bag, light data point is the packet of server to client end.As can be seen, the distribution that this 4 class is used exists significant difference, and these differences can be used as the basis of the stream of different application being carried out cluster analysis.By the characteristic of cluster and known applied business properties of flow are compared, can identify the type of cluster, thereby also just know the type that constitutes the stream of cluster.The EM algorithm is a kind of clustering method based on probability, and these class methods have adopted statistical method that example is carried out cluster analysis.Angle from probability, the problem that cluster need solve is how to find most possible cluster in giving given data, and we are difficult to derive certain conclusion fully from the example collection of limited quantity, therefore example can't be assigned to certain cluster utterly, can only think that example belongs to certain cluster with certain possibility.The basis of statistics cluster is to be based upon on the statistical model of limited mixing (finite mixture).Mixing is meant with k probability distribution represents k cluster, and for certain instantiation, each distribution provides hypothesis, and it belongs to this cluster, and example has the probability of certain serial property value.Each cluster has different distributions, and any instantiation belongs to and only belong to a cluster, but unknown.And each cluster do not have same equally likely possibility, exists certain to reflect the probability distribution of its relative total number.The EM algorithm can be found the maximal possibility estimation of probability distribution parameters in the mixed model.The EM algorithm carries out initial conjecture to the model parameter of each cluster, and the processing of carrying out two steps iteratively is to reach the likelihood maximization.In the first step, calculate cluster probability (i.e. " expectation " class value), this step is expectation; Second step, calculate distributed constant, promptly likelihood " maximization " is carried out in the distribution of giving given data and handled.The condition that iteration finishes is that the increment of log-likelihood (log-likehood) can be ignored.Though the EM algorithm can guarantee to converge on certain maximum, may be local maximum but not global maximum, therefore must use different initial guess parameter values to repeat several times, select the cluster division of overall log-likelihood value maximum.Need to prove that in other execution modes of system of the present invention, except the EM algorithm, the machine learning module can also adopt multiple machine learning method stream data to learn.The recognition result of machine learning module output can be the mapping that the packet in the stream example arrives type of service.As recognition result can be the mark of stream in the example, this label record the type of service of stream example.The policy mappings module can be learnt type of service under the corresponding packet by reading this mark, thereby can carry out further respective handling to packet.
The policy mappings module: this module is learnt type of service under the corresponding packet according to the recognition result of integrated service sensing module output, and then sets up the mapping between type of service and the concrete network application strategy.The policy mappings module can be mapped to type of service application strategies such as QoS, flow management, intrusion detection, and the application strategy module by correspondence realizes network flow management, functions such as intrusion detection again.For example mapping result can be input in the QoS module of the network equipment, the QoS module can provide differentiated QoS service to different classes of stream according to these mapping relations.
See also Fig. 8, the embodiment of the inventive method:
Set up the recognition strategy storehouse in system, recognition strategy can come from the analysis of existing network application model and conclusion, can come from also that the integrated service sensing module carries out machine learning to new network application and the new recognition strategy of summarizing; Further comprising the steps of:
A) packet feature extraction module is called corresponding data extract strategy according to the data packet stream of input from the data extract policy library, extracts the stream characteristic according to this data extract strategy from the data packet stream of input;
B) flow model example structure module is described according to predefined flow model, in conjunction with the stream characteristic of input, makes up the set of flow model example;
C) the integrated service sensing module carries out perception according to existing recognition strategy to the flow model example of importing, the output recognition result.
To this step, the purpose of the inventive method realizes, preferably, can further include:
D): the policy mappings module is set up the mapping between type of service and the network application strategy according to the recognition result of input;
Thereby the application strategy module provides corresponding service according to mapping result for Business Stream.
When the integrated service sensing module comprised quick identification module and machine learning module, step C can specifically comprise:
C1) the quick identification module adopts method for quickly identifying that the flow model example of input is carried out matching treatment earlier, the output recognition result;
C2) the flow model example that can't discern for the quick identification module, machine learning module adopt the method for machine learning to analyze, and then, analysis result are mated with existing recognition strategy, if coupling, the then corresponding recognition result of output; Otherwise, generate new recognition strategy, and the output recognition result.
Here, the method for quickly identifying that adopts among the step C1 comprises simple recognition methodss such as five-tuple detection, physical port detection or the detection of finite data bag, algorithm is simple, efficient is higher, can carry out efficient identification to the stream of easy identification, play pretreated effect, thereby significantly reduced the follow-up data flow that enters the machine learning module.Matching treatment described in the C1 is specially the convection model example and analyzes, and according to predefined matching precision flow model example and existing recognition strategy is mated, if coupling is then exported corresponding recognition result; Otherwise enter the machine learning resume module.
The machine learning method that adopts among the step C2 comprises one or more in the methods such as classification learning, related study, cluster, numerical prediction, has adopted expectation-maximized algorithm to carry out machine learning in this method execution mode especially.The form that learning outcome can adopt comprises: the rule of decision table, decision tree, classifying rules, correlation rule, the rule that comprises exception, inclusion relation, based on the expression of example, cluster etc.It needs to be noted: described matching precision can dynamically be set according to the actual requirements, thereby has realized greater flexibility and adaptability, can effectively remedy the deficiency of methods such as depth detection.
See also accompanying drawing 9, the equipment of the professional perception of realization of the present invention specifically comprises:
The quick identification module is used for the flow model example of input is carried out quick identification, identifies the pairing stream example of traditional network application mode, the output recognition result, thus the packet of these stream example correspondences is separated from total packet set fast;
The machine learning module, the stream example that is used for can't discerning the quick identification module carries out machine learning, and the recognition strategy in learning outcome and the recognition strategy storehouse is complementary, if can mate, then exports corresponding recognition result; Otherwise, generate new recognition strategy, be saved in the recognition strategy storehouse, and the output recognition result.Wherein, the quick identification module can adopt methods such as traditional five-tuple detection, physical port detection or the detection of finite data bag.The method of using when the machine learning module is carried out machine learning comprises: classification learning, related study, cluster or numerical prediction method.The learning outcome form that can adopt of output comprises: the rule of decision table, decision tree, classifying rules, correlation rule, the rule that comprises exception, inclusion relation, based on the rule of example, based on the expression or the cluster of example.
The above is preferred embodiment of the present invention only, is not to be used to limit protection scope of the present invention, all any modifications of being made within the spirit and principles in the present invention, is equal to replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (12)

1, a kind of system that realizes professional perception is characterized in that, this system comprises:
Packet feature extraction module is resolved the data packet stream of input and extract the characteristic that flows from packet;
The data extract policy library provides the data extract strategy of describing with formal language for packet feature extraction module;
The flow model example makes up module, the set of setting up the flow model example according to the characteristic of the described stream of packet feature extraction module output;
The recognition strategy storehouse for the integrated service sensing module provides the recognition strategy of describing with formal language, obtains and preserves new recognition strategy from the integrated service sensing module;
The integrated service sensing module, being used for the convection model example collection analyzes, judge whether the flow model example is complementary with existing recognition strategy, if coupling, then according to the recognition strategy that provides in the described recognition strategy storehouse described flow model example is discerned, output embodies the recognition result of type of service; If do not match, then by machine learning, generate new recognition strategy, described new recognition strategy is joined in the described recognition strategy storehouse, and the output recognition result.
2, the system of the professional perception of realization as claimed in claim 1 is characterized in that, also comprises:
The policy mappings module, its function is: the recognition result according to integrated service sensing module output, obtain data flow corresponding service type, and set up the mapping between type of service and the corresponding network application strategy.
3, the system of the professional perception of realization as claimed in claim 1 is characterized in that, described stream characteristic comprises at least:
One of the duration of source IP address, purpose IP address, source port number, destination slogan, protocol number, packet size, number of data packets, stream or flow.
4, the system of the professional perception of realization as claimed in claim 1 is characterized in that, described integrated service sensing module comprises:
The quick identification module, its function is: according to existing recognition strategy, discern processing with method for quickly identifying convection model example, and the output recognition result;
The machine learning module, its function is: to the flow model example that the quick identification module can't be discerned, use the method for machine learning to analyze, analysis result is mated with existing recognition strategy, if can mate, then export recognition result; Otherwise, generate new recognition strategy, and the output recognition result.
5, the system of the professional perception of realization as claimed in claim 4 is characterized in that, described method for quickly identifying comprises:
Five-tuple detection method, physical port detection method or finite data packet inspection method.
6, a kind of method that realizes professional perception is characterized in that, sets up the recognition strategy storehouse, dynamically generates or adjust recognition strategy; Also comprise step:
A) according to the data packet stream of input, from the data extract policy library, call corresponding data extract strategy, from the data packet stream of input, extract the stream characteristic according to this data extract strategy;
B) describe according to the flow model of definition, make up the set of flow model example in conjunction with the described stream characteristic of input;
C) the convection model example collection is analyzed, judge whether the flow model example is complementary with existing recognition strategy, if coupling is then discerned described flow model example according to the recognition strategy that provides in the described recognition strategy storehouse, output embodies the recognition result of type of service; If do not match, then by machine learning, generate new recognition strategy, described new recognition strategy is joined in the described recognition strategy storehouse, and the output recognition result.
7, the method for the professional perception of realization as claimed in claim 6 is characterized in that, also comprises:
D) set up mapping between type of service and the network application strategy.
8, the method for the professional perception of realization as claimed in claim 6 is characterized in that, among the described step C, describedly according to the recognition strategy that provides in the described recognition strategy storehouse described flow model example is discerned, and the recognition result that output embodies type of service is:
The integrated service sensing module carries out perception according to existing recognition strategy to the flow model example of importing, the output recognition result.
9, the method for the professional perception of realization as claimed in claim 8 is characterized in that, when the integrated service sensing module comprised quick identification module and machine learning module, described step C comprised:
C1) the quick identification module adopts method for quickly identifying that the flow model example of input is carried out matching treatment;
C2) the flow model example that can't discern for the quick identification module, machine learning module adopt the method for machine learning to analyze, and analysis result are mated with existing recognition strategy, if coupling, the then corresponding recognition result of output; Otherwise, generate new recognition strategy, and the output recognition result.
10, the method for the professional perception of realization as claimed in claim 9 is characterized in that, the matching treatment described in the step C1 is:
The convection model example is analyzed, and flow model example and existing recognition strategy is mated, if coupling is then exported corresponding recognition result; Otherwise enter the machine learning resume module.
11, a kind of equipment of realizing professional perception is characterized in that, comprising:
The quick identification module is used for according to existing recognition strategy, adopts method for quickly identifying that the flow model example of input is analyzed identification, the output recognition result;
The machine learning module is used for the flow model example that can't discern the quick identification module, uses the method for machine learning to analyze, and analysis result and existing recognition strategy are complementary, if can mate, then exports corresponding recognition result; Otherwise, generate new recognition strategy, and the output recognition result.
12, the equipment of the professional perception of realization as claimed in claim 11 is characterized in that, described method for quickly identifying comprises:
Five-tuple detection method, physical port detection method or finite data packet inspection method.
CNB2006100630196A 2006-09-30 2006-09-30 A kind of system, method and apparatus of realizing professional perception Active CN100550909C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2006100630196A CN100550909C (en) 2006-09-30 2006-09-30 A kind of system, method and apparatus of realizing professional perception

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2006100630196A CN100550909C (en) 2006-09-30 2006-09-30 A kind of system, method and apparatus of realizing professional perception

Publications (2)

Publication Number Publication Date
CN1997007A CN1997007A (en) 2007-07-11
CN100550909C true CN100550909C (en) 2009-10-14

Family

ID=38251936

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2006100630196A Active CN100550909C (en) 2006-09-30 2006-09-30 A kind of system, method and apparatus of realizing professional perception

Country Status (1)

Country Link
CN (1) CN100550909C (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101414939B (en) * 2008-11-28 2011-12-28 武汉虹旭信息技术有限责任公司 Internet application recognition method based on dynamical depth package detection
JP5408608B2 (en) * 2009-03-02 2014-02-05 公立大学法人大阪市立大学 Cryptographic traffic identification device and cryptographic traffic identification system including the same
CN101541048A (en) * 2009-04-03 2009-09-23 华为技术有限公司 Service quality control method and network equipment
CN102118796B (en) 2009-12-31 2013-04-24 华为技术有限公司 Bandwidth control method, device and system
CN102420830A (en) * 2010-12-16 2012-04-18 北京大学 Peer-to-peer (P2P) protocol type identification method
CN102111822B (en) * 2011-01-04 2014-04-09 南京邮电大学 Internet of things (IOT) method based on cognitive technology
CN102739457B (en) * 2012-07-23 2014-12-17 武汉大学 Network flow recognition system and method based on DPI (Deep Packet Inspection) and SVM (Support Vector Machine) technology
CN104243521B (en) * 2013-06-19 2017-06-09 北京思普崚技术有限公司 A kind of method that P2P Network Recognitions are carried out using deep packet inspection technical
CN105530138B (en) * 2014-09-28 2021-06-11 腾讯科技(深圳)有限公司 Data monitoring method and device
CN106257867A (en) * 2015-06-18 2016-12-28 中兴通讯股份有限公司 A kind of business recognition method encrypting flow and device
CN106452955B (en) * 2016-09-29 2019-03-26 北京赛博兴安科技有限公司 A kind of detection method and system of abnormal network connection
CN109547475B (en) * 2018-12-25 2021-06-29 中电福富信息科技有限公司 Business experience analysis system based on local network data flow collection
CN111131072B (en) * 2019-12-23 2023-08-22 北京浩瀚深度信息技术股份有限公司 Bury-free data acquisition method, device and storage medium
CN113395367B (en) * 2020-03-13 2023-04-28 中国移动通信集团山东有限公司 HTTPS service identification method and device, storage medium and electronic equipment
CN112822066B (en) * 2020-12-31 2022-03-11 北京浩瀚深度信息技术股份有限公司 Method and system for testing data link of DPI (deep packet inspection) equipment

Also Published As

Publication number Publication date
CN1997007A (en) 2007-07-11

Similar Documents

Publication Publication Date Title
CN100550909C (en) A kind of system, method and apparatus of realizing professional perception
Homayoun et al. BoTShark: A deep learning approach for botnet traffic detection
CN102035698B (en) HTTP tunnel detection method based on decision tree classification algorithm
Yun et al. A semantics-aware approach to the automated network protocol identification
CN103312565B (en) A kind of peer-to-peer network method for recognizing flux based on autonomous learning
CN102571946B (en) Realization method of protocol identification and control system based on P2P (peer-to-peer network)
CN104320304A (en) Multimode integration core network user traffic application identification method easy to expand
CN104348716A (en) Message processing method and equipment
Yang et al. Research on network traffic identification based on machine learning and deep packet inspection
CN103200133A (en) Flow identification method based on network flow gravitation cluster
Aksoy et al. Operating system fingerprinting via automated network traffic analysis
Szabo et al. Traffic analysis of mobile broadband networks
CN109151880A (en) Mobile application flow identification method based on multilayer classifier
Cai et al. An analysis of UDP traffic classification
Canini et al. GTVS: Boosting the collection of application traffic ground truth
CN106550241A (en) Video traffic identifying system and virtualization dispositions method
Hubballi et al. BitProb: Probabilistic bit signatures for accurate application identification
CN101854330A (en) Method and system for collecting and analyzing network applications of Internet
Min et al. Online Internet traffic identification algorithm based on multistage classifier
Oudah et al. A novel features set for internet traffic classification using burstiness
Li et al. High performance flow feature extraction with multi-core processors
CN106257867A (en) A kind of business recognition method encrypting flow and device
Qin et al. MUCM: multilevel user cluster mining based on behavior profiles for network monitoring
CN102098346B (en) Method for identifying flow of P2P (peer-to-peer) stream media in unknown flow
CN102480503B (en) P2P (peer-to-peer) traffic identification method and P2P traffic identification device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant