A kind of based on the protocol identification of peer-to-peer network and the implementation method of control system
Technical field
The present invention is a kind of protocol identification for peer-to-peer network service and control system, is mainly used to the protocol identification and the control problem that solve peer-to-peer network, belongs to peer-to-peer network field.
Background technology
Peer-to-Peer application development rapidly and captured multiple application with gesture with lightning speed in recent years: with BitTorrent, Napster, eMule is that the file-sharing application of representative is free with it, equality, open Modes of Sharing Resources and user's downloading experience at a high speed become rapidly the download tool of main flow, with Skype, ICQ, MSN is that the voice communication software of representative relies on its smoothness speech quality clearly, cheap cost of the phone call makes the networking telephone popularize rapidly, huge impact is formed to telecommunication service, with PPlive, QQ is live, PPstream is that the Stream Media Application software of representative is experienced by means of the broadcasting of its clear and smooth, rich and colourful items content Web TV is broken away from bandwidth bottleneck, thus become video, TV programme, the distribution platform that the multimedia resources such as film are new.Along with the fast development of P2P application, its negative effect brought also becomes and can not be ignored.Virus, wooden horse is by P2P file-sharing platform thus can spread at faster speed, causes larger destruction; The decadent contents such as pornographic, violence are distributed shared without restriction; Pirate music, films and television programs easily escape legal constraints; The network bandwidth is engulfed by a large amount of P2P data, and the network of non-P2P user experiences degradation, and enterprise's key application can not get Bandwidth guaranteed; The asymmetric discharge model of conventional internet is broken, and the flat rate mode based on this theory is no longer reasonable, and the interests of ISP are on the hazard.P2P applies, particularly file-sharing, Stream Media Application, not only there is many problems, use choice of dynamical port simultaneously, pretend the strategy escape normal flow amount detection machine systems such as other flow, enciphered data, anonymous communication, make these problems more be difficult to control.According to ASSOCIATE STATISTICS, up to now, P2P has occupied the flow of the Internet more than 70%.
The identification of research P2P flow also just becomes the previous more popular problem of order with control, and in recent years the algorithm of existing P2P Traffic identification and control, mechanism, solution all also there is many disadvantages: existing P2P Traffic identification usually needs to predict in advance the specifying information of P2P agreement, but current P2P network New Deal emerges in an endless stream, frequent change transport layer port communicates, and hides detection even by application layer encryption; Existing P2P flow control also compares hysteresis quality, and human factor is many, can not accomplish more intelligent more accurately rational distribution network bandwidth, therefore still have larger room for improvement.
From economic benefit, protocol identification will have wide market prospects with the achievement controlling to study.Particularly increasingly extensive in current P2P application, time development is maked rapid progress, while enjoying the various Content sharing service that P2P provides, P2P application also brings many negative problems, achievement in research herein will for saving Internet resources, improve bandwidth availability ratio, ensure the dynamic equilibrium of network, network is operated effectively sound assurance is provided.From social benefit, protocol identification is hot issue urgently to be resolved hurrily in current P2P application with controlling, and the protocol identification studied herein and control method attempt to obtain certain breakthrough in the solution of this problem.Protocol identification with control to study and can promote that current P2P network is to orderly future development, for network manager especially operator, can the reasonable employment of coordinate network resources, and guarantee that user enjoys P2P that is stable, that have QoS to ensure and serves.Therefore achievement in research is herein changed into product and will have broad mass market prospect.The research work of this paper is simultaneously conducive to control P2P flow excessively seizing the network bandwidth, also contributes to a networked society safeguarding that the health environment of the Internet is harmonious with construction one.
Summary of the invention
Technical problem: the object of this invention is to provide a kind of based on the protocol identification of peer-to-peer network and the implementation method of control system, carries out identifying to the agreement of peer-to-peer network and controls.The present invention is than scheme in the past, and the program has novelty, flexibility, expansibility and ease for operation, has good market prospects.
Technical scheme: the present invention be concentrated, can pipe, controlled.The major function of this software identifies and control P2P flow, and recognition system and the PTCM mechanism of wherein kernel module realization are cores.Use the method identification unencryption flow of matching regular expressions, the flow employing machine learning algorithm that can not identify does classifies, the P2P flow of coupling is carried out Hash mark simultaneously, so just reach the object that DPI and TLI two kinds of modes combine, then sorted flow is sent into the generation that PTCM mechanism carries out Intelligent Control Strategy, finally by message mechanism, control strategy is issued bridge system, bridge system controls the TC carried according to control strategy.
Protocal analysis logic function forms primarily of following components: the logic module of non-encrypted message identification module, encrypted message identification module, PTCM mechanism module and message controlling functions.
The step that this protocol analysis system implementation method comprises is:
Step 1). carry out demand analysis, the function that protocol identification and control system have needed is analyzed, and generate demand analysis document;
Step 2). according to the analytical documentation design module of step 1, labor is carried out to the function of each module, generate the logical relation between modules and function declaration document;
Step 3). according to the document of step 2, the non-encrypted message identification module of design and implimentation protocol identification and control system and encrypted message identification module, recognition efficiency in protocol analysis system, affect the operating efficiency of whole system, message identification module carrys out some unencrypted network messages in protocol identification peer-to-peer network according to message length and fixed bit tagged word;
Step 4). according to the document of step 2, the non-encrypted message after identifying and encrypted message are carried out Hash by design and implimentation protocol identification and control system, after stamping label, send PTCM mechanism module to;
Step 5). according to the document of step 2, the PTCM mechanism function of design and implimentation protocol identification and control system, this module is predicted according to the P2P flow identified, then generates the flow control policy of intelligence.
Step 6). according to the document of step 2, the message controlling functions of design and implimentation protocol analysis system, this module realizes the non-encrypted control strategy generated with encrypted message identification module and PTCM mechanism to mail to message control system by message mechanism, thus message control system is according to the control strategy reasonable distribution network bandwidth received.
Beneficial effect: the present invention extracts one for the identification of peer-to-peer network protocol massages and the implementation method of control system, has some significant advantages than method in the past:
Recognition efficiency is high: the present invention carries out message identification in network layer, shortens message identification path, and the present invention simultaneously only identifies uplink traffic, does not process, alleviate the processing pressure of CPU to downlink traffic, improves the recognition efficiency of message.Separately process is carried out to encrypted message and non-encrypted message simultaneously, the recognition accuracy of message is made to reach 100%, there will not be fail to judge, the situation such as erroneous judgement, and traditional mode of carrying out identifying on the router generally only can reach the identification hit rate of 90% ~ 95%.
Modularized design: dividing according to functional module of whole protocol identification and control system, non-encrypted message recognition function adopts deep layer scanning DPI technology, carrys out protocol identification association message according to the tagged word of message.Encrypted message recognition function adopts machine learning classification algorithm to carry out Classification and Identification.PTCM mechanism is then that the time series algorithm in machine learning is predicted P2P flow, thus generates the flow control policy of intelligence.Message controlling functions, according to flow control policy, carries out reasonable distribution to the network bandwidth.
Good set expandability: be separate modular due to what adopt between system module, function parallelization Hierarchical Design, communication mechanism between system module adopts the structure of stratification completely, therefore new function can be added easily, also can to upgrade easily existing function, so this system has good extensibility.
The reliability and stability of height: by showing that to the unit testing of protocol identification and control system this protocol analysis system operation conditions is good, occupying system resources is few, has good fault tolerant mechanism and disaster recovery capability.
Accompanying drawing explanation
Fig. 1 is the physics networking diagram of protocol identification and control system,
Fig. 2 is the flow chart of non-encrypted message identification module,
Fig. 3 is the schematic diagram of encrypted message identification module,
Fig. 4 is the schematic diagram of PTCM mechanism module,
Embodiment
Architecture
Non-encrypted message recognition function: the P2P Traffic identification technology based on application layer data inspection is by protocal analysis and reduction technique, extract P2P application layer data (i.e. P2P load), by analyzing the protocol characteristic value that P2P load comprises, judge whether to belong to P2P application.Therefore, these class methods are also called deep layer packet detection technique (DPI).In deep layer packet detection technique, carry out feature extraction by the load of the P2P system to concrete P2P agreement and correspondence thereof, set up feature database.For the real-time network stream flowed through, adopt pattern matching algorithm, judge the feature string wherein whether comprised in feature database.If characteristic matching success, this network flow is exactly P2P data.Namely message identification division of the present invention adopts DPI scanning technique, comes Hash Search message in protocol identification Ares specific network service and some relevant mutual message according to fixed bit tagged word.
Encrypted message recognition function: due to the complexity of encrypted message, adopt no longer valid based on the P2P Traffic identification technology of application layer data inspection, so the sorting algorithm in machine learning will be adopted to carry out message identification, this module is image data first, obtain training sample, again by flow attribution selection algorithm and machine learning algorithm Modling model, then reduced model, and then in real time discriminator is carried out to data message by reduced model, successful P2P flow of classifying carries out Hash mark, is then sent to PTCM module; The non-P2P flow identified will be lost and be left intact.
PTCM mechanism function: PTCM mechanism mainly realizes the generation of P2P flow control policy, this mechanism is by reasonably predicting the P2P flow after identification, carry out the counting of flow upper-lower door limit value again, the minimum and maximum bandwidth of adjustment of intelligence, and the Intelligent Control Strategy generated the most at last sends to bridge system by message mechanism, bridge system is according to the distribution network bandwidth of control strategy intelligence.
Method flow
This part describes the design and implimentation of summary of the invention various piece in detail:
Non-encrypted message recognition function realizes: the key data structure Socket Buffer (sk_buff) utilizing the ICP/IP protocol stack in Linux netfilter, carrys out the data of operations flows warp.If message fragment, or without linking number, then return without operation.
When a message flow is through first Hook Function NF_IP_PRE_ROUTING, can be sent in the control structure sk_buff of internal memory and temporarily stores.In this control structure, there is a pointer (as: skb-> nh) pointing to network message, first identify whether message is TCP message, the network layer provided according to sk_buff structure again and the size of transport layer header, the above two the head length that skb-> nh adds, pointer has just pointed to the head (as: Appdata pointer) of application layer data.The total length of packet is also provided in sk_buff, by deducting the size of network layer and transport layer header, the length of the layer data that just can be applied.After above-mentioned preparation completes, just can be carried out the ares message of comparison needs identification by Appdata pointer, namely by carrying out coupling to determine to message length and fixed bit.What store in sk_buff is network bytes sequence, so need during comparison to use _ constant_htons () or _ constant_htonl () comes Unified Network syllable sequence and host byte sequence.
The present invention needs to identify the protocol massages in peer-to-peer network, determines whether P2P message, is not, do not deal with.
Encrypted message recognition function realizes: the P2P message that the present invention is directed to encryption, this module is first by harvester image data, obtain training sample, again by flow attribution selection algorithm and machine learning algorithm Modling model, then reduced model, and then discriminator is carried out to real-time data.
In network application, it is with in a specific agreement and TCP, UDP, ICMP one that stream is generally defined as, and sometimes also has a pair specific port to carry out transmitting one or more IP packet between two Computer IP addresses.This five-tuple information (source IP address, object IP address, source port, destination interface, protocol type) constitutes the mark of differentiation one stream.These information are present in each IP packet.
The feature of flow is considered to identify and distinguish following unknown network flow usually, the attribute of the flow that feature normally obtains by calculating a large amount of packets.Such as maximum or minimum a series of value such as packet length, the duration of stream, the time of advent of tundish in one direction.
The most original stream is disabled in Traffic identification, and available is a series of attributes describing stream.These attributes comprise the behavioural characteristic etc. of the source of stream and the port numbers of destination end and flow, these attributes can be used for different traffic classifications, but it is very worthless for using whole attributes to carry out learning classification, but need from numerous attributes, pick out the attribute being of value to flow Accurate classification, remove incoherent and attribute that is redundancy, this process is just called Attributions selection.And Feature Selection Algorithm can be used for selecting to be conducive to the attribute of flow Accurate classification.
Under the flow attribution feature calculation system Fullstats inventing design is herein based upon Linux environment.This system, by order line form, can realize the transfer process from raw network data bag to attributive character value file easily.
The computational process of flow attribution feature can be divided into following four steps:
(1) network packet is captured by Technology of Network Sniffer;
(2) classified by stream (the same stream of the genus that five-tuple information is identical) by the packet captured, each stream comprises some packets;
(3) add up the attributive character of each stream, and calculate 248 attributive character values;
(4) the attributive character value obtained is done some format conversion, become the file format of needs.
According to above-mentioned 4 steps, four softwares under Linux environment are utilized to complete computing system Fullstats.These four softwares are respectively:
Tcpdemux: the demultiplexer of stream, for the data distribution will caught by sniffer, it will be the raw new file of each miscarriage.
Tcptrace: this software can analyze the feature of captured stream, and produces abundant statistical value, i.e. characteristic value.
Sniffer outstanding under Tcpdump:Linux, i.e. network packet catcher, filtering function is also very powerful.
Tcpslice: Extraction parts Tcpdump file, or bond these files.
Step 1 is above completed by Tcpdump, and step 2 is completed by Tcpdemux, the common completing steps 3 of Tcpdump, Tcptrace, Tcpslice.
PTCM mechanism function: the present invention designs generation Intelligent Control Strategy, the P2P flow control algorithm of foundation mainly comprises time series models prediction, predicted flow rate comparison and adjusts control strategy according to counter.
1) with C programming, time series predicting model is carried out program realization;
2) set the maximum broadband value BS of user, burst flow upper limit maximum uBS, lower limit minimum value dMS, pass through upper-lower door limit value counter uCount and dCount, wherein dMS < uBS < BS;
3) the flow value Q in the P2P flow next Δ t time is gone out according to FARIMA model prediction
Δ t;
4) when the predicted value of the P2P flow of first Δ t produces, timer then starts timing;
5) if Q
Δ tbe greater than uBS, count value uCount increases by one; If Q
Δ tbe less than dMS, count value dCount subtracts one;
6) when within m Δ t time period, if during uCount > dCount, then uBS improves (uCount-dCount) individual grade d
n, and dMS reduces (uCount-dCount) individual grade d
n, and meet dMS < uBS < BS; If during uCount < dCount, then uBS decline (dCount-uCount) individual grade d
n, and dMS rising (dCount-uCount) individual grade d
n, and meet dMS < uBS < BS.Its middle grade