A kind of based on the agreement identification of peer-to-peer network and the implementation method of control system
Technical field
The present invention is a kind of agreement identification and control system to the peer-to-peer network service, and is main with the agreement identification and the control problem that solve peer-to-peer network, belongs to the peer-to-peer network field.
Background technology
The Peer-to-Peer application development had captured a plurality of applications rapidly and with gesture with lightning speed in recent years: the file-sharing application that with BitTorrent, Napster, eMule is representative is experienced the download tool that becomes main flow rapidly with its freedom, equality, open resource-sharing mode and user's download at a high speed; With Skype, ICQ, MSN is that the voice communication software of representative relies on its smoothness speech quality, cheap cost of the phone call make the networking telephone popularize rapidly clearly; Telecommunication service is constituted huge impact; Rely on broadcast experience, the rich and colourful items content of its clear and smooth to make Web TV break away from bandwidth bottleneck with PPlive, QQ is live, PPstream is representative Streaming Media application software, thereby become the new distribution platforms of multimedia resource such as video, TV programme, film.Be accompanied by the fast development that P2P uses, the negative effect that it brought also becomes and can not be ignored.Thereby virus, wooden horse are borrowing P2P file-sharing platform to spread at faster speed, have caused bigger destruction; Unhealthy contents such as pornographic, violence are distributed shared without restriction; Pirate music, films and television programs are easily escaped legal constraints; The network bandwidth is engulfed by a large amount of P2P data, and non-P2P user's network is experienced serious decline, and enterprise's key application can not get bandwidth guarantee; The asymmetric flow model in tradition the Internet is broken, and is no longer reasonable with the monthly payment charge method that this theory is the basis, and the interests of ISP are on the hazard.P2P uses; Particularly file-sharing, Streaming Media are used; Not only there are many this type of problems, use the choice of dynamical port simultaneously, pretend strategy escape normal flow amount detection machine systems such as other flow, enciphered data, anonymous communication, make these problems be difficult to control more.According to ASSOCIATE STATISTICS, up to now, P2P has occupied the flow of the Internet more than 70%.
The identification of research P2P flow also just becomes the previous relatively popular topic of order with control; And all also there are many disadvantages in the algorithm, mechanism, the solution that had identification of P2P flow and control in recent years now: the identification of existing P 2P flow usually need be foreseen the specifying information of P2P agreement in advance; Yet present P2P network New Deal emerges in an endless stream; Often change transport layer port and communicate, hide detection through application layer encryption even; Hysteresis quality is also compared in existing P 2P flow control, and human factor is many, can not accomplish the more accurately rational distribution network bandwidth of intelligence, and therefore bigger room for improvement is still arranged.
On economic benefit, the achievement of agreement identification and Control Study is with broad market prospect.Particularly use increasingly extensive at current P2P; When development was maked rapid progress, when enjoying the various content sharing services that P2P provides, P2P used and also brings many negative problems to people; The achievement in research of this paper will be for saving Internet resources; Improve bandwidth availability ratio, guarantee the dynamic equilibrium of network, network is turned round effectively sound assurance is provided.On social benefit, agreement identification is the hot issue that needs to be resolved hurrily in the current P2P application with control, and the agreement identification of this paper research is attempted obtaining certain breakthrough aspect the solution of this problem with control method.Agreement identification can promote present P2P network to develop to orderly direction with Control Study, for network manager especially operator, and reasonable use that can coordinate network resources, and guarantee that the user enjoys P2P service stable, that have QoS to ensure.Therefore the achievement in research of this paper is changed into product and will have wide market prospects.The research work of this paper simultaneously helps controlling the P2P flow to excessively the seizing of the network bandwidth, and also helps to safeguard the health environment and a networked society of building a harmony of the Internet.
Summary of the invention
Technical problem: the purpose of this invention is to provide the implementation method of a kind of agreement identification based on peer-to-peer network and control system, the agreement of peer-to-peer network is discerned and controlled.The present invention compares scheme in the past, and this scheme has novelty, flexibility, is prone to autgmentability and ease for operation, has good market prospects.
Technical scheme: the present invention is that concentrate, that can manage, controlled.The major function of this software is identification and control P2P flow, and wherein the recognition system and the PTCM mechanism of kernel module realization are the cores.Use the method identification unencryption flow of regular expression coupling; The flow that can not discern adopts machine learning algorithm to do classification; P2P flow with coupling carries out the Hash sign simultaneously, has so just reached the purpose that DPI and TLI dual mode combine, and the flow that will classify is then sent into the generation that PTCM mechanism is carried out Intelligent Control Strategy; Through message mechanism control strategy is issued bridge system at last, the bridge system TC that control carries according to control strategy.
The protocal analysis logic function mainly is made up of following components: the logic module of non-encrypted message identification module, encryption message identification module, PTCM mechanism module and message controlled function.
The step that this protocol analysis system implementation method comprises is:
Step 1). carry out demand analysis, agreement identification and the function that control system need be accomplished are analyzed, and generate the demand analysis document;
Step 2). according to the analytical documentation design module of step 1, the function of each module is carried out labor, generate logical relation and function declaration document between each module;
Step 3). according to the document of step 2; The non-encrypted message identification module of design and identification of realization agreement and control system and encryption message identification module; Recognition efficiency in protocol analysis system; Affect whole system operation efficient, the message identification module comes some unencrypted network messages in the agreement identification peer-to-peer network according to message length and fixed bit tagged word;
Step 4). according to the document of step 2, design with realize the non-encrypted message after agreement identification will discern with control system and encrypt message and carry out Hash, stamp label after, send the machine-processed module of PTCM to;
Step 5). according to the document of step 2, design and the PTCM mechanism function that realizes agreement identification and control system, this module is predicted according to the P2P flow of identification, generates the flow control strategy of intelligence then.
Step 6). according to the document of step 2; Design and the message controlled function that realizes protocol analysis system; This module realizes the non-encrypted control strategy that generates with message identification module and the PTCM mechanism of encrypting is mail to the message control system through message mechanism, thereby the message control system is according to the control strategy reasonable distribution network bandwidth that receives.
Beneficial effect: the present invention extracts one to the identification of peer-to-peer network protocol massages and the implementation method of control system, and the method for comparing in the past has some significant advantages:
Recognition efficiency is high: the present invention carries out message identification in network layer, shortens message identification path, and the present invention simultaneously only discerns uplink traffic, and downlink traffic is not handled, and alleviates the processing pressure of CPU, improves the recognition efficiency of message.Carry out separate processes to encrypting message and non-encrypted message simultaneously; Make the recognition accuracy of message reach 100%; Can not occur failing to judge, situation such as erroneous judgement, and traditional on router, carry out identification mode and generally only can reach 90%~95% identification hit rate.
Modularized design: the identification of whole agreement is divided according to functional module with control system, and non-encrypted message recognition function adopts deep layer scanning DPI technology, comes agreement identification association message according to the tagged word of message.Encrypting the message recognition function adopts the machine learning classification algorithm to carry out Classification and Identification.PTCM mechanism then is that the time series algorithm in the machine learning is predicted the P2P flow, thereby generates the flow control strategy of intelligence.The message controlled function is carried out reasonable distribution according to the flow control strategy to the network bandwidth.
Good system extension property: owing to what adopt between the system module is separate modular; Function parallelization stratification design; Communication mechanism between the system module adopts hierarchical setting fully; Therefore can add new function easily, the prior function of also can upgrading at an easy rate is so this system has good extensibility.
The reliability and stability of height: the unit testing through to agreement identification and control system shows that this protocol analysis system operation conditions is good, and occupying system resources is few, has good fault tolerant mechanism and disaster recovery capability.
Description of drawings
Fig. 1 is the physics networking diagram of agreement identification and control system,
Fig. 2 is the flow chart of non-encrypted message identification module,
Fig. 3 is a schematic diagram of encrypting the message identification module,
Fig. 4 is the schematic diagram of PTCM mechanism module,
Embodiment
Architecture
Non-encrypted message recognition function: the P2P flow recognition technology that detects based on application layer data is through protocal analysis and reduction technique; Extract P2P application layer data (being P2P load); Through analyzing the protocol characteristic value that P2P load is comprised, judge whether to belong to P2P and use.Therefore, these class methods also are called deep layer packet detection technique (DPI).In deep layer packet detection technique,, set up feature database through the load of concrete P2P agreement and corresponding P2P system thereof is carried out feature extraction.Real-time network stream for flowing through adopts pattern matching algorithm, judges wherein whether comprise the characteristic string in the feature database.If the characteristic matching success, this network flow is exactly the P2P data.Message identification division of the present invention promptly adopts the DPI scanning technique, comes Hash search message and some relevant mutual messages in the agreement identification Ares specific network service according to the fixed bit tagged word.
Encrypt the message recognition function: owing to encrypt the complexity of message; The P2P flow recognition technology that employing detects based on application layer data is no longer valid, so will adopt the sorting algorithm in the machine learning to carry out message identification, this module is image data at first; Obtain training sample; Set up model through flow attribution selection algorithm and machine learning algorithm again, reduced model then, and then in real time the data message is carried out discriminator through reduced model; The successful P2P flow of classifying carries out the Hash sign, sends it to PTCM module then; The non-P2P flow that identifies will be lost and be left intact.
The main generation that realizes P2P flow control strategy of PTCM mechanism function: PTCM mechanism; This mechanism is through reasonably predicting the P2P flow after the identification; Carry out the flow counting of threshold value up and down again; The minimum and maximum bandwidth of adjustment of intelligence, and the Intelligent Control Strategy that generates the most at last sends to bridge system through message mechanism, the distribution network bandwidth that bridge system is intelligent according to control strategy.
Method flow
This part specifies the design and the realization of summary of the invention various piece:
Non-encrypted message recognition function realizes: utilize the key data structure Socket Buffer (sk_buff) of the ICP/IP protocol stack in the Linux netfilter, the data of coming the operations flows warp.If message fragment, or do not have linking number, then return and do not have operation.
When a message flow through first Hook Function NF_IP_PRE_ROUTING, can be sent to storage temporarily among the control structure sk_buff of internal memory.In this control structure; The pointer that points to network message (as: skb->nh) is arranged; Whether at first discern message is the TCP message; The network layer that provides according to the sk_buff structure again and the size of transport layer header, both head length before skb->nh adds, pointer has just pointed to the head (as: Appdata pointer) of application layer data.The total length of packet also is provided, through deducting the size of network layer and transport layer header, the length of the layer data that just can be applied in the sk_buff.After above-mentioned preparation is accomplished, just can compare the ares message that needs identification, just through message length and fixed bit are mated to confirm through the Appdata pointer.What store in the sk_buff is the network bytes preface, thus comparison the time need use _ constant_htons () or _ constant_htonl () unifies network bytes preface and host byte preface.
The present invention need discern the protocol massages in the peer-to-peer network, judges whether the message into P2P, is not then not deal with.
Encrypting the message recognition function realizes: the P2P message that the present invention is directed to encryption; This module is at first through the harvester image data; Obtain training sample; Set up model through flow attribution selection algorithm and machine learning algorithm again, reduced model then, and then real-time data are carried out discriminator.
In network application, it is to be TCP with a certain protocol that circulation often is defined as, UDP, and among the ICMP one also has a pair of specific port between two Computer IP addresses, to transmit one or more IP packets sometimes.This five-tuple information (source IP address, purpose IP address, source port, destination interface, protocol type) has constituted the sign of distinguishing a stream.These information are present in each IP packet.
The characteristic of flow is considered to discern and distinguish following unknown network flow usually, the attribute of the flow that characteristic normally obtains through the calculation of mass data bag.Such as looking like maximum or minimum packet length in one direction, the duration of stream, a series of values such as the time of advent of tundish.
The most original stream is disabled in flow identification, and available is a series of attributes of describing stream.These attributes comprise source end and the port numbers of destination end and the behavioural characteristic of flow etc. of stream; These attributes can be used for different traffic classifications; But it is very worthless being to use whole attributes to carry out learning classification; Be of value to the accurately attribute of classification of flow but need from numerous attributes, pick out, remove incoherent and redundant attribute, this process just is called attribute and selects.And the attribute selection algorithm can be used for selecting to help the attribute of the accurate classification of flow.
This paper invents designed flow attributive character computing system Fullstats and is based upon under the Linux environment.This system can realize the transfer process from the primitive network packet to attributive character value file easily through the order line form.
The computational process of flow attribution characteristic can be divided into following four steps:
(1) grasps network packet with Technology of Network Sniffer;
(2) captured packets is classified by stream (the same stream of the genus that five-tuple information is identical), each stream comprises the plurality of data bag;
(3) add up the attributive character of each stream, and calculate 248 attributive character values;
(4) the attributive character value that obtains is done some format conversion, become the file format that needs.
According to above-mentioned 4 steps, utilize four softwares under the Linux environment to accomplish computing system Fullstats.These four softwares are respectively:
Tcpdemux: the demultiplexer of stream, be used for the data distribution of catching by sniffer, it will be new file of each stream.
Tcptrace: this software can be analyzed the characteristic of the stream that is hunted down, and produces abundant statistical value, i.e. characteristic value.
Outstanding sniffer under the Tcpdump:Linux, i.e. network packet catcher, filtering function is also very powerful.
Tcpslice: extract part Tcpdump file, or bonding these files.
Step 1 in the preamble is accomplished by Tcpdump, and step 2 is accomplished by Tcpdemux, Tcpdump, Tcptrace, the common completing steps 3 of Tcpslice.
PTCM mechanism function: the present invention designs the generation Intelligent Control Strategy, and the P2P flow control algorithm of foundation comprises that mainly time series models are predicted, predicted flow rate is compared and adjusts control strategy according to counter.
1) with the C programming time series predicting model being carried out program realizes;
2) set user's maximum broadband value BS, burst flow upper limit maximum uBS, lower limit minimum value dMS, pass through about threshold value counter uCount and dCount, wherein dMS<uBS<BS;
3) model prediction goes out the flow value Q of the next Δ t of P2P flow in the time according to FARIMA
Δ t
4) when the predicted value of first Δ t P2P flow constantly produced, timer then picked up counting;
5) if Q
Δ tGreater than uBS, count value uCount increases by one; If Q
Δ tLess than dMS, count value dCount subtracts one;
6) when in m Δ t time period, if during uCount>dCount, then uBS improves (uCount-dCount) individual grade d
n, and dMS reduces (uCount-dCount) individual grade d
n, and satisfy dMS<uBS<BS; If during uCount<dCount, uBS (dCount-uCount) individual grade d that descends then
n, and dMS rising (dCount-uCount) individual grade d
n, and satisfy dMS<uBS<BS.Its middle grade