Background technology
P2P shared file system model can be divided into 4 kinds generally:
(1) the convergence directory formula P2P model of employing centric topology, resource discovering depends on the center LIST SERVER, and typical representative is the Napster system;
(2) the pure P2P model of the full distribution destructuring topology of employing, internodal file polling all is the broadcast mode transmission of directly passing through adjacent node with sharing, the typical representative is the Gnutella system;
(3) adopt the partly hybrid P2P model of distributed structural topology, on the basis of pure P2P, introduced the notion of super node, combine the advantage of searching quick and pure P2P decentralization of convergence directory formula P2P, typical representative is a Kazaa system etc.;
(4) the structural P 2 P model of the full distributed architecture topology of employing utilizes distributed message pass through mechanism and distributed hashtable (Distributed Hash Table, DHT) technology realization information location and file-sharing.Popular on the internet P2P software is based on hybrid P2P model and structural P 2 P model, is typically EMule, eMule/eDonkey etc.
P2P network monitor model mainly contains 2 kinds:
(1) based on the monitoring model of traffic characteristic, discerns and monitor, be mainly used in the management of Virtual network operator the P2P flow according to the traffic characteristic that various P2P software communication processes are presented;
(2) based on the monitoring model of flow content, discern and monitor, but then can't discern the P2P flow content of encrypting according to the flow content that P2P software is transmitted.
Document 1 " Thomas Karagiannis; Andre Broido; Michalis Faloutsos; et al.Transport layeridentification of P2P traffic[C] .IMC ' 04; October 25-27; 2004, Taormlna, Sicily, Italy.Copyright2004 ACM:121-134 " a kind of P2P flow rate testing methods based on transport layer characteristics is disclosed, this method is characterized as foundation with the P2P flow in the generality that transport layer was showed, in conjunction with traditional port detection technique; can detect the data flow that P2P that new P2P uses and encrypt uses, but this method is too complicated and can not use P2P and classify.As a whole, be mainly used in Virtual network operator to P2P flow restriction and management, and be not suitable for the P2P customizing messages and propagate monitoring and evidence obtaining based on the monitoring model of traffic characteristic.
Document 2 " Ho Gyun Lee; Taek yong Nam; Jong Soo Jang.The Method of P2P Traffic Detectingfor P2P Harmful Contents Prevention[C] .ICACT2005.Feb 21-23; 2005; Phoenix Park; Korea " a kind of P2P shared file system certain content method for supervising of content-based recovery disclosed, this method is at first discerned the P2P data flow, and the P2P data flow is divided into text, image and video data by the type of transmitting content.Adopt dictionary method relatively for the text type data content, the keyword that carries in the text is compared with setting up good flame dictionary library in advance, to realize monitoring to harmful content.For picture material, only mention monitoring in the literary composition to Pornograph, the ratio by " skin area " shared entire image in the image process method detection file surpasses certain threshold value and thinks that then this image carries Pornograph.For video file, adopt two kinds of monitoring methods: the one, from video file, obtain key frame, the content of key frame is judged; The 2nd, a certain fragment of recovery video file judges according to the content of this fragment whether video file exists illegal contents.The shortcoming of this scheme is that system architecture is too complicated, lacks unified testing mechanism; Data message content recovery technical difficulty is big, can't recover the ciphered data content; The detection method of image and video can only be taked ex-post analysis, needs to use the complex image processing technology, and amount of calculation is big, and real-time is poor, and it is lower to detect accuracy rate.Because the monitoring model based on the flow content need carry out restorative identification to the flow content that P2P software is transmitted, and but can't recover and monitor the encrypted P2P information content.
Existing P2P network monitor technology is mainly used in identification and the management of Virtual network operator to the P2P flow both at home and abroad, can't realize propagation of P2P customizing messages and audient thereof are effectively monitored and analyzed, and can not satisfy the application demand of network security supervision.
Summary of the invention
In order to overcome that art methods is propagated the eMule customizing messages and the audient monitors deficiency with the analytical effect difference, the invention provides a kind of initiative specific information and propagate monitoring method based on EMule, send the emulated data bag by simulation EMule client to the EMule network, and the packet that returns analyzed, can effectively propagate and monitor and analyze the EMule network-specific information.
The technical solution adopted for the present invention to solve the technical problems: a kind of initiative specific information based on EMule is propagated monitoring method, is characterized in comprising the steps:
(a) obtain the eMule resource in two ways: when (1) obtains the eMule resource from the Internet, from system, read earlier coupling the keyword "<a href=" ed2k of eMule resource in webpage: // "; from system's configuration information, read the search initial address again; the info web that this address is specified reads this locality and uses text mode to open; the content in the text is carried out word matched according to the resource matched keyword of eMule that has read; the follow-up of this matched position is preserved as the chained address of eMule resource; carry out 5~20 Thread Count configuration simultaneously, promptly earlier the Thread Count configuration information is made as 5, begin to obtain the eMule resource, whether the CPU usage of judging computer system reaches 80%, if do not reach, then the Thread Count configuration information is added 1, up to the CPU usage of computer system reach 80% or the Thread Count configuration information reach till 20;
System carries out match search to current web page information, discovery has link label:<a〉</a 〉, chained address in the link label is stored in the formation of being untreated as the follow-up address information that needs to handle, after the current web page information processing is finished, never above-mentioned matching treatment operation is repeated in first address that reads out this formation in the processing queue, till the chained address in the formation of being untreated is sky;
(2) when the eMule server obtains the eMule resource, disclosure agreement according to eMule, inquiry reciprocal process between simulation eMule client and the eMule server, send the data query bag to the eMule server, after the eMule server receives this querying command, at the own enterprising line search of server, and return Search Results to the eMule client, the Search Results when returning is organized in the mode of eMule the Resources list;
(b) the eMule resource that eMule resource acquisition device is obtained is analyzed, obtain the effective information that comprised in the eMule resource, comprise: filename, file size, file hash value, address, source, port number information, complete Hash Set information, and with these information stores in database, carrying out " audient's information " by the emulation client when obtaining, search relevant eMule resource fast according to the querying condition of appointment;
(c) adopt the emulation client approach to connect the eMule network, to obtain the Peers node listing information of sharing specific file, eMule emulation client terminal start-up, be connected to the eMule network, server offers client id of emulation client, exist when needing the task of file in download in task list, the eMule network type Ed2k or the Kad that are connected according to current emulation client carry out interactive simulation; Copy real network linking solicited message according to the file hash value in the eMule resource and initiate query requests to the Ed2k or the Kad webserver, after the Ed2k or the Kad webserver receive current query requests, inquire about the nodal information that download eMule resource respective file can be provided according to Ed2k or Kad procotol in Ed2k or Kad network, node listing information uses the tcp data bag to return to the emulation client; Adopt distributed system architecture, by disposing a plurality of emulation clients, each emulation client is submitted repeatedly query requests to the eMule network, and then can obtain to participate in all nodal informations of file transfer;
(d) after eMule emulation client gets access to the return information of Ed2k network and Kad network, according to eMule agreement and Ed2k or Kad agreement the return data bag is analyzed, obtain the Peer node listing information that current eMule resource respective file is downloaded, each Peer nodal information comprises client id and port numbers, and these node listing information are organized by protocol specification in packet; Receive the return information packet of Ed2k network or Kad network when the emulation client after, packet is decomposed according to protocol specification, obtain the node listing information of Ed2k network or Kad network; After obtaining node listing information that Ed2k network or Kad network return, all node nodal informations are reorganized,, and node sorted according to client id the knot removal that repeats, nodal information after the arrangement is linked, obtain the state information of this node;
After the emulation client obtains node listing, according to the eMule protocol specification, for each node, use the client id and the port numbers of this node to generate emulation handshake data bag, and send handshake request to this node, foundation is used for the TCP link of transfer of data, after setting up with linking between the node, request of data packet in the emulation client simulation eMule agreement, and to this node transmission, receive the return data bag of this node simultaneously, transmission and reception by the set time section, the data volume that receives is added up, get time average, draw the transmission rate and the node status information of this TCP link, for eMule resource information and audient's information of collecting, formed complete audient's monitor message, but owing in the process of acquisition of information, do not filter, only keep one, deposit the audience data storehouse in the data format of standard for the data that repeat.
The invention has the beneficial effects as follows: owing to send the emulated data bag to the EMule network by simulation EMule client, and the packet that returns analyzed, solved propagation of EMule customizing messages and audient's thereof monitoring problem effectively, and can efficiently searching and obtain EMule resource information in the Internet, the emulation client can get access to the audient's information that customizing messages is propagated that participates in exactly, form customizing messages target audience database, for network security supervision department provides audient's information monitoring and evidence obtaining means, be network safety situation macroscopic analysis and early-warning and predicting provide the foundation data and decision-making foundation simultaneously.
Below in conjunction with drawings and Examples the present invention is elaborated.
Embodiment
With reference to Fig. 1~2, basic conception involved in the present invention is explained as follows:
EMule resource: be meant the link information that starts a specific file propagation required by task of eMule, comprise Ed2K link and Kad link.
EMule resource acquisition device: be meant from the Internet and the eMule server on obtain the method for eMule resource automatically.
Automatically the method for obtaining the eMule resource from the Internet is specifically designed to search and obtains eMule resource with the form web page issue, adopt the depth-first search strategy, at first obtain web page contents and handle link information in the webpage: use " the coupling keyword " that set in advance to search the eMule resource link according to the known links address, eMule resource link for finding directly obtains the eMule resource; For web page interlinkage, put it into and wait for the URL formation of grasping; According to the depth-first search strategy, from the URL formation, select next step URL that will grasp then.Repeat said process, until the search stop condition that reaches default.
Automatically the method for obtaining the eMule resource from the eMule server is used to search for and obtain the eMule resource with the issue of eMule software inhouse, obtains the eMule resource information by the information inquiry process between simulation eMule client and the eMule server.
The emulation client is according to existing eMule resource, obtains audient's information by simulation eMule communication process.
Below realize the explanation of module for each:
Adopt the architecture of sub-module, sub-module can make the realization of each functional module separate, and carries out communication by interface between module and the module.The first order is an interface portion, comprising: user interface management.It is the main interface of system and user interactions, mainly realizes the interactive function of user and software and the calling function of other module; The second level comprises eMule resource acquisition module, eMule resource analysis processing module, eMule access to netwoks process simulation, the extraction of Peers node listing and processing, system parameter setting and read module and operates help for realizing part.Below realize the explanation of module for each:
EMule resource acquisition module: collect the eMule resource that needs by eMule resource acquisition device;
EMule resource analysis processing module: the eMule resource of collecting is analyzed and handled;
EMule access to netwoks process simulation: the access process of analog node and eMule network, obtain audient's information;
The Peers node listing extracts and handles: the return information to the eMule network is handled, and obtains the Peers node listing information that the eMule network returns, and by being connected with Peer, obtains the state information of Peer node;
System parameter setting and read module: the operational factor to system is provided with, and can read designated parameters as required;
Operate help: the module that the operation that can carry out system describes.
The inventive method adopts following steps to realize:
(1) obtains the eMule resource.
The mode of obtaining the eMule resource has following 2 kinds:
From the Internet, obtain the eMule resource automatically: at first from system, read the coupling keyword of eMule resource in webpage, be generally "<a href=" ed2k: // "; to carry out follow-up comparison; from system's configuration information, read the search initial address again; the info web that this address is specified reads this locality and uses text mode to open; the content in the text is carried out word matched according to the resource matched keyword of eMule that has read from front to back;, then the follow-up of this matched position is preserved as the chained address of eMule resource in case find that matched position is arranged.In order to accelerate eMule resource acquisition speed, this step need adopt multithreading to carry out work, so can proceed coupling, the down operation of subsequent content when preserving the eMule resource.In order to reach maximizing efficiency, the Thread Count that can carry out simultaneously needs and can be configured in system, considers the efficient of obtaining the eMule resource and the loading condition of computer, and the configuration scope of our regulation Thread Count is 5-20.During configuration by the configuring condition of operating personnel according to computer, at first the Thread Count configuration information is made as 5, begin to obtain the eMule resource, whether the CPU usage of judging computer system reaches 80%, if do not reach, then the Thread Count configuration information is added 1, restart to obtain the eMule resource, and whether the CPU usage of judging computer system reached 80%, repeats above-mentioned steps, up to the CPU usage of computer system reach 80% or the Thread Count configuration information reach till 20.When system carries out match search to current web page information, in case find to have link label:<a〉</a 〉, then the chained address in the link label is stored in the formation of being untreated as the follow-up address information that needs to handle, after the current web page information processing is finished, never above-mentioned matching treatment operation is repeated in first address that reads out this formation in the processing queue, till the chained address in the formation of being untreated is sky.
Automatically obtain the eMule resource from the eMule server: for search on the internet less than the eMule resource of specific file, can obtain by method at the enterprising line search of eMule server, when obtaining, disclosure agreement according to eMule, inquiry reciprocal process between simulation eMule client and the eMule server, send the data query bag to the eMule server, after the eMule server receives this querying command, at the own enterprising line search of server, and returning Search Results to the eMule client, the Search Results when returning is organized in the mode of eMule the Resources list.
Click eMule resource acquisition button on user interface, system calls eMule resource acquisition module automatically, and this module comprises 2 partial functions:
Automatically obtain the eMule resource from the Internet: at first reading the coupling keyword of eMule resource in webpage from system is "<a href=" ed2k: // "; reading the search initial address again from system's configuration information is: http://www.verycd.com/, the eMule resource quantity of obtaining is 9837.
Automatically obtain the eMule resource from the eMule server: at first the keyword that input will be inquired about the inquiry edit box is: " 007 ", click inquire button, the eMule resource quantity of obtaining is 309.
(2) analyzing and processing of eMule resource.
For the eMule resource of obtaining by eMule resource acquisition device, need analyze it, obtain the effective information that comprised in the eMule resource, comprising: filename, file size, file hash value, address, source, port number information, complete Hash Set information.These information stores in database, are being carried out " audient's information " when obtaining by the emulation client, can search relevant eMule resource fast according to the querying condition of appointment.
Because the eMule resource information is a string character string, the content format of character string is: ed2k: // | file|<filename〉|<file size〉|<file hash value〉| p=<hash set〉|/| source,<IP: port〉|/.During analysis the character string of obtaining is analyzed, what represent behind the keyword File is file name information, and then the content behind the file name information is a document size information, information after the file size is the file hash value, content behind the keyword " P=" is the complete Hash Set information of file, can guarantee the correctness of file like this, help the propagation of new file and rare file, the content behind the keyword " source " is the IP address and the port number information in source.
On user interface, click eMule resource analysis treatment button, system calls eMule resource analysis processing module automatically, this module is for the eMule resource of having obtained, it is analyzed, handles, obtain the effective information that comprised in the eMule resource, these effective informations comprise: filename, file size, file hash value, address, source, port number information, complete Hash Set information, and with these information stores in database.The eMule resource quantity that success is handled is 10146, and the database size of successfully storing after these data is 1.33M.
(3) eMule access to netwoks process simulation.
For eMule, can utilize the eMule resource query and obtain the Peers node listing of specific file-sharing, i.e. audient's information.Because the specific protocol that all adopts eMule software realizes communicating by letter between the Peers node, therefore need to adopt the emulation client approach to connect the eMule network, to obtain the Peers node listing information of sharing specific file.
When eMule emulation client terminal start-up, can be directly connected in the eMule network, server offers client id of emulation client, but only exists in task list when needing the task of file in download, just can carry out the mutual of next step with the eMule network.When needs carried out the file download in the eMule network, at first the eMule network type that is connected according to current emulation client carried out interactive simulation, and the eMule network type is divided into 2 kinds: Ed2k network and Kad network.
Ed2k access to netwoks process simulation: at first copy real network linking solicited message and initiate query requests to the Ed2k webserver according to the file hash value in the eMule resource, after the Ed2k webserver receives current query requests, inquire about the nodal information that download eMule resource respective file can be provided according to the Ed2k procotol in the Ed2k network, node listing information uses the tcp data bag to return to the emulation client.
Kad access to netwoks process simulation: at first copy real network linking solicited message and initiate query requests to the start node of Kad network according to the file hash value in the eMule resource, after the start node of Kad network receives current query requests, inquire about the nodal information that download eMule resource respective file can be provided according to the Kad procotol in the Kad network, node listing information uses the UDP message bag to return to the emulation client.
Only select a part of nodal information to return to the emulation client at random because Ed2k network and Kad network are each, the emulation client can not be obtained all nodal informations that participate in file transfer on the eMule network by one query.Therefore, this step can adopt distributed system architecture when implementing, and by disposing a plurality of emulation clients, each emulation client is submitted repeatedly query requests to the eMule network, and then can obtain to participate in all nodal informations of file transfer.
On user interface, click " eMule access to netwoks process simulation " button, system calls " eMule access to netwoks process simulation " module automatically, this module gets access to 10146 return data bags altogether by the eMule resource of having handled in the database is operated.
(4) the Peers node listing extracts and handles.
After eMule emulation client gets access to the return information of Ed2k network and Kad network, according to eMule agreement and Kad agreement the return data bag is analyzed, obtain to carry out the Peer node listing information that current eMule resource respective file is downloaded, each Peer nodal information comprises client id and port numbers, and these node listing information are organized by protocol specification in packet.Receive the return information packet of Ed2k network and Kad network when the emulation client after, packet is decomposed according to protocol specification, obtain the node listing information of Ed2k network and Kad network.
The data packet format of the node listing return information of eMule network is as shown in the table:
Title |
Size (byte) |
Default value |
Note |
Protocol type |
1 |
0xE3 |
|
Size |
4 |
|
The message size that does not comprise heading and size field |
Type |
1 |
0x42 |
The value of command code OP FOUNDSOURCES |
File Hash |
16 |
NA |
The hash value of associated documents |
Source quantity |
1 |
NA |
The source number that comprises in this message |
The source tabulation |
Variable |
NA |
The source list content |
The source listings format is as follows:
Title |
Size (byte) |
Default value |
Note |
Client id |
4 |
NA |
Share the ID of the client of this document |
Client port |
2 |
NA |
Share the client port of this document |
After obtaining node listing information that Ed2k network and Kad network return, all node nodal informations are reorganized:, and node sorted according to client id the knot removal that repeats.At last the nodal information after the arrangement is linked, obtain the state information of this node, comprising: whether available, transmission rate and the situation that has of resource.
After the emulation client obtains node listing, according to the eMule protocol specification,, use the client id of this node and port numbers to generate emulation handshake data bag, and send handshake request to this node for each node, be used for the TCP link of transfer of data with foundation.After setting up with linking between the node, request of data packet in the emulation client simulation eMule agreement, and to this node transmission, receive the return data bag of this node simultaneously, transmission and reception by the set time section, time average is added up, got to the data volume that receives, can draw the transmission rate and the node status information of this TCP link.
For eMule resource information and audient's information of collecting, formed complete audient's monitor message, but owing in the process of acquisition of information, do not filter, so in collected data, comprise redundancy and duplicate message, need filter the data of having collected: only keep one for the data that repeat, data format with standard deposits the audience data storehouse in, guarantees uniqueness, consistency and the integrality of data in the audience data storehouse.
On user interface, click " the Peers node listing extracts and handles " button, system calls " the Peers node listing extracts and handles " module automatically, this module is by handling the packet that has obtained, amount to and obtain 958533 nodes, actual enabled node is 567395, and all the other nodes are can not connected node.Under environment with good conditionsi, can adopt the distributed deployment structure, can obtain more nodal information.