Summary of the invention
For overcome that prior art is propagated the BitTorrent customizing messages and the audient when monitoring when analyzing; System architecture is too complicated; The testing mechanism of lack of uniform; And the long deficiency of computer processing time, the present invention provides a kind of passive load information monitoring method based on BitTorrent, generates " paper sample " information according to the file that has existed; Secondly pass through Hash calculating and pattern matching to " load information " and " paper sample " intercepted and captured; Whether " load information " of judging current transmission belongs to monitored customizing messages, and this method can be effectively exists the file of " paper sample " to monitor to all of BitTorrent transmission through network, can reduce computer processing time.
The technical solution adopted for the present invention to solve the technical problems: a kind of passive load information monitoring method based on BitTorrent is characterized in may further comprise the steps:
(a) use BitTorrent software that needed file is downloaded on the local hard drive; Generate " file sample " information according to the fileinfo that has existed; " file sample " information is carried out Hash to be handled; And the message length after Hash handles must be identical, the data message that the information after Hash is handled is spliced to form;
(b) " load information " carried out the Hash encryption, compare with " the file sample " handled through Hash " load information " with the process encryption;
(c) adopt character string matching method to carry out the relatively judgement of " load information " and " file sample ", n*20+1 compares to 20 of (n+1) * in " load information " encryption result and " the file sample ", and n is 0,1,2 in the formula ..., Maxn,
When judging certain " load information " and " file sample " coupling, write down current " load information " place TCP link information, " file sample " numbering and " load information " and meet quantity, proceed follow-up " load information " again and compare; When certain TCP was linked at " load information " quantity that meets on one " file sample " and reaches threshold value set in the system, " load information " that current TCP link is transmitted belonged to monitored specific file;
(d) according to source IP address, source port and purpose IP address, destination interface in the TCP link; Obtain the reference address information and the paper conveyance direction of audient's information; Again according to " file sample " under " load information "; Obtain the fileinfo that current P2P link is transmitted, extract current system time, audient's information that obtains and file transmission information are saved in the database as the file transfer time; The data of being preserved comprise: source IP address, source port, purpose IP address, destination interface, the file name of being transmitted, file hash value, transmission time, " load information " number of matches, record; Only keep one for the data that repeat writing time, deposits the audience data storehouse in the data format of standard.
The advantage that the present invention compares prior art is: owing to adopted " load data " intercepted and captured carried out the technology that data after Hash calculates are carried out pattern matching; The original contents that does not need recovery file; Need not be concerned about the form that is transmitted file; Only need carry out Hash calculating and pattern matching to " load information " of reality intercepting and capturing gets final product; Because it is a technology complicacy, that relate to a plurality of computer realms that file recovers, and omits this step and can save a large amount of computer processing times, the real-time in the time of can accomplishing the BitTorrent information monitoring to transmission over networks; Through experimental verification and actual test shows; This method is for the transmission through network BitTorrent information of 1G bandwidth; Can realize intercepting and capturing in real time, Hash calculates and pattern matching, can get access to the audient's information that customizing messages is propagated of participating in exactly, forms customizing messages target audience database; For network security supervision department provides audient's information monitoring and evidence obtaining means, be network safety situation macroscopic analysis and early-warning and predicting provide the foundation data and decision-making foundation simultaneously.
Below in conjunction with accompanying drawing and embodiment the present invention is further specified.
Embodiment
The present invention adopts the architecture of sub-module, and sub-module can make the realization of each functional module separate, and carries out communication through interface between module and the module.The first order is an interface portion, comprising: user interface management.It is the main interface of system and user interactions, mainly realizes the interactive function of user and software and the calling function of other module; The second level is implementation part, comprises file sample manufacturing module, load information intercepting and capturing processing module, load information and file sample comparison module, audient's information extraction and processing, system parameter setting and read module and operates help.Below realize the explanation of module for each:
File sample manufacturing module: " file sample " information of making this document according to known file;
Load information is intercepted and captured processing module: " load information " to transmission over networks intercepted and captured, and " load information " intercepted and captured handled;
Load information and file sample comparison module: compare according to " load information " that obtained and " file sample ", judge whether " load information " of current results is monitored specific file;
Audient's information extraction and processing: according to comparative result, the audient's information in the extraction TCP link also is saved in the database;
System parameter setting and read module: the operational factor to system is provided with, and can read designated parameters as required;
Operate help: the module that the operation that can carry out system describes.
The implementation step that the inventive method is concrete is following:
1) generates " file sample ".
BitTorrent flow information through " passively " intercepts and captures on the network is monitored the BitTorrent transmission information on the network; " load information " be meant when BitTorrent transmits, and after connecting between the node, and when carrying out actual transfer of data, place TCP chains the concrete data message that is transmitted; This data message is generally the actual information of institute's transfer files, does not have encrypted also not additional any out of Memory.
When carrying out based on the passive type of " load information " monitoring; Monitored fileinfo at first must exist; This document is the basis of carrying out subsequent operation; The acquisition of file can obtain in several ways, for example can use BitTorrent software that needed file is downloaded on the local hard drive; Generate " file sample " information according to the fileinfo that has existed then; " file sample " is meant specific file cut apart according to the size of appointment, the information after cutting apart is carried out Hash handle, and the message length after the Hash processing must be identical, the data message that the information after Hash is handled is spliced to form; Use " file sample " and the load information of intercepting and capturing to compare, can judge whether load information is the part of specific file; The host-host protocol regulation of BitTorrent; The least unit of transmitting between node and the node is " piece ", and the size of " piece " is stipulated in " seed file ", is generally 64K, 128K, 256K, 512K and other size; But be necessary for the multiple of 16K; Because to " piece " when transmitting, must " sheet " that " piece " is divided into fixed size be transmitted, the size of " sheet " is the 16K that fixes; When making " file sample ", consider that the size of " piece " can be according to the actual conditions of " seed file " and difference, and the size of " sheet " is fixed; If use the size of " piece " to make " file sample ", identical file must generate a plurality of " file samples ", when intercepting and capturing " load information ", must be unit with " piece " simultaneously, has so just increased the amount of calculation of system and has compared workload; So we use the size of " sheet " to generate " file sample ", have so just reduced the computation complexity of system, the process of intercepting and capturing, compare is unified, do not need to consider the special circumstances in " seed file ".
On user interface, click the file selector button; Selection needs to make the file of " sample " in the file selection box that ejects; Click " file sample " again and generate button; System calls " file sample " generation module automatically, and this module reads the content and " file sample " information of generation of specified file automatically, and the path of depositing of " file sample " message file of generation is provided with in system parameters; The file name of " file sample " message file that generates is identical with old file name, and suffix is called " .lar ".When experiment, the file of selecting altogether is 19, and " the file sample " of generation is 19, all generates successfully.
When generating " file sample " according to the file that has existed; First 16K file content that at first reads specified file is in internal memory; File content in the internal memory is carried out Hash calculate, for the message length after calculating can be identical and can not be cracked by other system, we adopt Secure Hash Algorithm (Secure Hash Algorithm; Hereinafter to be referred as SHA1) carry out Hash calculating, this algorithm has following characteristic: cannot recover prime information from the information after encrypting; Different prime informations can produce different encrypted information, but the length of enciphered message is fixed; Using the message length that produces behind the encrypted content file of this algorithm to 16K is 20; After obtaining the enciphered message of first 16K file content, in internal memory, we are referred to as " file sample working area " with this information temporary storage; Obtain the next 16K content of file again, carry out Hash and calculate, the Hash result calculated is linked to the back in " file sample working area ", repeat above-mentioned steps till specified file is finished dealing with; Because the size of file is not all to be the multiple of 16K; So the last file content of handling can be less than 16K; But when being to use the SHA1 algorithm that the file content less than 16K is carried out computations, the enciphered message of generation also is 20, so the generation of " file sample " is not influenced.
After all the elements of file were finished dealing with, the content in " file sample working area " was exactly the actual content of " file sample ", and these information are saved as binary file, so that follow-up use.
2) intercepting and capturing of " load information " and analyzing and processing.
The source of " load information " mainly is BitTorrent network traffics of obtaining transmission over networks through the method for passive monitoring; Because the content of transmission over networks is a lot, in the data of these magnanimity, intercept and capture " load information " accurately and must use the transmission feature of BitTorrent " load information " to carry out: at first " load information " must be to chain at TCP to transmit; Secondly the BitTorrent consultation adds that in the front of " load information " character symbol transmits, and the content of character symbol is " 0000400907 ", and wherein the size of data of 0040 expression transmission is 16K, and the data that 07 expression is transmitted are " load information "; Because the size of " sheet " that transmit through TCP is 16K; And the size of TCP transmission package is 1406 bytes; TCP can be divided into a plurality of TCP bags with the data that need transmission and transmit; Intercept and capture when information foremost character is for " 0000400907 " in the TCP bag, need follow-up a plurality of TCP bags are made up, to generate actual " load information ".
After obtaining " load information "; If directly maneuvering load information and original compare, can increase the comparison amount of calculation, so carrying out Hash to " load information ", we calculate; Use the information after Hash calculates to compare, to reduce the comparison amount of calculation of system with " text sample "; It is identical that " load information " carried out the Hash computational methods that Hash Calculation Method and generation " file sample " adopted, and all is the SHA1 AES.
On user interface, click " load information " and intercept and capture treatment button; System calls " load information " automatically and intercepts and captures processing module; This module is intercepted and captured for " load information " of transmission through network; And " load information " intercepted and captured carried out the Hash computations, after computations is accomplished, directly call " load information " and compare with " file sample " comparison module; " load information " quantity of in 1 hour, successfully intercepting and capturing and handling is 38953.
3) the relatively judgement of load information and " file sample ".
When system moved, the initial method of " load information " and " file sample " comparison module can at first call in system, and this method can need " the file sample " of monitored file to read in the internal memory all, with the efficient of accelerating relatively to judge.
After " load information " intercepted and captured and carried out the Hash computations; Can call the determination methods of this module; This method can compare according to " the file sample " in " load information " that import into and the internal memory fast, whether belongs to monitored file to judge current " load information " that imports into.If do not belong to monitored file, then directly return, if belong to monitored file, then directly call audient's information extraction and processing module, obtain audient's information of current " load information ".
The quantity that " load information " intercepted and captured judged is 38953, has 7795 " load informations " to belong to monitored file.
The relatively judgement of " load information " and " file sample " uses character string matching method to carry out, because the encrypted result of the SHA1 AES that adopts is 20 characters, so n*20+1 compares to 20 of (n+1) * in the encrypted result of relatively the time, using " load information " and " the file sample "; Wherein n is 0; 1,2 ...; Maxn
In order to accelerate the efficient of comparison, when system's operation starts, need monitored " file sample " is read in the internal memory, when comparing, directly in internal memory, compare.
When judging certain " load information ", can not directly just say that this " load information " belongs to the part of monitored file, because exist the possibility that single " load information " repeats with " file sample " coupling; Write down current " load information " place TCP link information, " file sample " numbering and " load information " and meet quantity, proceed follow-up " load information " more relatively; When certain TCP was linked at " load information " quantity that meets on one " file sample " and reaches threshold value set in the system, " load information " that current TCP link is transmitted belonged to monitored specific file.
4) extraction of audient's information and processing.
When judging " load information " that TCP link transmitted and be exactly the specific file that belongs to monitored; Need extract audient's information that current TCP link is comprised: the reference address information and the paper conveyance direction that at first can obtain audient's information according to source IP address, source port and purpose IP address, destination interface in the TCP link; Can obtain the fileinfo that current P2P link is transmitted according to " file sample " under " load information " again, extract current system time as the file transfer time.
For the audient's information and the file transmission information that obtain; We need be saved in these information in the database; So that use when carrying out other operation; The data of preserving comprise: source IP address, source port, purpose IP address, destination interface, the file name of being transmitted, file hash value, transmission time, " load information " number of matches, record, writing time.
Audient's information for collecting is put in order; Can form complete audient's monitor message; But owing in the process of acquisition of information, do not filter, so in collected data, comprise redundancy and duplicate message, need filter the data of having collected: the data for repeating only keep one; Data format with standard deposits the audience data storehouse in, guarantees uniqueness, consistency and the integrality of data in the audience data storehouse.
When " load information " is judged as the monitored file of data; System can call the extraction and the processing module of audient's information automatically; Obtain audient's information of the P2P link at " load information " place; Audient's information of obtaining comprises: source IP address, source port, purpose IP address, destination interface, the file name of being transmitted, file hash value, transmission time, " load information " number of matches, record, writing time; Extract audient's information to judging " load information " that belong to monitored file, the audient's information data recording quantity that obtains is 7795, and these audient's information are filtered, put in order, and obtaining different audient's information contents is 3.