CN100477668C - Stream sampling device and method for detecting high speed network super connection host - Google Patents

Stream sampling device and method for detecting high speed network super connection host Download PDF

Info

Publication number
CN100477668C
CN100477668C CNB2006100993217A CN200610099321A CN100477668C CN 100477668 C CN100477668 C CN 100477668C CN B2006100993217 A CNB2006100993217 A CN B2006100993217A CN 200610099321 A CN200610099321 A CN 200610099321A CN 100477668 C CN100477668 C CN 100477668C
Authority
CN
China
Prior art keywords
sampling
stream
centerdot
module
probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2006100993217A
Other languages
Chinese (zh)
Other versions
CN1901545A (en
Inventor
王洪波
程时端
林宇
金跃辉
王文东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CNB2006100993217A priority Critical patent/CN100477668C/en
Publication of CN1901545A publication Critical patent/CN1901545A/en
Application granted granted Critical
Publication of CN100477668C publication Critical patent/CN100477668C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

This invention provides a flow sample device and a method used in the test of high speed network super-connection host, in which, the device includes a new flow test module, an error absorption module and a random sample module, the sample method include: the new flow test module tests each arrived packet and judges if the packet belongs to a new flow, if so, it shows new flows have arrived, then the error absorption module computes the test error probability of the test module and adjusts the sample probability of said new flows, then the random sample module generates random numbers according to the sample probability computed by the error absorption module to decide if it samples said flow, which realizes equal probability random sample independent of flow identification having the line speed process ability of 10Gps and storage space of less complexity.

Description

Be used for stream sampling device and method that high speed network super connection host detects
Technical field
The present invention relates to a kind of network security detection technique, exactly, relate to a kind of stream sampling device of high speed network super connection host detection and implementation method of stream sampling thereof of being used for, belong to the network interconnection communication technical field.
Background technology
In recent years, network safety event frequently takes place, various virus overflowings, and for example propagation of distributed denial of service attack (DDoS), worm (Worms) virus, TCP (Port Scan) etc., the network security accident that has can cause enormous economic loss.Along with the Internet is penetrated into the various aspects of human economy, social activities gradually, detect accurately and rapidly and the response to network security incident has become important technology indispensable in the internet arena.
Find that after deliberation the similar behavior that network safety event had helps the carrying out that detect.For example, scan-type worm (Scanning Worms) can be sent grouping to a large amount of other main frames usually at short notice by the main frame of virus infections when propagating.In order to find to have the main frame of security breaches, the invador often utilizes the TCP technology to send the detection grouping by a main frame to the same port of a large amount of other main frames or the different port of same main frame.From the network layer angle, if at short notice, find to have in the network a series of IP groupings to have same source IP address, various objectives IP address (or same purpose IP address, different port number), then the network security detection system just can be classified the main frame of source IP address as the suspicion main frame, and does further to analyze and response.Symmetrical with above situation, distributed denial of service attack (DDoS) is that a large amount of at short notice different main frames send attack packets to same destination host, in the case, then be that interior a series of IP groupings of short time have same purpose IP address and different source IP addresss.
The common feature of above-mentioned these situations is: have a large amount of different IP streams (IP flow) at short notice between certain main frame and other main frame, main frame in such cases is called as super connection host.IP stream (IP flow) is a series of IP grouping sets with predicable, and predicable wherein depends on their traffic identifier (stream ID or flow ID).In above example, the victim of propagating main frame, the main frame that carries out TCP and the DDoS of worm-type virus all is a super connection host.The existence of super connection host is just indicating the generation of network safety event, and can proceed to analyze and response as trigger condition.Therefore detect and find that super connection host is the important foundation during network security detects.
At present, network security detection system (as intruding detection system, fire compartment wall etc.) has been widely deployed in the edge of each local area network (LAN) or the porch of network, is used to protect the internal network of each enterprises and institutions.In recent years, DDoS in target of attack, a large amount of consume network bandwidth; The propagation of worm often makes relevant link reach capacity rapidly.These security incidents all can make the core network service quality of ISP (ISP) descend, and impel increasing ISP or plan at its core network on-premise network safety detecting system.In addition, large enterprise's internal network and Web website thereof be the target of attack of DDoS often, and along with the development of business, the access bandwidth of these networks is also in continuous lifting.This just needs network security to detect can adapt to higher link rate (more than the 2.5Gbps), to satisfy the needs of ISP or large enterprise.
Traditional super connection host detection technique is the method for the every stream mode of working service mostly.For example, for detection port scanning, the Snort system adds up the different main frames with what of each source host (or port) by each stream information of preservation (promptly by source, the traffic identifier of destination address to forming) and communicates.Yet the high speed of link detects to super connection host and has brought very big challenge: on the one hand, in high-speed link, have a large amount of stream (Flow) in the unit interval.In order to preserve the sign of this a large amount of stream, must use the memory of high power capacity; On the other hand, for the grouping of linear speed process IP in express network, must use high-speed memory.According to the conventional semiconductor technology, dynamic RAM (DRAM) capacity is big, but its access speed slow (speed was tens nanoseconds) can not adapt to the linear speed processing demands in the express network; And static random-access memory (SRAM) though access speed fast (speed is several nanoseconds), but its finite capacity (having only tens Mbits) can not satisfy the requirement of safeguarding every stream mode.
In order to tackle the scalability challenge that network high-speedization is brought, people generally adopt the method based on stream sampling (Flow Sampling) recently.Its basic thought is that each stream that newly observes is carried out the sampling of fixation probability, the main frame relevant with sampled flows (comprising source IP address main frame or purpose IP address main frame) is regarded as the focus measurement object, and they (are for example carried out flow measurement, safeguard every stream mode for the stream of these main frames, be used for flow accounting).In the measurement afterwards,, will judge that then this measuring object is a super connection host if the stream number of certain focus measurement object surpasses the thresholding of setting.Because the quantity of the stream relevant with super connection host is more, as long as the numerical Design of stream sampling probability is reasonable, then super connection host will be listed in the focus measurement object under probability guarantees.
Detection for super connection host, the stream sampling must be satisfied two basic demands: it is disposable that sample (1): in network layer, the base unit of measuring is grouping, and one fail to be convened for lack of a quorum a plurality of groupings arranged, " sampling disposable " requires: each stream is only carried out single sampling, and do not rely on actual measurement to, the number of packet that belongs to this stream; This is a stream with regard to the base unit that has guaranteed sampling, rather than grouping.(2) equiprobability random sampling: the probability that any one stream is sampled equates, and does not rely on the content of traffic identifier; This probability that has just guaranteed that each main frame is listed in the focus measurement object is directly proportional with its fluxion amount, thereby makes super connection host be listed in the focus measurement object under probability guarantees.
Current, prior art all is to use based on Hash stream sampling (hash based flow sampling) method and flows sampling, and its basic thought is: a given hash function h, the space F that its independent variable domain of definition is a traffic identifier, function value is H:[0,1); A given again sample domain S:[x, x+r), the span that wherein detects starting point position x (real number) and sampling probability r is: 0≤x<x+r≤1; For the grouping of each arrival, remember that the traffic identifier of the stream that it belongs to is f, ask its cryptographic Hash h (f), if h (f) ∈ is S, then the stream at this grouping place is sampled; Otherwise, do not sampled.
Set hash function h and sample domain S:[x, x+r) after, whether a stream is sampled just depends on traffic identifier f, promptly whether its hash function value h (f) in the S scope.In order to satisfy the requirement of " equiprobability random sampling ", hash function h must satisfy even random Harsh (uniform random hashing) attribute: for traffic identifier f arbitrarily, hash function value h (f) is uniformly distributed in H:[0,1) on stochastic variable.Can not prove in theory to have the hash function that satisfies even random Harsh attribute, therefore use basic assumption prerequisite to be: the hash function that has enough approximate even random Harsh attribute based on the sampling of Hash stream.
Key based on Hash stream sampling algorithm is the selection of hash function.Selected hash function must satisfy following three requirements simultaneously: calculate fast (1), and (2) are safe, and (3) have enough good even random Harsh attribute.
Because handle each IP grouping carrying out linear speed that link arrives, therefore selected hash function must be finished the calculating of a cryptographic Hash in the shortest grouping arrives the time interval.For example: under OC-48 (2.5Gbps), OC-192 (10Gbps) link, the computing time of each cryptographic Hash must be respectively less than 128 nanoseconds, 32 nanoseconds.This just requires the hash function calculator should have very fast computational speed, to adapt to different high-speed links.
Super connection host detects and is mainly used in network safety filed, thereby the security performance of detection system itself is even more important.Because the algorithm based on the sampling of Hash stream is to use hash function to flow sampling, so it suffers the algorithm complexity attack easily and has potential safety hazard.For example, in ddos attack, if grasped the hash function that uses in the stream sampling in advance, the assailant just can produce a series of specific attack streams by forging source IP address, make these traffic identifier cryptographic Hash all (or most of) drop on outside the sample domain, detect thereby escape super connection host.
Though can provide pseudorandom cryptographic Hash sequence by enough hash functions, because hash function is the certainty function, cryptographic Hash still depends on the input value of hash function.If input has randomness, just become more readily available randomness output preferably; If input does not have randomness and will obtain enough output at random, just must use to have the strong evenly hash function of random Harsh attribute.
Generally acknowledge that at present can produce the best hash function of randomness sequence is encryption hash functions such as MD5, SHA1, but the calculation of complex of these hash functions, be difficult to realize quick computing, even adopt dedicated devices also to need tens clock cycle could produce once output, therefore be difficult to the high-speed packet processing environment.For example, the once-through operation process of the specialized hardware of some prior aries needs 64 clock cycle at least at present, this has surpassed the minimum down packet transaction time of high speed network environment, promptly use the expensive goods Virtex-4 (500MHZ) of Xilinx FPGA Virtex series, 64 clock cycle also needed for 128 nanoseconds at least.And under the 2.5Gbps link, the processing time of minimum packets must under the 10Gbps link, then be less than for 32 nanoseconds less than 128 nanoseconds.And, because the operation in tandem characteristic of hash function algorithms such as MD5, SHA1 is difficult to improve computational speed by improved parallel and flowing water optimisation technique.Equally, the CRC32 algorithm need carry out a large amount of multiplication and division computings because of it, and computational speed is also slower.Hash function based on simple division, multiplying then simply lacks fail safe because of its mathematic(al) structure.In a word, in the existing hash function, the fast hash function of computational speed lacks fail safe or even randomness, and it is too slow to have the hash function computational speed of high security and strong evenly randomness.
Therefore, existingly can only use under the lower environment of speed (being lower than 2.5Gbps) based on the Hash stream methods of sampling.In order to detect super connection host under environment more at a high speed, designing a kind of new stream methods of sampling as early as possible just becomes those skilled in the art new problem.
Summary of the invention
In view of this, the purpose of this invention is to provide a kind of stream sampling device of high speed network super connection host detection and implementation method of stream sampling thereof of being used for, the present invention can overcome the various defectives of prior art preferably, have 10Gbps linear speed disposal ability and less space complexity, can realize being independent of the equiprobability random sampling of traffic identifier, the security incident that can be used for the 10Gbps express network detects.
In order to achieve the above object, the invention provides a kind of stream sampling device that high speed network super connection host detects that is used for, it is characterized in that: described device comprises following three modules:
New stream detection module, this module is by a plurality of hash function arithmetic unit h independent of each other 1, h 2..., h kForm with bit vector memory, NAND gate and the flow counter of the m position bit that is linked in sequence with it, be arranged on the edge of local area network (LAN), the porch or the core network inside of network, IP grouping to input adopts hash function to detect, judged whether newly to flow to reach, reach if having newly to flow to, then the traffic identifier that will flow is delivered to stochastic sampling module, so that this stream is carried out the equiprobability random sampling, treats after error absorption module is calculated the probability of error of the stream detection module that makes new advances flow counter to be added 1;
Stochastic sampling module, the result that the sampling probability that calculates according to the random number and the error absorption module of this module generation compares carries out random sampling to new stream, and with output sampling stream, but the traffic identifier of the sampling of this moment and each new stream has nothing to do;
Error absorbs module, connects above-mentioned two modules respectively, by the new stream number of the flow counter statistics in the new stream detection module, calculates the probability of error p of new stream detection module, adjusts sampling rate according to default sampling probability and is
Figure C20061009932100101
And export to stochastic sampling module, to satisfy the designing requirement of " equiprobability random sampling ".
In the described new stream detection module, the codomain of each hash function of hash function arithmetic unit all be 1,2 ..., m}, the number of bits m of the number k of hash function arithmetic unit and bit vector memory is all natural number, the k value is the natural number more than or equal to 2, m is greater than k.
Described hash function arithmetic unit is by the H that forms with door and two kinds of logical devices of XOR gate that can finish matrix multiplication, add operation fast 3The hash function arithmetic unit, H 3Hash function is by the linear transformation of following equation: b 1 b 2 . . . b z = r 11 r 12 . . . r 1 w r 21 r 22 . . . r 2 w . . . . . . . . . . . . r z 1 r z 2 . . . r zw a 1 a 2 . . . a w , The string of binary characters A=a of w bit 1a 2A wBe mapped as the string of binary characters B=b of z bit 1b 2B z, matrix element r wherein Ij=0 or 1, the subscript i of representing matrix row, column sequence number, the span of j are respectively: 1,2 ..., z} and 1,2 ..., w}.
The bit vector memory of described m bit is multi-port SRAM device or two and above parallel dual-port SRAM device, so that the processing time of each grouping was no more than for 10 nanoseconds, can support the linear speed of IP grouping on the 10Gbps link to handle.
Described stochastic sampling module is a random number generator, to each by error absorb sampling probability that module calculates generate between a setting district [0,1) in random number, be used to realize being independent of the equiprobability random sampling of traffic identifier; If this random number absorbs the sampling probability that module is exported less than error, then to the newly arrived line sampling that flows to, otherwise, unsample.
In order to achieve the above object, the present invention also provides a kind of method that adopts the stream sampling device of high speed network super connection host in detecting to flow sampling, and it is characterized in that: described method comprises following step:
(1) initialization operation: when begin each measuring period, m bit and the flow counter that newly flows bit vector memory in the detection module all is initialized as zero;
(2) detect stream newly: newly flow detection module and adopt hash function that each newly arrived grouping is detected, judge whether it is the grouping of a new stream, the follow-up grouping that will belong to this stream is simultaneously differentiated out, to guarantee " disposable " of stream sampling, promptly each stream is only carried out single sampling; Do not reach if newly flow to, continue to carry out this and detect new flow step; Reach step below carrying out successively again up to having newly to flow to;
(3) adjust sampling rate: error absorbs the new stream number that module has detected by the flow counter statistics, calculates the probability of error p of new stream detection module again, and then newly to be flowed the probability that detection module successfully detects be 1-p to each new stream; Then the flow counter in the new stream detection module is added 1, according to default sampling probability r the sampling probability of stochastic sampling module is adjusted into again
Figure C20061009932100112
To guarantee that the probability that any one new stream is sampled all equals r = ( 1 - p ) × r 1 - p , To satisfy the designing requirement of " equiprobability random sampling ";
(4) random sampling: stochastic sampling module produces a random number, is used for new stream is carried out random sampling; This random number is more than or equal to 0 and less than 1, if this random number is less than adjusted sampling probability
Figure C20061009932100122
Then newly arrived stream is sampled, otherwise unsample;
(5) according to the process of the statistic of flow counter decision stream sampling: if the upper limit numerical value that the statistic n of flow counter has reached setting (for example
Figure C20061009932100123
Capping numerical value is for the actual error probability that newly flows detection module and produced can being controlled at than within the fractional value, and approaches according to formula p = ( 1 - ( 1 - 1 m ) kn ) k The theoretical value of calculating realizes the equiprobability random sampling), then redirect execution in step (1) restarts new measuring period; Otherwise redirect execution in step (2) continues to handle next grouping.
Described step (2) further comprises following content of operation:
When (21) arriving new a grouping, earlier by k hash function arithmetic unit h in the new stream detection module at every turn 1, h 2..., h kCalculate k the corresponding hash function value h of this stream of packets sign f respectively 1(f), h 2(f) ..., h kAnd the bit vector memory of visiting the relevant position according to the result of calculation of hash function respectively (f);
(22) if the h in the bit vector memory 1(f), h 2(f) ..., h k(f) having a bit in the bit at least is " 0 ", judges that then this grouping belongs to a new stream, promptly has newly to flow to reach, and continues to carry out subsequent operation; Otherwise, return step (21), begin to handle next grouping;
(23) the h in the bit vector memory 1(f), h 2(f) ..., h k(f) in the bit for after the bit of " 0 " is changed to " 1 ", executable operations step (3).
Described step (3) further comprises following content of operation:
(31) numerical value in elder generation's statistics flow counter, promptly the new stream number n that has correctly been detected this moment then has kn bit at most and is " 1 " in the bit vector memory, and k is the number of hash function arithmetic unit;
(32) calculate the false dismissal probability p that error appears detecting in new stream detection module: p = ( 1 - ( 1 - 1 m ) kn ) k , Just be the probability of " 1 " in the bit vector memory with the pairing k of a new stream bit of current arrival, m is the number of bits of bit vector memory in the formula; Then newly to be flowed the probability that detection module successfully detects be 1-p to each new stream;
(33) flow counter is added 1, i.e. n=n+1;
(34) according to default sampling probability r the sampling probability of stochastic sampling module is adjusted into
Figure C20061009932100132
The present invention is a kind of stream sampling device of high speed network super connection host detection and implementation method of stream sampling thereof of being used for, technical main innovation is that the function of " equiprobability random sampling " is separated from hash function itself, and finish this function by the stochastic sampling module of special setting, so just can weaken requirement to the even random Harsh attribute of hash function, make this method can select for use the fast hash function of computational speed to detect, thereby make apparatus of the present invention and method have the memory space of 10Gbps linear speed disposal ability and less complexity, can also realize being independent of the equiprobability random sampling of traffic identifier.And, apparatus of the present invention (having only three control modules) simple in structure, working stability, safety, reliable, with low cost, therefore, the present invention can be widely used in now and in following high speed network environment, have good popularization and application prospect.
Description of drawings
Fig. 1 is that the structure that the present invention is used for the stream sampling device that high speed network super connection host detects is formed schematic diagram.
Fig. 2 is the operating procedure block diagram that the present invention is used for the stream sampling implementation method of the stream sampling device that high speed network super connection host detects.
Embodiment
For making the purpose, technical solutions and advantages of the present invention clearer, the present invention is realized that further the detailed description of details, last result by experiment come the advantage of this method as directed below in conjunction with drawings and Examples.
Referring to Fig. 1, introduce the structure composition that the present invention is used for the stream sampling device of high speed network super connection host detection, this device absorbs module by new stream detection module, error and three modules of stochastic sampling module are formed.The function that wherein is arranged on the new stream detection module of the porch of edge, network of local area network (LAN) or core network inside is to adopt hash function to detect to the IP grouping of input, judged whether newly to flow to reach, reach if having newly to flow to, the traffic identifier that then will be somebody's turn to do new stream is delivered to stochastic sampling module, so that this is flow to line sampling.The function of stochastic sampling module is to absorb the sampling probability generation random number that module is calculated according to error, the output sampling stream so that new stream is carried out random sampling, but the traffic identifier of the sampling of carrying out this moment and each new stream is irrelevant.Because there is measure error (being that some new stream can not be judged by it) in new stream detection module, the function that error absorbs module is the new stream number that has detected of adding up by the flow counter in the new stream detection module, calculate the probability of error of new stream detection module, adjust sampling rate according to the probability of error again, make because of newly flowing the randomness that detection module does not judge new stream the undetected errors that causes, the randomness that can be sampled digests, to offset the detection error of new stream detection module, satisfy the designing requirement of " equiprobability random sampling ".
Wherein newly flowing detection module is by k hash function arithmetic unit h independent of each other 1, h 2..., h kBit vector memory, a NAND gate logic and a flow counter of (the k value is the natural number more than or equal to 2, k=3 among Fig. 1) and a m bit being linked in sequence with it are formed, and wherein the codomain of each hash function all is { 1,2 ..., m}, natural number m is much larger than k, and m can be 10 7
Hash function arithmetic unit of the present invention recommends to select H 3Hash function is because H 3Hash function is by following linear transformation: b 1 b 2 . . . b z = r 11 r 12 . . . r 1 w r 21 r 22 . . . r 2 w . . . . . . . . . . . . r z 1 r z 2 . . . r zw a 1 a 2 . . . a w , The string of binary characters A=a of w bit 1a 2A wBe mapped as the string of binary characters B=b of z bit 1b 2B z, matrix element r wherein Ij=0 or 1, the subscript i of matrix row, column sequence number, the span of j are respectively: 1,2 ..., z} and 1,2 ..., w}.
Because H 3The hash function arithmetic unit only by " with ", two kinds of logic gates of distance form, promptly it is to adopt to finish forming with door and two kinds of devices of XOR gate of matrix multiplication, add operation fast, can realize k parallel H at an easy rate 3Hash function calculates, and processing speed can reach nanosecond.
Because each stream only takies k bit of bit vector memory, the needed memory space of apparatus of the present invention is just fewer like this, can make the bit vector memory with SRAM.According to current semiconductor technology, even the capacity of (on-chip) SRAM also can reach 10Mbits in the sheet.For example, most of modern Commercial field programmable logic array FPGA comprises the embedded on-chip SRAM of a plurality of dual-ports.For example the Virtex-4 product of Xilinx company can comprise 552 18kbits dual-port SRAM modules, the nearly 10Mbits of total capacity at most.For the concurrent access of k Hash address is provided, bit vector memory of the present invention adopts high speed multi-port SRAM module or a plurality of parallel dual-port SRAM to make.Because present on-chip SRAM access speed reached for 1~2 nanosecond, and outer (off-chip) SRAM of sheet also can reach for 2~5 nanoseconds.The time that therefore whole device is handled a grouping can be controlled in about 10 nanoseconds, enough supported the linear speed of IP grouping on the 10Gbps link to handle.
Stochastic sampling module of the present invention is a random number generator, and each is absorbed the sampling probability p that module is calculated by error sGenerate between a setting district [0,1) in random number r nIf r n<p s, the newly arrived stream of then sampling, otherwise, unsample.It should be noted that: the random number of Chan Shenging does not rely on newly arrived traffic identifier here, therefore the content of traffic identifier does not influence the randomness of the random number that produces, therefore, whether the new stream of each among the present invention is the equiprobability random sampling that is independent of its traffic identifier by sampling.
Though the present invention and prior art all are to use hash function to detect new stream, the difference between them is tangible.Usually, the traditional methods of sampling based on Hash is to attempt to realize the equiprobability random sampling by hash function.But, since unreasonable for the hypothesis of the even random Harsh attribute of hash function, cause this prior art can't be used for high speed network environment.Innovation part of the present invention is that its uses hash function just to reach in order to have judged whether newly to flow to, and the even random Harsh attribute of hash function is not had specific (special) requirements; And finish the sampling of convection current by the special stochastic sampling module that be provided with, that be independent of traffic identifier of the present invention, thus weakened even random Harsh attribute specification to hash function, guarantee under high speed network environment, also can realize the equiprobability random sampling.
The present invention discloses the method that a kind of stream sampling device that is used for the high speed network super connection host detection flows sampling again: by new stream detection module the grouping that each arrives stream sampling device is detected earlier, judge whether this grouping belongs to a new stream; If this grouping belongs to new stream, promptly represent to have newly to flow to reach, then absorb the detection probability of error that module is calculated new stream detection module, and adjust the sampling probability of this new stream according to this numerical value by error; Then, absorb the sampling probability generation random number that module is calculated by stochastic sampling module according to error, to determine whether to sample this stream.
Referring to Fig. 2, introduce the concrete operations step that the present invention flows the methods of sampling:
(1) initialization operation: when begin each measuring period, m bit and the flow counter that newly flows bit vector memory in the detection module all is initialized as zero;
(2) detect stream newly: newly flow detection module and adopt hash function that each newly arrived grouping is detected, judge whether it is the grouping of a new stream, and the follow-up grouping that will belong to this stream is simultaneously differentiated out.Owing to the traffic identifier of the grouping that belongs to same stream is all identical, therefore have only when first grouping of a stream arrives and just can be considered to newly flow to reach, and when the follow-up grouping that belongs to this stream arrives, because corresponding bit all is set to " 1 " in the bit vector memory, so can not be construed to is the new stream that arrives, thereby guarantee " disposable " of stream sampling, promptly each stream is only carried out single sampling.
This step can also be subdivided into following content of operation:
When (21) arriving new a grouping, earlier by k hash function arithmetic unit h in the new stream detection module at every turn 1, h 2..., h kCalculate k the corresponding hash function value h of this stream of packets sign f respectively 1(f), h 2(f) ..., h kAnd the bit vector memory of visiting the relevant position according to the result of calculation of hash function respectively (f);
(22) if the h in the bit vector memory 1(f), h 2(f) ..., h k(f) having a bit in the bit at least is " 0 ", judges that then this grouping belongs to a new stream (promptly have newly to flow to and reach), after flow counter adds 1, carries out subsequent operation; Otherwise, return step (21), begin to handle next grouping;
(23) the h in the bit vector memory 1(f), h 2(f) ..., h k(f) in the bit for after the bit of " 0 " is changed to " 1 ", executable operations step (3).
(3) adjust sampling rate: error absorbs the new stream number that module has detected by the flow counter statistics, calculates the undetected error probability p of new stream detection module again, and then newly to be flowed the probability that detection module successfully detects be 1-p to each new stream; According to default sampling probability r the sampling probability of stochastic sampling module is adjusted into then To guarantee that the probability that any one new stream is sampled all equals r = ( 1 - p ) × r 1 - p , To satisfy the designing requirement of " equiprobability random sampling ".
(key operation of the inventive method) is elaborated to this step below:
Continuous arrival along with new stream, increasing bit is set to " 1 " in the bit vector memory, therefore following situation might appear: when first grouping of a new stream arrives, because hash-collision, just in time the cryptographic Hash that traffic identifier calculated with other stream that arrives in advance is identical for the k of its traffic identifier hash function value, thereby be set to " 1 " because of the arrival of other stream by k bit in the bit vector memory that this new stream determined, therefore, in this case, should correctly be detected with regard to can newly not flowing detection module by new stream, detect error thereby produce.
If one newly flows to when reaching, newly flowing the correct detected stream number of detection module is n (value in the flow counter just), then has kn bit to be " 1 " in the bit vector memory at most.Suppose that cryptographic Hash evenly distributes, then according to probability theory, the probability that is " 1 " by k the bit that new stream determined in the bit vector memory is p = ( 1 - ( 1 - 1 m ) kn ) k , The false dismissal probability that error appears detecting in promptly new stream detection module is p, and therefore newly to be flowed the detected probability of detection module success be 1-p to a new stream.
Therefore, the operation of this step is: the sampling probability of adjusting stochastic sampling module is
Figure C20061009932100172
(r is the default sampling probability of the inventive method) can guarantee that like this probability that any one new stream is sampled by this method all equals r (promptly
Figure C20061009932100173
), thereby satisfied " equiprobability random sampling " demand.Require cryptographic Hash evenly to distribute in theory though calculate the mathematical formulae of p, when the stream number n of detection less than the numerical value of certain setting (for example n < 3 5 m ) time, using existing hash function, the actual error probability that newly flows detection module and produced is and formula p = ( 1 - ( 1 - 1 m ) kn ) k The theoretical value of calculating is consistent.Document " actual performance of Bloom filter and parallel text are searched " (" Practical performance of bloom filters andparallel free-text searching " Communications of the ACM, 1989,32 (10): 1237~1239.) this has been done introduce and verify.
(31) numerical value in elder generation's statistics flow counter, promptly the new stream number n that has correctly been detected this moment then has kn bit at most and is " 1 " in the bit vector memory, and k is the number of hash function arithmetic unit;
(32) suppose that cryptographic Hash evenly distributes, then calculate the false dismissal probability p that error appears detecting in new stream detection module according to probability theory: p = ( 1 - ( 1 - 1 m ) kn ) k , Just be the probability of " 1 " in the bit vector memory with the pairing k of a new stream bit of current arrival, m is the number of bits of bit vector memory in the formula; Then newly to be flowed the probability that detection module successfully detects be 1-p to each new stream;
(33) flow counter is added 1, i.e. n=n+1;
(34) according to default sampling probability r the sampling probability of stochastic sampling module is adjusted into
Figure C20061009932100182
(4) random sampling: the sampling probability that stochastic sampling module is adjusted according to abovementioned steps (3) produces random number, is used for new stream is carried out random sampling;
(5) according to the process of the statistic of flow counter decision stream sampling: if the upper limit numerical value that the statistic n of flow counter has reached setting (for example
Figure C20061009932100183
Capping numerical value is for the actual error probable value that can will newly flow detection module be produced is controlled at than within the fractional value, and this numerical value is approached according to formula p = ( 1 - ( 1 - 1 m ) kn ) k The theoretical value of calculating realizes the equiprobability random sampling), then redirect execution in step (1) restarts new measuring period; Otherwise redirect execution in step (2) continues to handle next grouping.
Method of the present invention has been carried out experimental demonstration.The actual internet data that this experiment internet usage data analysis cooperative association (CAIDA) and U.S. application network research National Laboratory (NLANR) provide, experimental data relevant information such as following table:
In the experiment, use document " new fast detecting super-spreader's flow algorithm " (" New streamingalgorithms for fast detection of superspreaders " In Proceedings of the 12thAnnual Network and Distributed System Security Symposium, San Diego, California, 2005) the one-level filter algorithm (One level Filtering) in detects super connection host and compares with the stream sampling algorithm.Suppose that the stream number surpasses k sMain frame be super connection host, according to the given parameter b of user (b>1), if stream number less than k sThe main frame of/b is judged as super connection host, then is called " flase drop ".Document " new fast detecting super-spreader's flow algorithm " proves: for any given degree of belief δ (0<δ<1), the one-level filter algorithm can be selected suitable sampling rate automatically, reaches " false drop rate " all less than δ to guarantee " loss ".But its hypothesis prerequisite is the stream sampling will satisfy " equiprobability random sampling ".In experiment, use respectively based on Hash stream sampling (abbreviating " aging method " as) and the inventive method (abbreviating " new method " as) and realize the one-level filter algorithm, and these two kinds of sampling algorithms all use H 3Hash function.
Experimental result data is as shown in the table:
Figure C20061009932100191
As can be seen: when the traffic identifier sequence was the linear flow sign, " loss " and " false drop rate " that produced when using new method was all less than its theoretical value (3%).And use " loss " that aging method produced under most of situation, all to surpass its corresponding theory value, and " loss " during than new method exceed 2 to 9 percentage points, and this is very high numerical value for Secure Application.Though use " false drop rate " that aging method produced still exceeds 10 to 40 times than new method less than theoretical value, this means to be used for safeguarding that the memory space of suspicion super connection host is big 10 to 40 times.Though the absolute value of " false drop rate " is smaller, because the enormous amount of normal flow, the consumption of the caused memory space of flase drop also is appreciable.Produce when using aging method than the reason of mistake rate just as previously described: aging method can not carry out the equiprobability random sampling to the linear flow identifier.For the situation of stochastic flow identifier, the testing result when using two sampling algorithms all meets theoretical value.It should be noted that: the result under the situation of use result that new method produced and linear flow identifier is significantly difference not, illustrates that new method can both satisfy " equiprobability random sampling " to linear and stochastic flow identifier, has adaptability preferably.
The result of the test explanation, experiment of the present invention is successful, has realized goal of the invention, this stream sampling device and using method thereof have good application prospects.

Claims (8)

1, a kind of stream sampling device that is used for the high speed network super connection host detection, it is characterized in that: described device comprises following three modules:
New stream detection module, this module is by a plurality of hash function arithmetic unit h independent of each other 1, h 2..., h kForm with bit vector memory, NAND gate and the flow counter of the m position bit that is linked in sequence with it, be arranged on the edge of local area network (LAN), the porch or the core network inside of network, IP grouping to input adopts hash function to detect, judged whether newly to flow to reach, reach if having newly to flow to, then the traffic identifier that will flow is delivered to stochastic sampling module, so that this stream is carried out the equiprobability random sampling, treats after error absorption module is calculated the probability of error of the stream detection module that makes new advances flow counter to be added 1;
Stochastic sampling module, the result that the sampling probability that calculates according to the random number and the error absorption module of this module generation compares carries out random sampling to new stream, and with output sampling stream, but the traffic identifier of the sampling of this moment and each new stream has nothing to do;
Error absorbs module, connects above-mentioned two modules respectively, by the new stream number of the flow counter statistics in the new stream detection module, calculates the probability of error p of new stream detection module, adjusts sampling rate according to default sampling probability r and is
Figure C2006100993210002C1
And export to stochastic sampling module, to satisfy the designing requirement of " equiprobability random sampling ".
2, stream sampling device according to claim 1, it is characterized in that: in the described new stream detection module, the codomain of each hash function of hash function arithmetic unit all is { 1,2, ..., m}, the number of bits m of the number k of hash function arithmetic unit and bit vector memory is all natural number, the k value is the natural number more than or equal to 2, and m is greater than k.
3, stream sampling device according to claim 2 is characterized in that: described hash function arithmetic unit is by the H that forms with door and two kinds of logical devices of XOR gate that can finish matrix multiplication, add operation 3The hash function arithmetic unit, H 3Hash function is by the linear transformation of following equation: b 1 b 2 &CenterDot; &CenterDot; &CenterDot; b z = r 11 r 12 &CenterDot; &CenterDot; &CenterDot; r 1 w r 21 r 22 &CenterDot; &CenterDot; &CenterDot; r 2 w &CenterDot; &CenterDot; &CenterDot; &CenterDot; &CenterDot; &CenterDot; &CenterDot; &CenterDot; &CenterDot; &CenterDot; &CenterDot; &CenterDot; r z 1 r z 2 &CenterDot; &CenterDot; &CenterDot; r zw a 1 a 2 &CenterDot; &CenterDot; &CenterDot; a w , The string of binary characters A=a of w bit 1a 2A wBe mapped as the string of binary characters B=b of z bit 1b 2B z, matrix element r wherein Ij=0 or 1, the subscript i of representing matrix row, column sequence number, the span of j are respectively: 1,2 ..., z} and 1,2 ..., w}.
4, stream sampling device according to claim 2, it is characterized in that: the bit vector memory of described m bit is multi-port SRAM device or two and above parallel dual-port SRAM device, so that the processing time of each grouping was no more than for 10 nanoseconds, can support the linear speed of IP grouping on the 10Gbps link to handle.
5, stream sampling device according to claim 1, it is characterized in that: described stochastic sampling module is a random number generator, each is absorbed sampling probability that module calculates by error generate between a setting district [0,1) Nei random number is used to realize being independent of the equiprobability random sampling of traffic identifier; If this random number absorbs the sampling probability that module is exported less than error, then to the newly arrived line sampling that flows to, otherwise, unsample.
6, a kind of stream sampling device that is used for the high speed network super connection host detection according to claim 1 flows the method for sampling, and it is characterized in that: described method comprises following step:
(1) initialization operation: when begin each measuring period, m bit and the flow counter that newly flows bit vector memory in the detection module all is initialized as zero;
(2) detect stream newly: newly flow detection module and adopt hash function that each newly arrived grouping is detected, judge whether it is the grouping of a new stream, the follow-up grouping that will belong to this stream is simultaneously differentiated out, to guarantee " disposable " of stream sampling, promptly each stream is only carried out single sampling; Do not reach if newly flow to, continue to carry out the step that this detects new stream; Reach up to having newly to flow to, carry out subsequent step more successively;
(3) adjust sampling rate: error absorbs the new stream number that module has detected by the flow counter statistics, calculates the probability of error p of new stream detection module again, and then newly to be flowed the probability that detection module successfully detects be 1-p to each new stream; Then the flow counter in the new stream detection module is added 1, according to default sampling probability r the sampling probability of stochastic sampling module is adjusted into again To guarantee that the probability that any one new stream is sampled all equals r = ( 1 - p ) &times; r 1 - p , To satisfy the designing requirement of " equiprobability random sampling ";
(4) random sampling: stochastic sampling module produces a random number, is used for new stream is carried out random sampling; This random number is more than or equal to 0 and less than 1, if this random number is less than adjusted sampling probability
Figure C2006100993210004C2
Then newly arrived stream is sampled, otherwise unsample;
(5) process of sampling according to the statistic decision stream of flow counter: if the statistic n of flow counter has reached the upper limit numerical value of setting, then redirect execution in step (1) restarts new measuring period; Otherwise redirect execution in step (2) continues to handle next grouping.
7, the method that flows sampling according to claim 6 is characterized in that: described step (2) further comprises following content of operation:
When (21) arriving new a grouping, earlier by k hash function arithmetic unit h in the new stream detection module at every turn 1, h 2..., h kCalculate k the corresponding hash function value h of this stream of packets sign f respectively 1(f), h 2(f) ..., h kAnd the bit vector memory of visiting the relevant position according to the result of calculation of hash function respectively (f);
(22) if the h in the bit vector memory 1(f), h 2(f) ..., h k(f) having a bit in the bit at least is " 0 ", judges that then this grouping belongs to a new stream, promptly has newly to flow to reach, and continues to carry out subsequent operation; Otherwise, return step (21), begin to handle next grouping;
(23) the h in the bit vector memory 1(f), h 2(f) ..., h k(f) in the bit for after the bit of " 0 " is changed to " 1 ", executable operations step (3).
8, the method that flows sampling according to claim 6 is characterized in that: described step (3) further comprises following content of operation:
(31) numerical value in elder generation's statistics flow counter, promptly the new stream number n that has correctly been detected this moment then has kn bit at most and is " 1 " in the bit vector memory, and k is the number of hash function arithmetic unit;
(32) calculate the false dismissal probability p that error appears detecting in new stream detection module: p = ( 1 - ( 1 - 1 m ) kn ) k , Just be the probability of " 1 " in the bit vector memory with the pairing k of a new stream bit of current arrival, m is the number of bits of bit vector memory in the formula; Then newly to be flowed the probability that detection module successfully detects be 1-p to each new stream;
(33) flow counter is added 1, i.e. n=n+1;
(34) according to default sampling probability r the sampling probability of stochastic sampling module is adjusted into
Figure C2006100993210005C1
CNB2006100993217A 2006-07-17 2006-07-17 Stream sampling device and method for detecting high speed network super connection host Expired - Fee Related CN100477668C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2006100993217A CN100477668C (en) 2006-07-17 2006-07-17 Stream sampling device and method for detecting high speed network super connection host

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2006100993217A CN100477668C (en) 2006-07-17 2006-07-17 Stream sampling device and method for detecting high speed network super connection host

Publications (2)

Publication Number Publication Date
CN1901545A CN1901545A (en) 2007-01-24
CN100477668C true CN100477668C (en) 2009-04-08

Family

ID=37657283

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2006100993217A Expired - Fee Related CN100477668C (en) 2006-07-17 2006-07-17 Stream sampling device and method for detecting high speed network super connection host

Country Status (1)

Country Link
CN (1) CN100477668C (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101309179B (en) * 2007-05-18 2011-03-16 北京启明星辰信息技术股份有限公司 Real-time flux abnormity detection method on basis of host activity and communication pattern analysis
US9112771B2 (en) 2009-02-06 2015-08-18 The Chinese University Of Hong Kong System and method for catching top hosts
CN107196826A (en) * 2017-07-12 2017-09-22 东南大学 A kind of network flow programming method algorithm based on sampling
CN114826955B (en) * 2022-05-26 2023-03-21 电子科技大学 Dynamic grouping sampling method for service flow in IPv6 network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种基于的流信息抽样测量框架及算法. 张,峰,谭兴晔,雷振明.计算机应用研究,第6期. 2005 *
基于BF的单向网络性能抽样测量技术. 张,峰,雷振明.吉林大学学报,第23卷第3期. 2005 *

Also Published As

Publication number Publication date
CN1901545A (en) 2007-01-24

Similar Documents

Publication Publication Date Title
CN103179132B (en) A kind of method and device detecting and defend CC attack
Cai et al. Enhancing network capacity by weakening community structure in scale-free network
Lu et al. A memory-efficient parallel string matching architecture for high-speed intrusion detection
US20130081136A1 (en) Method and device for detecting flood attacks
CN106330906A (en) Method for detecting DDoS (Distributed Denial of Service) attack in big data environment
CN108183917B (en) DDoS attack cross-layer cooperative detection method based on software defined network
CN104836702A (en) Host network abnormal behavior detection and classification method under large flow environment
CN100477668C (en) Stream sampling device and method for detecting high speed network super connection host
Liao et al. Feature extraction and construction of application layer DDoS attack based on user behavior
CN106330611A (en) Anonymous protocol classification method based on statistical feature classification
CN105939340A (en) Method and system for discovering hidden conficker
CN106096406B (en) A kind of security breaches backtracking analysis method and device
Che et al. KNEMAG: key node estimation mechanism based on attack graph for IOT security
CN112333128B (en) Web attack behavior detection system based on self-encoder
Yeom et al. LSTM-based collaborative source-side DDoS attack detection
CN112637104B (en) Abnormal flow detection method and system
CN103166965A (en) Multi-source network coding pollution defense method based on subspace attributes
CN106603294A (en) Comprehensive vulnerability assessment method based on power communication network structure and state
CN100461091C (en) Methods and systems for content detection in a reconfigurable hardware
Fei et al. A survey of internet worm propagation models
Zhan et al. Adaptive detection method for Packet-In message injection attack in SDN
Sabhanatarajan et al. A resource efficient content inspection system for next generation Smart NICs
CN105791039A (en) Method and system for detecting suspicious tunnel based on characteristic fragment self-discovery
Yue et al. A detection method for I-CIFA attack in NDN network
CN115119209A (en) Real-time intelligent attack method based on integration strategy for RPL network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20090408

Termination date: 20140717

EXPY Termination of patent right or utility model