A kind of wavelet analysis boundary processing method based on traffic statistics
Technical field
The present invention utilizes wavelet analysis to find the solution a kind of boundary processing method based on traffic statistics in Hurst (self similar parameter) parametric procedure.Be mainly used in BORDER PROCESSING based on signal in the Hurst parametric solution process of wavelet analysis, judge whether to take place DDoS (Distributed Denial of Service according to the Hurst parameter value then, distributed denial of service) attacks, belong to the network security technology field.
Background technology
Along with network becomes people's life and work necessary part, network security also just more and more becomes the network application key point.From that day that network is popularized, the network crime did not just stop, and the trend that grows in intensity is arranged on the contrary.The research of ddos attack is become the focus and the field, forward position of current domestic and international network security research, caused the very big concern of national governments, scientific research department and industry.Server is taken precautions against ddos attack and mainly contained two aspects at present: ddos attack detects with ddos attack and filters.Detection is the prerequisite of filtering, so the efficient that ddos attack detects is to take precautions against the key of attacking.Ddos attack detects the method for the main extraction packet off-note that adopts and based on the statistics detection technique its defective is arranged all at present, along with the maturation of network traffics self similarity model theory, people propose Network Based from the ddos attack detection means of acquaintance like model.
At the beginning of the nineties, Leland proposes real network traffics to Bellcore (Bell Laboratory) and has self-similarity on the statistical significance.After this, many researchers measure other network again, as Paxson WAN (wide area network) network is measured, wherein comprise many TCP (transmission control protocol) arrival process; Crovella etc. have observed WWW (World Wide Web (WWW)) business, show that all the real network Business Stream all having correlation in the scope for a long time, promptly is a long correlation.Therefore, the current network business is more suitable for adopting the model with long-range dependence, describes as the self similar processes model.The self-similarity of Model of network traffic is because the heavy-tailed distribution character of file causes, and when ddos attack takes place, attacks the transmission of packet with normal tcp data bag in the blocking network.At this moment, the self similarity performance of network can reduce.When attack made that network almost completely blocks, the packet in the network all was the packet of ddos attack basically.At this moment, network traffics will trend towards the Poisson distribution process, and promptly Hurst trends towards 0.5.That is to say that when ddos attack takes place, the Hurst parameter will have comparatively obvious variation.From the variation of Hurst value, whether we just can detect ddos attack has taken place.
Shown in the ddos attack detection model of flow self-similarity Network Based that flow self similarity model Network Based detects the process of ddos attack.At first the feature of packet is extracted, be sent to then in the feature database, judge whether to have taken place ddos attack after utilizing the information Hurst value be stored in the feature database to ask for by packet capture.Its key content is found the solution the Hurst parameter exactly, so the speed of the accuracy of Hurst parametric solution and degree has determined accuracy and efficient that ddos attack detects.Extraction for the Hurst coefficient, forefathers have provided a lot of methods, as R/S (the scale weighted error is analyzed again) method, variance-time method etc., but the appearance of small echo has brought new method for this problem, this is to carry out on yardstick and time-domain because be when utilizing small echo to signal analysis, especially it has getting in touch of nature in the yardstick consistency of multiple dimensioned characteristic and self similar processes, so in the signature analysis to self similar processes, small echo becomes inevitable choice.
The fast wavelet transform algorithm (being called the Mallat algorithm) that Mallat proposes has been realized the conversion of wavelet analysis from mathematics to the technology, has established the status of wavelet analysis as quick computational tool.In actual application, the process of wavelet decomposition is to calculate the convolution of input signal and analysis filter earlier, then convolution results is carried out the dot interlace sampling, to form the high and low frequency signal.In the multiple dimensioned decomposition of Mallat algorithm signal, primary signal S obtains detail signal d after by high pass filter Hi_D down-sampling, by obtaining approximate signal a behind the low pass filter Lo_D down-sampling, constantly pairing approximation signal a decomposes and obtains multiple dimensioned detail coefficients d
J, k(suppose that the wavelet filter length that is adopted is WLen, but its maximum decomposed class then
J is a decomposed class, j ∈ [1, MaxLev], k is for decomposing the signal length that decomposites in the progression at each grade):
Wherein, Ψ
J, k (t)Be that wavelet function Ψ (t) is expanded and translation,
Be two to advance orthogonal wavelet, its regularity index is R.D so
J, kVariance be
Var[d
j,k]=Var[2
j(H+1/2)d
0,k]
=2
j(2H+1)Var[d
0,k]
=C2
j(2H+1)
=C2
jγ (1)
After the data on (1) formula both sides were taken the logarithm, carrying out under the meaning of mean square error minimum that linear fit obtains with j was independent variable, with log
2 Var[dj, k]Be the straight line of function, slope is γ.Utilize γ=2H+1 just can try to achieve the value of H.
Sum up, it is as follows that small echo is found the solution the algorithm flow concrete grammar of Hurst index:
(1) primary signal initialization.
(2), obtain and preserve the multiple dimensioned detail coefficients d of corresponding each yardstick to signal S with small echo high pass, low pass filter carry out convolution and the down-sampling branch is clipped to detail signal d and approximate signal a
J, k
(3) judge whether it is afterbody, if then carried out for (4) step, otherwise carried out for (2) step.
(4) under the meaning of mean square error minimum, carry out the linear fit parameter Estimation.
(5) calculate the Hurst parameter.
Using Mallat algorithm wavelet decomposition concerning the endless signal, when finite digital signal is made wavelet transformation, owing to do not know the exact value of signal back, so must expand original signal in some way.BORDER PROCESSING mode commonly used at present has zero continuation, periodic extension, symmetric extension etc., and establishing primary signal length is NLen, filter length WLen, and the signal length that need carry out convolution is NLen+Wlen-1.
For zero continuation, the border zero padding of original signal.
Periodic extension: original signal is regarded as periodic signal.
Symmetric extension: original signal is carried out the mirror reflection continuation at the end points place.The difference of setting according to reflected ray has two kinds of methods, and the one, reflected ray passes end points, and the 2nd, reflected ray is in the centre of two points.The continuation method of left end point among first method such as the figure, second method is shown in the continuation method of right endpoint.Which kind of method no matter, its signal all is to be symmetry shape to distribute.
Which kind of method all can not the real signal message of right-on reaction, so also just can accurately not solve the Hurst parameter, has influenced the accuracy that ddos attack detects.
Summary of the invention
Technical problem: the purpose of this invention is to provide a kind of wavelet analysis boundary processing method, solve the Hurst parameter and accurately find the solution problem, accurately detect ddos attack thereby solve based on traffic statistics.The method that the application of the invention proposes detecting in the ddos attack based on Self-similarity Theory, can solve the Hurst parameter more accurately, judges whether to take place ddos attack exactly.
Technical scheme: method of the present invention is on the basis of flow self-similarity principle Network Based, utilizes traffic statistics to carry out BORDER PROCESSING, improves a kind of method of Hurst parametric solution precision, makes the ddos attack detection accuracy higher.
Packet capture and extraction module carry out quick feature extraction to the packet after catching, and the traffic characteristic in the packet is extracted, and store the input information that goes in the feature database with as the Hurst parametric solution into.The data packet flow characteristic information that the data packet flow characteristic extracting module is responsible for catching stores in the feature database, and attack detectors reading flow measure feature information from feature database is carried out check and analysis.The feature database between plays the function of buffer-stored.From the position of feature database system of living in, what it was mainly finished is to approach database function, but because the real-time that detects, and the data of this database storage will be fully in internal memory, rather than form storage file.Because the real-time requirement that detects, characteristic value was stored in the feature database after a period of time, concerning detector, belonged to historical data, there are not much purposes, therefore, feature database itself should have automatic refresh function, finish to historical data refresh removal automatically, be unlikely to undue increase with the scale that keeps feature database.In the overall framework of feature database, the traffic characteristic PktHdr that left side characteristic extracting module extracts is delivered in the feature database and preserves.The inner chain sheet form that adopts of feature database.The right attack detectors is extracted the characteristic value sequence of certain-length from feature database, attack detectors has certain call format to data computing, therefore must just can import initial data into detector through processing and use for analyzing.The input data of detector exactly with initial data according to uniformly-spaced dividing the time of advent, each time interval is as a time slot, add up received length of data package summation in this time slot, with received length of data package summation in this time slot of statistics as signal values to be detected such as this time slots.Carry out the attack detecting analytical calculation through carrying out the Hurst parametric solution after the preliminary treatment.
In this method, (the fast wavelet transform algorithm that Mallat proposes has been realized the conversion of wavelet analysis from mathematics to the technology to use the Mallat algorithm.In actual application, the process of wavelet decomposition is to calculate the convolution of input signal and analysis filter earlier, then convolution results is carried out the dot interlace sampling, to form the high and low frequency signal.Use Mallat algorithm wavelet decomposition concerning the endless signal, when finite digital signal is made wavelet transformation, owing to do not know the exact value of signal back, so must expand original signal in some way), carrying out discrete wavelet decomposes, in multistage decomposable process, each grade signal is carried out traffic statistics, utilize its average discharge, establish through preliminary treatment and wait for that detection signal is S=[S1 as the continuation signal, S2, ... Sn], wherein signal length is n, S1, S2, ..., Sn is the flow that identical appointed interval came out in the time, wavelet filter length is WLen, but its maximum decomposed class then
Specifically comprise the steps:
1). the statistics [S1, S2 ... Sn] total flow, computational methods are as follows:
Sum=S1+S2+...+Sn; Sum is a total flow;
2). calculate average discharge, computational methods are as follows:
Average=sum/n; Average is an average discharge:
3). signal S is carried out boundary extension, and method is as follows;
[S1, S2 ... Sn, average, average ... average], use average discharge average to carry out boundary extension, signal length is n+WLen after the continuation;
4). to signal S with small echo high pass, low pass filter carry out convolution and the down-sampling branch is clipped to detail signal d and approximate signal a;
5) pairing approximation signal a=[α
J, 1, α
J, 2... α
J, nj], wherein j is a decomposed class, j ∈ [1, MaxLev] adopts processing method identical from the step 1) to the step 3):
● at first add up total flow sum=α
J, 1+ α
J, 2+ ...+α
J, njSum is the total flow of approximate signal a,
Its length is nj, represents the length of the approximate signal a that the j level is decomposed, and j is a decomposed class;
● calculate average discharge average=sum/nj;
● pairing approximation signal a prolongs to draw together and is [α
J, 1, α
J, 2... α
J, nj, average, average ... average],
Use average discharge average to carry out boundary extension, signal length is nj+WLen after the continuation.
It is as follows that signal carries out pretreated method:
With initial data according to uniformly-spaced dividing the time of advent, each time interval is as a time slot, add up received length of data package summation in this time slot, with the statistics this time slot in received length of data package summation as signal values to be detected such as this time slots, suppose to have chosen altogether m time slot, then pretreated signal S=[S1, S2 ... Sn].(for example: every 10ms signal is added up once, chosen 1024 time slots altogether, promptly add up altogether 1024 times, obtain waiting for that detection signal is S=[S1, S2 ... S1024]).
It is as follows to judge whether to take place the ddos attack method:
Utilize traffic statistics to carry out the multiple dimensioned detail coefficients d that BORDER PROCESSING obtained and preserved corresponding each yardstick
1, k1, d
2, k2... d
I, ki... d
MaxLev, kMaxLev, k wherein
iFor decomposing the signal length that decomposites in the progression, wherein at each grade
Under the meaning of mean square error minimum with 1,2 ... MaxLev carries out the linear fit parameter Estimation, calculates the Hurst parameter, can judge whether to take place ddos attack according to Hurst parameter size, and determination methods is as follows:
● when the Hurst parameter greater than 0.6 the time, think that network traffics are normal,
● when the Hurst parameter changes, provide " DDoS may take place " warning information between 0.5-0.6,
● when the Hurst parameter less than 0.5 the time, then judge ddos attack taken place.
Beneficial effect: the present invention proposes a kind of wavelet analysis boundary processing method, being mainly used in to solve utilizes wavelet transformation to find the solution the complexity problem of accuracy in the Hurst parametric procedure, the method that the application of the invention proposes can be avoided over the blindness of boundary processing method, the truth that reflects network signal better judges whether to take place ddos attack exactly.Below we provide specific description:
Accuracy:, do not consider the concrete condition of signal for zero continuation; For periodic extension, can only reflected signal situation at first; For symmetric extension, can only will finish that a part of situation by institute's intercept signal.Utilization can well reflect all features of the signal that is intercepted in certain time period based on the wavelet analysis boundary processing method of traffic statistics, improves the accuracy of Hurst parametric solution, thereby judges whether to take place ddos attack exactly.
Description of drawings
Fig. 1 is based on the ddos attack detection model of network traffics self-similarity.
Fig. 2 is the multiple dimensioned decomposition of Mallat algorithm signal.
Fig. 3 is the overall framework of feature database.
Embodiment
The wavelet analysis boundary processing method that the present invention is based on traffic statistics is used the Mallat algorithm, carry out discrete wavelet and decompose, in multistage decomposable process, each grade signal is carried out traffic statistics, utilize its average discharge as the continuation signal, if wait for that through preliminary treatment detection signal is S=[S1, S2 ... Sn], wherein signal length is n, S1, S2 ..., Sn is the flow that identical appointed interval came out in the time, wavelet filter length is WLen, then its maximum can be separated progression
Specifically comprise the steps:
1). the statistics [S1, S2 ... Sn] total flow, computational methods are as follows:
Sum=S1+S2+...+Sn; Sum is a total flow;
2). calculate average discharge, computational methods are as follows:
Average=sum/n; Average is an average discharge:
3). signal S is carried out boundary extension, and method is as follows;
[S1, S2 ... Sn, average, average ... average], use average discharge average to carry out boundary extension, signal length is n+WLen after the continuation;
4). to signal S with small echo high pass, low pass filter carry out convolution and the down-sampling branch is clipped to detail signal d and approximate signal a;
5) pairing approximation signal a=[a
J, 1, α
J, 2... a
J, nj], wherein j is a decomposed class, j ∈ [1, MaxLev] adopts processing method identical from the step 1) to the step 3):
● at first add up total flow sum=α
J, 1+ α
J, 2+ ...+α
J, njSum is the total flow of approximate signal a,
Its length is nj, represents the length of the approximate signal a that the j level is decomposed, and j is a decomposed class;
● calculate average discharge average=sum/nj;
● pairing approximation signal a prolongs to draw together and is [a
J, i, α
J, 2... α
J, nj, average, average ... average],
Use average discharge average to carry out boundary extension, signal length is nj+WLen after the continuation.
For convenience of description, our supposition has following application example:
Every 10ms signal is added up once, added up altogether 1024 times, obtain waiting for that detection signal is S=[S1, S2 ... S1024], the Symlets small echo Sym3 that adopts, its filter length is 6, but its maximum decomposed class is MaxLev=7, and the present invention is specific as follows:
1). the statistics [S1, S2 ... S1024] total flow, computational methods are as follows:
Sum=S1+S2+...+S1024; Sum is a total flow;
2). calculate average discharge, computational methods are as follows:
Average=sum/1024; Average is an average discharge;
3). signal S is carried out boundary extension, and method is as follows;
[S1, S2...S1024, average, average ... average], use average discharge average to carry out boundary extension, signal length is 1030 after the continuation;
4) to signal S with small echo high pass, low pass filter carry out convolution and the down-sampling branch is clipped to detail signal d and approximate signal a;
5) pairing approximation signal a=[a
J, 1, a
J, 2... ..a
J, nj], wherein j is a decomposed class, j ∈ [1, MaxLev] adopts processing method identical from the step 1) to the step 3):
● at first add up total flow sum=a
J, 1+ a
J, 2+ ...+a
J, njSum is the total flow of approximate signal a, and its length is nj, represents the length of the approximate signal a that the j level is decomposed, and j is a decomposed class;
● calculate average discharge average=sum/nj;
● pairing approximation signal a prolongs to draw together and is [a
J, 1, a
J, 2... a
J, nj, average, average ... average], use average discharge average to carry out boundary extension, signal length is nj+WLen after the continuation.
Utilize traffic statistics to carry out the multiple dimensioned detail coefficients d that BORDER PROCESSING obtained and preserved corresponding each yardstick
1, k1, d
2, k2... d
I, ki... d
MaxLev, kMaxtev(k wherein
iFor decomposing the signal length that decomposites in the progression, wherein at each grade
With 1,2......7 carries out the linear fit parameter Estimation under the meaning of mean square error minimum, calculates the Hurst parameter.Can judge whether to take place ddos attack according to Hurst parameter size.Determination methods is as follows:
● when the Hurst parameter greater than 0.6 the time, think that network traffics are normal.
● when the Hurst parameter changes, provide " DDoS may take place " warning information between 0.5-0.6.
● when the Hurst parameter less than 0.5 the time, then judge ddos attack taken place.