CN105187411B - A kind of method of distribution abnormality detection network data flow - Google Patents

A kind of method of distribution abnormality detection network data flow Download PDF

Info

Publication number
CN105187411B
CN105187411B CN201510506829.3A CN201510506829A CN105187411B CN 105187411 B CN105187411 B CN 105187411B CN 201510506829 A CN201510506829 A CN 201510506829A CN 105187411 B CN105187411 B CN 105187411B
Authority
CN
China
Prior art keywords
feature
test point
string
characteristic
data flow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510506829.3A
Other languages
Chinese (zh)
Other versions
CN105187411A (en
Inventor
蓝友枢
陈健
张章学
叶松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
FUJIAN STRAIT INFORMATION Corp
Original Assignee
FUJIAN STRAIT INFORMATION Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by FUJIAN STRAIT INFORMATION Corp filed Critical FUJIAN STRAIT INFORMATION Corp
Priority to CN201510506829.3A priority Critical patent/CN105187411B/en
Publication of CN105187411A publication Critical patent/CN105187411A/en
Application granted granted Critical
Publication of CN105187411B publication Critical patent/CN105187411B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The present invention relates to a kind of methods of distributed abnormality detection network data flow, include the following steps:Step S1:If a node of the connecting interface of terminal device and network is common node, each test point monitors an ordinary node, whether there is abnormal data packet at ordinary node to detect;Step S2:The abnormality detection of stratification is carried out using three kinds of detection methods, first layer detection uses the matched method for detecting abnormality of feature based, second layer detection uses the method for detecting abnormality based on statistical analysis, third layer to use the method for detecting abnormality based on machine learning and data mining:Step S3:Information sharing between being detected a little using common node, then the feature database of all test points effectively updated.The present invention can reduce the quantity of test point, while make each test point load balancing, it is possibility to have effect protection terminal device, and the quantity wrapped extremely in network is reduced, and ensure the precision of data.

Description

A kind of method of distribution abnormality detection network data flow
Technical field
The present invention relates to network technique field, especially a kind of method of distributed abnormality detection network data flow.
Background technology
Network data throat floater refers to being impacted to network transmission, cause terminal user can not normal use the phenomenon that. Data flow anomaly can cause the performance of network a degree of influence, consume device resource, net is possibly even caused when serious Network is paralysed.Such as:Distributed denial of service attack, attacker take some by the computer of more different IP addresses of control Business device is attacked, and by sending a large amount of data message, is not only made server resource utilization rate excessively high, is also taken up Netowrk tape Width causes network congestion or even servers go down.So that the normal request of normal users is unable to get the response of server.Cause This, the exception of detection network data flow appearance simultaneously takes certain measure to be handled, and just seems particularly heavy to guarantee network security It wants.
According to the difference of the test point deployed position of abnormality detection in the area research, can be divided into single-point abnormality detection and Distributed abnormality detection.
The abnormality detection of single-point be by the configuration detecting system on individual host, to identify the exception occurred on host, It can be directed to host and carry out careful analysis extremely and send out alarm, and the host to detecting mininet has preferable extremely Performance.But with the continuous expansion of network size, the cost for each host being arranged detection node is too high, and drawback highlights.And This mode can not awareness network flow information, host can only be detected extremely.
Distributed abnormality detection is widely used in net network safety filed as a kind of method of abnormality detection.Its master It is that multiple test points and a processing center are arranged in the network to be monitored to want thought, is completed by mutual collaborative work different The detection of regular data stream.The network data flow of the mode detection part of generally use sampling Detection is judged by handling and analyzing Abnormal data stream situation in whole network.
The exemplary distribution formula logical construction of the abnormality detection of network data flow mainly has two major classes.The first kind is distributed Data collection, centralized data processing;Second class is the data collection and processing of layered structure.
The distributed data collection of the first kind, as shown in figure 4, the Data processing of centralization needs two class monitoring points:It visits Survey node and analysis node.Wherein probe node is responsible for collecting network information, and center analysis node is responsible for all calculating and is divided Analysis task.The accuracy that the advantages of this structure is mainly reflected in data is relatively high, because directly being transferred to after the completion of data collection Analysis node processing, does not pass through other approach.But the disadvantage is that an analysis node is responsible for multiple collection nodes, cause point Analyse the overload of node.
The data collection and processing of second class layered structure need to configure multiple analysis nodes, as shown in figure 5, passing through node Depth be divided into upper level node and bottom layer node.Bottom analysis node collects that treated that information is exported to high-rise analysis node, Analysis associated with each other is carried out by high-rise analysis node.The advantages of this structure, is embodied in the structure using layering so that Ge Gefen The load of analysis node substantially reduces.But the disadvantage is that needing to pass by analyzing processing layer by layer and to upper layer node after data collection It passs so that often pass through one layer, the accuracy of data just decreases, and it is difficult accurately to current net to lead to high-rise analysis node Network situation is judged.
After being disposed to test point, the technology carried out abnormality detection on single test point is broadly divided into three classes:
1, the matched abnormality detection technology of feature based, main thought are:The spy of lower known exception is stored in feature database Data are levied, test point is matched with feature database by the individual features attribute of extraction data flow to determine the exception of data flow Situation.The matching is divided into as fuzzy matching and exactly matches.Fuzzy matching, when likelihood is more than specified threshold, then it is assumed that abnormal. It exactly matches, only when likelihood reaches a hundred percent, just regards as exception.This method can be according to existing different Regular data stream feature database detects exception present in feature database, has higher accuracy and reliability.But this method Known exception can only be detected, the unknown abnormal data stream of None- identified.
2, the abnormality detection technology based on statistical analysis, main thought are:Data in statistics network in a period of time Stream information establishes certain mathematical model, and statistical analysis obtains a threshold value, to distinguish normal stream and exception stream.Such as:It obtains All data packets that certain time segment data stream carries, are obtained by calculation the frequency of occurrences of data packet, and frequency is more than certain threshold value person It is considered as normal flow, is otherwise considered as exception.This method need not know the concrete behavior feature of abnormal data stream in advance, can To take precautions against emerging exception.But the difficult point for being determined to become this method of rational threshold value, threshold value setting is too high, can lead to one A little exceptions can not detected, and system rate of failing to report increases;Threshold value sets the too low erroneous judgement that can lead to normal flow, i.e., it is certain just Regular data stream can be taken as dealing of abnormal data.
3, the abnormality detection technology based on machine learning and data mining, main thought are:By acquiring a large amount of number According to being analyzed, obtain abnormal conditions.The method of data mining of today there are many, such as sorting algorithm, aggregating algorithm, god Through network algorithm, pattern analysis etc..This method can both detect known exception stream, can also test position exception stream, Analysis can be associated to mass data.But the data volume of processing is larger, and it is computationally intensive, more system money can be consumed Source.
Invention content
In view of this, the object of the present invention is to provide a kind of method of distributed abnormality detection network data flow, protecting Distributed Detection network data flow can be realized while terminal.This method can effectively reduce the abnormal packet in whole network, and Prevent terminal from receiving abnormal packet;Test point is placed in the connector of terminal and network, the load of test point can be effectively reduced, then The cost that detection node hardware makes can reduce, while detection is integrated in one with analysis, is not in the data precision of layering It loses.
The present invention is realized using following scheme:A kind of method of distribution abnormality detection network data flow, specifically include with Lower step:
Step S1:Dispose N number of test point:If a node of the connecting interface of terminal device and network is common node, institute Except test point is stated independently of the common node;The test point is connected with the common node, each test point monitoring one Ordinary node, to detect the abnormal data packet of the terminal device transmission at ordinary node while prevent the terminal from receiving network In abnormal data packet;
Step S2:Test point monitors abnormal network data flow:The abnormality detection of stratification is carried out using three kinds of detection methods, First layer detection uses the matched method for detecting abnormality of feature based, second layer detection to use the abnormality detection based on statistical analysis Method, third layer use the method for detecting abnormality based on machine learning and data mining:
Step S3:Test point carries out information sharing:Information sharing between being detected a little using common node, then owned The feature database of test point is effectively updated.
Further, the step S1 specifically includes following steps:
Step S11:, that is, there is N in the router or interchanger for providing N number of terminal device and being connected with each terminal device A router or interchanger are respectively provided with a test point on each router or interchanger, form a detection point set D= {Di| i=1,2,3 ... N };
Step S12:Feature database design is carried out, the feature database includes feature string and characteristic value;The wherein described feature string is The character string of certain length, the characteristic value are that each character in the feature string carries out the result that exclusive or obtains;The spy It levies library design to design using chained list, the feature of each length is concatenated into a chained list;
Step S13:All test points are initialized, the feature database of the characteristic information of known abnormal data stream is arranged into inspection In measuring point, and the test point is connected on corresponding router or interchanger, then the test point passes through the road that is connected Network data flow is analyzed by device or interchanger.
Preferably, each node in network is common node, these monitored common nodes are had directly with terminal It connects connected.For the protection to terminal machine, and reduces and issue the exception being sent in whole network circuit by terminal machine Packet arranges that test point, a test point only bear this interface at the connecting interface of terminal machine and whole network Duty, reduces the load of test point, to reduce the hardware requirement to test point, achievees the effect that reduce cost.This test point The abnormal packet that can be not only sent with detection terminal machine, but also can prevent the terminal from receiving the exception come in automatic network Data packet considerably increases safety.
Further, first layer detection is specifically included using the matched method for detecting abnormality of feature based in the step S2 Following steps:
Step S211:Assuming that the characteristic set C={ C in feature databasei| i=0,1,2 ... } it indicates;
Step S212:When the test point receiving network data stream, i.e., what the described network data flow was detected by test point Router or interchanger when, the feature string and characteristic value of network data flow described in the test point extraction number, the feature String includes IP address, port numbers and protocol type;The feature string length is set as L, then by the total each word of the feature string Symbol phase exclusive or obtains a corresponding characteristic value T, and characteristic value T is obtained using following formula:The wherein described spy Sign string L is AB ...;
Step S213:Assuming that in feature database length and characteristic value T-phase with feature string have m, then j=1,2,3 ..., m, Then CijFor the characteristic value under the length;The test point judgesWhether it is zero, ifNot equal to 0, then the spy Sign string is mismatched with this feature value, ignores this feature string;IfEqual to 0, then the feature string is matched with this feature value, is protected This feature string is stayed, complete match next time is carried out;
Step S214:By the screening of the step S213, complete is carried out using BF algorithms to the feature string retained To match, the BF algorithms are specially to match the first character of target signature string with the first character of pattern feature string, Judge whether the two is equal;If equal, continue second character of target signature string described in comparison and the pattern feature string Second character;If unequal, second character of the target signature string and the of the pattern feature string One character, compares down successively, until obtaining last matching result;If the target signature string and the pattern feature string Successful match, then the data flow is abnormal data stream, and next layer of detection is entered if non-successful match.
Particularly, in the detection process of this layer, the feature of abnormal data stream characteristic information is preserved due to first establishing Library.The characteristic information of abnormal data stream may include the load characteristic of message, transmission feature, connection feature etc..When data circulate When crossing test point, test point extracts the characteristic information of the data flow first, the method by exactly matching, by itself and feature Existing characteristic attribute is matched in library, if matched likelihood reaches absolutely, assert that the data flow is abnormal Data flow gives a warning, and network manager is reminded to carry out abnormality processing.Known exception can simply be removed using this method Stream, and remaining data flow then carries out next layer of detection.
Further, second layer detection divides method for detecting abnormality to specifically include using based on statistical analysis in the step S2 Following steps:
Step S21:Test point carries out characteristic information acquisition, the characteristic information to the network data flow that needs are detected Including source IP, destination IP, source port, destination interface and data capacity size.
Step S222:Define dijIndicate the size of j-th of value of ith feature information;Define a feature set Ss={ dij|i =1,2,3 ... m;J=1,2,3 ... yi, Y={ yi| i=1,2,3 ... m } be m characteristic information statistics collection, yiExpression is counting Time interval in unduplicated feature i number;Define ith feature entropy H (hi)
Wherein SiIndicate total packet count of ith feature in the time interval of statistics;
Step S223:Using weighting averaging method eliminate statistics when extraneous factor interference caused by entropy with Machine fluctuates, and finds out the overall development trend of Characteristic Entropy, to predict that the Characteristic Entropy trend in next statistics time interval, prediction are public Formula is as follows:
yt=axt-1+(1-a)yt-1,
Wherein ytWith yt-1It is in t moment and t-1 moment, the predicted value of Characteristic Entropy respectively, a is weighted factor its value Jie Between 0 to 1, xt-1It is in t-1 moment, the measured value of Characteristic Entropy;
Step S224:According to the predictor formula in the step S223, then weighting averaging can be used to normal data The characteristic parameter of stream is predicted, remembers that the actual characteristic entropy of ith feature in the time interval counted at t-th is Hti, in t The predicted characteristics entropy of ith feature is in the time interval of a statisticsThen
Expression formula substitute into above formula successively, be unfolded:
The wherein definition of k is:K statistics time interval is chosen as an observation cycle, value needs in an experiment into Row adjustment arrives optimum value to choose;The initial value in corresponding observation cycle, can directly with the observation at the moment into Row is equivalent;A is to be chosen according to the variation characteristic of corresponding observation interval;
Step S225:The standard deviation of the Characteristic Entropy of ith feature is denoted as σ in t-th of time intervalti, t-th of time The normal interval of the Characteristic Entropy of ith feature is denoted as T in intervalti, then:
If the value of the Characteristic Entropy of ith feature is fallen in t-th of statistics time intervalWith When, then the network data flow in the time interval is normal flow, and otherwise the network data flow in the time interval is abnormal The abnormal data stream is entered third layer and is detected by data flow.
Particularly, it can only be detected using the matched method for detecting abnormality of feature based due to first layer existing in feature database It is abnormal, so it is not necessarily normal flow not regard as abnormal data flow by first layer, it may be possible to which new exception then needs Want can to carry out the abnormality detection based on statistical analysis of the second layer.This method needs to carry out modeling analysis to historical record, obtains One threshold interval that can distinguish normal flow and abnormal data stream.After test point gets data on flows, test point Can be for statistical analysis to obtained data, the result that analysis obtains is compared with threshold value, is flowed into when to actual data The result of row analysis is not fallen in threshold interval, then abnormal data stream is identified as, into third layer anomaly analysis;Otherwise assert For normal flow.
Further, third layer detection uses the abnormality detection side based on machine learning and data mining in the step S2 Method includes machine learning and data mining two parts;When the machine learning carries out the detection of abnormal data stream, by history number According to study constantly update feature database, and then improve to the detectability of abnormal data stream;The data mining is using cluster point Analysis algorithm realizes that the abnormal data stream detection of third layer specifically includes following steps:
Step S231:The number k for inputting cluster includes the database of n object, exports k cluster, and the k poly- Class meets the standard of variance minimum;
Step S232:The k centers as initial clustering are arbitrarily selected from n data object;
Step S233:According to the center object of each clustering object, calculate each object and the center to object away from From repartitioning corresponding object according to minimum range;
Step S234:Recalculate the center object of the cluster changed
Step S235:Standard test functions are calculated, if the test function is restrained, termination algorithm;It is unsatisfactory for, returns The step S233.
Particularly, since the method for detecting abnormality based on statistical analysis that the second layer uses can only exclude some frequencies of occurrences Deviate the abnormal data stream of normal frequency threshold interval, and for emerging abnormal data stream, the discrimination degree of second layer detection It is relatively low.So in data flow for reaching third layer, it is more normal to occur frequency, but in feature database again without pair Answer the exception of feature, it is therefore desirable to this part exception is found by data mining, and machine is allowed to learn to arrive these exceptions automatically, and Shi Gengxin feature databases, excavation detects possible new abnormal data stream, and updates the feature database of current detection point.
Further, after three layers of detection in the step S2, if test point detects that abnormal data stream is known , then test point directly notifies that network manager is handled;If the existing data mining of abnormal data outflow that test point detects New off-note is obtained with machine learning, then feature database is updated and realizes that the test point in the step S3 carries out letter Breath is shared, specifically includes following steps:
Step S31:The unique characteristics library of test point to analyzing new off-note is updated and expands, and calculates first The length L for going out new feature string, the feature string is referred in the chained list branch of the length, obtain further according to exclusive or described in The characteristic value of feature string is inserted into the chained list of individual features value;
Step S32:After the update of the feature database of the test point, using breadth-first search BFS by new spy Reference breath is transmitted in the feature database of all test points.
Preferably, the breadth-first search BFS specifically includes following steps:
Step S321:Using feature database at first newer test point as the root node of BFS;
Step S322:Using queue structure, the root node is put into queue first;
Step S323:A node is taken out from the queue, if the node is test point, and feature database does not carry out also more Newly, then its feature database is updated;If the node is ordinary node, pass the information on down;
Step S324:Judge whether institute's queue is empty, and if it is empty then the feature database update of test point is completed in whole network, Terminate, otherwise return to step S323.
Preferably, the step S3 using common node be detected a little between information sharing so that all test points Feature database can effectively be updated.There is new exception when detecting point discovery, the common knot that can be monitored by the test point Point sends out new exception information;Then according to the theory of figure, using breadth first search, other test points in network is made all to obtain The new exception information is obtained, to achieve the purpose that the information sharing between each test point.It is simply that using common Node transmits information, achievees the purpose that all test point feature databases of update.
Compared with prior art, the beneficial effects of the invention are as follows terminal is protected, reduce the quantity wrapped extremely in network, While reducing test point, reduces the load of test point and ensure that the precision of data.
Description of the drawings
Fig. 1 is the step flow chart of distributed network abnormality detection data flow method of the present invention.
Fig. 2 is the specific signal of the test point deploying step in distributed network abnormality detection network data flow of the present invention Figure.
Fig. 3 is the flow of the abnormal step of test point monitoring in distributed network abnormality detection network data flow of the present invention Figure.
Fig. 4 is the first kind of distributed network in background of the present invention:Distributed data collection, centralized data processing Schematic diagram.
Fig. 5 is the second class of distributed network in background of the present invention:The schematic diagram of data collection and the processing of layered structure.
Specific implementation mode
The present invention will be further described with reference to the accompanying drawings and embodiments.
The present embodiment provides a kind of methods of distributed abnormality detection network data flow, as shown in Figure 1, Figure 2 and Figure 3, tool Body includes the following steps:
Step S1:Dispose N number of test point:If a node of the connecting interface of terminal device and network is common node, institute Except test point is stated independently of the common node;The test point is connected with the common node, each test point monitoring one Ordinary node, to detect the abnormal data packet of the terminal device transmission at ordinary node while prevent the terminal from receiving network In abnormal data packet;
Step S2:Test point monitors abnormal network data flow:The abnormality detection of stratification is carried out using three kinds of detection methods, First layer detection uses the matched method for detecting abnormality of feature based, second layer detection to use the abnormality detection based on statistical analysis Method, third layer use the method for detecting abnormality based on machine learning and data mining:
Step S3:Test point carries out information sharing:Information sharing between being detected a little using common node, then owned The feature database of test point is effectively updated.
In the present embodiment, the step S1 specifically includes following steps:
Step S11:, that is, there is N in the router or interchanger for providing N number of terminal device and being connected with each terminal device A router or interchanger, as soon as a test point is respectively provided on each router or interchanger, then there will be N number of test point D forms a detection point set D={ Di| i=1,2,3 ... N };For arranging test point in the entire network, cloth is carried out in this way It sets, the quantity of test point can be greatly reduced;
Step S12:Feature database design is carried out, the feature database includes feature string and characteristic value;The wherein described feature string is The character string of certain length, the characteristic value are that each character in the feature string carries out the result that exclusive or obtains;The spy It levies library design to design using chained list, the feature of each length is concatenated into a chained list;
Step S13:All test points are initialized, the feature database of the characteristic information of known abnormal data stream is arranged into inspection In measuring point, and the test point is connected on corresponding router or interchanger, then the test point passes through the road that is connected Network data flow is analyzed by device or interchanger.
In the present embodiment, during forming feature database matched side twice is used in order to accelerate matched speed Formula:The length of character string is matched for the first time, and carries out the matching of characteristic value, unmatched feature is largely affirmed in removal String;BF matching algorithms are taken to carry out complete match for the second time.Therefore, feature database design is designed using chained list, the spy of each length Sign is concatenated into a chained list, and each character string uses indexed mode, key assignments to be characterized value, and the characteristic value of each character string is each Character is different or, obtaining one eight ASCII characters.
In addition, if test point setting is very little, the load of single test point is bigger, therefore each in the present embodiment straight The common node setting test point being connected with terminal is connect, the effect not only detected in this way is more satisfactory, greatly reduces test point Number so that the overall cost of detection reduces;And from the angle of protection terminal, terminal can be effectively prevent to receive abnormal Packet, while reducing the quantity wrapped extremely in whole network again.
In the present embodiment, first layer detection is specific using the matched method for detecting abnormality of feature based in the step S2 Include the following steps:
Step S211:Assuming that the characteristic set C={ C in feature databasei| i=0,1,2 ... } it indicates;
Step S212:When the test point receiving network data stream, i.e., what the described network data flow was detected by test point Router or interchanger when, the feature string and characteristic value of network data flow described in the test point extraction number, the feature String includes IP address, port numbers and protocol type;The feature string length is set as L, then by the total each word of the feature string Symbol phase exclusive or obtains a corresponding characteristic value T, and characteristic value T is obtained using following formula:The wherein described spy Sign string L is AB ...;
First, the test point with feature string length carry out fast search, find meet the length fraction it is extremely special Sign string, can directly ignore most feature string, greatly improve matching speed, and due to only comparing a number, then technical speed is very Soon, the efficiency of whole system is had no effect on;Then, in the calculating and comparison procedure that carry out characteristic value, due to using hardware realization Then calculating speed is also very fast, it is the same can to filter out a small amount of characteristic value from relying in the fraction feature string that length screens Feature string;
Step S213:Assuming that in feature database length and characteristic value T-phase with feature string have m, then j=1,2,3 ..., m, Then CijFor the characteristic value under the length;The test point judgesWhether it is zero, ifNot equal to 0, then the spy Sign string is mismatched with this feature value, ignores this feature string;IfEqual to 0, then the feature string is matched with this feature value, is protected This feature string is stayed, complete match next time is carried out;
Step S214:By the screening of the step S213, complete is carried out using BF algorithms to the feature string retained To match, the BF algorithms are specially to match the first character of target signature string with the first character of pattern feature string, Judge whether the two is equal;If equal, continue second character of target signature string described in comparison and the pattern feature string Second character;If unequal, second character of the target signature string and the of the pattern feature string One character, compares down successively, until obtaining last matching result;If the target signature string and the pattern feature string Successful match, then the data flow is abnormal data stream, and next layer of detection is entered if non-successful match.
Preferably, due in the detection process of the prior art, the matching of whole character strings is carried out, it will consumption is a large amount of System resource, seriously reduce the rate of system detectio.In true network, actually abnormal data flow only accounts for few Part.If all matched using character string, most of resource, which can be wasted in, matches normal data flow exhaustion.Therefore, In the present embodiment, there is certain special feature string in abnormal data stream, and characteristic value is different by all character strings in feature string One eight ASCII character values, this process obtained from or may be used hardware circuit and be quickly calculated;Wherein feature string For that can distinguish normal flow and the character string in a message of abnormal data stream, size is much smaller than entire message, therefore Only to being matched to some characteristic values in data flow, the matched time will be greatly reduced in this way, characteristic value T is formed and can be used as The important evidence of this layer of detection method matching judgment.
Particularly, in the present embodiment, the matched method for detecting abnormality of the feature based is realized using following code:
In the present embodiment, second layer detection use divides method for detecting abnormality based on statistical analysis in the step S2, During data flow is transmitted, comentropy H indicate information information contained amount number, then be directed to the data flow for needing to detect, it is right It carries out information collection, and the information of acquisition may include source IP, the amount of capacity etc. of destination IP, source port, destination interface, data Attribute;Assign the data flow of detection as discrete independent information source, assigns each attributive character as one group of chance event event, to it Carry out entropy analysis;The dispersion degree that comentropy can embody corresponding information source data flow either collects neutralization degree of scatter;
Then this method specifically includes following steps:
Step S21:Test point carries out characteristic information acquisition, the characteristic information to the network data flow that needs are detected Including source IP, destination IP, source port, destination interface and data capacity size.
Step S222:Define dijIndicate the size of j-th of value of ith feature information;Define a feature set Ss={ dij|i =1,2,3 ... m;J=1,2,3 ... yi, Y={ yi| i=1,2,3 ... m } be m characteristic information statistics collection, yiExpression is counting Time interval in unduplicated feature i number;Define ith feature entropy H (hi)
Wherein SiIndicate total packet count of ith feature in the time interval of statistics;
Such as m=5, hi(i=1,2,3 ..., 5) five features can be respectively represented:Source IP, destination IP, source Mouth, destination interface, protocol type;d12=100 meaning is:2nd value of first feature is 100, that is, the number of source IP The number occurred in second statistics time interval is 100 times;d23=125 indicate that the third value of second feature is 125, The number that i.e. destination IP occurs in third statistics time interval is 125 times;Pass throughEnergy The entropy of the entropy of source IP, the entropy of destination IP, the entropy of source port, the entropy of destination interface, protocol type is calculated, so that it may to be united Count the distribution situation of these features in time interval;
Step S223:Using weighting averaging method eliminate statistics when extraneous factor interference caused by entropy with Machine fluctuates, and finds out the overall development trend of Characteristic Entropy, to predict that the Characteristic Entropy trend in next statistics time interval, prediction are public Formula is as follows:
yt=axt-1+(1-a)yt-1,
Wherein ytWith yt-1It is in t moment and t-1 moment, the predicted value of Characteristic Entropy respectively, a is weighted factor its value Jie Between 0 to 1, xt-1It is in t-1 moment, the measured value of Characteristic Entropy;
Step S224:According to the predictor formula in the step S223, then weighting averaging can be used to normal data The characteristic parameter of stream is predicted, remembers that the actual characteristic entropy of ith feature in the time interval counted at t-th is Hti, in t The predicted characteristics entropy of i-th feature is in the time interval of a statisticsThen
Expression formula substitute into above formula successively, be unfolded:
The wherein definition of k is:K statistics time interval is chosen as an observation cycle, value needs in an experiment into Row adjustment arrives optimum value to choose, such as statistics time interval is 2 minutes, and observation cycle is 20 minutes, then k=20 ÷ 2= 10;It is the initial value in corresponding observation cycle, can be directly carried out with the observation at the moment equivalent;A is according to corresponding The variation characteristic of observation interval is chosen;
Step S225:The standard deviation of the Characteristic Entropy of ith feature is denoted as σ in t-th of time intervalti, t-th of time The normal interval of the Characteristic Entropy of ith feature is denoted as T in intervalti, then:
If the value of the Characteristic Entropy of ith feature is fallen in t-th of statistics time intervalWith When, then the network data flow in the time interval is normal flow, and otherwise the network data flow in the time interval is abnormal The abnormal data stream is entered third layer and is detected by data flow.
In short, the abnormality detection based on statistical analysis, thought is exactly in existing data prediction following a period of time The probability that each characteristic attribute occurs, if the probability very little that the characteristic attribute d of prediction occurs, and in following this time interval Interior, d but occurs, then it is believed that small probability event has occurred, is abnormal, is then judged that there is abnormal data stream.For Abnormal data flow, test point send out alarm, corresponding measure are made convenient for network administrator;For remaining data flow, into Enter third layer detection.
In the present embodiment, third layer detection is examined using based on machine learning and the abnormal of data mining in the step S2 Survey method includes machine learning and data mining two parts;When the machine learning carries out the detection of abnormal data stream, by going through Feature database is constantly updated in the study of history data, and then improves the detectability to abnormal data stream;The data mining is using poly- Alanysis algorithm realizes the abnormal data stream detection of third layer, specifically includes following steps:
Step S231:The number k for inputting cluster includes the database of n object, exports k cluster, and the k poly- Class meets the standard of variance minimum;In being clustered at this k, the similarity of the object in belonging to same and birdsing of the same feather flock together is higher, and belongs to not Similarity with the object in cluster is relatively low;And similarity is the " center pair obtained using the mean value of object in cluster As " be calculated;
Step S232:The k centers as initial clustering are arbitrarily selected from n data object;
Step S233:According to the center object of each clustering object, calculate each object and the center to object away from From repartitioning corresponding object according to minimum range;
Step S234:Recalculate the center object of the cluster changed
Step S235:Standard test functions are calculated, if the test function is restrained, termination algorithm;It is unsatisfactory for, returns The step S233.
Particularly, in the present embodiment, the value of k is most important in the cluster algorithm, and value determines poly- Class as a result, to determine algorithm realize quality, if the value of k it is inappropriate can cause excavate with cluster result it is undesirable It is even invalid;Specifically realized using following code:
Assuming that data flow is Sda={ x1,x2,x3…xn, cluster number is k:
For the cluster changed:Calculate new cluster centre
In the present embodiment, after three layers of detection in the step S2, if test point detects abnormal data, stream is Known, then test point directly notifies that network manager is handled;If the existing data of abnormal data outflow that test point detects It excavates and machine learning obtains new off-note, then feature database is updated and realizes that the detection in the step S3 clicks through Row information is shared, specifically includes following steps:
Step S31:The unique characteristics library of test point to analyzing new off-note is updated and expands, and calculates first The length L for going out new feature string, the feature string is referred in the chained list branch of the length, obtain further according to exclusive or described in The characteristic value of feature string is inserted into the chained list of individual features value;
Step S32:After the update of the feature database of the test point, using breadth-first search BFS by new spy Reference breath is transmitted in the feature database of all test points, can just make the detection process of this kind of exception stream next time significantly simple in this way Change.
In the present embodiment, preferably, during update herein and data sharing, it is contemplated that the dynamic routing of router It practises, and whole network is graphic structure, then uses the searching algorithm of figure;Again because test point is all located at the edge of network, because This specifically includes following steps using breadth-first search BFS, the breadth-first search BFS:
Step S321:Using feature database at first newer test point as the root node of BFS;
Step S322:Using queue structure, the root node is put into queue first;
Step S323:A node is taken out from the queue, if the node is test point, and feature database does not carry out also more Newly, then its feature database is updated;If the node is ordinary node, pass the information on down;
Step S324:Judge whether institute's queue is empty, and if it is empty then the feature database update of test point is completed in whole network, Terminate, otherwise return to step S323.
Particularly, in the present embodiment, the breadth-first search BFS specifically uses following code to realize:
S is initial point
R={ s }, Q={ s },
while
A node is selected from Q
If (this node is test point)
Then feature database updates
else
The foregoing is merely presently preferred embodiments of the present invention, all equivalent changes done according to scope of the present invention patent with Modification should all belong to the covering scope of the present invention.

Claims (7)

1. a kind of method of distribution abnormality detection network data flow, it is characterised in that include the following steps:
Step S1:Dispose N number of test point:If a node of the connecting interface of terminal device and network is common node, the inspection Except measuring point is independently of the common node;The test point is connected with the common node, and each test point monitoring one is common Node, to detect the abnormal data packet of the terminal device transmission at ordinary node while prevent the terminal from receiving in network Abnormal data packet;
Step S2:Test point monitors abnormal network data flow:Carry out the abnormality detection of stratification using three kinds of detection methods, first Layer detection uses the matched method for detecting abnormality of feature based, and when not detecting abnormal data stream, data flow enters the second layer Detection, second layer detection uses the method for detecting abnormality based on statistical analysis, when detecting abnormal data stream, the exception number Enter third layer according to stream to detect, third layer uses the method for detecting abnormality based on machine learning and data mining;
Step S3:Test point carries out information sharing:Information sharing between being detected a little using common node, then all detections The feature database of point is effectively updated.
2. a kind of method of distributed abnormality detection network data flow according to claim 1, it is characterised in that:The step Rapid S1 specifically includes following steps:
Step S11:, that is, there is N number of road in the router or interchanger for providing N number of terminal device and being connected with each terminal device By device or interchanger, it is respectively provided with a test point on each router or interchanger, forms a detection point set D={ Di| I=1,2,3 ... N };
Step S12:Feature database design is carried out, the feature database includes feature string and characteristic value;The wherein described feature string is certain The character string of length, the characteristic value are that each character in the feature string carries out the result that exclusive or obtains;The feature database Design is designed using chained list, and the feature of each length is concatenated into a chained list;
Step S13:All test points are initialized, the feature database of the characteristic information of known abnormal data stream is arranged into test point In, and the test point is connected on corresponding router or interchanger, then the test point passes through the router that is connected Or interchanger analyzes network data flow.
3. a kind of method of distributed abnormality detection network data flow according to claim 1, it is characterised in that:The step First layer detection specifically includes following steps using the matched method for detecting abnormality of feature based in rapid S2:
Step S211:Assuming that the characteristic set C={ C in feature databasei| i=0,1,2 ... } it indicates;
Step S212:When the test point receiving network data stream, i.e., routing that the described network data flow is detected by test point Device or interchanger when, the feature string and characteristic value of network data flow described in the test point extraction number, the feature string packet Include IP address, port numbers and protocol type;The feature string length is set as L, then by each character phase in the feature string Exclusive or obtains a corresponding characteristic value T, and characteristic value T is obtained using following formula:The wherein described feature string L For AB ...;
Step S213:Assuming that in feature database length and characteristic value T-phase with feature string have m, then j=1,2,3 ..., m, then Cij For the characteristic value under the length;The test point judgesWhether it is zero, ifNot equal to 0, then the feature string with This feature value mismatches, and ignores this feature string;IfEqual to 0, then the feature string is matched with this feature value, retains the spy Sign string, carries out complete match next time;
Step S214:By the screening of the step S213, complete match, institute are carried out using BF algorithms to the feature string retained It is specially to match the first character of target signature string with the first character of pattern feature string to state BF algorithms, judges two Whether person is equal;If equal, continue second character and the second of the pattern feature string of target signature string described in comparison A character;If unequal, the first character of second character and the pattern feature string of the target signature string Symbol, compares down, successively until obtaining last matching result;If the target signature string and the pattern feature String matching at Work(, then the data flow is abnormal data stream, and next layer of detection is entered if non-successful match.
4. a kind of method of distributed abnormality detection network data flow according to claim 1, it is characterised in that:The step Second layer detection divides method for detecting abnormality to specifically include following steps using based on statistical analysis in rapid S2:
Step S21:Test point carries out characteristic information acquisition to the network data flow that needs are detected, and the characteristic information includes Source IP, destination IP, source port, destination interface and data capacity size;
Step S222:Define dijIndicate the size of j-th of value of ith feature information;Define a feature set Ss={ dij| i=1, 2,3…m;J=1,2,3 ... yi, Y={ yi| i=1,2,3 ... m } be m characteristic information statistics collection, yiIndicate statistics when Between be spaced in unduplicated feature i number;Define ith feature entropy H (hi)
Wherein SiIndicate total packet count of ith feature in the time interval of statistics;
Step S223:The random wave of entropy caused by the interference of extraneous factor when counting is eliminated using the method for weighting averaging It is dynamic, the overall development trend of Characteristic Entropy is found out, to predict the Characteristic Entropy trend in next statistics time interval, predictor formula is such as Under:
yt=axt-1+(1-a)yt-1,
Wherein ytWith yt-1It is in t moment and t-1 moment, the predicted value of Characteristic Entropy respectively, a is weighted factor its value between 0 to 1 Between, xt-1It is in t-1 moment, the measured value of Characteristic Entropy;
Step S224:According to the predictor formula in the step S223, then weighting averaging can be used to normal flow Characteristic parameter is predicted, remembers that the actual characteristic entropy of ith feature in the time interval counted at t-th is Hti, unite at t-th The predicted characteristics entropy of ith feature is in the time interval of meterThen
Expression formula substitute into above formula successively, be unfolded:
The wherein definition of k is:K statistics time interval is chosen as an observation cycle, value needs to be adjusted in an experiment It is whole, arrive optimum value to choose;It is the initial value in corresponding observation cycle, can be directly carried out etc. with the observation at the moment Effect;A is to be chosen according to the variation characteristic of corresponding observation interval;
Step S225:The standard deviation of the Characteristic Entropy of ith feature is denoted as σ in t-th of time intervalti, in t-th of time interval The normal interval of the Characteristic Entropy of interior ith feature is denoted as Tti, then:
If the value of the Characteristic Entropy of ith feature is fallen in t-th of statistics time intervalWithWhen, then should Network data flow in time interval is normal flow, and otherwise the network data flow in the time interval is abnormal data stream, The abnormal data stream is entered third layer to be detected.
5. a kind of method of distributed abnormality detection network data flow according to claim 1, it is characterised in that:The step Third layer detection includes that machine learning is dug with data using the method for detecting abnormality based on machine learning and data mining in rapid S2 Dig two parts;When the machine learning carries out the detection of abnormal data stream, by constantly updating feature database to the study of historical data, And then raising is to the detectability of abnormal data stream;The abnormal number of third layer is realized in the data mining using cluster algorithm Following steps are specifically included according to stream detection:
Step S231:The number k for inputting cluster includes the database of n object, exports k cluster, and the k cluster is full The standard of sufficient variance minimum;
Step S232:The k centers as initial clustering are arbitrarily selected from n data object;
Step S233:According to the center object of each clustering object, calculate each object and the center to object at a distance from, root Corresponding object is repartitioned according to minimum range;
Step S234:Recalculate the center object of the cluster changed;
Step S235:Standard test functions are calculated, if the test function is restrained, termination algorithm;It is unsatisfactory for, returns described Step S233.
6. a kind of method of distributed abnormality detection network data flow according to claim 1, it is characterised in that:By institute After stating three layers of detection in step S2, if test point detects that abnormal data stream is known, test point directly notifies network Manager is handled;If the existing data mining of abnormal data outflow and machine learning that test point detects show that new exception is special Sign is then updated feature database and realizes that the test point in the step S3 carries out information sharing, specifically includes following steps:
Step S31:The unique characteristics library of test point to analyzing new off-note is updated and expands, and calculates first new Feature string length L, the feature string is referred in the chained list branch of the length, the feature obtained further according to exclusive or The characteristic value of string is inserted into the chained list of individual features value;
Step S32:After the update of the feature database of the test point, new feature is believed using breadth-first search BFS Breath is transmitted in the feature database of all test points.
7. a kind of method of distributed abnormality detection network data flow according to claim 6, it is characterised in that:It is described wide Degree first search algorithm BFS specifically includes following steps:
Step S321:Using feature database at first newer test point as the root node of BFS;
Step S322:Using queue structure, the root node is put into queue first;
Step S323:A node is taken out from the queue, if the node is test point, and feature database is not updated also, Then its feature database is updated;If the node is ordinary node, pass the information on down;
Step S324:Judge whether institute's queue is empty, and if it is empty then the feature database update of test point is completed in whole network, knot Beam, otherwise return to step S323.
CN201510506829.3A 2015-08-18 2015-08-18 A kind of method of distribution abnormality detection network data flow Active CN105187411B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510506829.3A CN105187411B (en) 2015-08-18 2015-08-18 A kind of method of distribution abnormality detection network data flow

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510506829.3A CN105187411B (en) 2015-08-18 2015-08-18 A kind of method of distribution abnormality detection network data flow

Publications (2)

Publication Number Publication Date
CN105187411A CN105187411A (en) 2015-12-23
CN105187411B true CN105187411B (en) 2018-09-14

Family

ID=54909255

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510506829.3A Active CN105187411B (en) 2015-08-18 2015-08-18 A kind of method of distribution abnormality detection network data flow

Country Status (1)

Country Link
CN (1) CN105187411B (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107153630B (en) * 2016-03-04 2020-11-06 阿里巴巴集团控股有限公司 Training method and training system of machine learning system
CN107391359B (en) * 2016-05-17 2020-11-27 腾讯科技(深圳)有限公司 Service testing method and device
CN105978897B (en) * 2016-06-28 2019-05-07 南京南瑞继保电气有限公司 A kind of detection method of electric power secondary system Botnet
CN106254321B (en) * 2016-07-26 2019-03-19 中国人民解放军防空兵学院 A kind of whole network abnormal data stream classification method
CN106330975A (en) * 2016-11-03 2017-01-11 上海三零卫士信息安全有限公司 Method for periodic exception detection based on SCADA system
CN106643907B (en) * 2017-01-16 2018-10-16 大连理工大学 Weighted principal component analyzing method for the identification of structure monitoring data exception
CN108737336B (en) * 2017-04-18 2021-01-15 中国移动通信有限公司研究院 Block chain-based threat behavior processing method and device, equipment and storage medium
CN107835201A (en) * 2017-12-14 2018-03-23 华中师范大学 Network attack detecting method and device
CN108829715B (en) * 2018-05-04 2022-03-25 慧安金科(北京)科技有限公司 Method, apparatus, and computer-readable storage medium for detecting abnormal data
CN108900542B (en) * 2018-08-10 2021-03-19 海南大学 DDoS attack detection method and device based on LSTM prediction model
CN109889619B (en) * 2019-01-28 2022-01-21 中国互联网络信息中心 Abnormal domain name monitoring method and device based on block chain
CN110175200A (en) * 2019-05-31 2019-08-27 国网上海市电力公司 A kind of abnormal energy analysis method and system based on intelligent algorithm
CN110213287B (en) * 2019-06-12 2020-07-10 北京理工大学 Dual-mode intrusion detection device based on integrated machine learning algorithm
CN111177513B (en) * 2019-12-31 2023-10-31 北京百度网讯科技有限公司 Determination method and device of abnormal access address, electronic equipment and storage medium
CN114070899B (en) * 2020-07-27 2023-05-12 深信服科技股份有限公司 Message detection method, device and readable storage medium
CN113890746B (en) * 2021-08-16 2024-05-07 曙光信息产业(北京)有限公司 Attack traffic identification method, device, equipment and storage medium
CN117412315B (en) * 2023-12-12 2024-03-15 深圳通诚无限科技有限公司 Wireless communication network data optimization method based on data analysis

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101848160A (en) * 2010-05-26 2010-09-29 钱叶魁 Method for detecting and classifying all-network flow abnormity on line
CN102801738A (en) * 2012-08-30 2012-11-28 中国人民解放军国防科学技术大学 Distributed DoS (Denial of Service) detection method and system on basis of summary matrices
CN104022999A (en) * 2013-09-05 2014-09-03 北京科能腾达信息技术股份有限公司 Network data processing method and system based on protocol analysis
CN104079452A (en) * 2014-06-30 2014-10-01 电子科技大学 Data monitoring technology and network traffic abnormality classifying method
CN104301895A (en) * 2014-09-28 2015-01-21 北京邮电大学 Double-layer trigger intrusion detection method based on flow prediction
CN104468631A (en) * 2014-12-31 2015-03-25 国家电网公司 Network intrusion identification method based on anomaly flow and black-white list library of IP terminal
CN104601553A (en) * 2014-12-26 2015-05-06 北京邮电大学 Internet-of-things tampering invasion detection method in combination with abnormal monitoring

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101848160A (en) * 2010-05-26 2010-09-29 钱叶魁 Method for detecting and classifying all-network flow abnormity on line
CN102801738A (en) * 2012-08-30 2012-11-28 中国人民解放军国防科学技术大学 Distributed DoS (Denial of Service) detection method and system on basis of summary matrices
CN104022999A (en) * 2013-09-05 2014-09-03 北京科能腾达信息技术股份有限公司 Network data processing method and system based on protocol analysis
CN104079452A (en) * 2014-06-30 2014-10-01 电子科技大学 Data monitoring technology and network traffic abnormality classifying method
CN104301895A (en) * 2014-09-28 2015-01-21 北京邮电大学 Double-layer trigger intrusion detection method based on flow prediction
CN104601553A (en) * 2014-12-26 2015-05-06 北京邮电大学 Internet-of-things tampering invasion detection method in combination with abnormal monitoring
CN104468631A (en) * 2014-12-31 2015-03-25 国家电网公司 Network intrusion identification method based on anomaly flow and black-white list library of IP terminal

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
无线传感器网络入侵检测研究;杨黎斌,慕德俊,蔡晓妍;《计算机应用研究》;20081130;第25卷(第11期);第3204-3208页 *

Also Published As

Publication number Publication date
CN105187411A (en) 2015-12-23

Similar Documents

Publication Publication Date Title
CN105187411B (en) A kind of method of distribution abnormality detection network data flow
CN105577679B (en) A kind of anomalous traffic detection method based on feature selecting and density peaks cluster
Xu et al. Efficient DDoS detection based on K-FKNN in software defined networks
Rajasegarar et al. Distributed anomaly detection in wireless sensor networks
CN107517216B (en) Network security event correlation method
CN113645232B (en) Intelligent flow monitoring method, system and storage medium for industrial Internet
CN108322347A (en) Data detection method, device, detection service device and storage medium
CN111310139B (en) Behavior data identification method and device and storage medium
CN108632269A (en) Detecting method of distributed denial of service attacking based on C4.5 decision Tree algorithms
CN106878995A (en) A kind of wireless sensor network Exception Type discrimination method based on perception data
CN108243060A (en) A kind of network security alarm risk determination method presorted based on big data
CN108076060A (en) Neutral net Tendency Prediction method based on dynamic k-means clusters
CN109120465A (en) Target area network topology division methods based on die body
CN102970692A (en) Method for detecting boundary nodes of wireless sensor network event
CN107248996A (en) A kind of detection of DNS amplification attacks and filter method
CN109194608A (en) Event detecting method is gathered around in a kind of ddos attack based on stream and sudden strain of a muscle
CN110336789A (en) Domain-flux Botnet detection method based on blended learning
CN110493260A (en) A kind of network flood model attack detection method
CN108494594A (en) A kind of analysis method and system of EIGRP route networks failure
CN113645182A (en) Random forest detection method for denial of service attack based on secondary feature screening
CN108156114A (en) The key node of power information physical system network attack map determines method and device
CN109450957A (en) A kind of low speed Denial of Service attack detection method based on cloud model
CN106060039A (en) Classification detection method facing network abnormal data flow
CN108769042B (en) Network security risk assessment method based on differential manifold
RU180789U1 (en) DEVICE OF INFORMATION SECURITY AUDIT IN AUTOMATED SYSTEMS

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant