CN101516099B - Test method for sensor network anomaly - Google Patents

Test method for sensor network anomaly Download PDF

Info

Publication number
CN101516099B
CN101516099B CN2009100615378A CN200910061537A CN101516099B CN 101516099 B CN101516099 B CN 101516099B CN 2009100615378 A CN2009100615378 A CN 2009100615378A CN 200910061537 A CN200910061537 A CN 200910061537A CN 101516099 B CN101516099 B CN 101516099B
Authority
CN
China
Prior art keywords
node
high dimensional
dimensional data
nodes
bunch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2009100615378A
Other languages
Chinese (zh)
Other versions
CN101516099A (en
Inventor
刘文予
蒋洪波
舒乐
张松涛
刘文平
陈金华
刘军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN2009100615378A priority Critical patent/CN101516099B/en
Publication of CN101516099A publication Critical patent/CN101516099A/en
Application granted granted Critical
Publication of CN101516099B publication Critical patent/CN101516099B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Mobile Radio Communication Systems (AREA)
  • Arrangements For Transmission Of Measured Signals (AREA)

Abstract

The invention relates to a test method for sensor network anomaly, which clusters the network. A cluster head in each cluster collects high dimensional data sequences of all nodes in every unit time of the cluster; a hidden Markov model building method is adopted for building the high dimensional data transition model of the nodes in each unit time, takes the similarity of the models as the classification benchmark, classifies the high dimensional data transition model of all nodes, collects all high dimensional data sequences in the unit time of the high dimensional data transition model of the nodes of the same type, and again the hidden Markov model building method is adopted for building new high dimensional data transition models of the nodes which are used for anomaly detection in the cluster. The method fully uses the relativity of the time and space of the data the in the sensor network, effectively reduces the data redundancy and communication overhead, prolongs the service life of the nodes of the sensors, thus achieving the purpose of anomaly detection.

Description

A kind of test method for sensor network anomaly
Technical field
The present invention relates to wireless sensor network, particularly relate to the method for wireless sensor network abnormality detection.
Background technology
In wireless sensor network, the space or the temporal correlation of height generally all arranged between a large amount of original sensor datas, all these full data transmission had both been wasted energy to base-station node (Sink) also there is no need.Data fusion (data aggregation, or be called convergence) main thought be exactly to carry out interactively cooperative cooperating between the node on the transmission path, greatly reduce data volume by removing the correlation redundancy, thereby with still less the same or suitable amount of information of data representation.
The achievement in research of the current existing part of data fusion of carrying out based on sensor network nodes data space correlation or temporal correlation, but a theoretical system do not formed.Especially all isolate at the utilization of spatial coherence and temporal correlation and come, with spatial coherence and temporal correlation organically, the combining of system, thereby hindered the further raising of data fusion efficient;
Existing data fusion model all be the part independently, to the explanation and the modeling of relation between each part and multi-level information fusion neither one science.In fact the correlation between the data has level, and be to weaken gradually along with the increase of distance, also have correlation between each part, existing research all is that zone of deterministic delimitation or boundary are the part, and the data of only regarding as this part have local correlations.
Existing Data Fusion model at first is to pay close attention to sensing data, has just carried out simple processing for the incident that the is detected modeling of data reflection, and this is just as the gap of amplitude coding in the speech coding and model based coding; In addition, existing data processing model is supported not enough to senior application characteristic, as being the performance temperature data equally, need the data fusion research (asking for) of topology information different fully, do not have unified Mathematical Modeling form with the method for the data fusion research employing that only needs statistical information as contour.The research of merging at the uniform data of multiple sensing data (as showing temperature, humidity simultaneously) does not also have significant achievement in addition.And the theory of these data processing, method often need original low-dimensional data set is converted into the high dimensional data collection, and present research mainly is confined to the low-dimensional application scenario, and the analysis and modeling work of high dimensional data collection does not launch as yet fully; At last, existing data processing model mainly concentrates on and uses linear system to analyze room and time correlation with deal with data, reduce transfer of data thereby for example use linear prediction model to predict, but the system of reality is often comparatively complicated, and the linear system model does not often have versatility.
Have a lot of exceptional value detection algorithms all are that the difference of judging sensor node data and its adjacent node data determines whether it being exceptional value, the time of Xu Yaoing is grown and can not guarantee its accuracy like this.Utilize the framework of data fusion, exceptional value detection problem is changed into the reasoning problems of the framework of given data fusion, promptly for given sequence, infer the status switch that its corresponding maximum possible according to the framework of existing data fusion, if judge that according to the likelihood function value that existing model uses existing status switch to obtain the result will be more practical.
Summary of the invention
The object of the present invention is to provide a kind of wireless sensor network method for detecting abnormality, take into full account data in time with the space on correlation, effectively reduce data redundancy and communication overhead, prolong the life-span of sensor node, reach the purpose of abnormality detection.
A kind of test method for sensor network anomaly carries out sub-clustering to network, and each bunch carries out abnormality detection as follows:
The high dimensional data sequence of all nodes in i unit interval in step 1) bunch head converges bunch, with this high dimensional data sequence is training sample, adopt the HMM construction method to make up the node high dimensional data transition model of i unit interval, i=1,2,, N, the unit interval quantity of N for extracting;
Step 2) serve as the classification benchmark with the transition model similitude, with the 1st, 2 ..., the node high dimensional data transition model of N unit interval is classified;
Step 3) is for the initial transition model of node high dimensional data that belongs to the j class, all high dimensional data sequences in its corresponding unit interval are converged, constitute new training sample, adopt the HMM construction method to make up j category node high dimensional data transition model, j=1,2 ... N1, N1 are step 2) number of categories that obtains;
Step 4) utilize j category node high dimensional data transition model to bunch in the data of all node collections carry out abnormality detection.
Technique effect of the present invention is embodied in:
(1) multidimensional data merges.High dimensional data collection in the network (as comprising temperature, humidity simultaneously) is carried out data fusion, adopt the HMM model that the high dimensional data collection is carried out modeling, set up the data fusion model of the general data collection that is fit to any dimension;
(2) reduce redundancy, reduce expense.The present invention makes full use of sensor network and collects correlation on data time and the space, at first by the sub-clustering algorithm, by the correlation on the space, bunch data difference that interior nodes collects is little, bunch being that unit carries out modeling, reduced the redundancy of model, the also convenient follow-up data that bunch interior nodes is collected carries out abnormality detection; Consider temporal correlation,, rebuild new training sample and make up new model, further reduce the redundancy of model category of model; Adopt HMM to the high dimensional data collection transition carry out modeling, make all data that in the process of abnormality detection, need not to collect send to leader cluster node from node, only need transition sequence (i.e. sequence after the sampling) is sent to leader cluster node, thereby significantly reduced communication overhead.
Description of drawings
Fig. 1 is a wireless sensor network structural representation of the present invention;
Fig. 2 is a method for detecting abnormality overview flow chart of the present invention;
Fig. 3 is a wireless sensor network sub-clustering result schematic diagram;
Fig. 4 is the training sample schematic diagram;
To be abnormality detection result schematic diagram: Fig. 5 (a) be node 2-D data transition figure in 17 days bunches of March to Fig. 5; Fig. 5 (b) is node 2-D data transition figure in 18 days bunches of March; Fig. 5 (c) is node 2-D data figure (do not comprise node 3, thick line is the transition curve of node 4) in 17 days bunches of March; Fig. 5 (d), 5 (e), 5 (f) are node 2-D data figure (do not comprise node 3, the thick line among the figure is respectively node 2,7,9 transition curve) in 18 days bunches of March.
Embodiment
Purpose, technical scheme and advantage in order more clearly to show this patent make a detailed description this method below in conjunction with accompanying drawing and instantiation.
The present invention is directed to the abnormality detection problem in the wireless sensor network, can carry out data fusion, extract useful information, finally draw abnormality detection result the high dimensional data collection in the network (reading that comprises temperature and humidity simultaneously).
Fig. 1 has described the layout of the used sensor network of embodiment, used sensing data among the embodiment is to collect the data of 1~54 node from February 28th, 2004 to April 5 by Intel Berkeley research laboratory (Intel Berkeley Research lab).Unit interval is 12 hours a integral multiple, is that one day (24 hours) illustrate with the unit interval in the example.
Fig. 2 is the schematic flow sheet of embodiment, may further comprise the steps:
Step 1 is carried out sub-clustering according to the adjacency of locus to network.
For network configuration shown in Figure 1, we carry out sub-clustering to network by the following method: each node has a status indication position, is initialized as 0,0 expression node and determines state, and 1 expression node has determined it oneself is leader cluster node or from node; Each node has a level (leval) marker bit, is initialized as ∞; Each node is the parent marker bit of underlined father node also.Each node (0, N 3) in select oneself interim ID number at random, wherein N represents the sum of node in the network.
Communicate between step 11 node, each node obtains that h jumps and h jumps information with interior neighbor node, and h is value arbitrarily, and h is taken as 2 in this experiment.Status indication is 0 node with oneself ID number is to make comparisons for ID number that 0 h jumps and h jumps with interior neighbor node with the mark mark, if the ID maximum of oneself, then confirm oneself to be leader cluster node, being about to the leval mark position is 0, the expression leader cluster node is at the 0th layer, and with the status indication position of oneself is 1, enters step 12.
To be that all nodes of 0 are relatively more own jump with spacing h and h jumps similitude with interior leader cluster node in step 12 status indication position, and we represent the similitude of two nodes with coefficient correlation among the present invention, promptly suppose x k, x 1Be respectively node S k, S 1The sequence samples of in one day, reading, correlation coefficient r K, lBe defined as,
r k , l = E ( x k x l ) - E ( x k ) E ( x l ) E ( x k 2 ) - E 2 ( x k ) E ( x l 2 ) - E 2 ( x l )
Wherein E () is for getting the average symbol.
If coefficient correlation is greater than relevance threshold, two curve similitudes are big more, and coefficient correlation levels off to 1, and we establish coefficient correlation greater than 0.97 in this experiment, and node is 1 with the status indication position of oneself, and node is from node.
Step 13 repeating step 11, step 12, the status indication position of all nodes in network are 1 entirely.
Each calculates the similitude of all leader cluster nodes in own and the spacing one jumping scope step 14 from node, find the leader cluster node of coefficient correlation maximum, if this coefficient correlation is greater than relevance threshold, as coefficient correlation greater than 0.97, then will be somebody's turn to do from the leval mark of node and be changed to 1, should be changed to ID number of leader cluster node from the parent mark of node.
The node of step 15leval=∞ obtains h and h and jumps information with all nodes of leval ≠ ∞ in the interior scope, select the node S of the leval≤h-1 of similitude maximum and node, the leval mark of this node is changed to leval mark+1 of node s, and the parent mark of this node is changed to s.
More than be the process of leader cluster node election, so far, the node in the network belongs to and only belongs to one bunch, and the maximum hop count of each bunch is h, and wherein the size of h can be adjusted at the beginning of program running.Figure as a result after the sub-clustering as shown in Figure 3.
Next, each bunch is the center with bunch head, to bunch in data carry out analyzing and processing, draw the abnormality detection result in this bunch.
Each bunch of step 2 gathered sample training HMM (HMM), and sample comprises the data sequence of all nodes in N days bunches.Each node was gathered a data value (mean values of all data that data collect at two hours for this node) every two hours, 12 values were just arranged in one day, the sequences that these 12 values are formed are as sample, for the stability that guarantees to train, we converge to the leader cluster node training to the sequence of all nodes in bunch.Calculate the parameters of these HMMs with Bao Mu-Wei Erqi (Baum-Welch) algorithm iteration, comprise initial probability vector, state transition probability matrix, the parameters of mixed Gauss model, hybrid matrix etc.By accompanying drawing 4 (expression March 1 by node 4,6,7,8,9,10,11,12,13 formed bunch), we can see that the transition curve of data in same bunch is consistent basically.Our used training sequence is a multisequencing, and the result of Baum-Welch iteration makes the summation maximum of a plurality of sequences probability of occurrence in model.
Step 3 need be classified similar model in order further to reduce the redundancy of model.Because temporal correlation, be in the adjacent fate bunch in one day the curve of high dimensional nonlinear data set transition similar, can be with the K mean algorithm to being that the HMM of unit training is classified with the sky, after the classification, with belonging to the convergence of the corresponding fate of model of a class together, form new training sample to train HMM again.What adopt in the experiment is to be divided into 4 classes with 15 days data with the K mean algorithm.
All transition sequences of the data set of all nodes some days in given any one bunch of step 4, the forward direction algorithm of utilization Hidden Markov the inside, calculate this all sequences respectively in step 3 training HMM in probability of occurrence, get and make probability and maximum HMM, if the probability that the transition sequence of certain node occurs in this model, judges then that the data of this node are for unusual less than the unusual judgment threshold of setting.Probability is more little, and poor more with the model match, for unusual possibility is big more, establishing unusual judgment threshold in this experiment is 0.1.Mj represents j HMM, and establishing label and be the probability that the sequence of i occurs in mj is P (i|mj), and this sequence length is Ti, with L (i|mj)=(1/T i) the pairing value of log (P (i|mj)) expression normalization probability logarithm.We were with March 17, and the data on March 18 are gone the accuracy of four models of training in the checking procedure three, with by node 4,6,7,8,9,10,11,12,13 constituted bunch be example, the result of match is as shown in table 1.Wherein as shown in Figure 5, in this bunch of data representation in the table one 2 days data of 9 nodes respectively at model I, the L (i|mj) in 2,3,4.Our best with model 3 matches of the data on March 17 as can be seen from this table, the data on March 18 are best with model 4 matches.The temperature reading of node 3 is all undesired in these two days, is more than 100 degree, and curve is respectively as accompanying drawing 5 (a), (solid line is represented temperature curve, and dotted line is represented voltage curve, and the line of overstriking is represented the temperature and the magnitude of voltage of node 3) shown in the accompanying drawing 5 (b); For the ease of observing, we remove node 3.March 17, in the remaining node, the L of node 4 (i|mj) minimum is observed its curve transition figure, finds other node height of voltage ratio of node 4, shown in accompanying drawing 5 (c); March 18, in the remaining node, node 2,7,9 L (i|mj) is little more a lot of than other nodes, their curve transition figure such as accompanying drawing 5 (d), accompanying drawing 5 (e) shown in the accompanying drawing 5 (f), can be defined as abnormal nodes.
Figure G2009100615378D00071
Table one model match 0317,0,318 three day data result
In sum, a kind of test method for sensor network anomaly based on level decision-making and HMM is effective.

Claims (3)

1. a test method for sensor network anomaly carries out sub-clustering to network, and each bunch carries out abnormality detection as follows:
The high dimensional data sequence of all nodes in i unit interval in step 1) bunch head converges bunch, with this high dimensional data sequence is training sample, adopt the HMM construction method to make up the node high dimensional data transition model of i unit interval, i=1,2,, N, the unit interval quantity of N for extracting;
Step 2) serve as the classification benchmark with the transition model similitude, with the 1st, 2 ..., the node high dimensional data transition model of N unit interval is classified;
Step 3) is for the node high dimensional data transition model that belongs to the j class, all high dimensional data sequences in its corresponding unit interval are converged, constitute new training sample, adopt the HMM construction method to make up j category node high dimensional data transition model, j=1,2 ... N1, N1 are step 2) number of categories that obtains;
The transition sequence of the data set of all nodes some days in given bunch of the step 4), utilization Hidden Markov forward direction algorithm, calculate all sequences respectively at the probability of occurrence of the node high dimensional data transition model of j class, get and make probability and maximum node high dimensional data transition model, if the probability that the transition sequence of certain node occurs in this model, judges then that the data of this node are for unusual less than the unusual judgment threshold of setting;
Described network cluster dividing carries out in the following manner:
The node of identify label is relatively not own jumps with h at interval and h jumps with interior neighbor node ID number for step 01, if the ID maximum of self, is a bunch head with self identify label then, and the level mark puts 0;
The step 02 not node of identify label is calculated oneself and spacing h jumping and the h jumping similitude with interior bunch head, if exist similitude greater than relevance threshold, is from node with this node identify label then;
Step 03 repeating step 01~02 is identified up to the identity of all nodes;
Step 04 is calculated the similitude of all bunches head that own and spacing one jump from node, finds a bunch T of similitude maximum, if maximum comparability, is then confirmed this greater than relevance threshold from the child node of node for a bunch T, and will be somebody's turn to do from the level mark of node and put 1;
Step 05 do not do the level mark from node calculate self with spacing h and h jumping with interior all similitudes of having made the level mark from node, select similitude maximum and level mark≤h-1 from node S, should equal level mark+1 from the level mark of node, confirm that this was the child node from node S from node from node S;
Step 06 repeating step 05, up to all from the level mark of node by assignment;
Described similitude is calculated in the following manner: make x k, x 1Be respectively node S k, S 1The high dimensional data sequence of in a unit interval, gathering, similitude r K, lBe defined as:
r k , l = E ( x k x l ) - E ( x k ) E ( x l ) E ( x k 2 ) - E 2 ( x k ) E ( x l 2 ) - E 2 ( x l )
Wherein average is got in E () expression.
2. a kind of test method for sensor network anomaly according to claim 1, described step 2) adopt the K mean algorithm to classify.
3. a kind of test method for sensor network anomaly according to claim 1 is characterized in that, the described unit interval is 12 hours a integral multiple.
CN2009100615378A 2009-04-07 2009-04-07 Test method for sensor network anomaly Expired - Fee Related CN101516099B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009100615378A CN101516099B (en) 2009-04-07 2009-04-07 Test method for sensor network anomaly

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009100615378A CN101516099B (en) 2009-04-07 2009-04-07 Test method for sensor network anomaly

Publications (2)

Publication Number Publication Date
CN101516099A CN101516099A (en) 2009-08-26
CN101516099B true CN101516099B (en) 2010-12-01

Family

ID=41040335

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009100615378A Expired - Fee Related CN101516099B (en) 2009-04-07 2009-04-07 Test method for sensor network anomaly

Country Status (1)

Country Link
CN (1) CN101516099B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101969655B (en) * 2010-09-21 2013-01-23 浙江农林大学 Single-layer linear network prediction based authentication method of network data of wireless sensor
CN102340811B (en) * 2011-11-02 2014-11-26 中国农业大学 Method for carrying out fault diagnosis on wireless sensor networks
CN105873129B (en) * 2016-03-24 2018-12-18 中国人民解放军信息工程大学 The sensor network missing values reconstructing method of multi-node collaboration
CN105939524B (en) * 2016-06-21 2019-08-16 南京大学 A kind of wireless sensor network node event real-time predicting method
CN107295536B (en) * 2017-07-05 2020-07-07 苏州大学 Parallel diagnosis test method
CN107528722B (en) * 2017-07-06 2020-10-23 创新先进技术有限公司 Method and device for detecting abnormal point in time sequence
US10841322B2 (en) 2018-01-18 2020-11-17 General Electric Company Decision system and method for separating faults from attacks
CN109374044B (en) * 2018-09-30 2023-11-10 国际商业机器(中国)投资有限公司 Intelligent automatic restoration method and device for multi-parameter environment monitoring equipment
CN109406940B (en) * 2018-10-16 2019-06-28 深圳供电局有限公司 A kind of distributed feed line automatization system for power distribution network monitoring
CN109768968B (en) * 2018-12-19 2020-07-31 紫光云引擎科技(苏州)有限公司 Data informatization acquisition and analysis system and method based on cloud computing
CN112468487B (en) * 2020-11-25 2022-03-18 清华大学 Method and device for realizing model training and method and device for realizing node detection
US20230092627A1 (en) * 2021-09-21 2023-03-23 International Business Machines Corporation Distributed sensing and classification
CN114244751B (en) * 2021-11-22 2023-09-15 慧之安信息技术股份有限公司 Wireless sensor network anomaly detection method and system
CN116643951B (en) * 2023-07-24 2023-10-10 青岛冠成软件有限公司 Cold chain logistics transportation big data monitoring and collecting method

Also Published As

Publication number Publication date
CN101516099A (en) 2009-08-26

Similar Documents

Publication Publication Date Title
CN101516099B (en) Test method for sensor network anomaly
Dao et al. Identification failure data for cluster heads aggregation in WSN based on improving classification of SVM
Capozzoli et al. Fault detection analysis using data mining techniques for a cluster of smart office buildings
CN111639237B (en) Electric power communication network risk assessment system based on clustering and association rule mining
WO2018126984A2 (en) Mea-bp neural network-based wsn abnormality detection method
CN104750861B (en) A kind of energy-accumulating power station mass data cleaning method and system
CN110990461A (en) Big data analysis model algorithm model selection method and device, electronic equipment and medium
CN108985380B (en) Point switch fault identification method based on cluster integration
CN112381181B (en) Dynamic detection method for building energy consumption abnormity
CN110335168B (en) Method and system for optimizing power utilization information acquisition terminal fault prediction model based on GRU
Guo et al. Feature selection based on Rough set and modified genetic algorithm for intrusion detection
CN104881735A (en) System and method of smart power grid big data mining for supporting smart city operation management
CN108289285B (en) Method for recovering and reconstructing lost data of ocean wireless sensor network
CN114861788A (en) Load abnormity detection method and system based on DBSCAN clustering
CN115278741A (en) Fault diagnosis method and device based on multi-mode data dependency relationship
CN116780781B (en) Power management method for smart grid access
CN108734359B (en) Wind power prediction data preprocessing method
CN105373620A (en) Mass battery data exception detection method and system for large-scale battery energy storage power stations
CN110533253A (en) A kind of scientific research cooperative Relationship Prediction method based on Heterogeneous Information network
Gu et al. Application of fuzzy decision tree algorithm based on mobile computing in sports fitness member management
Chu et al. Co-training based on semi-supervised ensemble classification approach for multi-label data stream
CN117078048A (en) Digital twinning-based intelligent city resource management method and system
CN109376790A (en) A kind of binary classification method based on Analysis of The Seepage
CN116629428A (en) Building energy consumption prediction method based on feature selection and SSA-BiLSTM
Li et al. A hybrid coevolutionary algorithm for designing fuzzy classifiers

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20101201

Termination date: 20160407

CF01 Termination of patent right due to non-payment of annual fee