CN102761888B

CN102761888B - The sensor network abnormal detection method that a kind of feature based is selected and device

Info

Publication number: CN102761888B
Application number: CN201210253661.6A
Authority: CN
Inventors: 李�瑞; 刘克彬; 赵季中; 何源; 刘云浩
Original assignee: WUXI RUIAN TECHNOLOGY CO LTD
Current assignee: Ruan Internet Of Things Technology Group Co ltd; Run Technology Co ltd
Priority date: 2012-07-20
Filing date: 2012-07-20
Publication date: 2016-01-13
Anticipated expiration: 2032-07-20
Also published as: CN102761888A

Abstract

The invention discloses the sensor network abnormal detection method that a kind of feature based is selected, the method comprises: sorted according to the criterion based on coefficient correlation by the characteristic attribute of collection; Choosing of characteristic features property set is carried out according to the feature selecting result of calculation based on coefficient correlation; The reliability of the characteristic features property set selected by being proved by cross validation, described characteristic features property set is used for the running status of representative system.The abnormality detection that feature based is selected not only has simply, efficient, the characteristic that is easy to realization; Can fusion component and other sensor network diagnostic tool can also integrate as one.The invention also discloses the sensing network abnormal detector that a kind of feature based is selected.

Description

Sensing network anomaly detection method and device based on feature selection

Technical Field

The invention relates to the technical field of sensor network diagnosis, in particular to a sensor network anomaly detection method and device based on feature selection.

Background

The sensing network is a wireless network formed by a large number of static or mobile sensors in a self-organizing and multi-hop mode, and aims to cooperatively sense, collect, process and transmit monitoring information of a sensing object in a network coverage geographical area and report the monitoring information to a user. The technology has the obvious advantages of sensing and monitoring various complex environments with low cost. The sensing network has a wide application prospect and has been widely applied to a plurality of fields such as environment monitoring, building safety monitoring, logistics, military battlefields, forest ecological monitoring and the like.

The main components of a sensor network include sensor nodes (sensornodes) and base station nodes (sinknnodes). In general, the sensor nodes form a communication network through a wireless multi-hop self-organizing form, and transmit collected data back to a base station. Each sensor node is composed of a data acquisition module (a sensor and an A/D converter), a data processing and control module (a microprocessor and a memory), a communication module (a wireless transceiver), a power supply module (a battery and a DC/AC energy converter) and the like.

Network management and anomaly detection are key issues in determining whether a wireless sensor network can operate reliably. At present, most of abnormity and error detection depends on the collection of wireless sensor network data, and the reason of an abnormity method is obtained from data packet analysis. These anomaly detection methods mainly include localization of the occurrence of errors by decision trees, passive inference of models using observed data, and the like. The information needs to be collected no matter in active collection or passive monitoring, the cost of the detection systems for detecting the abnormity is relatively high due to the existence of a large amount of redundant information in the network, the redundant information of the network is greatly reduced by a characteristic selection method, the selected data set can represent the current operation state of the network, and the change of the selected representative attribute characteristic set can reflect the generation condition of the network abnormity.

Disclosure of Invention

The invention aims to provide a sensing network anomaly detection method and device based on feature selection, which can greatly reduce the overhead of anomaly detection and can be well fused with the traditional sensing network diagnostic tool.

In order to achieve the purpose, the invention adopts the following technical scheme:

a sensing network anomaly detection method based on feature selection comprises the following steps:

sorting the collected feature attributes according to a criterion based on the correlation coefficient;

selecting a representative feature attribute set according to a feature selection calculation result based on the correlation coefficient;

and verifying the reliability of the selected representative characteristic attribute set used for representing the running state of the system through cross validation.

Said sorting the collected feature attributes according to a correlation coefficient based criterion further comprises:

preprocessing the attribute values of the collected characteristic attributes by negative value removal;

calculating correlation coefficients among different characteristic attributes by using the preprocessed attribute values;

and sorting the characteristic attributes according to the correlation coefficient and a preset sorting criterion.

Correlation coefficient between different characteristic attributes

Wherein f is_iAnd f_jTwo feature attributes are represented, cov covariance and var standard deviation, respectively.

The selecting of the representative feature attribute set according to the feature selection calculation result based on the correlation coefficient further comprises:

taking the first k sequenced characteristic attributes as an initial characteristic attribute set, and calculating correlation coefficients between the initial characteristic attribute set and the rest of characteristic attributes and correlation coefficients between the k characteristic attributes;

and calculating a correlation coefficient between each characteristic attribute in the characteristic attribute set and the characteristic attribute set, and deleting or adding each characteristic attribute according to a result.

The calculation formula for calculating the correlation coefficient between each feature attribute in the feature attribute set and the feature attribute set is as follows:

R (f_{k}, F) = \frac{{kr}_{kf}}{\sqrt{k + k (k - 1) r_{kk}}};

wherein,representing an average correlation coefficient between a feature attribute and the set of feature attributes;representing an average correlation coefficient between different characteristic attributes; f denotes the currently selected feature attribute set.

A sensing network anomaly detection device based on feature selection comprises:

a sorting module for sorting the collected characteristic attributes according to a criterion based on the correlation coefficient;

the selection module is used for selecting the representative characteristic attribute set according to the characteristic selection calculation result based on the correlation coefficient;

and the verification module is used for proving the reliability of the selected representative characteristic attribute set through cross verification.

The sorting module further comprises:

the preprocessing submodule is used for preprocessing the attribute values of the collected characteristic attributes by removing negative values;

the first operation submodule is used for calculating correlation coefficients among different characteristic attributes by utilizing the preprocessed attribute values;

and the sorting submodule is used for sorting the characteristic attributes according to the correlation coefficient and a preset sorting criterion.

The selecting module further comprises:

the second operation submodule is used for taking the front k sequenced characteristic attributes as an initial characteristic attribute set, and calculating correlation coefficients between the initial characteristic attribute set and the rest of the characteristic attributes and correlation coefficients between the k characteristic attributes;

and the characteristic selection submodule is used for calculating a correlation coefficient between each characteristic attribute in the characteristic attribute set and the characteristic attribute set, and deleting or adding each characteristic attribute according to a result.

By adopting the technical scheme of the invention, the data anomaly detection in the sensor network is realized, and the high-efficiency anomaly detection capability is realized; a feature attribute subset selection mechanism is provided, and the correctness of anomaly detection is ensured; the anomaly detection based on feature selection has the characteristics of simplicity, high efficiency and easiness in implementation; and the sensor network diagnostic tool can be integrated with other sensor network diagnostic tools as a fusible component.

Drawings

Fig. 1 is a flowchart of a method for detecting an anomaly in a sensor network based on feature selection according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a calculation method of correlation coefficients between different characteristic attributes in the embodiment of the present invention.

Fig. 3 is a schematic structural diagram of a sensor network anomaly detection device based on feature selection according to an embodiment of the present invention.

Detailed Description

The technical scheme of the invention is further explained by the specific implementation mode in combination with the attached drawings.

As shown in fig. 1, a method for detecting an anomaly of a sensor network based on feature selection according to an embodiment of the present invention includes:

s101, sorting the collected characteristic attributes according to a criterion based on the correlation coefficient.

The collected characteristic attributes refer to all state parameters fed back by each sensor through the sensing network, each parameter corresponds to one characteristic attribute, and the source data shown in fig. 1 is the characteristic attributes. In particular, the description of the partial characterization attributes is shown in the following table:

and after the characteristic attributes are collected, preprocessing the attribute values, wherein the preprocessing comprises negative value removal or non-negative value retention.

And calculating correlation coefficients among different characteristic attributes by using the preprocessed attribute values. The way in which the correlation coefficients between different characteristic properties are calculated is shown in fig. 2. Correlation coefficient between different characteristic attributes

ρ (f_{i}, f_{j}) = \frac{cov (f_{i}, f_{j})}{\sqrt{var (f_{i}) var (f_{j})}} .

Due to ρ (f)_i,f_j)=ρ(f_j,f_i) Thus, the attribute set matrix composed among the attribute of the feature is a triangular matrix.

Wherein f is_iAnd f_jRespectively representing two characteristic attributes, cov representing covariance and var representing standard deviation;representing an average correlation coefficient between a feature attribute and the set of feature attributes;representing an average correlation coefficient between different characteristic attributes; f denotes the currently selected feature attribute set.

For a value of x_iCharacteristic property f of_iAnd a value of y_jCharacteristic property f of_jThe estimation of the correlation coefficient is:

ρ (f_{i}, f_{j}) = \frac{Σ_{i} (x_{i} - \overset{&OverBar;}{x}) (y_{i} - \overset{&OverBar;}{y})}{Σ_{i} {(x_{i} - \overset{&OverBar;}{x})}^{2} Σ_{j} {(y_{i} - \overset{&OverBar;}{y})}^{2}} .

the user can choose the definition for high relevance according to his own system settings. For example, a threshold value of 0.95 may be chosen for fully correlated attribute values, and if there is such a correlation between two attribute values, it may be considered redundant information; the correlation coefficient is considered to be relatively high for values with correlation between 0.75 and 0.95, but whether the data is redundant still needs to be further determined.

The sorting criteria include 2:

1) for the feature attribute f_iMore advanced ranking of feature attributes including high correlation coefficient

2) The ranking is further advanced if there are two or more feature attributes having the same high correlation coefficient value, which feature attribute has a large average correlation coefficient value.

And S102, selecting the representative feature attribute set according to the feature selection calculation result based on the correlation coefficient.

And taking the top k characteristic attributes after being sorted according to the sorting criterion as an initial characteristic attribute set. The correlation coefficients between the set of characteristic attributes and the remaining characteristic attributes, including the correlation coefficients between the k characteristic attributes, may then be calculated.

The set of attribute correlation means that the correlation coefficient thereof increases as the correlation coefficient between the characteristic attribute and the attribute set containing k characteristic attributes increases, and decreases as the correlation coefficient with the attribute set containing k characteristic attributes decreases.

And calculating a correlation coefficient between each characteristic attribute in the characteristic attribute set and the characteristic attribute set, and deleting or retaining each characteristic attribute according to a result.

Defining the average correlation coefficient between the characteristic attribute and the characteristic attribute set (the selected characteristic attribute set capable of representing the system operation state) asThe average correlation coefficient between different characteristic attributes isThe correlation coefficient that measures the correlation between the feature attribute sets is calculated as:

R (f_{k}, F) = \frac{{kr}_{kf}}{\sqrt{k + k (k - 1) r_{kk}}} .

the characteristic selection method based on the correlation coefficient is that the calculation of the correlation coefficient is carried out on a characteristic attribute group (a single characteristic attribute can also be used as a characteristic attribute group) and a characteristic attribute set according to the formula, if the correlation coefficient is larger than or equal to a threshold value set by a user, the characteristic attribute is deleted, and if the correlation coefficient is smaller than the threshold value set by the user, the characteristic attribute is added; the correlation coefficient is extremely low, and the characteristic attribute group (a single characteristic attribute can also be used as one characteristic attribute group) which is lower than the threshold value set by the user is ignored because the characteristic attribute group has no correlation with the characteristic attribute group.

In addition, at this time, the feature attribute search of this step may be performed by using the best-first search strategy, so as to ensure the selection of the feature attribute subset. Here, the best-first search strategy is a heuristic search strategy that attempts to predict the solution that is closest to the best path, and this particular search type is called greedy best-first search.

And S103, verifying the reliability of the selected representative characteristic attribute set through cross validation.

And performing leave-one-out cross validation on the selected feature attribute subset, wherein the cross validation is to use only one feature attribute in the feature attribute set as validation material, and the rest feature attributes are left as training material. This step continues until each feature attribute is treated as a verification material. Equivalent to using k-fold cross validation.

And determining whether the output representative characteristic attribute subset meets the sorting rule in the S101 through the leave-one-out cross validation. The representative feature attribute set is used to represent an operational state of the system.

In the process of network diagnosis by a plurality of sensor network diagnostic tools, a diagnostic data collection process exists, and the representative characteristic attribute subset selection in the embodiment of the invention can be integrated in the diagnostic data collection stage of other diagnostic tools to be used as a fusible component to reduce the collection of diagnostic information and reduce the network overhead.

Correspondingly, the embodiment of the invention provides a sensing network anomaly detection device based on feature selection. As shown in fig. 3, the apparatus includes: the device comprises a sorting module, a selecting module and a verifying module. Wherein:

The sorting module further comprises: the device comprises a preprocessing submodule, a first operation submodule and a sequencing submodule. Wherein:

the first operation submodule is connected with the preprocessing submodule and used for calculating correlation coefficients among different characteristic attributes by using the attribute values after preprocessing;

and the sorting submodule is connected with the first operation submodule and is used for sorting the characteristic attributes according to the correlation coefficient and a preset sorting criterion.

The selecting module further comprises: a second operation submodule and a feature selection submodule. Wherein:

the second operation submodule is connected with the sorting submodule and used for taking the sorted front k characteristic attributes as an initial characteristic attribute set and calculating a correlation coefficient between the initial characteristic attribute set and the rest characteristic attributes and a correlation coefficient between the k characteristic attributes;

and the characteristic selection submodule is connected with the second operation submodule and the verification module and is used for calculating a correlation coefficient between each characteristic attribute in the characteristic attribute set and deleting or adding each characteristic attribute according to a result.

The first operation module and the second operation module calculate correlation coefficients between different characteristic attributes

ρ (f_{i}, f_{j}) = \frac{cov (f_{i}, f_{j})}{\sqrt{var (f_{i}) var (f_{j})}};

The feature selection module calculates a correlation coefficient between each feature attribute in the feature attribute set and the feature attribute set according to a calculation formula:

wherein,representing an average correlation coefficient between a feature attribute and the set of feature attributes;to representAverage correlation coefficients between different characteristic attributes; f denotes the currently selected feature attribute set.

By adopting the technical scheme of the invention, the data anomaly detection in the sensor network is realized, and the high-efficiency anomaly detection capability is realized; a feature attribute subset selection mechanism is provided, and the correctness of anomaly detection is ensured; the anomaly detection based on feature selection has the characteristics of simplicity, high efficiency and easiness in implementation; the sensor network diagnostic tool can also be integrated with other sensor network diagnostic tools as a fusible component.

The above description is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A sensing network anomaly detection method based on feature selection is characterized by comprising the following steps:

verifying the reliability of the selected representative characteristic attribute set through cross validation, wherein the representative characteristic attribute set is used for representing the running state of the system;

wherein selecting the representative feature attribute set according to the feature selection calculation result based on the correlation coefficient further comprises:

calculating a correlation coefficient between each characteristic attribute in the characteristic attribute set and the characteristic attribute set, and deleting or adding each characteristic attribute according to a result;

wherein the calculation formula for calculating the correlation coefficient between each feature attribute in the feature attribute set and the feature attribute set is as follows:

wherein,representing an average correlation coefficient between a feature attribute and the set of feature attributes;representing an average correlation coefficient between different characteristic attributes; f represents the currently selected characteristic attribute set;

wherein the sorting the collected feature attributes according to a criterion based on a correlation coefficient specifically comprises:

for the feature attribute f_iThe characteristic attributes with more high correlation coefficients are ranked more front; or, if two or more characteristic attributes have the same high correlation coefficient value, and the average correlation coefficient of which characteristic attribute is large, the ordering is further advanced;

wherein the deleting or adding each feature attribute according to the result specifically comprises:

and if the correlation coefficient is larger than or equal to the threshold set by the user, deleting the characteristic attribute, and if the correlation coefficient is smaller than the threshold set by the user, adding the characteristic attribute.

2. The method of claim 1, wherein sorting the collected feature attributes according to a correlation coefficient based criterion further comprises:

3. The method of claim 2, wherein a correlation coefficient between different feature attributes

4. A sensing network anomaly detection device based on feature selection is characterized by comprising:

a verification module for verifying the reliability of the selected representative feature attribute set by cross-validation;

wherein, the selecting module further comprises:

a feature selection submodule for calculating said using the formulaA correlation coefficient between each feature attribute within a set of feature attributes and the set of feature attributes:

wherein,representing an average correlation coefficient between a feature attribute and the set of feature attributes;representing an average correlation coefficient between different characteristic attributes; f represents the currently selected characteristic attribute set; deleting or adding each characteristic attribute according to the result;

wherein the ranking module is specifically configured to rank the feature attribute f_iThe characteristic attributes with more high correlation coefficients are ranked more front; or, if two or more characteristic attributes have the same high correlation coefficient value, and the average correlation coefficient of which characteristic attribute is large, the ordering is further advanced;

the feature selection submodule is specifically configured to delete the feature attribute if the correlation coefficient is greater than or equal to a threshold set by a user, and add the feature attribute if the correlation coefficient is less than the threshold set by the user.

5. The apparatus of claim 4, wherein the ranking module further comprises: