CN107423435B - Multi-level anomaly detection method for multi-dimensional space-time data - Google Patents

Multi-level anomaly detection method for multi-dimensional space-time data Download PDF

Info

Publication number
CN107423435B
CN107423435B CN201710660034.7A CN201710660034A CN107423435B CN 107423435 B CN107423435 B CN 107423435B CN 201710660034 A CN201710660034 A CN 201710660034A CN 107423435 B CN107423435 B CN 107423435B
Authority
CN
China
Prior art keywords
data
abnormal
attribute
correlation
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710660034.7A
Other languages
Chinese (zh)
Other versions
CN107423435A (en
Inventor
陈爱国
罗光春
田玲
卢国明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201710660034.7A priority Critical patent/CN107423435B/en
Publication of CN107423435A publication Critical patent/CN107423435A/en
Application granted granted Critical
Publication of CN107423435B publication Critical patent/CN107423435B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases

Abstract

The invention relates to a multi-level anomaly detection method of multi-dimensional space-time data, which comprises the following steps: A. the system comprises a sensing data layer, a sinking layer and a gateway layer; B. obtaining a correlation attribute set, a correlation data matrix M of the correlation attribute data and a correlation coefficient matrix R through historical event abnormal data measured in advance; C. judging whether the data acquired by the sensor in the period is abnormal or not on a sensing data layer according to the fluctuation difference value of adjacent data of the same sensor, and transmitting the result to a corresponding node of a sinking layer; D. the sinking layer marks whether the data of each node is abnormal or not according to the proportion of the abnormal data in the detection result of each node; E. and the gateway layer extracts an abnormal attribute set according to the abnormal detection result of the sinking layer. The method has simple calculation process, can quickly find the abnormal event in time by calculating the fluctuation of the data, and then carries out correlation detection on the attribute of the data, thereby effectively improving the detection accuracy of the abnormal data.

Description

Multi-level anomaly detection method for multi-dimensional space-time data
Technical Field
The invention relates to data mining and abnormal data analysis, in particular to a multi-level abnormal detection method of multi-dimensional space-time data.
Background
The sensor is a device which is commonly used for sensing the attribute data of the environmental state, and has the characteristics of low price, small volume, no need of manual maintenance and the like. A user deploys a sensor network in an environment to achieve the purpose of monitoring various states of the environment. The environment detected usually is not always in a static state, but changes exist, and the change of the environment is finally reflected on the change of a value sensed by the sensor. However, for a sensor, the change in sensor readings is usually due to two factors:
(1) something happening in the environment causes the environment value to change. The change of the value of the environmental attribute can be caused by different events in the environment, such as temperature rise caused by fire, rainfall influence on the humidity of the detection environment, and the like. The event change, which causes the attribute value to deviate from the normal range, is due to abnormal data caused by the occurrence of the event. These are normal changes in the attribute environment and the sensor acquisition is the correct data with meaning.
(2) The sensors are subject to various disturbances causing numerical changes. Specifically, there are two reasons: on one hand, the sensor is an external reason, generally, all sensors are directly deployed in an actual physical environment, and due to the complexity and changeability of the actual physical environment, the sensor is easily subjected to external direct interference, such as noise interference and dust interference; the data collected by the sensor has certain errors due to human intentional interference; on the other hand, the sensor has a certain service life, and the accuracy of data acquisition of the sensor are reduced to different degrees along with the increase of the service time or the continuous exposure to the outside wind, sunlight and rain. Thus, a part of the data collected by the sensor is error data. Such data is not useful and does not represent the state of the environment.
The data is detected for abnormality, and the correct data is distinguished, and the abnormal data and the error data of the event are very important. The data anomaly detection of the current sensor can be divided into two categories, namely anomaly detection based on Euclidean distance and detection based on data space-time attribute, and the anomaly detection applied to multi-dimensional correlation data has certain deficiency. Wherein:
(1) euclidean distance-based anomaly detection
The main idea is to divide the collected data into classes by using a big data clustering method, and then find out outliers according to the number of the divided data of each class. And judging which class the acquired data belongs to, and classifying the acquired data into which class according to the Euclidean distance from the class central point of the data, wherein the distance from the class is the least. The euclidean distance is used and does not take into account the correlation between data when multidimensional data is collected.
In practice, the detection of the environment by a system will collect data from a number of aspects, such as temperature, humidity, illumination intensity, atmospheric pressure, etc. A complete piece of data contains information in multiple aspects, and certain correlation exists among the data, such as the temperature can affect the gas pressure. For example, in monitoring a greenhouse, temperature data and pressure data inside the greenhouse are collected, and according to PV — nRT of physics, under the condition of a certain volume, gas pressure and temperature are in direct proportion. Assuming a large number of temperature and pressure sensors are deployed in a system, a series of temperature and pressure data sets < T, P >: <26,110.5>, <26,110.6>, <26,110.4>, <26,110.3>, <24,110.4> assuming that the center point <25.100> has been selected. The distance from the central point of each piece of data can be calculated according to a calculation formula of the Euclidean distance, and the Euclidean distances from the central point are respectively as follows: 111.25, 112.36, 109.16, 107.09, 109.16. It can be seen that the distances are not very different, and therefore the data are classified into the same class. However, it can be seen that the data <24,110.4> is obviously different from other data pairs because when the air pressure is around 110, the value according to the regular temperature should be 26, and therefore, the data should be abnormal data generated by interference, and is an error data, but is not detected by the outlier detection of the euclidean distance.
(2) Detection based on data spatiotemporal attributes
The main idea is that in the sensor network, when monitoring is performed on an area, a large number of sensors of the same type are deployed, and due to the limitation of communication distance, the distance between deployment positions is not very long; meanwhile, the change of the physical environment is continuous without mutation. Based on the above objective facts, for a single sensor, a plurality of data continuously collected in one data collection period should have certain similarity; and because a plurality of sensors of the same type in a certain range are all used for collecting data in the range, the data collected by the sensors at the same time have certain similarity. According to the characteristic, for the acquired temperature and pressure data sets <26,110.5>, <26,110.6>, <26,110.4>, <26,110.3>, <24,110.4>, the data <24,110.4> is judged to be abnormal according to the similarity of the data. However, this method based on spatio-temporal correlation detection is effective only when the number of occurrences of abnormal data is extremely small. In practical situations, if one sensor is interfered, abnormal data possibly has a little difference from normal data in one sampling period, even the proportion of the abnormal data is more, if the abnormal data is detected based on time correlation, the abnormal data cannot be effectively detected, and even the abnormal data can be treated as normal data; meanwhile, the sensors can be subjected to intentional interference of an enemy and the like in a large area, the possibility that the sensors in the large area acquire abnormal data exists, and the sensors adjacent to the geographic position have errors, so that the sensors cannot be effectively detected through the spatial correlation detection of the data, the data is single-attribute data in the time-space attribute check, and the correlation between the attributes is not considered.
Disclosure of Invention
The invention provides a multi-level anomaly detection method of multi-dimensional space-time data, which can more accurately detect anomalous data by detecting the correlation between attributes of the anomalous data, so that the method can be suitable for a system with high real-time requirement.
The invention relates to a multi-level anomaly detection method of multi-dimensional space-time data, which comprises the following steps:
A. having a sensing data layer: various types of sensors are included and used for collecting data of various attributes;
sinking a layer: collecting data uploaded by sensors of the same type, wherein each type of sensor corresponds to one node of a sinking layer;
a gateway layer: collecting data uploaded by all nodes of a sinking layer;
B. excavating attributes with correlation in historical event abnormal data measured in advance through a data excavation technology to obtain a related attribute set; calculating the mean value of each attribute aiming at each element of the related attribute set, and obtaining a related data matrix M and a related coefficient matrix R of related attribute data through mean value calculation;
C. and carrying out anomaly detection based on data time correlation on each sensor on a sensing data layer: acquiring data of each sensor, calculating a fluctuation difference value of the data according to adjacent data of the same sensor, and obtaining whether the data acquired by the sensor in the period is abnormal or not according to the fluctuation difference value; transmitting the data and detection results of all the sensors to corresponding nodes of the sinking layer;
D. the data and the detection results uploaded by the sensors of the same type are collected through the nodes on the sinking layer, abnormal detection and data fusion based on data space correlation are carried out, and the fusion results and the abnormal detection results are uploaded to the gateway layer: c, calculating the proportion of abnormal data in the detection result corresponding to the step C in the total data received by the node, and identifying whether the data of the node is abnormal according to the size relation between the proportion and the threshold; for example, if the proportion of the abnormal data is higher than a threshold c (c is a threshold based on spatial correlation abnormality, and takes a value of any one of [ 0.7-0.9 ]), it indicates that the abnormal data occupies a dominant position, and identifies the data of the node as "abnormal"; if the proportion is lower than the threshold value c, indicating that the normal data occupies a dominant position, and marking the data of the node as 'normal';
E. and the gateway layer collects the data and the abnormal detection result uploaded by all the nodes of the sinking layer and extracts an abnormal attribute set according to the abnormal detection result.
The method comprises the steps of firstly, excavating attributes with correlation among multi-dimensional attributes through a data mining technology to obtain a correlation attribute set, then, carrying out data anomaly detection according to the time-space correlation of data aiming at various data collected by a sensor, and finally, carrying out correlation anomaly detection among the attributes according to the result of the time-space correlation anomaly detection.
Specifically, the step B includes:
B1. collecting historical event abnormal data sets, extracting abnormal attributes and values of the abnormal attributes in each piece of event abnormal data, and forming an abnormal attribute set;
B2. according to the abnormal attribute set, all related attributes in the abnormal attribute set are mined through an association analysis algorithm (such as an Apriori algorithm) in the data mining technology to form an associated attribute set.
B3. And for each associated attribute, finding out a corresponding numerical value in the event abnormal data set in the step B1 according to the category of the attribute to form an associated data matrix M.
B4. And calculating a correlation coefficient matrix R of the correlation data matrix M according to the numerical value of each correlation attribute.
Further, in step B3, a mean value of the corresponding values found from the event anomaly data set is calculated, and the associated data matrix M is obtained through the mean value calculation.
Specifically, the step E includes:
E1. collecting attribute data and anomaly detection results uploaded by all nodes of a sinking layer, and constructing an attribute data set and an anomaly detection result;
E2. extracting corresponding abnormal data in the data set according to the attributes in the abnormal detection result to form an abnormal data set;
E3. if the abnormal data set is empty, all the data collected in the step C are correct data; if the abnormal data set is not empty, judging whether the attribute corresponding to the formed abnormal data set is in the related attribute set in the step B, if not, judging that the data corresponding to the attribute is error data; if yes, calculating a correlation coefficient matrix corresponding to the abnormal data set;
E4. and C, similarity calculation is carried out on the correlation coefficient matrix and the correlation coefficient matrix of the corresponding attribute calculated through historical data, each row of the two matrixes is regarded as a vector, cosine values of each vector included angle are calculated, if each cosine value is within a preset range (for example, the range is [ d, 1], d is a set cosine threshold value, and the value of d is any numerical value within [0.7, 0.9 ]), the two relation matrixes are similar, the data collected in the step C is judged to be event abnormal data, if at least one cosine value is not within the preset range, the two relation matrixes are not similar, and the data collected in the step C is judged to be error data.
The multi-level anomaly detection method of the multi-dimensional space-time data has a simple calculation process, can quickly find out an abnormal event in time by calculating the fluctuation of the data, and then carries out correlation detection on the attributes of the data, thereby effectively improving the detection accuracy of the abnormal data.
The present invention will be described in further detail with reference to the following examples. This should not be understood as limiting the scope of the above-described subject matter of the present invention to the following examples. Various substitutions and alterations according to the general knowledge and conventional practice in the art are intended to be included within the scope of the present invention without departing from the technical spirit of the present invention as described above.
Drawings
FIG. 1 is a flow chart of a short-term traffic flow prediction method according to the present invention.
Detailed Description
As shown in FIG. 1, the multi-level anomaly detection method for multi-dimensional spatio-temporal data of the present invention comprises: attribute set a ═ { a ═ a1,A2,A3……AnAnd represents n detection attributes. Each test datum contains all attribute value information, and is represented by a data set D, D ═ D1,D2,D3……Dn},DiRepresents attribute AiThe value of (d); meanwhile, each data set D corresponds to an abnormal mark set E, E ═ E1,E2,E3……En},EiRepresenting data DiThe abnormal states of (2) are 3 states, namely normal data, event abnormal data and error data, which are respectively represented by integers 0, 1 and 2. As long as one data in one data set D is error data, the one data set D is error data; if a data set D contains event exception data but no error data, then it is an event exception data; if there is only normal data, this is a piece of normal data.
The detection method comprises the following steps:
A. having a sensing data layer: various types of sensors are included and used for collecting data of various attributes;
sinking a layer: collecting data uploaded by sensors of the same type, wherein each type of sensor corresponds to one node of a sinking layer;
a gateway layer: collecting data uploaded by all nodes of a sinking layer;
B1. collecting historical event abnormal data sets, extracting abnormal attributes and values of the abnormal attributes in each piece of event abnormal data, and forming an abnormal attribute set;
B2. and excavating all related attributes in the abnormal attribute set through an Apriori association analysis algorithm in the data mining technology according to the abnormal attribute set to form an associated attribute set.
B3. For each associated attribute, finding out a corresponding numerical value in the event abnormal data set in the step B1 according to the category of the attribute, and forming an associated data matrix M by the mean value of the numerical values.
B4. And calculating a correlation coefficient matrix R of the correlation data matrix M according to the numerical value of each correlation attribute.
The steps B1-B4 are explained by taking 5 attributes as examples: a complete data set D ═ D1,D2,D3,D4,D55 data, each having 5 attribute values, are associated with an exception set E ═ E1,E2,E3,E4,E5}. The method comprises the steps of firstly finding out event abnormal data in historical data, and extracting an abnormal attribute set according to the abnormal set. If E is in an exception set1,E3,E5Indicates an event anomaly, then the set of anomaly attributes { A } can be extracted1,A3,A5}. By analogy, after all event exception data are processed, an attribute set a can be formed, and the basic form is as follows:
A={
{A1,A3,A5},{A4,A5},{A1,A3},{Ai,Aj,Ak,…}…….
}
each individual item in the attribute set A is an object, various attributes in the object are data items, all frequent item sets in the attribute set A can be obtained according to an Apriori algorithm of an association rule mining algorithm and a minimum support degree minsup (the value of the minsup is any value of 70% -90%), the frequent item sets are related attribute sets, and a related attribute database DB _ RA is formed.
The form of the associated attribute set is the same as the attribute set form, and then for any one associated attribute in the associated attribute set { A }i,Aj,Ak… …, the matrix of correlation coefficients for this set is calculated. Suppose that an associated attribute has n related attributes (A)1,A2,A3,……An) In the historical event abnormal data, m pieces of data contain the n relevant attributes, and the basic steps are as follows:
1. and extracting corresponding data from the event abnormal data according to the relevant attributes. Extracting the n related attribute data from the first piece of data: { x11,x12,x13……x1nExtracting the n related attribute data from the second piece of data: { x21,x22,x23……x2nN pieces of related attribute data extracted from the mth piece of data: { xm1,xm2,xm3……xm4}
2. All the extracted data construct associated data matrix M is:
Figure BDA0001370326410000051
each row of the associated data matrix M represents one sample data and each column represents M characteristic data of one attribute. The ith column represents the attribute AiM data of (2), by XiTo show that: xi=[x1i,x2i,x3i….xmi];
3. Calculating a correlation coefficient matrix R:
Figure BDA0001370326410000061
wherein
Figure BDA0001370326410000062
Represented by attribute AiAnd AjThe degree of correlation of (c). D (X)i) Representative is vector XiVariance of (C), cov (X)i,Xj) Is XiAnd XjThe covariance of (a) is calculated as follows:
cov(Xi,Xj)=E((Xi-E(Xi))(Xj-E(Xj)))
where E (X) represents the expectation of a set of vector data. After all the correlation attribute sets are calculated, one correlation coefficient matrix R corresponds to each other, and all the correlation coefficient matrices R form a correlation coefficient matrix database DB _ RM.
C. And carrying out anomaly detection based on data time correlation on each sensor on a sensing data layer: acquiring data of each sensor, calculating a fluctuation difference value of the data according to adjacent data of the same sensor, and obtaining whether the data acquired by the sensor in the period is abnormal or not according to the fluctuation difference value; and transmitting the data and the detection results of all the sensors to corresponding nodes of the sinking layer. The specific process is as follows:
suppose a sensor has collected n data, d respectively, in one cycle1,d2……dn. The anomaly detection based on the time correlation of data comprises the following steps:
1. the initial fluctuation count is 0, the fluctuation threshold b is n/3, and the amplitude threshold a is set.
2. Calculating diAnd di+1(ii) difference of (i)<n)。diff=|di-di+1|。
3. If diff > a, the number of oscillations count is equal to count + 1.
4. If i < n, go back to step 2 to continue the calculation, otherwise go to step 5.
5. If the count is greater than b, the anomaly detection identifier e of the sensing data layer is 1, which represents data anomaly, otherwise, the anomaly detection identifier e is 0, which represents normal data.
6. And (3) synthesizing n data into one data d through a data fusion algorithm (such as an averaging method).
7. Fusing the data d, the abnormal detection mark e and the sensor label SiStart and end times t of data acquisition cyclebeginAnd tend. Five of them are combined into a data pair T1,T1={Si,d,e,tbegin,tendGet T out of1Upload to SiAnd (4) corresponding sinking layer nodes.
D. The data and the detection results uploaded by the sensors of the same type are collected through the nodes on the sinking layer, abnormal detection and data fusion based on data space correlation are carried out, and the fusion results and the abnormal detection results are uploaded to the gateway layer: c, calculating the proportion of abnormal data in the detection result corresponding to the step C in the total data received by the node, and identifying whether the data of the node is abnormal according to the size relation between the proportion and the threshold; if the proportion of the abnormal data is higher than a threshold c (the c is a threshold based on spatial correlation abnormality and takes a value of any one value in [ 0.7-0.9 ]), indicating that the abnormal data occupies a dominant position, and identifying the data of the node as abnormal; if the ratio is lower than the threshold value c, the normal data is in the dominant position, and the data of the node is marked as 'normal'. The method specifically comprises the following steps:
each sinking layer node is responsible for collecting data pair T of one attribute1The designation of sinking level nodes is denoted SKiAnd represents the node that collects the ith attribute data. Then carrying out abnormity detection, wherein the detection result is EiI represents the ith class attribute, and the steps of carrying out anomaly detection and data fusion based on data space correlation for each node are as follows:
1. each node collects data pairs T uploaded by corresponding sensing data layers1
2. According to the time tag tbeginAnd tendSelecting data pairs T having the same time stamp1And extracting the data d and the abnormal detection mark e.
3. Construction of a data set from data d1,d2,d3,….,diH, and an exception result set { e _1, e _2, e _3, … … e _ i }, diIs represented by SiData of the sensor, e _ i represents the sensor SiAnd uploading the abnormal detection result based on the time correlation in the data.
4. In the abnormal result set { e _1, e _2, e _3, … … e _ i }, the count1 of the number of detections that identify normality and the count2 of the number of detections that identify data abnormality are counted.
5. The proportion p of the abnormal data is calculated, p is count2/(count1+ count 2).
6. If p is greater than the threshold value c (c is 0.7-0.9)]Any one of the values of (a) then the spatial correlation-based anomaly detection flag E is identifiediThe value of (d) is 1, and the other is 0.
7. If E isiSelect data set { d ═ 11,d2,d3,….,diData identifying anomalies in the data building dataset XiIf E isiSelect data set { d ═ 01,d2,d3,….,diData construction data set X with normal identificationi。Xi=[x1i,x2i,x3i…,xmi]。
8. Data set XiAbnormal detection result EiNumber SK of nodeiAnd a start-stop time T in the data set T1beginAnd tend. The five are combined into a data pair T2,T2={SKi,Xi,Ei,tbegin,tendGet the data pair T2And uploading to a gateway layer.
E1. The gateway layer collects attribute data and abnormal detection results uploaded by all nodes of the sinking layer, and constructs an attribute data set and an abnormal detection result;
E2. extracting corresponding abnormal data in the data set according to the attributes in the abnormal detection result to form an abnormal data set;
E3. if the abnormal data set is empty, all the data collected in the step C are correct data; if the abnormal data set is not empty, judging whether the attribute corresponding to the formed abnormal data set is in the related attribute set in the step B, if not, judging that the data corresponding to the attribute is error data; if yes, calculating a correlation coefficient matrix corresponding to the abnormal data set; E4. and C, similarity calculation is carried out on the correlation coefficient matrix and the correlation coefficient matrix of the corresponding attribute calculated through historical data, each row of the two matrixes is regarded as a vector, cosine values of each vector included angle are calculated, if each cosine value is within a preset range [ d, 1], the two relation matrixes are similar, the data acquired in the step C are judged to be event abnormal data, if at least one cosine value is not within the preset range, the two relation matrixes are not similar, and the data acquired in the step C are judged to be error data. Wherein d is a set cosine threshold, and the value of d is any value in [0.7, 0.9 ].
The method comprises the following specific steps:
1. collecting T uploaded by all sinking layer nodes2A data set of types.
2. According to the time tag tbeginAnd tendExtracting a data set D, D ═ X1,X2,X3… … }, and an exception set E, E ═ E { E }1,E2,E3…. Both are in correspondence, DiThe abnormality detection result of (E) is represented byiTo indicate.
3. If E isiIndicating an exception, the corresponding exception attribute A is then presentedi. According to the abnormal set E, an abnormal attribute set EA can be extracted, wherein EA is { A }i,Aj,Ak……}。
4. And if the abnormal attribute set EA is an empty set, judging the data set D as a normal data set, and turning to the step 9. If the abnormal attribute set EA is not empty, judging whether the abnormal attribute set EA is in the related attribute database DB _ RA or not, if not, judging that the data set D is error data, deleting the data set D, and turning to the step 10; if the abnormal attribute collection EA is in the associated attribute database DB _ RA, go to step 5.
5. Extracting a data vector corresponding to each abnormal attribute from the data set D according to the abnormal attribute set EA to form an abnormal data matrix ED, wherein ED is { X ═ Xi,Xj,Xk….}。
6. And calculating a correlation coefficient matrix R' of the abnormal data matrix ED according to the abnormal data matrix ED, wherein n is the number of the correlation attributes.
Figure BDA0001370326410000081
7. And finding out a corresponding correlation coefficient matrix R in the correlation coefficient matrix database DB _ RM, and calculating the cos value of each row in the correlation coefficient matrix R.
Figure BDA0001370326410000082
Obtaining cos set, { cosA ═ cosAi,cosAj,cosAk}。
8. Comparing each cos value with a cosine threshold D in the cos set, if the cos value is larger than the cosine threshold D, judging the data set D as error data, and deleting the data set D; if the cos values are all less than the threshold D, the data set D is judged to be an event abnormal data.
9. Each XiFusing vector data into attribute data DiAnd finally D ═ D1, D2, D3 … ….
10. And (6) ending.

Claims (3)

1. The multi-level anomaly detection method of the multi-dimensional space-time data is characterized by comprising the following steps:
A. having a sensing data layer: various types of sensors are included and used for collecting data of various attributes;
sinking a layer: collecting data uploaded by sensors of the same type, wherein each type of sensor corresponds to one node of a sinking layer;
a gateway layer: collecting data uploaded by all nodes of a sinking layer;
B. excavating attributes with correlation in historical event abnormal data measured in advance through a data excavation technology to obtain a related attribute set; calculating the mean value of each attribute aiming at each element of the related attribute set, and obtaining a related data matrix M and a related coefficient matrix R of related attribute data through mean value calculation;
C. and carrying out anomaly detection based on data time correlation on each sensor on a sensing data layer: acquiring data of each sensor, calculating a fluctuation difference value of the data according to adjacent data of the same sensor, and obtaining whether the data acquired by the sensor in a period is abnormal or not according to the fluctuation difference value; transmitting the data and detection results of all the sensors to corresponding nodes of the sinking layer;
D. the data and the detection results uploaded by the sensors of the same type are collected through the nodes on the sinking layer, abnormal detection and data fusion based on data space correlation are carried out, and the fusion results and the abnormal detection results are uploaded to the gateway layer: c, calculating the proportion of abnormal data in the detection result corresponding to the step C in the total data received by the node, and identifying whether the data of the node is abnormal according to the size relation between the proportion and the threshold;
E. the gateway layer collects data and abnormal detection results uploaded by all nodes of the sinking layer, and extracts an abnormal attribute set according to the abnormal detection results:
E1. collecting attribute data and anomaly detection results uploaded by all nodes of a sinking layer, and constructing an attribute data set and an anomaly detection result;
E2. extracting corresponding abnormal data in the data set according to the attributes in the abnormal detection result to form an abnormal data set;
E3. if the abnormal data set is empty, all the data collected in the step C are correct data; if the abnormal data set is not empty, judging whether the attribute corresponding to the formed abnormal data set is in the related attribute set in the step B, if not, judging that the data corresponding to the attribute is error data; if yes, calculating a correlation coefficient matrix corresponding to the abnormal data set;
E4. and C, similarity calculation is carried out on the correlation coefficient matrix and the correlation coefficient matrix of the corresponding attribute calculated through historical data, each row of the two matrixes is regarded as a vector, cosine values of each vector included angle are calculated, if each cosine value is within a preset range, the two relation matrixes are similar, the data acquired in the step C is judged to be event abnormal data, if at least one cosine value is not within the preset range, the two relation matrixes are not similar, and the data acquired in the step C is judged to be error data.
2. The method for multi-level anomaly detection of multi-dimensional spatio-temporal data according to claim 1, characterized by: the step B comprises the following steps:
B1. collecting historical event abnormal data sets, extracting abnormal attributes and values of the abnormal attributes in each piece of event abnormal data, and forming an abnormal attribute set;
B2. according to the abnormal attribute set, mining all related attributes in the abnormal attribute set through a correlation analysis algorithm in a data mining technology to form a correlation attribute set;
B3. for each associated attribute, finding out a corresponding numerical value in the event abnormal data set in the step B1 according to the category of the attribute to form an associated data matrix M;
B4. and calculating a correlation coefficient matrix R of the correlation data matrix M according to the numerical value of each correlation attribute.
3. The multi-level anomaly detection method for multi-dimensional spatio-temporal data according to claim 2, characterized by: in step B3, the mean value of the corresponding values found from the event anomaly data set is calculated, and the associated data matrix M is obtained by means of the mean value calculation.
CN201710660034.7A 2017-08-04 2017-08-04 Multi-level anomaly detection method for multi-dimensional space-time data Active CN107423435B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710660034.7A CN107423435B (en) 2017-08-04 2017-08-04 Multi-level anomaly detection method for multi-dimensional space-time data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710660034.7A CN107423435B (en) 2017-08-04 2017-08-04 Multi-level anomaly detection method for multi-dimensional space-time data

Publications (2)

Publication Number Publication Date
CN107423435A CN107423435A (en) 2017-12-01
CN107423435B true CN107423435B (en) 2020-05-12

Family

ID=60436486

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710660034.7A Active CN107423435B (en) 2017-08-04 2017-08-04 Multi-level anomaly detection method for multi-dimensional space-time data

Country Status (1)

Country Link
CN (1) CN107423435B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241043B (en) * 2018-08-13 2022-10-14 蜜小蜂智慧(北京)科技有限公司 Data quality detection method and device
CN109697247B (en) * 2018-12-30 2021-05-18 北京奇艺世纪科技有限公司 Method and device for detecting data accuracy
CN110662220B (en) * 2019-11-15 2021-04-30 江南大学 Wireless sensor network anomaly detection method based on time-space correlation and information entropy
CN111189488B (en) * 2019-12-13 2020-12-04 精英数智科技股份有限公司 Sensor value abnormity identification method, device, equipment and storage medium
CN111044176A (en) * 2020-01-02 2020-04-21 中电投电力工程有限公司 Method for monitoring temperature abnormity of generator
CN112506913B (en) * 2021-02-02 2021-07-09 广东工业大学 Big data architecture construction method for manufacturing industry data space
CN116756136B (en) * 2023-08-16 2023-10-31 深圳市明心数智科技有限公司 Automatic data processing method, device, equipment and medium for fishpond monitoring equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7693589B2 (en) * 2007-08-14 2010-04-06 International Business Machines Corporation Anomaly anti-pattern
CN103887886A (en) * 2014-04-14 2014-06-25 杭州昊美科技有限公司 Power network detection system and method based on sensor network
CN104994535A (en) * 2015-06-04 2015-10-21 浙江农林大学 Sensor data flow abnormality detection method based on multidimensional data model
CN105764162A (en) * 2016-05-10 2016-07-13 江苏大学 Wireless sensor network abnormal event detecting method based on multi-attribute correlation
CN106502234A (en) * 2016-10-17 2017-03-15 重庆邮电大学 Industrial control system method for detecting abnormality based on double skeleton patterns

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7693589B2 (en) * 2007-08-14 2010-04-06 International Business Machines Corporation Anomaly anti-pattern
CN103887886A (en) * 2014-04-14 2014-06-25 杭州昊美科技有限公司 Power network detection system and method based on sensor network
CN104994535A (en) * 2015-06-04 2015-10-21 浙江农林大学 Sensor data flow abnormality detection method based on multidimensional data model
CN105764162A (en) * 2016-05-10 2016-07-13 江苏大学 Wireless sensor network abnormal event detecting method based on multi-attribute correlation
CN106502234A (en) * 2016-10-17 2017-03-15 重庆邮电大学 Industrial control system method for detecting abnormality based on double skeleton patterns

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
物联网网关的设计开发及数据异常检测研究;龙滢;《中国优秀硕士学位论文全文数据库》;20150815(第2015年第08期);第I136-189页 *

Also Published As

Publication number Publication date
CN107423435A (en) 2017-12-01

Similar Documents

Publication Publication Date Title
CN107423435B (en) Multi-level anomaly detection method for multi-dimensional space-time data
AU2016287383B2 (en) Method for detecting anomalies in a water distribution system
Grinn-Gofroń et al. Airborne Alternaria and Cladosporium fungal spores in Europe: Forecasting possibilities and relationships with meteorological parameters
Paolini et al. Radiometric correction effects in Landsat multi‐date/multi‐sensor change detection studies
CN105764162B (en) A kind of wireless sensor network accident detection method based on more Attribute Associations
Diamantopoulou et al. Modelling total volume of dominant pine trees in reforestations via multivariate analysis and artificial neural network models
CN107707657B (en) Safety monitoring system based on multiple sensors
CN110337066A (en) Based on channel state information indoor occupant activity recognition method, man-machine interactive system
Becerril-Piña et al. Integration of remote sensing techniques for monitoring desertification in Mexico
CN106714109A (en) WiFi fingerprint database updating method based on crowdsourcing data
CN110888186A (en) Method for forecasting hail and short-time heavy rainfall based on GBDT + LR model
WO2021043126A1 (en) System and method for event recognition
JP5674499B2 (en) Sensing device
CN116308958A (en) Carbon emission online detection and early warning system and method based on mobile terminal
Nadeem et al. Likelihood based population viability analysis in the presence of observation error
CN110062410A (en) A kind of cell outage detection localization method based on adaptive resonance theory
CN115167323A (en) Industrial control equipment feedback information instruction transmission system based on digital factory
CN110099089A (en) The self-tuing on line of multiple data flows in sensor network
CN103975557B (en) The method and system of evaluation for sensor observation
Mahrooghy et al. On the use of a cluster ensemble cloud classification technique in satellite precipitation estimation
CN110011847A (en) A kind of data source method for evaluating quality under sensing cloud environment
CN105407496A (en) Method of recognizing error measurement value in wireless sensor network
CN110662220B (en) Wireless sensor network anomaly detection method based on time-space correlation and information entropy
Gao et al. Semantic-based detection of segment outliers and unusual events for wireless sensor networks
Zhang et al. Detection and classification of anomalous events in water quality datasets within a smart city-smart bay project

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant