CN109639463A

CN109639463A - A kind of determination method of Internet of Things monitoring point neighbouring relations

Info

Publication number: CN109639463A
Application number: CN201811407765.1A
Authority: CN
Inventors: 李永飞; 田立勤; 赵巧芳; 陈振国; 郭晓欣; 王德志; 王养廷
Original assignee: North China Institute of Science and Technology
Current assignee: North China Institute of Science and Technology
Priority date: 2018-11-23
Filing date: 2018-11-23
Publication date: 2019-04-16

Abstract

A kind of determination method of Internet of Things monitoring point neighbouring relations, the method reads the Historical Monitoring data of each monitoring point in setting time window first, obtain monitoring data sequence sets, then the monitoring data sequence in monitoring data sequence sets is clustered using a variety of clustering algorithms, and every kind of clustering algorithm passes through change number of clusters and measures multiple cluster results, the silhouette coefficient of every kind of cluster result is calculated later, and using the maximum cluster result of silhouette coefficient as optimal result, the neighbouring relations of Internet of Things monitoring point are finally judged according to optimal result.The present invention is based on Historical Monitoring data, use clustering algorithm, its Logic adjacent relationship is determined according to the inherent similitude between each data of monitoring point, experimental result is shown, the monitoring point neighbouring relations that this method is determined are stablized, it is more in line with objective reality compared to conventional method, more scientific and reasonable foundation can be provided for the validity examination of Internet of Things monitoring data and other data processings.

Description

A kind of determination method of Internet of Things monitoring point neighbouring relations

Technical field

The Analysis And Evaluation method for the Internet of Things monitoring point neighbouring relations based on cluster that the present invention relates to a kind of, belongs to data It excavates and Internet of Things monitoring technical field.

Background technique

In current all kinds of monitoring system of internet of things, due to by awareness apparatus and transmission network failure, it is even artificial The influence of the factors such as intentional, in the prevalence of a large amount of invalid or abnormal data.Such as in air quality real-time monitoring system, About there are 0.95%~3.18% all kinds of abnormal datas.These abnormal datas influence overall data availability, need into Row data validity examination.When determining data exception and being modified to abnormal data, it usually needs referring to neighbor monitoring and detecting The similar monitor value of point.For example, when noting abnormalities data, using the monitoring data average value of neighbor monitoring and detecting point (at generality Reason) or maximum value (punitive processing) exceptional value is modified.Therefore, the neighbouring relations for determining Internet of Things monitoring point, are objects A basic problem that must be solved in networking monitoring dealing of abnormal data.

Existing Internet of Things smp data processing system is usually that the administrative region according to belonging to monitoring point or place are geographical Judgment basis of the position as neighbouring relations.This determination method meaning is intuitive and realizes simply, but due to many administrative regions Shape very irregular, other node geo hypertelorisms in part monitoring point and same adjacent area, monitor value is different Less, monitoring object is complicated and changeable in addition, causes existing method simultaneously for reference value when regular data determines and exceptional value is corrected Actual needs cannot be met well, it is therefore necessary to explore more scientific and reasonable determination method.

Summary of the invention

It is an object of the invention to aiming at the disadvantages of the prior art, provide a kind of judgement of Internet of Things monitoring point neighbouring relations Method provides more scientific and reasonable foundation for the validity examination of Internet of Things monitoring data.

Problem of the present invention is realized with following technical proposals:

A kind of determination method of Internet of Things monitoring point neighbouring relations, the method read each in setting time window first The Historical Monitoring data of monitoring point, obtain monitoring data sequence sets, then using a variety of clustering algorithms to monitoring data sequence sets In monitoring data sequence clustered, and every kind of clustering algorithm pass through change number of clusters measure multiple cluster results, later The silhouette coefficient of every kind of cluster result is calculated, and using the maximum cluster result of silhouette coefficient as optimal result, it is last according to most Excellent result judges the neighbouring relations of Internet of Things monitoring point.

The determination method of above-mentioned Internet of Things monitoring point neighbouring relations, the described method comprises the following steps:

A. monitoring data are extracted

Then setting time window first reads the Historical Monitoring data of each monitoring point in setting time window, it is assumed that There is K monitoring point, indicates the monitoring data sequence read from i-th of monitoring point with Di, obtain monitoring data sequence sets D={ D₁, D₂,……D_K}；

B. number of clusters amount is determined

Cluster result number of clusters range is set as n₁~n₂, n₁And n₂It is natural number, and n₁< n₂；

C. clustering is carried out

1. specified clustering algorithm set；

2. number of clusters amount is set as n₁

3. to the monitoring data sequence in monitoring data sequence sets successively using various poly- in specified clustering algorithm set Class algorithm is clustered；

4. the numerical value of number of clusters amount is added 1, the operation of step 3. is repeated, until number of clusters amount is n₂；

5. calculating the silhouette coefficient of each cluster result；

D. determine neighbouring relations

The maximum cluster result of silhouette coefficient is chosen as optimal result, then is included into the monitoring point of same cluster in optimal result Adjacent monitoring point each other.

The determination method of above-mentioned Internet of Things monitoring point neighbouring relations, to the monitoring data sequence in monitoring data sequence sets into When row cluster, the calculation method of the distance between each monitoring data sequence is as follows:

For monitoring data sequence sets D={ D₁,D₂,……D_KIn monitoring data sequence D_iAnd D_j, definition is between the two Distance are as follows:

Wherein n is monitoring data sequence length, D_imFor monitoring data sequence D_iIn M dimension data, D_jmFor monitoring data sequence D_jIn m dimension data.

The determination method of above-mentioned Internet of Things monitoring point neighbouring relations, the calculation method of the silhouette coefficient of cluster result are as follows:

The silhouette coefficient of i-th of object in data set are as follows:

Wherein, a_iIt is the average distance of i-th of object other objects into the cluster where it, b_i It is i-th of object to the minimum value in the average distance of other clusters；

The average value for calculating the silhouette coefficient of all objects in data set, obtains the silhouette coefficient of cluster result.

The determination method of above-mentioned Internet of Things monitoring point neighbouring relations, when setting the number of clusters range of cluster result, n₁And n₂It is flat Mean value is closestNumber, wherein K is the number of monitoring point.

The present invention is based on Historical Monitoring data, using clustering algorithm, according to the inherent similitude between each data of monitoring point Determine that its Logic adjacent relationship, experimental result are shown, the monitoring point neighbouring relations that this method is determined are stable and have good Good interpretation, is more in line with objective reality compared to conventional method, can for the validity examination of Internet of Things monitoring data and its Its data processing provides more scientific and reasonable foundation.

Detailed description of the invention

The invention will be further described with reference to the accompanying drawing.

Fig. 1 is flow chart of the invention；

Fig. 2 is monitoring point distribution map.

Specific embodiment

The related knowledge of the neighbouring relations of Internet of Things monitoring point

Define the neighbouring relations of 1. monitoring points: the equivalence relation R defined on Internet of Things monitoring point set A meets reflexive Property, symmetry and transitivity, referred to as monitoring point a neighbouring relations.

Define the adjacent area of 2. monitoring points: the R equivalence class [a] that Internet of Things monitoring point a is formed on monitoring point set A_R, The referred to as adjacent area of monitoring point a.

Define the neighbor node of 3. monitoring points: Internet of Things monitors in point set A, belongs to an adjacent area with monitoring point a Other monitoring points, the referred to as neighbor node of monitoring point a.

Define the adjacent sectors of 4. monitoring points: Internet of Things monitors a division of point set A, referred to as a kind of phase of monitoring point Adjacent subregion.

About the neighbouring relations of Internet of Things monitoring point, there is following theorem.

Theorem 1: Internet of Things monitors quotient set A/R of the point set A about neighbouring relations R, is that one kind of monitoring point set A is adjacent Subregion.

Prove: quotient set A/R is the equivalence class set of neighbouring relations R, that is, A/R={ [x]_R| x ∈ A } wherein equivalence class [x]_R =y ∈ A | (x, y) ∈ R }.

And the division of A is the set { A of its nonvoid subset_i, and meet the following conditions: A_iIA_j=φ, i ≠ j；YA_i=A.

Will be proven below quotient set A/R is a division for monitoring point set A.

Firstly,There is [x]_RNon-empty；

Secondly,[if x]_R≠[y]_R, then have [x]_RI[y]_R=φ；

Finally,HaveThereforeHave againSo

From the foregoing, it will be observed that quotient set A/R is a division for monitoring point set A

According to defining 4, Internet of Things monitors quotient set A/R of the point set A about neighbouring relations R, is the one kind for monitoring point set A Adjacent sectors.

Property 1: the adjacent sectors of monitoring point are an adjacent area set.

According to theorem 1, quotient set A/R is an adjacent sectors for monitoring point set A.Because quotient set A/R is neighbouring relations R Equivalence class set, so adjacent sectors are a R equivalence class set.

Again according to defining 2, adjacent area is R equivalence class.So the adjacent sectors of monitoring point are an adjacent area set.

2: one neighbouring relations of property correspond to a kind of adjacent sectors；A kind of corresponding neighbouring relations in adjacent sectors.

According to defining 1, neighbouring relations are the equivalence relations monitored on point set A.

According to defining 4, adjacent sectors are a divisions for monitoring point set A.

By the one-to-one relationship between equivalence relation and division, it is known that a neighbouring relations correspond to a kind of adjacent sectors； A kind of corresponding neighbouring relations in adjacent sectors.

By aforementioned theorem and property it is found that monitoring point set for Internet of Things, as long as giving a neighbouring relations, so that it may It determines a kind of adjacent sectors, and then determines the adjacent area and its neighbor node where each monitoring point.

It defines the administrative neighbor node of 5. monitoring points: monitoring point neighbouring relations being defined as to belong to same administrative region, with The administrative neighbor node that there is monitoring point a the monitoring point of the neighbouring relations to be referred to as monitoring point a.

For example, the monitoring point in same city-level administrative region is divided into an adjacent area.This neighbouring relations determine The benefit of method is consistent with each monitoring point administrative jurisdiction system, convenient for management.But the shape of many administrative regions is very Irregularly, this will lead to other node geo hypertelorisms in part monitoring point and same adjacent area, and monitor value is different Reference value when regular data determines and exceptional value is corrected is little.

It defines the geographical neighbor node of 6. monitoring points: several geographic center points is selected in monitoring range, monitoring point phase Adjacent contextual definition is to be less than designated value with geographic center point distance.With monitoring point a there is the monitoring point of the neighbouring relations to be referred to as to supervise The geographical neighbor node of measuring point a.

This neighbouring relations determination method can be avoided administrative region caused problem in irregular shape.But by dividing Actual monitoring data are analysed it can be found that monitoring point similar in geographic distance, the difference of monitoring data may also be very big；Geography away from From farther away monitoring point, there is also the close phenomenons of data.Such as in air quality surveillance, since its influence factor is numerous, And influencing mechanism is complicated.The close monitoring point of the geographical positional distance in part, the air quality on periphery but differs greatly, also uncomfortable It closes cross-referenced.

Define the physical neighborhood node of 7. monitoring points: using Internet of Things monitoring point in real world it is already existing certain Relationship is as neighbouring relations, the neighbor node determined therefrom that, referred to as the physical neighborhood node of monitoring point.Administration neighbours above-mentioned Node and geographical neighbor node belong to physical neighborhood node.

The neighbouring relations of physical neighborhood node are determined based on certain existing rule, and realization is easier.But due to The reference value of part neighbor monitoring and detecting point is little, so being monitored data processing, actual effect based on this neighbouring relations It is often and unreasonable.This is because the internal association of used neighbouring relations and monitoring object may be not consistent, so The substantive characteristics of monitoring data is not can accurately reflect.

Neighbouring relations based on cluster determine

According to property 2 it is found that if being capable of providing a kind of more reasonable adjacent sectors, so that it may determine one preferably Neighbouring relations.In order to overcome physical neighborhood node existing deficiency in terms of data validity analysis, consider to be based on Historical Monitoring Data realize the judgement of neighbouring relations according to the feature of data itself.

Define the logical neighbors node of 8. monitoring points: using clustering method, the spy based on Historical Monitoring data itself Monitoring point set is divided into one group of adjacent area by sign, and the adjacent pass of monitoring point is determined further according to obtained adjacent sectors System.Neighbor node with the neighbouring relations is referred to as the logical neighbors node of monitoring point.1. the cluster of Internet of Things monitoring data Analysis

The citation form of Internet of Things monitoring is that one group of monitoring point is disposed in particular range, installs one group in each monitoring point Sensor acquires monitoring data.One group of monitor value that obtained monitoring data are usually saved in the form of time series, one As data format be shown in Table 1.It is assumed herein that each monitoring point is mounted with N kind sensor, using hour as data acquisition intervals.That , each monitoring point can generate one group of monitoring data every a hour.

The monitoring data format of certain the Internet of Things monitoring point of table 1

Clustering is that a kind of method of sub-clustering is carried out according to Sample Similarity, and target is to realize Sample Similarity in cluster Sample Similarity is minimum between maximum, cluster.Monitoring point can be determined about the adjacent of a certain parameter T using the method for clustering Relationship.The parameter T monitoring value sequence of all monitoring points is taken out, which can describe number of the monitoring point about parameter T According to feature.By carrying out clustering to all monitoring value sequences, all monitoring points can be included into different clusters, use institute Judgment basis of the obtained cluster result as monitoring point neighbouring relations.

2 monitoring point Logic adjacent relationship decision algorithms

The algorithm for realizing that Internet of Things monitoring point neighbouring relations determine using clustering is as shown in Figure 1.

Concrete processing procedure is as follows:

(1) monitoring data are extracted；The basic format of monitoring data is as shown in table 1.Here it is with air quality surveillance data Example, illustrates extraction process.Monitoring object is 8 class Air Pollutants, and data mode is small hourly value.The atmosphere of certain monitoring point Monitoring data form is as shown in table 2.By taking each monitoring point determines about the neighbouring relations of PM2.5 as an example, each monitoring point is every It generate 24 monitor values, if using n days historical datas, described with 24 × n monitor value the monitoring point about The data characteristics of PM2.5.These monitor values constitute a data sequence.

Certain the air quality surveillance point monitoring data of table 2

Same treatment carried out to all data of monitoring point, the available one group data sequence for describing each monitoring point.

(2) number of clusters amount is determined；In clustering, determine that number of clusters amount is a critical issue.Generally according to business demand Or motivation is analyzed to determine number of clusters, or use empirical valueK is object sum to be analyzed.It can also be using different Number of clusters carry out clustering after, calculate evaluation index or analysis indexes variation tendency based on the analysis results, then determine therefrom that conjunction Suitable number of clusters.

(3) clustering is carried out；Selecting suitable clustering algorithm is also a key factor of impact analysis result.In reality In the application of border, the concrete conditions such as combined data type, cluster purpose is needed to be selected.

(4) determine neighbouring relations；

Cluster analysis result is arranged, monitoring point in same cluster neighbor node each other is included into, constitutes an adjacent area.According to This can determine that the neighbouring relations between each monitoring point.

3. the definition of sample distance

Measurement of the sample distance for realizing sample similarity, and the foundation as clustering.Traditional distance definition Including Euclidean distance, manhatton distance etc..Better analytical effect in order to obtain has scholar to have studied score norm, DTW respectively (Dynamic Time Warping, dynamic time consolidation) distance, real compensation editing distance etc. are in terms of sample similarity measurement Application.In fact, the mode of distance definition is directly related with the characteristics of objects of clustering and analysis target, it is difficult to find one Kind is suitble to the similarity measurement mode of all clusterings.

The purpose that the present invention carries out clustering to monitoring data is to find between the monitoring data of different monitoring points The close degree of numerical value.For this purpose, it is as follows to define sample distance:

Define the distance of 9. monitoring data sequences: for monitoring data sequence D_iAnd D_j, define distance between the two are as follows:

Wherein n is monitoring data sequence length, D_imFor monitoring data sequence D_iIn M dimension data, D_jmFor monitoring data sequence D_jIn m dimension data

This distance definition is that the data difference of all corresponding dimensions of two data sequences is summed, and is finally taken absolutely Value.

4. algorithm and the selection of number of clusters amount based on silhouette coefficient

Silhouette coefficient is investigated using the similarity measurement of object in data set poly- in the case where no base condition Separation property between compactedness and cluster in cluster, assesses cluster result in class result.

Define 10. silhouette coefficients: the silhouette coefficient of i-th of object in data set are as follows:

Wherein, a_iIt is the average distance of i-th of object other objects into the cluster where it, b_i It is i-th of object to the minimum value in the average distance of other clusters.

The value of S (i) between -1 to 1, closer to 1 explanation i-th of object where cluster compactedness it is better, and with it is other Cluster further away from.If value close to 0, indicate to distinguish between cluster it is unobvious, if close to -1, then it represents that sub-clustering mistake.Number can be used Evaluation index according to the silhouette coefficient average value of all objects of concentration as clustering result quality.

In the neighbouring relations decision algorithm of monitoring point, it is thus necessary to determine that the number of clusters of cluster result, and cluster appropriate is selected to calculate Method.Since the adjacencies of Internet of Things monitoring point are unknown, so the present invention is using silhouette coefficient as determining number of clusters and choosing Select the foundation of algorithm.Specific practice is: carrying out multiple clustering using a variety of clustering algorithms and different number of clusters, finds out respectively Its silhouette coefficient, contouring coefficient the maximum is as final result.

Experimental result and analysis

Using monitoring point neighbouring relations decision algorithm of the invention, the hierarchical clustering algorithm provided using R language is to monitoring Point carries out the neighbouring relations decision analysis about PM2.5.

1. experimental data

Experimental data uses 28 monitoring points, 30 days PM2.5 monitoring data on Beijing periphery, and Fig. 2 is these monitoring points Location map.These monitoring points are substantially evenly looped around Areas around Beijing, the existing Plain of locating geographical environment and mountain area, Cover industry prosperity area and agricultural production area.Random number is carried out to these monitoring points, is represented respectively with 1~28, Make corresponding mark in Fig. 2.The PM2.5 monitoring number in 28 30 days monitoring point months is extracted from primary monitoring data According to as experimental data set.

2. experimental result

Using hierarchical clustering algorithm, take respectively between cluster apart from measure be complete, average, simple, Ward, median, mcquitty etc., number of clusters amount carry out clustering using 3~6 pairs of experimental data sets.Table 3 is each cluster knot The silhouette coefficient of fruit.

3 clustering silhouette coefficient table of table

It can be seen that number of clusters K takes Clustering Effect when 5 best using average distance (average).Table 4 to table 7 is distinguished Give optimum cluster result when number of clusters K is 3~6.

Table 4K=3, method=complete cluster result

Table 5K=4, method=complete cluster result

Table 6K=5, method=average cluster result

Table 7K=6, method=ward cluster result

3. result is analyzed in control methods

As a comparison, the judgement result of administrative neighbouring relations and geographical neighbouring relations is set forth in table 8 and table 9.

During administrative neighbouring relations determine, 28 monitoring points are divided into northern, east and middle part according to affiliated administrative region Three regions.

The administrative neighbouring relations of table 8 analyze result

The judgement of geographical neighbor node specifies 5 geographic center points according to principle is uniformly distributed in entire overlay area, Then all monitoring points are divided by 5 different adjacent areas according to geographic distance.

The geographical neighbouring relations of table 9 analyze result

4. analysis of experimental results

Experimental result is shown, determines neighbouring relations, in the obtained result of various algorithms, profile system using clustering Number both greater than 0.5 illustrates that separating degree is all relatively more reasonable between compactness and cluster in its cluster.

When number of clusters is 3, the monitoring points in three sub-clusterings A, B, C are respectively 11,9,8, and the size of cluster is more balanced；Cluster When number is 4, the cluster A in table 4 is divided into two clusters, monitoring point 13,14 independent clusters, other two cluster remains unchanged；Number of clusters When being 5, monitoring point 9,26 is separated out independent cluster, and other clusters are held essentially constant；Cluster B and cluster C quilt when number of clusters is 6, in table 6 Three clusters are divided into, other clusters remain unchanged.Here cluster name claims A~F to be used only to distinguish the label of cluster result, does not include Quality judgement.It can be seen that each number of clusters amount is more balanced in the result of clustering, with the increase of number of clusters, distinguishes and get between cluster Come thinner, the composition of each cluster keeps in logic consistent.

Cluster analysis result is compared with physics neighbouring relations result, the A class in 3 class of Logic adjacent and it is administrative it is adjacent in North zone be overlapped larger, the A class during A class in 5 class of Logic adjacent is adjacent with geography also has larger be overlapped.This is because row North zone during political affairs are adjacent and it is geographical it is adjacent in the A class monitoring point that is included be located in Bashang Grassland and Wild jujube in Taihang Mountain Area, work Industry degree is generally lower, so the air quality of these monitoring points is all relatively good.Therefore, there is also in logic for these monitoring points Neighbouring relations, so occurring being overlapped more phenomenon.

For the division result of other each adjacent areas, experimental result and physics neighbouring relations analysis result difference are larger. Two kinds of physics neighbouring relations determine the silhouette coefficient of result all 0.1 or so, illustrate its sub-clustering and unreasonable, this and point before Analysis conclusion is consistent.

Clustering can according in data feature, unlabelled sample is divided by several clusters based on similarity, The rule that data itself imply is objectively responded.The present invention uses level by the monitoring data sequence of extraction special parameter Clustering algorithm carries out clustering to part air quality surveillance point.Experimental result is shown, is determined according to cluster analysis result Monitoring point neighbouring relations stablize, and reasonable interpretation can be made in conjunction with reality, there is good interpretation, compare Traditional way that neighbouring relations are determined according to administrative region or geographical location, is more in line with objective reality, can be Internet of Things Monitoring data validity examination and other data processings provide more scientific and reasonable processing foundation.

Claims

1. a kind of determination method of Internet of Things monitoring point neighbouring relations, characterized in that the method reads setting time window first The Historical Monitoring data of each monitoring point, obtain monitoring data sequence sets in mouthful, then using a variety of clustering algorithms to monitoring number It is clustered according to the monitoring data sequence in sequence sets, and every kind of clustering algorithm passes through change number of clusters and measures multiple clusters knots Fruit calculates the silhouette coefficient of every kind of cluster result later, and using the maximum cluster result of silhouette coefficient as optimal result, finally The neighbouring relations of Internet of Things monitoring point are judged according to optimal result.

2. a kind of determination method of Internet of Things monitoring point neighbouring relations according to claim 1, characterized in that the method The following steps are included:

A. monitoring data are extracted

Then setting time window first reads the Historical Monitoring data of each monitoring point in setting time window, it is assumed that have K Monitoring point indicates the monitoring data sequence read from i-th of monitoring point with Di, obtains monitoring data sequence sets D={ D₁, D₂,……D_K}；

B. number of clusters amount is determined

C. clustering is carried out

1. specified clustering algorithm set；

2. number of clusters amount is set as n₁

3. successively being calculated using the various clusters in specified clustering algorithm set the monitoring data sequence in monitoring data sequence sets Method is clustered；

5. calculating the silhouette coefficient of each cluster result；

D. determine neighbouring relations

The maximum cluster result of silhouette coefficient is chosen as optimal result, then is included into the monitoring point of same cluster in optimal result each other Adjacent monitoring point.

3. a kind of determination method of Internet of Things monitoring point neighbouring relations according to claim 1 or 2, characterized in that prison When monitoring data sequence in measured data sequence sets is clustered, the calculation method of the distance between each monitoring data sequence is such as Under:

For monitoring data sequence sets D={ D₁,D₂,……D_KIn monitoring data sequence D_iAnd D_j, define between the two away from From are as follows:

4. a kind of determination method of Internet of Things monitoring point neighbouring relations according to claim 3, characterized in that cluster result Silhouette coefficient calculation method it is as follows:

The silhouette coefficient of i-th of object in data set are as follows:

Wherein, a_iIt is the average distance of i-th of object other objects into the cluster where it, b_iIt is i-th A object is to the minimum value in the average distance of other clusters；

5. a kind of determination method of Internet of Things monitoring point neighbouring relations according to claim 4, characterized in that setting cluster When number of clusters range as a result, n₁And n₂Average value be it is closestNumber, wherein K is the number of monitoring point.