CN106055689A - Spatial clustering method based on time sequence correlation - Google Patents
Spatial clustering method based on time sequence correlation Download PDFInfo
- Publication number
- CN106055689A CN106055689A CN201610404636.1A CN201610404636A CN106055689A CN 106055689 A CN106055689 A CN 106055689A CN 201610404636 A CN201610404636 A CN 201610404636A CN 106055689 A CN106055689 A CN 106055689A
- Authority
- CN
- China
- Prior art keywords
- cluster
- result
- spatial
- clustering
- spatial point
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2474—Sequence data queries, e.g. querying versioned data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
- G06F2216/03—Data mining
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Fuzzy Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a spatial clustering method based on time sequence correlation. The method comprises the steps of: 1, selecting a set of spatial points to be clustered; 2, according to geographical relationships of the spatial points, carrying out first-time clustering, and clustering the spatial points belonging to the same geographical relationship into one category; 3, determining a time interval T of time sequence data, which is used in the process of carrying out second-time clustering, obtaining a data value of each spatial point in the time interval T, and forming a time sequence; 4, according to clustering results obtained in the step 2 and the time sequence obtained in the step 3, calculating time sequence correlation between any two spatial points in the same category; and 5, for each clustering result in the step 2, combining the time sequence correlation obtained in the step 4 to carry out second-time clustering on each clustering result so as to form a final clustering result. According to the spatial clustering method disclosed by the invention, two-step clustering is used in the spatial object clustering process, and consideration on the characteristics of time sequence correlation between the objects is added, so that the clustering result is more accurate and has greater practical significance.
Description
Technical field
The invention belongs to big data and the data mining application of spatial analysis, be specifically related to a kind of relevant based on sequential
The spatial clustering method of property.
Background technology
Cluster is one important ingredient of Data Mining and analysis method.Along with big data and data mining are led
The extensive application in territory, the clustering analysis of method conventional in data analysis field also receives to be visited the most widely
Rope, it has all obtained highly effective application at multiple fields such as image procossing, bio information, spatial database, artificial intelligences.
The data object with higher similarity is classified as one bunch by the main thought of cluster, and the data between different bunches
Object does not has or has relatively low similarity, similar in bunch, different between bunch.For cluster analysis, metric data pair
Similarity between as becomes the key of analysis, and the quality of cluster result also depends on the similarity assessment that the method is used
Whether mode and the method have explored more hidden patterns.
Usually, the method for common cluster generally uses method for measuring similarity based on distance.Containing of distance
Justice is relatively wide, as long as being that the function of four conditions meeting distance definition all can be as calculating the range formula of similarity, and these four
Condition is uniqueness, nonnegativity, symmetry and triangle inequality respectively.Conventional distance calculating method specifically include that European away from
From, mahalanobis distance, manhatton distance and Chebyshev's distance.Euclidean distance is the distance of a usual employing, is mainly described in
The natural length of two points and actual distance in space;Mahalanobis distance is intended to indicate that the covariance distance of data, mahalanobis distance
Unlike Euclidean distance, it mainly considers the relation between the various characteristic of sample;Manhatton distance be then a kind of for
The metric form of degree of geometrical quantity space, it designates the summation of two points absolute wheelbase in coordinate system;And Chebyshev away from
From being a kind of metric form in vector space, its main thought be by two points between distance definition be its each coordinate values
The maximum of difference.In clustering method based on distance, during more typically clustering algorithm specifically includes that k-means clustering algorithm, k-
Heart point clustering algorithm, coagulation type hierarchical clustering algorithm and disintegrated type hierarchical clustering algorithm etc..
But for having different spatial, and having the object of temporal aspect, traditional clustering method has limitation,
More excellent cluster result can not be obtained.
Summary of the invention
Some reality characteristics between it is an object of the invention to for object, provide a kind of space based on timing dependence
Clustering method.When the method clusters for spatial object, use two step clusters, add the timing dependence considered between each object
Characteristic so that cluster result is more accurate, has more realistic meaning.
Specifically, the technical scheme is that
A kind of spatial clustering method based on timing dependence, comprises the following steps:
1) set of the spatial point that will cluster is chosen;
2) carry out clustering for the first time according to spatial point relation geographically, the spatial point of same geographical relationship will be under the jurisdiction of
Gathering is a class;
3) for analysis task, the time interval T of the time series data used when determining second time cluster, takes out each space
Point data value in time interval T, forms time series;
4) according to step 2) in the cluster result that obtains and step 3) time series that obtains, calculate in same class any two
Timing dependence between individual spatial point;
5) for step 2) in each cluster result, integrating step 4) timing dependence between the spatial point that obtains,
By a kind of bottom-up method, each cluster result is carried out secondary cluster, form final cluster result.
Compared with prior art, beneficial effects of the present invention is as follows:
The present invention, for when clustering real space object, not only considers its characteristic on space length, with
Time also contemplate the timing dependence between each data object, so make the result that spatial object clusters more true, more
There is the Research Significance of reality.
Accompanying drawing explanation
Fig. 1 is the flow chart of steps of the inventive method.
Fig. 2 is the cluster ratio variation diagram with distance, and wherein transverse axis represents citing, and the longitudinal axis represents cluster ratio.
Detailed description of the invention
Understandable for enabling the above-mentioned purpose of the present invention, feature and advantage to become apparent from, below by specific embodiment and
Accompanying drawing, the present invention will be further described.
The spatial clustering method based on timing dependence of the present embodiment, its steps flow chart as it is shown in figure 1, specifically include with
Lower step:
The first step, chooses the set of the spatial point that will cluster.It is all of that this set includes in certain spatial dimension
Point, and for each point, all contain the time series data in the time period.Such as, to China's air quality monitoring stations
Point clusters, then the set of this spatial point includes all of air quality monitoring stations point, comes for each monitoring station
Say, all contain Detection of Air Quality data hourly.
The set of above-mentioned spatial point can be the whole spatial point in spatial dimension, it is also possible to is to apply certain filtering rule
After the spatial point that filters out.These filtering rules include but not limited to: distance within a particular value or other indexs (as
Precipitation) within the scope of certain special value.
Second step, carries out clustering for the first time according to spatial point relation geographically, will be under the jurisdiction of same geographical relationship
It is a class that spatial point is gathered.The such as administrative division of this geographical relationship, such as country, province, city etc., can according to different situations, as
All the scope in space, data set sequential density, the computing capability etc. of main frame are adjusted;And for example self defined area, such as basis
Mountain range, river trend carry out region segmentation, it is also possible to be to divide according to the spatial object of urban construction, such as railway, at a high speed public affairs
Road etc..
3rd step, for analysis task, the time interval T of the time series data used when determining second time cluster, takes out every
Individual spatial point data value in time interval T, forms time series.
4th step, the time series obtained according to result and the 3rd step of for the first time cluster, calculate in same class any two
Timing dependence between individual spatial point.
Such as, in this example, use administrative division to carry out for the first time and cluster, the administrative division being positioned at according to each point
In, the some cluster in same administrative area is one bunch.For any two points in every cluster, the Pearson's phase between calculating at 2
Closing property index, it is defined as follows:
Wherein, rXYSpan be-1 to 1, it is positive correlation or negative correlation that sign represents relevant direction, and it is absolute
Value is the biggest, and to represent degree of correlation the highest,WithRepresent the meansigma methods of time series X and Y, x respectivelyiAnd yiExpress time sequence X exists
The numerical value in the i-th moment, N express time sequence Y is at the numerical value in jth moment.Allowancing for bark outside Ademilson correlation metric, the present invention can also
Other index is used to calculate timing dependence, such as Spearman rank correlation coefficient (Spearman's rank correlation
Coefficient), Kendall rank correlation coefficient (Kendall rank correlation coefficient) etc..
5th step, by a kind of bottom-up method, carries out secondary cluster to each cluster result, forms final gathering
Class result.
Shown in the most following algorithm of false code 1 (Algorithm 1recluster) of this secondary clustering method, this method makes
With a kind of bottom-up clustering method, referred to as recluster algorithm, recluster algorithm is the process of an iteration.This calculation
The input parameter of method is clustered result clustered, result unclustered not clustered and last time
Length length that recluster algorithm does not clusters after performing.The result that each is clustered for the first time, recluster's
In initial parameter value, clustered is a null set, preserves the cluster result during all recluster methods perform,
Unclustered is the result of cluster for the first time, and length is the length of unclustered.Algorithm when being performed for the first time,
It is as follows that algorithm performs step:
If in the result 1. not clustered, after spatial point number performs with last recluster algorithm, number is identical, says
Without meeting the spatial point clustered required in bright result, algorithm performs to terminate, and returns, and wherein clustered result is
Secondary cluster result.
If in the most non-cluster result, spatial point number is 0, illustrates that all spatial point are complete cluster, and algorithm performs
Terminating, return, wherein clustered result is the result of secondary cluster
3. the length of length is entered as the length of unclustered, and creates a new variable save at this
Recluster does not carries out the some remaining clustered.As being unsatisfactory for 1,2 conditions, then for owning in unclustered
Spatial point, it is judged that the dependency of its any two point A Yu B, this dependency is the timing dependence obtained at four-step calculation, as
Really its dependency is less than a certain threshold value, or it does not have significant difference, and (wherein significant difference refers to statistically logarithm
According to the evaluation of diversity, being provided with significant difference between data, just the data of explanation participation comparison are not from same
Totally, the correlation values drawn just has interpretability), then B is added in remaining, and by B from
Unclustered removes.
4., after step 3 is finished, residue unclustered being gathered is a class, and adds in cluster
(cluster represents " class ").
5. re-executing algorithm recluster, the parameter of use is cluster, remaining, length.
As a example by China's air quality monitoring stations point, with PM2.5 for analyzing dimension, divide using city and cluster as the first step
Benchmark, relevance threshold is redefined for 0.6, performs the method given by the present invention.Centered by each city, with away from
Distance from city is radius r, calculates along with the change of r, and the change of cluster ratio, result is as shown in Figure 1.In Fig. 1, transverse axis table
Show the distance of distance, can represent centered by certain specified point, the set of all spatial point in this specific range;
The longitudinal axis represents cluster ratio, and after referring to cluster, number of clusters mesh is divided by the number of all spatial point.Different colors represents respectively with not
Maximum, minimum and the meansigma methods of all of cluster ratio of gained centered by isospace point.Cluster ratio is defined as the most clustered
Result number divided by the number of all websites.Result according to Fig. 1 is it is found that along with the change of distance, totally cluster ratio
Example maintains about 40%.
In the method, for spatial object, cluster in tradition cluster based on distance mode the most merely, but adopt
With the Two-step cluster proposed, this method not only allows for spatial object characteristic in terms of distance, also contemplates simultaneously
Its timing dependence characteristic.By the result obtained by two step clustering methods, due to its corresponding time series data also poly-
Class process is considered, so cluster result has more realistic meaning.Simultaneously this method also expanded traditional clustering method time
Application in empty data.
Above example is only limited in order to technical scheme to be described, the ordinary skill of this area
Technical scheme can be modified or equivalent by personnel, without departing from the spirit and scope of the present invention, and this
The protection domain of invention should be as the criterion with described in claim.
Claims (8)
1. a spatial clustering method based on timing dependence, it is characterised in that comprise the following steps:
1) set of the spatial point that will cluster is chosen;
2) carry out clustering for the first time according to spatial point relation geographically, the spatial point being under the jurisdiction of same geographical relationship is gathered be
One class;
3) for analysis task, the time interval of the time series data used when determining second time cluster, takes out each spatial point and exists
Data value in this time interval, forms time series;
4) according to step 2) in the cluster result that obtains and step 3) time series that obtains, calculate any two in same class empty
Between point between timing dependence;
5) for step 2) in each cluster result, integrating step 4) timing dependence that obtains, each cluster result is entered
Row secondary clusters, and forms final cluster result.
2. the method for claim 1, it is characterised in that step 1) in the set of described spatial point is certain spatial dimension
Whole spatial point, or the spatial point filtered out after applying certain filtering rule, and each spatial point comprises
Time series data in time period.
3. method as claimed in claim 2, it is characterised in that described filtering rule includes: distance within a particular value,
Or other indexs are within the scope of certain special value.
4. the method for claim 1, it is characterised in that step 2) described geographical relationship be by administrative division divide ground
Reason relation, or self-defining region.
5. method as claimed in claim 4, it is characterised in that described administrative division includes but not limited to country, province, city
City, and can being adjusted according to different situations, including the scope according to whole spaces, data set sequential density, main frame
Computing capability is adjusted.
6. method as claimed in claim 4, it is characterised in that described self-defining region is according to mountain range, river trend
The region divided, or the region divided according to the spatial object of urban construction.
7. the method for claim 1, it is characterised in that step 4) index that calculates described timing dependence includes: skin
Ademilson correlation metric, Spearman rank correlation coefficient, Kendall rank correlation coefficient.
8. the method for claim 1, it is characterised in that step 5) by bottom-up clustering method to each cluster
Result carries out secondary cluster;Described bottom-up clustering method is referred to as recluster algorithm, and its input parameter is for have gathered
Result clustered of class, result unclustered not clustered and last recluster algorithm do not cluster after performing
Length length;The execution step of this recluster algorithm is as follows:
If in the result a) not clustered, after spatial point number performs with last recluster algorithm, number is identical, and knot is described
Without meeting the spatial point clustered required in Guo, algorithm performs to terminate, and returns;Wherein clustered result is secondary
Cluster result;
If b) in non-cluster result, spatial point number is 0, illustrating that all spatial point are complete cluster, algorithm performs to terminate,
Return;Wherein clustered result is the result of secondary cluster;
C) length of length is entered as the length of unclustered, and creates a new variable save at this
Recluster does not carries out the some remaining clustered;Condition as being unsatisfactory for a), b), then in unclustered
All spatial point, it is judged that the timing dependence of its any two point A Yu B, if its dependency is less than a certain threshold value, or it is not
There is significance, then B is added in remaining, and B is removed from unclustered;
D) after step c) is finished, residue unclustered being gathered is a class, and adds in cluster.
E) re-executing algorithm recluster, the parameter of use is cluster, remaining, length.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610404636.1A CN106055689A (en) | 2016-06-08 | 2016-06-08 | Spatial clustering method based on time sequence correlation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610404636.1A CN106055689A (en) | 2016-06-08 | 2016-06-08 | Spatial clustering method based on time sequence correlation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106055689A true CN106055689A (en) | 2016-10-26 |
Family
ID=57169893
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610404636.1A Pending CN106055689A (en) | 2016-06-08 | 2016-06-08 | Spatial clustering method based on time sequence correlation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106055689A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108564110A (en) * | 2018-03-26 | 2018-09-21 | 上海电力学院 | A kind of Air Quality Forecast method based on clustering algorithm |
CN110134839A (en) * | 2019-03-27 | 2019-08-16 | 平安科技(深圳)有限公司 | Time series data characteristic processing method, apparatus and computer readable storage medium |
CN110147843A (en) * | 2019-05-22 | 2019-08-20 | 哈尔滨工程大学 | Voice Time Series Similar measure based on metric learning |
CN110263791A (en) * | 2019-05-31 | 2019-09-20 | 京东城市(北京)数字科技有限公司 | A kind of method and apparatus in identification function area |
CN110288140A (en) * | 2019-06-14 | 2019-09-27 | 西北大学 | A kind of opioid drug spatial prediction technique based on geo-relevance model |
CN110706004A (en) * | 2019-06-27 | 2020-01-17 | 华南农业大学 | Farmland heavy metal pollutant tracing method based on hierarchical clustering |
CN113537311A (en) * | 2021-06-30 | 2021-10-22 | 北京百度网讯科技有限公司 | Spatial point clustering method and device and electronic equipment |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103942325A (en) * | 2014-04-29 | 2014-07-23 | 中南大学 | Method for association rule mining of ocean-land climate events with combination of climate subdivision thought |
CN105550244A (en) * | 2015-12-07 | 2016-05-04 | 武汉大学 | Adaptive clustering method |
-
2016
- 2016-06-08 CN CN201610404636.1A patent/CN106055689A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103942325A (en) * | 2014-04-29 | 2014-07-23 | 中南大学 | Method for association rule mining of ocean-land climate events with combination of climate subdivision thought |
CN105550244A (en) * | 2015-12-07 | 2016-05-04 | 武汉大学 | Adaptive clustering method |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108564110B (en) * | 2018-03-26 | 2021-07-20 | 上海电力学院 | Air quality prediction method based on clustering algorithm |
CN108564110A (en) * | 2018-03-26 | 2018-09-21 | 上海电力学院 | A kind of Air Quality Forecast method based on clustering algorithm |
CN110134839A (en) * | 2019-03-27 | 2019-08-16 | 平安科技(深圳)有限公司 | Time series data characteristic processing method, apparatus and computer readable storage medium |
CN110134839B (en) * | 2019-03-27 | 2023-06-06 | 平安科技(深圳)有限公司 | Time sequence data characteristic processing method and device and computer readable storage medium |
CN110147843A (en) * | 2019-05-22 | 2019-08-20 | 哈尔滨工程大学 | Voice Time Series Similar measure based on metric learning |
CN110263791A (en) * | 2019-05-31 | 2019-09-20 | 京东城市(北京)数字科技有限公司 | A kind of method and apparatus in identification function area |
CN110263791B (en) * | 2019-05-31 | 2021-11-09 | 北京京东智能城市大数据研究院 | Method and device for identifying functional area |
CN110288140B (en) * | 2019-06-14 | 2023-04-07 | 西北大学 | Opioid spatial propagation prediction method based on geographical correlation model |
CN110288140A (en) * | 2019-06-14 | 2019-09-27 | 西北大学 | A kind of opioid drug spatial prediction technique based on geo-relevance model |
CN110706004A (en) * | 2019-06-27 | 2020-01-17 | 华南农业大学 | Farmland heavy metal pollutant tracing method based on hierarchical clustering |
CN110706004B (en) * | 2019-06-27 | 2022-03-29 | 华南农业大学 | Farmland heavy metal pollutant tracing method based on hierarchical clustering |
CN113537311A (en) * | 2021-06-30 | 2021-10-22 | 北京百度网讯科技有限公司 | Spatial point clustering method and device and electronic equipment |
US20230004751A1 (en) * | 2021-06-30 | 2023-01-05 | Beijing Baidu Netcom Science Technology Co., Ltd. | Clustering Method and Apparatus for Spatial Points, and Electronic Device |
CN113537311B (en) * | 2021-06-30 | 2023-08-04 | 北京百度网讯科技有限公司 | Spatial point clustering method and device and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106055689A (en) | Spatial clustering method based on time sequence correlation | |
CN109448370B (en) | Traffic control subarea division method based on vehicle track data | |
CN107610469B (en) | Day-dimension area traffic index prediction method considering multi-factor influence | |
CN108596362B (en) | Power load curve form clustering method based on adaptive piecewise aggregation approximation | |
Prat-Pérez et al. | Shaping communities out of triangles | |
CN107529651A (en) | A kind of urban transportation passenger flow forecasting and equipment based on deep learning | |
CN104462184B (en) | A kind of large-scale data abnormality recognition method based on two-way sampling combination | |
CN105163326B (en) | A kind of cell clustering method and system based on wireless network traffic feature | |
CN101178703B (en) | Failure diagnosis chart clustering method based on network dividing | |
CN101276420A (en) | Classification method for syncretizing optical spectrum information and multi-point simulation space information | |
CN108959958A (en) | A kind of method for secret protection and system being associated with big data | |
Pietrucha-Urbanik | Multidimensional comparative analysis of water infrastructures differentiation | |
CN109871638A (en) | A kind of lake and marshland Evaluation of Eutrophication model building method | |
CN106228190A (en) | Decision tree method of discrimination for resident's exception water | |
CN111125285A (en) | Animal geographic zoning method based on species spatial distribution relation | |
CN111307164A (en) | Low-sampling-rate track map matching method | |
CN110716998B (en) | Fine scale population data spatialization method | |
CN114219370B (en) | Social network-based multidimensional influence factor weight analysis method for river water quality | |
CN113641733B (en) | Real-time intelligent estimation method for river cross section flow | |
CN105243503A (en) | Coastal zone ecological safety assessment method based on space variables and logistic regression | |
Jarvis | New measure of the topologic structure of dendritic drainage networks | |
CN112052405B (en) | Passenger searching area recommendation method based on driver experience | |
CN109285219A (en) | A kind of grid type hydrological model grid calculation order encoding method based on DEM | |
CN112819208A (en) | Spatial similarity geological disaster prediction method based on feature subset coupling model | |
CN109255433B (en) | Community detection method based on similarity |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20161026 |
|
WD01 | Invention patent application deemed withdrawn after publication |