CN111651502A - City functional area identification method based on multi-subspace model - Google Patents

City functional area identification method based on multi-subspace model Download PDF

Info

Publication number
CN111651502A
CN111651502A CN202010484901.8A CN202010484901A CN111651502A CN 111651502 A CN111651502 A CN 111651502A CN 202010484901 A CN202010484901 A CN 202010484901A CN 111651502 A CN111651502 A CN 111651502A
Authority
CN
China
Prior art keywords
functional
matrix
functional area
similarity
subspace
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010484901.8A
Other languages
Chinese (zh)
Other versions
CN111651502B (en
Inventor
朱佳玮
陶超
李海峰
肖俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN202010484901.8A priority Critical patent/CN111651502B/en
Publication of CN111651502A publication Critical patent/CN111651502A/en
Application granted granted Critical
Publication of CN111651502B publication Critical patent/CN111651502B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Remote Sensing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a city functional area identification method based on a multi-subspace model, which comprises the following steps: acquiring taxi track data and check-in data in a research area; constructing a time sequence characteristic matrix C facing to partitions and based on an access purpose; inputting the time sequence characteristic matrix C to a sparse subspace clustering algorithm, and calculating to obtain the corresponding relation between the geographic unit and the urban functional area; and acquiring the remarkable characteristic location of each functional area, and further identifying the main function of each functional area. The method provided by the invention utilizes human activity information provided by geographic big data, overcomes the defects in the prior art based on a multi-subspace model, can more accurately identify the urban functional areas, analyzes the uniqueness and abundance of each functional area based on the geometric properties of the subspace, and provides a fine quantitative index indication for the management and development of the urban functional areas.

Description

City functional area identification method based on multi-subspace model
Technical Field
The invention belongs to the technical field of geographic spatial information identification, relates to an urban geographic information identification method, and particularly relates to an urban functional area identification method based on a multi-subspace model.
Background
The urban space structure is a core research content of urban geographic informatics and is also a centralized reflection of human-ground relations, because the urban space influences the production and activities of people when influenced by human activities, the urban space structure relates to urban planning, site selection and travel and site recommendation. In urban spatial structure analysis, the distribution of urban functional areas is a result presented in a geographic space under the influence of a plurality of factors.
There are many methods for analyzing urban functional areas, such as social investigation, but the method takes time and labor to obtain data, and may be greatly influenced by subjective factors during analysis, and the greatest disadvantage is that human activities, which are key factors of urban development, cannot be directly reflected. With the rapid development of mobile communication, internet and satellite positioning technologies, a series of electronic footprints are generated by mobile devices with positioning functions, and the electronic footprints are real records of urban resident activities, so that people can explore urban functional areas from the perspective of human activities. The existing method utilizes social media sign-in data, mobile phone data and taxi track data to detect the city functional area.
The prior art is not fully developed on models for analyzing data. The general steps are as follows, firstly, when processing the large data of the geographic space, after mapping the human activity time sequence characteristic information to the geographic units divided by people, each geographic unit can be expressed by a vector, and the information is stored in a high-dimensional vector space. Then, they characterize these geographic units by some algorithms such as singular value decomposition, latent semantic analysis, latent dirichlet, and the like. Finally, clustering is carried out through the similarity of the geographic units on the feature expression, and each clustering result represents one functional area, so that the distribution of the urban functional areas is obtained. However, these models have the following disadvantages.
First, in the process of feature expression, part of the algorithm first makes strict assumptions about features, such as samples having only one set of features or obeying the same distribution. Since the samples are reduced from a high-dimensional space to a low-dimensional subspace after being subjected to feature expression, these algorithms can be called single subspace algorithms. The strict assumption of the simplex space algorithm is convenient for obtaining the characteristic mode, and the functional area distribution can be obtained by clustering according to the relation between the samples and the characteristics, but if the weight occupied by the sample information is smaller, the sample information will be marginalized after the characteristics are expressed, so that the clustering result is inaccurate. And there are characteristic differences between functional regions, each of which cannot be described concisely and accurately using the same set of characteristics. When the data is too large and the feature pattern is too complex, the assumption of the feature pattern by the single subspace model will limit the mining of the feature. They cannot handle more complex data.
Second, these models ignore the geometric meaning of the vector space. The geometrical properties of the subspace are related to the characteristics of the urban functional area, and the prior art has ignored discussion and consideration of this.
Disclosure of Invention
In view of the above, the present invention provides a method for identifying an urban functional area based on a multi-subspace model, which utilizes human activity information provided by geographic big data, overcomes the defects in the prior art by using the multi-subspace model, and can identify the urban functional area more accurately.
The invention aims to realize the method, and the method for identifying the urban functional area based on the multi-subspace model comprises the following steps:
step 1, taxi track data and check-in data in a research area are obtained;
step 2, constructing a time sequence characteristic matrix C facing to partitions and based on the visiting purpose;
step 3, inputting the time sequence characteristic matrix C to a sparse subspace clustering algorithm, and calculating to obtain the corresponding relation between the geographic unit and the urban functional area;
and 4, acquiring the remarkable characteristic location of each functional area, and further identifying the main function of each functional area.
Specifically, the process of constructing the time sequence feature matrix C in step 2 includes the following steps:
step 201, dividing the research area to obtain N geographic units;
step 202, preprocessing the taxi track data, eliminating abnormal points, extracting the end point and the arrival time of each journey, and mapping the end point and the geographic unit to obtain the visit record of the geographic unit;
step 203, matching the check-in data records with visit records of the geographic units, and classifying the purpose of each visit;
step 204, constructing a time sequence feature matrix C with M rows and N columns, which represents the human activity dynamic carried by the geographic unit in a period of time, wherein M is T × D, T represents the number of divided time segments, D represents the number of categories of visited destinations, and each column in C represents the number of people visiting the corresponding geographic unit for different purposes in different time segments.
Specifically, the sparse subspace clustering algorithm described in step 3 includes the following steps:
step 301, solving a coefficient matrix Z with the size of N × N, wherein the matrix Z needs to be satisfied at l1Minimization under constraint:
Figure BDA0002518778500000031
CZ=C,Zii=0
wherein
Figure BDA0002518778500000041
Is represented by1Norm,/, of1Norm minimization makes the coefficient matrix Z sparse, forcing the timing characteristics of each geographic cell to need to be represented only by linear combinations of the timing characteristics of other geographic cells in the same subspace;
step 302, a similarity matrix W | + | Z | -Y | -Z | -Y |, of data is then established using the coefficient matrixTW is N × N, and the value in the matrix is the similarity of the time sequence characteristics between the geographic units corresponding to the indexes;
the similarity matrix W is a block diagonal matrix, namely only a non-zero sub-matrix is arranged on a main diagonal, the other sub-matrices are zero matrices, each non-zero sub-matrix is a subspace, the same subspace comprises a plurality of geographic units with extremely similar time sequence characteristics, the geographic units in different subspaces have large difference in time sequence characteristics, and therefore the subspace is a required detection city functional area;
step 303, calculating the number of subspaces by using the normalized laplacian matrix L of the similarity matrix W, where L is I-D-1/ 2WD-1/2Where I is the identity matrix, D- ∑iWijSorting the eigenvalues of L in ascending order, calculating the difference lambda of every two adjacent eigenvaluesk+1kK corresponding to the maximum difference is the number of the acquired subspaces, namely the number of the urban functional areas to be detected;
and 304, using a K-means clustering method for the similarity matrix W, setting the clustering number as K obtained in the step 303, obtaining the corresponding relation between the geographic unit and K categories, namely the corresponding relation between the geographic unit and K city functional areas, and completing the detection of the city functional areas.
Specifically, the obtaining of the significant feature location of each functional area in step 4 includes: extracting a subspace matrix S corresponding to each city functional area from the similarity matrix W generated in the step 302 by using the corresponding relation in the step 3041,...,Si,...,SkAnd performing principal component analysis to obtain a feature vector [ e ]1,e2,...,ep,...,eM]iIs called SiThe first r eigenvectors [ e ] with the cumulative eigenvalue percentage higher than 90%1,e2,...,er]iIs SiOf the salient feature locations.
Specifically, the identification of the main function of each functional area includes that each significant feature point of each functional area is deformed into a matrix with D rows and T columns, each row represents the activity level change of the feature point aiming at D over T time periods, the main activity mode of the functional area is obtained, the functional area is marked by the most active function in the main activity mode, and the urban functional area identification is completed.
Further, the method for identifying the urban functional area further comprises the following steps:
step 5, calculating the similarity of each functional area;
and 6, calculating the uniqueness of each functional area, and sequencing each functional area according to the uniqueness.
Specifically, the similarity of the functional regions is calculated according to the main angle between the corresponding subspaces, and any two functional regions correspond to the subspace SkAnd SlSimilarity aff (S) ofk,Sl) The calculation formula is as follows,
Figure BDA0002518778500000051
wherein the content of the first and second substances,
Figure BDA0002518778500000052
is that
Figure BDA0002518778500000053
Of the ith maximum singular value, UkAnd UlAre each SkAnd SlThe orthogonal basis of (a) is,
Figure BDA0002518778500000054
is the main angle between the subspaces, dk∧dlDenotes SkAnd SlOf spatial dimension dkAnd dlThe smaller of these;
specifically, the uniqueness of the functional region is inversely proportional to the similarity, if the similarity between the subspaces is high, the functions of the corresponding functional regions will be greatly similar, the uniqueness of the functional region is low, and each functional region S isiThe uniqueness calculation formula is as follows:
Figure BDA0002518778500000055
where k is the total number of functional regions, S-iDenotes in addition to SiAnd (4) functional regions outside.
Further, the method for identifying the urban functional area further comprises the following steps:
and 7, calculating the abundance of each functional region, and sequencing each functional region according to the abundance.
Specifically, the abundance of the functional region is related to the reconstruction error of the significant feature site of each functional region, which is calculated as follows:
Figure BDA0002518778500000061
wherein C (S)i) Is formed by belonging to a subspace SiIs used to form a matrix of the original vectors,
Figure BDA0002518778500000062
is formed by SiThe significant feature location of the computing system is reconstructed by the matrix, | | | | | non-calculationFThe frobenius norm of the matrix is represented.
The method provides a model based on multiple subspaces, considers that city functional areas have multiple groups of characteristics, when the time-space activity information of a geographic unit is expressed by vectors, vector samples of the time-space activity information are located in a high-dimensional space formed by joint subspaces, the dynamic characteristics of human activities carried by the geographic units located in the same subspace are similar, the dynamic characteristics can be clustered into one functional area, the identification of the city functional areas is realized by searching the subspaces, the uniqueness and abundance of each functional area are analyzed based on the geometric properties of the subspaces, and a fine quantitative index indication is provided for the management and development of the city functional areas.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a schematic flow chart of an embodiment of the method of the present invention.
FIG. 3 is a similarity matrix obtained by using sparse subspace clustering in an embodiment of the present invention;
FIG. 4 shows the results of detecting urban functional areas according to an embodiment of the present invention;
FIG. 5 functional activity level of a salient feature site for each functional area in an embodiment of the present invention;
FIG. 6 illustrates the similarity of functional areas calculated according to an embodiment of the present invention;
FIG. 7 shows the uniqueness and abundance of the functional regions calculated by the example of the present invention.
Detailed Description
The present invention is further illustrated by the following examples and the accompanying drawings, but the present invention is not limited thereto in any way, and any modifications or alterations based on the teaching of the present invention are within the scope of the present invention.
As shown in fig. 1, a city functional area identification method based on a multi-subspace model includes the following steps:
step 1, taxi track data and check-in data in a research area are obtained;
step 2, constructing a time sequence characteristic matrix C facing to partitions and based on the visiting purpose;
step 3, inputting the time sequence characteristic matrix C to a sparse subspace clustering algorithm, and calculating to obtain the corresponding relation between the geographic unit and the urban functional area;
and 4, acquiring the remarkable characteristic location of each functional area, and further identifying the main function of each functional area.
Specifically, the process of constructing the time sequence feature matrix C in step 2 includes the following steps:
step 201, dividing the research area to obtain N geographic units;
step 202, preprocessing the taxi track data, eliminating abnormal points, extracting the end point and the arrival time of each journey, and mapping the end point and the geographic unit to obtain the visit record of the geographic unit;
step 203, matching the check-in data records with visit records of the geographic units, and classifying the purpose of each visit;
step 204, constructing a time sequence feature matrix C with M rows and N columns, which represents the human activity dynamic carried by the geographic unit in a period of time, wherein M is T × D, T represents the number of divided time segments, D represents the number of categories of visited destinations, and each column in C represents the number of people visiting the corresponding geographic unit for different purposes in different time segments.
Specifically, the sparse subspace clustering algorithm described in step 3 includes the following steps:
step 301, solving a coefficient matrix Z with the size of N × N, wherein the matrix Z needs to be satisfied at l1Minimization under constraint:
Figure BDA0002518778500000081
CZ=C,Zii=0
wherein
Figure BDA0002518778500000082
Is represented by1Norm,/, of1Norm minimization makes the coefficient matrix Z sparse, forcing the timing characteristics of each geographic cell to need to be represented only by linear combinations of the timing characteristics of other geographic cells in the same subspace;
step 302, a similarity matrix W | + | Z | -Y | -Z | -Y |, of data is then established using the coefficient matrixTW is N × N, and the value in the matrix is the similarity of the time sequence characteristics between the geographic units corresponding to the indexes;
the similarity matrix W is a block diagonal matrix, namely only a non-zero sub-matrix is arranged on a main diagonal, the other sub-matrices are zero matrices, each non-zero sub-matrix is a subspace, the same subspace comprises a plurality of geographic units with extremely similar time sequence characteristics, the geographic units in different subspaces have large difference in time sequence characteristics, and therefore the subspace is a required detection city functional area;
step 303, calculating the number of subspaces by using the normalized laplacian matrix L of the similarity matrix W, where L is I-D-1/ 2WD-1/2Where I is the identity matrix, D- ∑iWijSorting the eigenvalues of L in ascending order, calculating the difference lambda of every two adjacent eigenvaluesk+1kK corresponding to the maximum difference is the number of the acquired subspaces, namely the number of the urban functional areas to be detected;
and 304, using a K-means clustering method for the similarity matrix W, setting the clustering number as K obtained in the step 303, obtaining the corresponding relation between the geographic unit and K categories, namely the corresponding relation between the geographic unit and K city functional areas, and completing the detection of the city functional areas.
Specifically, the obtaining of the significant feature location of each functional area in step 4 includes: extracting a subspace matrix S corresponding to each city functional area from the similarity matrix W generated in the step 302 by using the corresponding relation in the step 3041,...,Si,...,SkAnd performing principal component analysis to obtain a feature vector [ e ]1,e2,...,ep,…,eM]iIs referred to as 5iThe first r eigenvectors [ e ] with the cumulative eigenvalue percentage higher than 90%1,e2,...,er]iIs SiOf the salient feature locations.
Specifically, the identification of the main function of each functional area includes deforming each significant feature point of each functional area into a matrix of D rows and T columns, where each row represents the change of the activity level of the feature point for D over T time periods, obtaining the main activity pattern of the functional area, and regarding the most active function in the main activity pattern as the main function of the area.
Further, the method for identifying the urban functional area further comprises the following steps:
step 5, calculating the similarity of each functional area;
and 6, calculating the uniqueness of each functional area, and sequencing each functional area according to the uniqueness.
In particular, the calculation of the similarity of the functional regions is calculated from the principal angles between the subspaces, any two functional regions SkAnd SlThe similarity calculation formula of (a) is as follows,
Figure BDA0002518778500000091
wherein the content of the first and second substances,
Figure BDA0002518778500000092
is that
Figure BDA0002518778500000093
Of the ith maximum singular value, UkAnd UlAre each SkAnd SlThe orthogonal basis of (a) is,
Figure BDA0002518778500000094
is the main angle between the subspaces, dk∧dlDenotes SkAnd SlOf spatial dimension dkAnd dlThe smaller of these.
Specifically, the uniqueness of the functional region is inversely proportional to the similarity, if the similarity between the subspaces is high, the functions of the corresponding functional regions will be greatly similar, the uniqueness of the functional region is low, and each functional region S isiThe uniqueness calculation formula is as follows:
Figure BDA0002518778500000101
where k is the total number of functional regions, S-iDenotes in addition to SiAnd (4) functional regions outside.
Further, the method for identifying the urban functional area further comprises the following steps:
and 7, calculating the abundance of each functional region, and sequencing each functional region according to the abundance.
Specifically, the abundance of the functional region is related to the reconstruction error of the significant feature site of each functional region, which is calculated as follows:
Figure BDA0002518778500000102
wherein C (S)i) Is formed by belonging to a subspace SiIs used to form a matrix of the original vectors,
Figure BDA0002518778500000103
is formed by SiIs used to reconstruct the salient feature locations.
The reconstruction error describes the difference value of the original subspace matrix restored by the salient feature positions, and the larger the reconstruction error is, the more the salient feature positions are required to depict the dynamic change in the functional region besides the dominant salient feature positions are indicated. The abundance is the abundant activity pattern of people in the area and the functional development which can support the activity pattern.
As shown in the flow chart of FIG. 2, the experiment of the present invention includes the following steps.
(1) Data processing
Step 1.1: selecting main urban areas in Shanghai as research areas, dividing grids into 500 m × 500 m, and removing water units to obtain 3166 geographic units.
Step 1.2: preprocessing GPS track data of 6600 taxis from Shanghai city, eliminating abnormal points, extracting the end point and arrival time of each journey, and mapping the end point and the geographic unit to obtain 7852724 arrival records.
Step 1.3: matching the check-in data records with visit records of the geographic units, and classifying the purposes of each visit, wherein the purposes of the visits have six types: home, traffic, work, dining, entertainment and others (refer to places such as parks, museums, libraries, etc.).
Step 1.4: dividing one day by hours to obtain 24 time periods, and counting the times of visiting each geographic unit in the 24 time periods for each purpose (total number 6) to obtain a time sequence characteristic matrix C with 144 rows and 3166 columns.
(2) Urban functional area identification
Step 2.1: inputting the time sequence characteristic matrix C to a sparse subspace clustering algorithm to obtain a similarity matrix W, wherein a visual similarity matrix result is shown in FIG. 3, which reveals the similarity between geographic units, and the similarity value is colored black if the similarity value is nonzero, so that five opposite angles can be seen, and the structure reveals that the number of the urban functional areas is 5.
Step 2.2: computing the number of subspaces using the normalized Laplace matrix L of W, where L is I-D-1/2WD-1/2Where I is the identity matrix, D- ∑iWii. Sorting the characteristic values of L in ascending order, and calculating every two characteristic valuesDifference lambda of adjacent characteristic valuesk+1kAnd k corresponding to the maximum difference is 5, namely the number of subspaces (urban functional areas) is 5, which is consistent with the interpretation result in the step 2.1.
Step 2.3: therefore, the K-means clustering method is used for W, the clustering number is set to 5, and the urban functional area detection is completed to obtain urban functional areas 1, 2, 3, 4 and 5. The results of the visualization of the clustering results on the map are shown in fig. 4, where it can be seen that the central area is mainly covered by the functional area 5.
Step 2.4: since the main function of the functional area is determined by the significant activity features of the functional area, in order to determine the actual function of the detected urban functional area, principal component analysis is performed on the subspace matrix corresponding to each functional area to obtain the feature locations of each functional area, and it is found that the ratio of the first 5 feature values in the functional areas 1, 2, 3, and 4 exceeds 90%, and the ratio of the first 5 feature values in the functional area 5 is less than 90%, therefore, we use the basis vectors corresponding to the first 5 feature values as the significant feature locations of each functional area, and when analyzing the functional area 5, use the basis vectors corresponding to the first 10 feature values as the significant feature locations.
Step 2.5: each distinctive feature location of each functional area is transformed into a matrix of 6 rows and 24 columns, and each row represents the activity level change of the feature location for the purpose of home (H), traffic (Tr), work (W), dining (D), entertainment (E) and others (O, referring to places such as a park, a museum, a library) in 24 hours, and the distinctive feature locations of all the functional areas are shown in fig. 5. As can be seen from the figure, the family event (H) is most active in the distinctive feature location of the functional area 1, the dining event (D) is ranked second, and the entertainment event (E) is more prominent, so that the functional area 1 can be used as a living area developed with catering and entertainment facilities; similarly, the traffic activity (Tr) of the functional area 2 is highlighted as a traffic hub; the functional area 3 is mainly active as work (W), and is therefore a work area; for the functional area 5, the influence of the first 10 significant characteristic places is mainly measured, and dining activities (D) and entertainment activities (E) are found to be active, and are considered as commercial areas; the functional area 4 corresponds to other functional areas such as a park, a museum, a gas station, etc.
(3) Urban functional area analysis
Step 3.1: the proximity of the subspaces, i.e. the similarity of the functional regions, is calculated from the principal angles between the subspaces, see fig. 6, the similarity of the functional regions themselves is not calculated and is set to 0. The residential and business areas in fig. 6 share the highest similarity because the residential areas are more likely to have dining and entertainment facilities, and the locations of the business areas in fig. 3 are themselves mixed with a large number of residential areas.
Step 3.2: the uniqueness of the functional zones was calculated from the similarity of the functional zones and the results are shown in FIG. 7. The overall value of the functional zone uniqueness was higher, indicating that the overall functional zone differences were significant for the study area. Where the uniqueness of residential and commercial areas is low, this is also consistent with the results of step 3.1 in (3).
Step 3.3: the abundance of the functional domains was calculated and the results are shown in FIG. 7. The maximum reconstruction error of other functional areas (areas providing other services) means that the activity pattern in the other functional areas is the most complex because of the large number of facilities involved and the large difference of dynamic activity patterns. And reconstruction errors of residential areas and commercial areas are minimum, because the residential areas and the commercial areas are respectively concentrated on living, catering and entertainment, and the dynamic activity mode of the functions is single.
It can be known from the summary and the embodiments of the invention that, in order to solve the problems existing in the prior art, the invention provides a model based on multiple subspaces, and considers that a city functional area has multiple groups of features, when the spatio-temporal activity information of a geographic unit is expressed by a vector, a vector sample is located in a high-dimensional space formed by joint subspaces, the dynamic features of human activities carried by the geographic units located in the same subspace are similar and can be clustered into one functional area, the identification of the city functional area is realized by searching the subspace, and the uniqueness and abundance of each functional area are analyzed based on the geometric properties of the subspace, so that a fine quantitative index indication is provided for the management and development of the city functional area.

Claims (4)

1. A city functional area identification method based on a multi-subspace model is characterized by comprising the following steps:
step 1, taxi track data and check-in data in a research area are obtained;
step 2, constructing a time sequence characteristic matrix C facing to partitions and based on the visiting purpose;
step 3, inputting the time sequence characteristic matrix C to a sparse subspace clustering algorithm, and calculating to obtain the corresponding relation between the geographic unit and the urban functional area;
step 4, acquiring the significant characteristic location of each functional area, and further identifying the main function of each functional area;
the construction process of the time sequence characteristic matrix C in the step 2 comprises the following steps:
step 201, dividing the research area to obtain N geographic units;
step 202, preprocessing the taxi track data, eliminating abnormal points, extracting the end point and the arrival time of each journey, and mapping the end point and the geographic unit to obtain the visit record of the geographic unit;
step 203, matching the check-in data records with visit records of the geographic units, and classifying the purpose of each visit;
step 204, constructing a time sequence characteristic matrix C with M rows and N columns, wherein the time sequence characteristic matrix C represents the human activity dynamic carried by the geographic unit in a period of time, M is T multiplied by D, T represents the number of divided time segments, D represents the number of categories of visited destinations, and each column in C represents the number of people visiting the corresponding geographic unit in different time segments for different destinations;
the sparse subspace clustering algorithm in the step 3 comprises the following steps:
step 301, solving a coefficient matrix Z with the size of N × N, wherein the matrix Z needs to be satisfied at l1Minimization under constraint:
Figure FDA0002518778490000011
CZ=C,Zii=0
wherein
Figure FDA0002518778490000012
Is represented by1Norm,/, of1Norm minimization makes the coefficient matrix Z sparse, forcing the timing characteristics of each geographic cell to need to be represented only by linear combinations of the timing characteristics of other geographic cells in the same subspace;
step 302, a similarity matrix W | + | Z | -Y | -Z | -Y |, of data is then established using the coefficient matrixTW is N × N, and the value in the matrix is the similarity of the time sequence characteristics between the geographic units corresponding to the indexes;
step 303, calculating the number of subspaces by using the normalized laplacian matrix L of the similarity matrix W, where L is I-D-1/2WD-1/2Where I is the identity matrix, D- ∑iWijSorting the eigenvalues of L in ascending order, calculating the difference lambda of every two adjacent eigenvaluesk+1kK corresponding to the maximum difference is the number of the acquired subspaces, namely the number of the urban functional areas to be detected;
and 304, using a K-means clustering method for the similarity matrix W, setting the clustering number as K obtained in the step 303, obtaining the corresponding relation between the geographic unit and K categories, namely the corresponding relation between the geographic unit and K city functional areas, and completing the detection of the city functional areas.
2. The urban functional area identification method according to claim 1, wherein the obtaining of the significant feature location of each functional area in step 4 comprises: extracting a subspace matrix S corresponding to each city functional area from the similarity matrix W generated in the step 302 by using the corresponding relation in the step 3041,...,Si,...,SkAnd performing principal component analysis to obtain a feature vector [ e ]1,e2,...,ep,…,eM]iIs called SiThe first r eigenvectors [ e ] with the cumulative eigenvalue percentage higher than 90%1,e2,...,er]iIs SiA salient feature location of;
the main function identification of each functional area comprises the steps of deforming each significant characteristic place of each functional area into a matrix with D rows and T columns, wherein each row represents the activity level change of the characteristic place aiming at D in T time periods to obtain the main activity mode of the functional area, marking the functional area by the most active function in the main activity mode, and completing city functional area identification.
3. The urban functional area identification method according to claim 1 or 2, wherein the urban functional area identification method further comprises the steps of:
step 5, calculating the similarity of each functional area;
step 6, calculating the uniqueness of each functional area, and sequencing each functional area according to the uniqueness;
the similarity of the functional regions is calculated according to the main angle between the corresponding subspaces, and any two functional regions correspond to the subspace SkAnd SlSimilarity aff (S) ofk,Sl) The calculation formula is as follows,
Figure FDA0002518778490000031
wherein the content of the first and second substances,
Figure FDA0002518778490000032
is that
Figure FDA0002518778490000033
Of the ith maximum singular value, UkAnd UlAre each SkAnd SlThe orthogonal basis of (a) is,
Figure FDA0002518778490000034
is the main angle between the subspaces, dk∧dlDenotes SkAnd SlOf spatial dimension dkAnd dlThe smaller of these;
the uniqueness of the functional regions is inversely proportional to the similarity, if the similarity between the subspaces is high, the functions of the corresponding functional regions will be greatly similar, the uniqueness of the functional regions is low, and each functional region has a high degree of similarity with the corresponding functional regionFunctional region SiThe uniqueness calculation formula is as follows:
Figure FDA0002518778490000035
where k is the total number of functional regions, S-iDenotes in addition to SiAnd (4) functional regions outside.
4. The urban functional area identification method according to claim 3, wherein said urban functional area identification method further comprises the steps of:
step 7, calculating the abundance of each functional region, and sequencing each functional region according to the abundance;
the abundance of the functional region is related to the reconstruction error of the significant feature site of each functional region, which is calculated as follows:
Figure FDA0002518778490000036
wherein C (S)i) Is formed by belonging to a subspace SiIs used to form a matrix of the original vectors,
Figure FDA0002518778490000037
is formed by SiThe significant feature location of the computing system is reconstructed by the matrix, | | | | | non-calculationFThe frobenius norm of the matrix is represented.
CN202010484901.8A 2020-06-01 2020-06-01 City functional area identification method based on multi-subspace model Active CN111651502B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010484901.8A CN111651502B (en) 2020-06-01 2020-06-01 City functional area identification method based on multi-subspace model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010484901.8A CN111651502B (en) 2020-06-01 2020-06-01 City functional area identification method based on multi-subspace model

Publications (2)

Publication Number Publication Date
CN111651502A true CN111651502A (en) 2020-09-11
CN111651502B CN111651502B (en) 2021-09-14

Family

ID=72344015

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010484901.8A Active CN111651502B (en) 2020-06-01 2020-06-01 City functional area identification method based on multi-subspace model

Country Status (1)

Country Link
CN (1) CN111651502B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112559909A (en) * 2020-12-18 2021-03-26 浙江工业大学 Business area discovery method based on GCN embedded spatial clustering model
CN113343781A (en) * 2021-05-17 2021-09-03 武汉大学 Urban functional area identification method comprehensively using remote sensing data and taxi track data
CN113806419A (en) * 2021-08-26 2021-12-17 西北大学 Urban area function identification model and method based on space-time big data
CN113902185A (en) * 2021-09-30 2022-01-07 北京百度网讯科技有限公司 Method and device for determining regional land property, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105117595A (en) * 2015-08-19 2015-12-02 大连理工大学 Floating car data based private car travel data integration method
US20170300566A1 (en) * 2016-04-19 2017-10-19 Strava, Inc. Determining clusters of similar activities
CN108764193A (en) * 2018-06-04 2018-11-06 北京师范大学 Merge the city function limited region dividing method of POI and remote sensing image
CN108876475A (en) * 2018-07-12 2018-11-23 青岛理工大学 A kind of urban function region recognition methods, server and storage medium based on point of interest acquisition
CN110298500A (en) * 2019-06-19 2019-10-01 大连理工大学 A kind of urban transportation track data set creation method based on taxi car data and city road network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105117595A (en) * 2015-08-19 2015-12-02 大连理工大学 Floating car data based private car travel data integration method
US20170300566A1 (en) * 2016-04-19 2017-10-19 Strava, Inc. Determining clusters of similar activities
CN108764193A (en) * 2018-06-04 2018-11-06 北京师范大学 Merge the city function limited region dividing method of POI and remote sensing image
CN108876475A (en) * 2018-07-12 2018-11-23 青岛理工大学 A kind of urban function region recognition methods, server and storage medium based on point of interest acquisition
CN110298500A (en) * 2019-06-19 2019-10-01 大连理工大学 A kind of urban transportation track data set creation method based on taxi car data and city road network

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
YE ZHI等: "Latent spatio-temporal activity structures: a new approach to inferring intra-urban functional regions via social media check-in data", 《GEO-SPATIAL INFORMATION SCIENCE》 *
刘旭: "基于出租车和POI数据的城市土地利用现状变化研究", 《中国优秀硕士学位论文全文数据库 基础科技辑》 *
宁鹏飞等: "基于签到数据的城市热点功能区识别研究", 《测绘地理信息》 *
张慧杰等: "基于轨迹和兴趣点数据的城市功能区动态识别与时变规律可视分析", 《计算机辅助设计与图形学学报》 *
柯文聪等: "基于Landsat与DMSP-OLS的非监督城区提取方法研究", 《测绘与空间地理信息》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112559909A (en) * 2020-12-18 2021-03-26 浙江工业大学 Business area discovery method based on GCN embedded spatial clustering model
CN113343781A (en) * 2021-05-17 2021-09-03 武汉大学 Urban functional area identification method comprehensively using remote sensing data and taxi track data
CN113343781B (en) * 2021-05-17 2022-02-01 武汉大学 City functional area identification method using remote sensing data and taxi track data
CN113806419A (en) * 2021-08-26 2021-12-17 西北大学 Urban area function identification model and method based on space-time big data
CN113806419B (en) * 2021-08-26 2024-04-12 西北大学 Urban area function recognition model and recognition method based on space-time big data
CN113902185A (en) * 2021-09-30 2022-01-07 北京百度网讯科技有限公司 Method and device for determining regional land property, electronic equipment and storage medium
CN113902185B (en) * 2021-09-30 2023-10-31 北京百度网讯科技有限公司 Determination method and device for regional land property, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111651502B (en) 2021-09-14

Similar Documents

Publication Publication Date Title
CN111651502B (en) City functional area identification method based on multi-subspace model
CN112949413B (en) City landscape element classification and locality measurement method based on street view picture
CN111651545A (en) Urban marginal area extraction method based on multi-source data fusion
CN110264709A (en) The prediction technique of the magnitude of traffic flow of road based on figure convolutional network
CN109493119B (en) POI data-based urban business center identification method and system
CN107633067A (en) A kind of Stock discrimination method based on human behavior rule and data digging method
CN110674858B (en) Traffic public opinion detection method based on space-time correlation and big data mining
CN111737605A (en) Travel purpose identification method and device based on mobile phone signaling data
CN112819207A (en) Geological disaster space prediction method, system and storage medium based on similarity measurement
CN109840272B (en) Method for predicting user demand of shared electric automobile station
CN108898244B (en) Digital signage position recommendation method coupled with multi-source elements
CN111652198A (en) Urban edge area identification method and system
CN108566620A (en) A kind of indoor orientation method based on WIFI
CN106227884A (en) A kind of recommendation method of calling a taxi online based on collaborative filtering
CN114708521A (en) City functional area identification method and system based on street view image target relation perception network
CN112561401A (en) City vitality measurement and characterization method and system based on multi-source big data
Renigier-Biłozor et al. Modern challenges of property market analysis-homogeneous areas determination
Bajat et al. Spatial hedonic modeling of housing prices using auxiliary maps
CN117408167A (en) Debris flow disaster vulnerability prediction method based on deep neural network
Qiu et al. RPSBPT: A route planning scheme with best profit for taxi
Haining Data problems in spatial econometric modeling
CN110633890A (en) Land utilization efficiency judgment method and system
CN112650949B (en) Regional POI (point of interest) demand identification method based on multi-source feature fusion collaborative filtering
CN115879594A (en) Urban settlement population distribution trend prediction method based on geographic detector
Keskin et al. Cohort fertility heterogeneity during the fertility decline period in Turkey

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant