CN111651502A

CN111651502A - City functional area identification method based on multi-subspace model

Info

Publication number: CN111651502A
Application number: CN202010484901.8A
Authority: CN
Inventors: 朱佳玮; 陶超; 李海峰; 肖俊
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2020-06-01
Filing date: 2020-06-01
Publication date: 2020-09-11
Anticipated expiration: 2040-06-01
Also published as: CN111651502B

Abstract

The invention discloses a city functional area identification method based on a multi-subspace model, which comprises the following steps: acquiring taxi track data and check-in data in a research area; constructing a time sequence characteristic matrix C facing to partitions and based on an access purpose; inputting the time sequence characteristic matrix C to a sparse subspace clustering algorithm, and calculating to obtain the corresponding relation between the geographic unit and the urban functional area; and acquiring the remarkable characteristic location of each functional area, and further identifying the main function of each functional area. The method provided by the invention utilizes human activity information provided by geographic big data, overcomes the defects in the prior art based on a multi-subspace model, can more accurately identify the urban functional areas, analyzes the uniqueness and abundance of each functional area based on the geometric properties of the subspace, and provides a fine quantitative index indication for the management and development of the urban functional areas.

Description

City functional area identification method based on multi-subspace model

Technical Field

The invention belongs to the technical field of geographic spatial information identification, relates to an urban geographic information identification method, and particularly relates to an urban functional area identification method based on a multi-subspace model.

Background

The urban space structure is a core research content of urban geographic informatics and is also a centralized reflection of human-ground relations, because the urban space influences the production and activities of people when influenced by human activities, the urban space structure relates to urban planning, site selection and travel and site recommendation. In urban spatial structure analysis, the distribution of urban functional areas is a result presented in a geographic space under the influence of a plurality of factors.

There are many methods for analyzing urban functional areas, such as social investigation, but the method takes time and labor to obtain data, and may be greatly influenced by subjective factors during analysis, and the greatest disadvantage is that human activities, which are key factors of urban development, cannot be directly reflected. With the rapid development of mobile communication, internet and satellite positioning technologies, a series of electronic footprints are generated by mobile devices with positioning functions, and the electronic footprints are real records of urban resident activities, so that people can explore urban functional areas from the perspective of human activities. The existing method utilizes social media sign-in data, mobile phone data and taxi track data to detect the city functional area.

The prior art is not fully developed on models for analyzing data. The general steps are as follows, firstly, when processing the large data of the geographic space, after mapping the human activity time sequence characteristic information to the geographic units divided by people, each geographic unit can be expressed by a vector, and the information is stored in a high-dimensional vector space. Then, they characterize these geographic units by some algorithms such as singular value decomposition, latent semantic analysis, latent dirichlet, and the like. Finally, clustering is carried out through the similarity of the geographic units on the feature expression, and each clustering result represents one functional area, so that the distribution of the urban functional areas is obtained. However, these models have the following disadvantages.

First, in the process of feature expression, part of the algorithm first makes strict assumptions about features, such as samples having only one set of features or obeying the same distribution. Since the samples are reduced from a high-dimensional space to a low-dimensional subspace after being subjected to feature expression, these algorithms can be called single subspace algorithms. The strict assumption of the simplex space algorithm is convenient for obtaining the characteristic mode, and the functional area distribution can be obtained by clustering according to the relation between the samples and the characteristics, but if the weight occupied by the sample information is smaller, the sample information will be marginalized after the characteristics are expressed, so that the clustering result is inaccurate. And there are characteristic differences between functional regions, each of which cannot be described concisely and accurately using the same set of characteristics. When the data is too large and the feature pattern is too complex, the assumption of the feature pattern by the single subspace model will limit the mining of the feature. They cannot handle more complex data.

Second, these models ignore the geometric meaning of the vector space. The geometrical properties of the subspace are related to the characteristics of the urban functional area, and the prior art has ignored discussion and consideration of this.

Disclosure of Invention

In view of the above, the present invention provides a method for identifying an urban functional area based on a multi-subspace model, which utilizes human activity information provided by geographic big data, overcomes the defects in the prior art by using the multi-subspace model, and can identify the urban functional area more accurately.

The invention aims to realize the method, and the method for identifying the urban functional area based on the multi-subspace model comprises the following steps:

step 1, taxi track data and check-in data in a research area are obtained;

step 2, constructing a time sequence characteristic matrix C facing to partitions and based on the visiting purpose;

step 3, inputting the time sequence characteristic matrix C to a sparse subspace clustering algorithm, and calculating to obtain the corresponding relation between the geographic unit and the urban functional area;

and 4, acquiring the remarkable characteristic location of each functional area, and further identifying the main function of each functional area.

Specifically, the process of constructing the time sequence feature matrix C in step 2 includes the following steps:

step 201, dividing the research area to obtain N geographic units;

step 202, preprocessing the taxi track data, eliminating abnormal points, extracting the end point and the arrival time of each journey, and mapping the end point and the geographic unit to obtain the visit record of the geographic unit;

step 203, matching the check-in data records with visit records of the geographic units, and classifying the purpose of each visit;

step 204, constructing a time sequence feature matrix C with M rows and N columns, which represents the human activity dynamic carried by the geographic unit in a period of time, wherein M is T × D, T represents the number of divided time segments, D represents the number of categories of visited destinations, and each column in C represents the number of people visiting the corresponding geographic unit for different purposes in different time segments.

Specifically, the sparse subspace clustering algorithm described in step 3 includes the following steps:

step 301, solving a coefficient matrix Z with the size of N × N, wherein the matrix Z needs to be satisfied at l₁Minimization under constraint:

CZ＝C，Z_ii＝0

wherein

Is represented by₁Norm,/, of₁Norm minimization makes the coefficient matrix Z sparse, forcing the timing characteristics of each geographic cell to need to be represented only by linear combinations of the timing characteristics of other geographic cells in the same subspace;

step 302, a similarity matrix W | + | Z | -Y | -Z | -Y |, of data is then established using the coefficient matrix^TW is N × N, and the value in the matrix is the similarity of the time sequence characteristics between the geographic units corresponding to the indexes;

the similarity matrix W is a block diagonal matrix, namely only a non-zero sub-matrix is arranged on a main diagonal, the other sub-matrices are zero matrices, each non-zero sub-matrix is a subspace, the same subspace comprises a plurality of geographic units with extremely similar time sequence characteristics, the geographic units in different subspaces have large difference in time sequence characteristics, and therefore the subspace is a required detection city functional area;

step 303, calculating the number of subspaces by using the normalized laplacian matrix L of the similarity matrix W, where L is I-D^-1/ ²WD^-1/2Where I is the identity matrix, D- ∑_iW_ijSorting the eigenvalues of L in ascending order, calculating the difference lambda of every two adjacent eigenvalues_k+1-λ_kK corresponding to the maximum difference is the number of the acquired subspaces, namely the number of the urban functional areas to be detected;

and 304, using a K-means clustering method for the similarity matrix W, setting the clustering number as K obtained in the step 303, obtaining the corresponding relation between the geographic unit and K categories, namely the corresponding relation between the geographic unit and K city functional areas, and completing the detection of the city functional areas.

Specifically, the obtaining of the significant feature location of each functional area in step 4 includes: extracting a subspace matrix S corresponding to each city functional area from the similarity matrix W generated in the step 302 by using the corresponding relation in the step 304₁，...，S_i，...，S_kAnd performing principal component analysis to obtain a feature vector [ e ]₁，e₂，...，e_p，...，e_M]_iIs called S_iThe first r eigenvectors [ e ] with the cumulative eigenvalue percentage higher than 90%₁，e₂，...，e_r]_iIs S_iOf the salient feature locations.

Specifically, the identification of the main function of each functional area includes that each significant feature point of each functional area is deformed into a matrix with D rows and T columns, each row represents the activity level change of the feature point aiming at D over T time periods, the main activity mode of the functional area is obtained, the functional area is marked by the most active function in the main activity mode, and the urban functional area identification is completed.

Further, the method for identifying the urban functional area further comprises the following steps:

step 5, calculating the similarity of each functional area;

and 6, calculating the uniqueness of each functional area, and sequencing each functional area according to the uniqueness.

Specifically, the similarity of the functional regions is calculated according to the main angle between the corresponding subspaces, and any two functional regions correspond to the subspace S_kAnd S_lSimilarity aff (S) of_k，S_l) The calculation formula is as follows,

wherein the content of the first and second substances,

is that

Of the ith maximum singular value, U^kAnd U^lAre each S_kAnd S_lThe orthogonal basis of (a) is,

is the main angle between the subspaces, d_k∧d_lDenotes S_kAnd S_lOf spatial dimension d_kAnd d_lThe smaller of these;

specifically, the uniqueness of the functional region is inversely proportional to the similarity, if the similarity between the subspaces is high, the functions of the corresponding functional regions will be greatly similar, the uniqueness of the functional region is low, and each functional region S is_iThe uniqueness calculation formula is as follows:

where k is the total number of functional regions, S_-iDenotes in addition to S_iAnd (4) functional regions outside.

and 7, calculating the abundance of each functional region, and sequencing each functional region according to the abundance.

Specifically, the abundance of the functional region is related to the reconstruction error of the significant feature site of each functional region, which is calculated as follows:

wherein C (S)_i) Is formed by belonging to a subspace S_iIs used to form a matrix of the original vectors,

is formed by S_iThe significant feature location of the computing system is reconstructed by the matrix, | | | | | non-calculation_FThe frobenius norm of the matrix is represented.

The method provides a model based on multiple subspaces, considers that city functional areas have multiple groups of characteristics, when the time-space activity information of a geographic unit is expressed by vectors, vector samples of the time-space activity information are located in a high-dimensional space formed by joint subspaces, the dynamic characteristics of human activities carried by the geographic units located in the same subspace are similar, the dynamic characteristics can be clustered into one functional area, the identification of the city functional areas is realized by searching the subspaces, the uniqueness and abundance of each functional area are analyzed based on the geometric properties of the subspaces, and a fine quantitative index indication is provided for the management and development of the city functional areas.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention;

FIG. 2 is a schematic flow chart of an embodiment of the method of the present invention.

FIG. 3 is a similarity matrix obtained by using sparse subspace clustering in an embodiment of the present invention;

FIG. 4 shows the results of detecting urban functional areas according to an embodiment of the present invention;

FIG. 5 functional activity level of a salient feature site for each functional area in an embodiment of the present invention;

FIG. 6 illustrates the similarity of functional areas calculated according to an embodiment of the present invention;

FIG. 7 shows the uniqueness and abundance of the functional regions calculated by the example of the present invention.

Detailed Description

The present invention is further illustrated by the following examples and the accompanying drawings, but the present invention is not limited thereto in any way, and any modifications or alterations based on the teaching of the present invention are within the scope of the present invention.

As shown in fig. 1, a city functional area identification method based on a multi-subspace model includes the following steps:

step 1, taxi track data and check-in data in a research area are obtained;

step 201, dividing the research area to obtain N geographic units;

CZ＝C，Z_ii＝0

wherein

Specifically, the obtaining of the significant feature location of each functional area in step 4 includes: extracting a subspace matrix S corresponding to each city functional area from the similarity matrix W generated in the step 302 by using the corresponding relation in the step 304₁，...，S_i，...，S_kAnd performing principal component analysis to obtain a feature vector [ e ]₁，e₂，...，e_p，…，e_M]_iIs referred to as 5_iThe first r eigenvectors [ e ] with the cumulative eigenvalue percentage higher than 90%₁，e₂，...，e_r]_iIs S_iOf the salient feature locations.

Specifically, the identification of the main function of each functional area includes deforming each significant feature point of each functional area into a matrix of D rows and T columns, where each row represents the change of the activity level of the feature point for D over T time periods, obtaining the main activity pattern of the functional area, and regarding the most active function in the main activity pattern as the main function of the area.

step 5, calculating the similarity of each functional area;

In particular, the calculation of the similarity of the functional regions is calculated from the principal angles between the subspaces, any two functional regions S_kAnd S_lThe similarity calculation formula of (a) is as follows,

wherein the content of the first and second substances,

is that

is the main angle between the subspaces, d_k∧d_lDenotes S_kAnd S_lOf spatial dimension d_kAnd d_lThe smaller of these.

is formed by S_iIs used to reconstruct the salient feature locations.

The reconstruction error describes the difference value of the original subspace matrix restored by the salient feature positions, and the larger the reconstruction error is, the more the salient feature positions are required to depict the dynamic change in the functional region besides the dominant salient feature positions are indicated. The abundance is the abundant activity pattern of people in the area and the functional development which can support the activity pattern.

As shown in the flow chart of FIG. 2, the experiment of the present invention includes the following steps.

(1) Data processing

Step 1.1: selecting main urban areas in Shanghai as research areas, dividing grids into 500 m × 500 m, and removing water units to obtain 3166 geographic units.

Step 1.2: preprocessing GPS track data of 6600 taxis from Shanghai city, eliminating abnormal points, extracting the end point and arrival time of each journey, and mapping the end point and the geographic unit to obtain 7852724 arrival records.

Step 1.3: matching the check-in data records with visit records of the geographic units, and classifying the purposes of each visit, wherein the purposes of the visits have six types: home, traffic, work, dining, entertainment and others (refer to places such as parks, museums, libraries, etc.).

Step 1.4: dividing one day by hours to obtain 24 time periods, and counting the times of visiting each geographic unit in the 24 time periods for each purpose (total number 6) to obtain a time sequence characteristic matrix C with 144 rows and 3166 columns.

(2) Urban functional area identification

Step 2.1: inputting the time sequence characteristic matrix C to a sparse subspace clustering algorithm to obtain a similarity matrix W, wherein a visual similarity matrix result is shown in FIG. 3, which reveals the similarity between geographic units, and the similarity value is colored black if the similarity value is nonzero, so that five opposite angles can be seen, and the structure reveals that the number of the urban functional areas is 5.

Step 2.2: computing the number of subspaces using the normalized Laplace matrix L of W, where L is I-D^-1/2WD^-1/2Where I is the identity matrix, D- ∑_iW_ii. Sorting the characteristic values of L in ascending order, and calculating every two characteristic valuesDifference lambda of adjacent characteristic values_k+1-λ_kAnd k corresponding to the maximum difference is 5, namely the number of subspaces (urban functional areas) is 5, which is consistent with the interpretation result in the step 2.1.

Step 2.3: therefore, the K-means clustering method is used for W, the clustering number is set to 5, and the urban functional area detection is completed to obtain urban

functional areas

1, 2, 3, 4 and 5. The results of the visualization of the clustering results on the map are shown in fig. 4, where it can be seen that the central area is mainly covered by the functional area 5.

Step 2.4: since the main function of the functional area is determined by the significant activity features of the functional area, in order to determine the actual function of the detected urban functional area, principal component analysis is performed on the subspace matrix corresponding to each functional area to obtain the feature locations of each functional area, and it is found that the ratio of the first 5 feature values in the

functional areas

1, 2, 3, and 4 exceeds 90%, and the ratio of the first 5 feature values in the functional area 5 is less than 90%, therefore, we use the basis vectors corresponding to the first 5 feature values as the significant feature locations of each functional area, and when analyzing the functional area 5, use the basis vectors corresponding to the first 10 feature values as the significant feature locations.

Step 2.5: each distinctive feature location of each functional area is transformed into a matrix of 6 rows and 24 columns, and each row represents the activity level change of the feature location for the purpose of home (H), traffic (Tr), work (W), dining (D), entertainment (E) and others (O, referring to places such as a park, a museum, a library) in 24 hours, and the distinctive feature locations of all the functional areas are shown in fig. 5. As can be seen from the figure, the family event (H) is most active in the distinctive feature location of the functional area 1, the dining event (D) is ranked second, and the entertainment event (E) is more prominent, so that the functional area 1 can be used as a living area developed with catering and entertainment facilities; similarly, the traffic activity (Tr) of the functional area 2 is highlighted as a traffic hub; the functional area 3 is mainly active as work (W), and is therefore a work area; for the functional area 5, the influence of the first 10 significant characteristic places is mainly measured, and dining activities (D) and entertainment activities (E) are found to be active, and are considered as commercial areas; the functional area 4 corresponds to other functional areas such as a park, a museum, a gas station, etc.

(3) Urban functional area analysis

Step 3.1: the proximity of the subspaces, i.e. the similarity of the functional regions, is calculated from the principal angles between the subspaces, see fig. 6, the similarity of the functional regions themselves is not calculated and is set to 0. The residential and business areas in fig. 6 share the highest similarity because the residential areas are more likely to have dining and entertainment facilities, and the locations of the business areas in fig. 3 are themselves mixed with a large number of residential areas.

Step 3.2: the uniqueness of the functional zones was calculated from the similarity of the functional zones and the results are shown in FIG. 7. The overall value of the functional zone uniqueness was higher, indicating that the overall functional zone differences were significant for the study area. Where the uniqueness of residential and commercial areas is low, this is also consistent with the results of step 3.1 in (3).

Step 3.3: the abundance of the functional domains was calculated and the results are shown in FIG. 7. The maximum reconstruction error of other functional areas (areas providing other services) means that the activity pattern in the other functional areas is the most complex because of the large number of facilities involved and the large difference of dynamic activity patterns. And reconstruction errors of residential areas and commercial areas are minimum, because the residential areas and the commercial areas are respectively concentrated on living, catering and entertainment, and the dynamic activity mode of the functions is single.

It can be known from the summary and the embodiments of the invention that, in order to solve the problems existing in the prior art, the invention provides a model based on multiple subspaces, and considers that a city functional area has multiple groups of features, when the spatio-temporal activity information of a geographic unit is expressed by a vector, a vector sample is located in a high-dimensional space formed by joint subspaces, the dynamic features of human activities carried by the geographic units located in the same subspace are similar and can be clustered into one functional area, the identification of the city functional area is realized by searching the subspace, and the uniqueness and abundance of each functional area are analyzed based on the geometric properties of the subspace, so that a fine quantitative index indication is provided for the management and development of the city functional area.

Claims

1. A city functional area identification method based on a multi-subspace model is characterized by comprising the following steps:

step 1, taxi track data and check-in data in a research area are obtained;

step 4, acquiring the significant characteristic location of each functional area, and further identifying the main function of each functional area;

the construction process of the time sequence characteristic matrix C in the step 2 comprises the following steps:

step 201, dividing the research area to obtain N geographic units;

step 204, constructing a time sequence characteristic matrix C with M rows and N columns, wherein the time sequence characteristic matrix C represents the human activity dynamic carried by the geographic unit in a period of time, M is T multiplied by D, T represents the number of divided time segments, D represents the number of categories of visited destinations, and each column in C represents the number of people visiting the corresponding geographic unit in different time segments for different destinations;

the sparse subspace clustering algorithm in the step 3 comprises the following steps:

CZ＝C，Z_ii＝0

wherein

step 303, calculating the number of subspaces by using the normalized laplacian matrix L of the similarity matrix W, where L is I-D^-1/2WD^-1/2Where I is the identity matrix, D- ∑_iW_ijSorting the eigenvalues of L in ascending order, calculating the difference lambda of every two adjacent eigenvalues_k+1-λ_kK corresponding to the maximum difference is the number of the acquired subspaces, namely the number of the urban functional areas to be detected;

2. The urban functional area identification method according to claim 1, wherein the obtaining of the significant feature location of each functional area in step 4 comprises: extracting a subspace matrix S corresponding to each city functional area from the similarity matrix W generated in the step 302 by using the corresponding relation in the step 304₁，...，S_i，...，S_kAnd performing principal component analysis to obtain a feature vector [ e ]₁，e₂，...，e_p，…，e_M]_iIs called S_iThe first r eigenvectors [ e ] with the cumulative eigenvalue percentage higher than 90%₁，e₂，...，e_r]_iIs S_iA salient feature location of;

the main function identification of each functional area comprises the steps of deforming each significant characteristic place of each functional area into a matrix with D rows and T columns, wherein each row represents the activity level change of the characteristic place aiming at D in T time periods to obtain the main activity mode of the functional area, marking the functional area by the most active function in the main activity mode, and completing city functional area identification.

3. The urban functional area identification method according to claim 1 or 2, wherein the urban functional area identification method further comprises the steps of:

step 5, calculating the similarity of each functional area;

step 6, calculating the uniqueness of each functional area, and sequencing each functional area according to the uniqueness;

the similarity of the functional regions is calculated according to the main angle between the corresponding subspaces, and any two functional regions correspond to the subspace S_kAnd S_lSimilarity aff (S) of_k，S_l) The calculation formula is as follows,

wherein the content of the first and second substances,

is that

the uniqueness of the functional regions is inversely proportional to the similarity, if the similarity between the subspaces is high, the functions of the corresponding functional regions will be greatly similar, the uniqueness of the functional regions is low, and each functional region has a high degree of similarity with the corresponding functional regionFunctional region S_iThe uniqueness calculation formula is as follows:

4. The urban functional area identification method according to claim 3, wherein said urban functional area identification method further comprises the steps of:

step 7, calculating the abundance of each functional region, and sequencing each functional region according to the abundance;

the abundance of the functional region is related to the reconstruction error of the significant feature site of each functional region, which is calculated as follows: