CN114661393B

CN114661393B - Urban aggregation effect visual analysis method based on floating population data feature clustering

Info

Publication number: CN114661393B
Application number: CN202210193379.7A
Authority: CN
Inventors: 秦红星; 徐超群
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2022-03-01
Filing date: 2022-03-01
Publication date: 2024-03-22
Anticipated expiration: 2042-03-01
Also published as: CN114661393A

Abstract

The invention discloses a visual analysis method for urban aggregation effect based on floating population data characteristic clustering, which comprises the following steps: s1: converting the format of the original data set d1, unifying all geographic positions and longitude and latitude coordinates contained in the data item; s2: screening out related data items in the data set by using priori knowledge to form a new data set d2; s3: performing DBSCAN density clustering on geographic coordinates of cities belonging to a city group; s4: K-Means clustering the percentage of the third industry in the inflow population of all cities in the dataset d2; s5: labeling the twice clustering results in the data sets d2 and d 3; s6: visually displaying the data sets d2 and d3 on a front-end page by using an ECharts chart library and adding mouse interaction; s7: the radioability of the center cities is analyzed and the links between the cities are analyzed using an attraction model. The invention provides a novel visual method for judging the intensity of the urban cluster aggregation effect, and the method is applicable to cities in the whole country.

Description

Urban aggregation effect visual analysis method based on floating population data feature clustering

Technical Field

The invention belongs to the field of visual computation, and particularly relates to visual analysis of urban gathering effect by using dynamic monitoring data and clustering modes of floating population.

Background

Along with the continuous improvement of the urban process in China, urban groups are also continuously expanding. The increase of urban inflow population is a manifestation of urban core competitiveness and is also a cause of urban aggregation effect. The aggregate Effect (aggregate Effect), which is an economic Effect produced by various industrial and economic activities in a spatially concentrated manner and a centripetal force attracting the economic activities toward a certain area, is a fundamental factor causing city formation and expansion. The urban "aggregation effect" refers to various influences or economic effects of socioeconomic activity due to space aggregation. Essentially, external economics are a typical manifestation or implementation of the "collective effect" of urban economic systems. However, we see that the urban aggregation effect is an omnibearing external economic effect, is a huge energy released by modern cities, and is an important motive power for the development of modern cities and the development of urbanization. Urban aggregation effects are further divided into: (1) neighbor effect: the method is the influence of the spatial relationship between enterprises and departments on the development of the urban economic activity, and is the economical efficiency brought by the fact that the economic activity is concentrated in the city. The concentration of enterprises in geographic positions is beneficial to the innovation and development of enterprises. (2) division effect: almost any unit of location is gathered together to enjoy the benefits of specialized division of work, such as socialization in service, collaboration in production division of work, etc., which is the division effect. The specialization of the urban industry also means the development of regional specialization. Specialization is communicated with space aggregation. (3) structural effect: refers to the aggregation mode of aggregation elements and the aggregation degree among elements has an effect on urban aggregation. (4) Scale Effect: both productivity benefits, namely production economies of scale, and consumer benefits, namely consumer economies of scale. (5) a depression effect: the interdependence relationship between cities and regions objectively exists a 'depression effect' in a geographic space, which is also called a 'city field effect'.

How to find the prominent field, potential direction and short board of urban development is one of the important subjects of urban research at present. By comparing the occupational constitution of the floating population among different cities, the general industrial structure of the cities can be analyzed, the proportion of different industries in the general industry of the cities can be found, and the development direction can be provided for the urban process.

The visual analysis after clustering the domestic population flow characteristics can be used for researching urban system connection and identifying urban groups, can reflect urban functions and urban attractions, is used for researching and judging market development and investment prospects of cities in businesses, has development potential in areas, grows cities, contracts and overcomes short plates, so that the development condition and rules of population flow are mastered, and the purposes of guiding the service management work of the floating population, predicting the working direction and improving employment quality are achieved. Because of the complex data hierarchy of population flow, the situation of poor characteristic clustering effect exists in the prior research (such as the identification of urban clusters), and the research area is relatively one-sided, in order to solve the problems, a method of K-Means fast clustering and DBSCAN density clustering is provided for carrying out data visualization in a national range, and the analysis accuracy can be improved.

Through retrieval, application publication number CN107609107B, a travel co-occurrence phenomenon visual analysis method based on multi-source city data comprises the following steps: firstly, dividing areas of cities by utilizing road network basic data and simulation tools, then modeling co-occurrence among the areas, carrying out association rule mining on the areas based on taxi track data by utilizing the models and parameters set by users, then mining area functions by combining with urban interest point data, and finally visually displaying co-occurrence mining results and the area functions. The invention can utilize multi-source city data: the taxi track data, the urban road network data and the POI data are used for carrying out full-aspect multi-angle visual analysis and exploration on the region co-occurrence phenomenon and the urban region function, providing effective information for urban traffic planning, and having the characteristics of convenience in analyzing the inherent association of the data, strong operability and the like. The research content of the patent is to analyze taxi track data and urban road network data to provide valuable information for future traffic planning of cities, and the invention utilizes dynamic monitoring data of floating population to analyze the aggregation effect of urban groups and the correlation of industrial structures, points out the current short plates of the cities/regions and provides suggestions for future development directions of the cities.

Application publication number CN109254984A, a visual analysis method for sensing urban dynamic structure evolution rules based on OD data, comprising the following steps: step 1: collecting OD data and storing in a database; step 2: clustering the positions, and clustering the tracks according to the positions and the hours; step 3: constructing a position clustering network sequence according to the hours, and representing the flow relation among all clusters in each hour; step 4: defining an LDA model based on a position clustering network sequence, training to obtain a topic model, and sequencing topics based on importance degrees; step 5: designing a theme-time view, visualizing probability distribution of different themes in each position network, and displaying evolution characteristics of the different themes along with time; step 6: designing an edge association view to intuitively display the spatial distribution of the important areas and the flow relation between the important areas; step 7: and designing an edge flow time distribution view, and displaying the probability of each arc in the edge association view under different time steps. The patent designs different topic views after calculating all OD data mixed with time dimension, and takes visual analysis as a final step, which is inflexible for the visual result of a certain (or several) area, so the method of the patent cannot meet the subjective activity of a user; in the invention, a user can freely select any area (including provinces, cities and regions) on the map by using the mouse, and the development condition of the cities in any area can be compared conveniently by calculating and visualizing according to the data of the area with strong interactivity.

Disclosure of Invention

The present invention is directed to solving the above problems of the prior art. A visual analysis method for urban aggregation effect based on floating population data feature clustering for finding urban functions, potential directions and weak industries is provided. The technical scheme of the invention is as follows:

a visual analysis method of urban concentration effect based on floating population data feature clustering, comprising the following steps:

s1: inputting an original floating population dynamic monitoring data set d1, converting dta format data of the original data set d1 into csv or json format files, unifying geographic position information of inflow places, outflow places, household places and the like of each piece of data into longitude and latitude coordinates, and writing the longitude and latitude coordinates into the data set d 1.

S2: screening out related value data items in the data set by using priori knowledge to form a new data set d2;

s3: extracting longitude and latitude coordinates of cities in all urban groups of the whole country to form a data set d3, and performing DBSCAN density clustering on the d3, wherein the algorithm has two parameters: radius eps and density threshold MinPts;

s4: K-Means clustering the percentage of the third industry in the inflow population of all cities in the dataset d2;

s5: labeling the twice clustering results in the data sets d2 and d 3;

s6: visually displaying the data sets d2 and d3 on a front-end page by using an ECharts chart library and adding mouse interaction;

s7: the radioability of the center cities is analyzed and the links between the cities are analyzed using an attraction model.

Further, the step S2 specifically includes: screening out valuable data items in the data set d1, wherein the valuable data items comprise: the new data set d2 is formed by data including inflow places, professions, industries of the people, salaries, local traffic evaluation and community life evaluation.

Further, in the step S3, longitude and latitude coordinates of cities in all urban groups of the whole country are extracted to form a data set d3, DBSCAN density clustering is performed on the d3, and the algorithm has two parameters: radius eps and density threshold MinPts, comprising the following specific steps:

(1) With each data point x _i Drawing a circle with eps as radius, which is called x _i Is a neighborhood of eps; counting the points contained in the circle, and if the number of the points in one circle exceeds a density threshold MinPts, marking the circle center of the circle as a core point, which is also called a core object;

(2) If the number of points in the eps neighborhood of a certain point is smaller than the density threshold value but falls in the neighborhood of the core point, the point is called as a boundary point; points that are neither core points nor boundary points, or noise points; core point x _i All points in the eps neighborhood of (a) are x _i Is directly connected with the density of the steel plate;

(3) If x _j From x _i Direct density, x _k From x _j Density up to … x _n From x _k Density is direct, then x _n From x _i The density is reachable, and the property indicates the transmissibility through the density, so that the density is reachable;

(4) If for x _k Let x _i And x _j Can all be made of x _k The density is up to, then, called x _i And x _j The density-connected points are connected together to form a cluster.

Further, the clustered sample points processed by the DBSCAN algorithm are divided into: the core points, boundary points and noise points are defined as follows:

core point: for the data set d3, if the epsilon neighborhood of the sample p at least contains MinPts samples, including the sample p, then the sample p is called as a core point, and the number of samples in the epsilon neighborhood of the core point p satisfies:

N _ε (p)≥MinPts

wherein the distance between any point q in epsilon neighborhood and core point p is dist (p, q), then N _ε The expression of (p) is:

N _ε (p)＝{q∈d3|dist(p,q)≤ε}

boundary points: for sample b of non-core points, if b is within epsilon neighborhood of any core point p, then sample b is referred to as a boundary point, namely:

noise point: for sample n of non-core points, if n is not within epsilon neighborhood of any core point p, then sample n is called noise point, namely:

as long as any two sample points are in a relationship of density direct or density reachable, the two sample points are classified into the same cluster; therefore, the DBSCAN algorithm randomly selects a core point from the data set d3 as a seed, a corresponding cluster is determined by starting from the seed, and when all the core points are traversed, the algorithm is ended, so that a clustering result is obtained.

Further, in the step S4, the K-Means clustering is performed on the percentage of the third industry in the inflow population of all cities in the data set d2, and specifically includes:

the percentage of the number of people belonging to the third industry is calculated, the K-Means cluster is used for dividing the interval of the percentage into 4 types, and the range from large to small respectively represents a core city, a secondary city, a tertiary city and a common city (rural area), and the specific steps are as follows:

(1) Initially selecting the initial centers of 4 classes, and in the kth iteration, solving the distance from any sample to the 4 centers;

(2) Classifying the sample into the class of the center with the shortest distance, and updating the center value of the class by using a mean value method;

(3) For all 4 clustering centers, if the value is unchanged after the updating by the iteration method, ending the iteration; otherwise, continuing iteration to obtain a clustering result.

Further, in the step S6, using the Django framework, the back-end data is visualized using the echartis icon library at the front end, modifying the attribute in the china. Js in the chart library, so that the boundaries of each province, city and county on the chinese map can be visualized, and adding the mouse selection and chart linkage functions.

Further, the step S7 of analyzing the radiant capacity of the center cities and analyzing the links between the cities by using the gravity model specifically includes: analyzing the economic driving capability of a central city of a range selected by a mouse on a map according to a K-Means clustering result, judging the type of the city, and judging which industry is stronger and which short plate exists; comparing the DBSCAN clustering result with the actual city group distribution, and analyzing the hidden condition formed by the city group; and judging the intensity of the flowing personnel in the range selected by the mouse by using an gravitation model, respectively representing two cities or regions by using i and j, and representing the region connection by using the gravitation model.

Further, the gravity model is used for representing the region connection, and the expression is as follows:

I _ij for the contact attraction value of two cities or regions, Q _i 、Q _j Is the number of people going and going in two cities (regions), D _ij For the straight line distance between cities, g is the gravitation adjustment coefficient between cities, the parameter can be adjusted to optimize the visual effect, the thickness of the available connecting line between the areas is used for representing the intensity, and the larger the gravitation value is, the stronger the area connection is.

The invention has the advantages and beneficial effects as follows:

the invention provides a visual analysis method of urban gathering effect based on a dynamic monitoring data clustering mode of a floating population. The method fully utilizes the related items in the dynamic monitoring dataset of the floating population, can carry out K-Means rapid clustering or DBSCAN density clustering on partial characteristics, then uses an gravitation model for visualization, uses a front-end related technology to carry out circle selection on a specific area by using a mouse and carries out specific display and analysis on the data of the area in the circle, thereby accurately exploring more potential information (such as the short-board industry existing in the city at present and the degree of connection between cities) and providing a proper development direction for the urbanization process.

Specific innovation 1: the relevant value data items in the data set are screened out in S2 to form a new data set d2. The original data set is dynamic monitoring data of a floating population for several years, and each year comprises more than 80 ten thousand data, each data comprises more than 70 data items, and only more than ten data items conforming to the study, such as: source address, destination address, academic, professional, industry, reasons for willingness to stay local, traffic, infrastructure, community life, medical insurance, sophistication of social insurance, etc. The selection of data items is self-designed, and different data items are given different weights, so that the invention can analyze the process of urbanization from a novel angle by dynamically monitoring data of the floating population.

Specific innovation 2 the invention combines two methods of DBSCAN density clustering and K-Means fast clustering. In the past study of analyzing urban related data, a method of using DBSCAN to perform density clustering on urban position points is rarely used, and the clustering method is very effective for visualizing the clustering effect of urban clusters. The invention combines the two clustering modes, so that not only can the division result of the external urban geographic position be seen, but also the influence of the dynamic monitoring data of the floating population in the city on the urban hierarchy can be seen.

Specific innovation 3: in S6, the conventional study is usually performed on data of a specific city or region, and has a certain limitation. When the front end performs data visualization, the invention realizes a new interaction technology, can meet the requirement of selecting any area on the map by a mouse, uses JavaScript to process and calculate the data of the area, displays the calculated data, and can clearly compare the development difference of each city in the area. The difficulty in implementing the above is high, so that no other people have seen using this way of interaction.

Drawings

FIG. 1 is a flow chart providing a preferred embodiment of the present invention;

FIG. 2 is a DBSCAN density clustering process;

FIG. 3 is a K-Means fast clustering process.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and specifically described below with reference to the drawings in the embodiments of the present invention. The described embodiments are only a few embodiments of the present invention.

The technical scheme for solving the technical problems is as follows:

referring to fig. 1 to 3, the visual analysis method for urban aggregation effect based on a dynamic monitoring data clustering mode of a floating population provided by the invention comprises the following steps:

s1: and converting the dta format data of the original data set d1 into csv or json format files, unifying longitude and latitude coordinates of geographic position information such as inflow place, outflow place, household place and the like of each piece of data, and writing the geographic position information into the data set d1, wherein the details such as longitude and latitude data and regional name mapping, coding format conversion of hundred-degree maps and Goldmap and the like are involved.

S2: valuable data items in the data set d1, such as inflow places, professions, industries of the people, salaries, local traffic evaluation, community life evaluation and the like are screened out to form a new data set d2. For data items such as local traffic evaluation, community life evaluation and the like, the scoring weight of each data item is different, the data items are used as the basis for scoring the life conditions of the city by the mobile personnel, and the scoring result is added into the data set d2. The scoring rules are independent designs after the author's summary of the prior knowledge.

S3: and extracting longitude and latitude coordinates of cities in all urban groups of the whole country to form a data set d3, and performing DBSCAN density clustering on the d 3. The purpose of density clustering is to compare the clustering result with the actual urban group division, and analyze whether the urban economic zone is formed between the urban groups.

The algorithm has two parameters: radius eps and density threshold MinPts, comprising the following specific steps:

(1) With each data point x _i And drawing a circle by taking eps as a radius as a circle center. This circle is called x _i Is a neighborhood of eps; the points contained within this circle are counted. If the number of points inside a circle exceeds the density threshold MinPts, the center of the circle is marked as a core point, which is also called a core object.

(2) If the number of points in the eps neighborhood of a point is less than the density threshold but falls within the neighborhood of the core point, the point is referred to as a boundary point. Either the core point or the boundary point, or the noise point. Core point x _i All points in the eps neighborhood of (a) are x _i Is directly through to the density of the product.

(3) If x _j From x _i Direct density, x _k From x _j Density up to … x _n From x _k Density is direct, then x _n From x _i The density can be achieved. This property illustrates the transmissibility directly from density, from which it can be deduced that the density is reachable.

(4) If for x _k Let x _i And x _j Can all be made of x _k The density is up to, then, called x _i And x _j And (3) density connection. The densely connected points are connected together to form clusters.

The clustered sample points processed by the DBSCAN algorithm are divided into: core points (core points), boundary points (boundary points) and noise points (noise), the three types of sample points are defined as follows:

N _ε (p)≥MinPts

N _ε (p)＝{q∈d3|dist(p,q)≤ε}

boundary points: for sample b of non-core points, if b is within epsilon neighborhood of any core point p, then sample b is referred to as a boundary point. Namely:

noise point: for sample n of non-core points, if n is not within epsilon neighborhood of any core point p, then sample n is referred to as a noise point. Namely:

any two sample points are classified into the same cluster as long as they are either density-direct or density-reachable. Therefore, the DBSCAN algorithm randomly selects a core point from the data set d3 as a seed, a corresponding cluster is determined by starting from the seed, and when all the core points are traversed, the algorithm is ended, so that a clustering result is obtained.

S4: in the data set d2, statistics is performed on the number of people belonging to the third industry as the industry of the inflow population of each city in the whole country, the percentage of the number of people belonging to the third industry is calculated, the K-Means cluster is used to divide the percentage interval into 4 categories, and the range from large to small represents a core city, a secondary city, a tertiary city and a common city (rural area), and the specific steps are as follows:

(1) Starting with the proper choice of the initial centers of the 4 classes, in the kth iteration, for any sample, find its distance to the 4 centers;

(2) Classifying the sample into the class of the center with the shortest distance, and updating the center value of the class by using methods such as a mean value and the like;

S5: the results of the S3, S4 clusters are marked in the datasets d2, d 3. The data item in d2 thus far contains the source location, destination location, occupation, salary, industry of interest, local property structure, local social life score, local social life class of each mobile person; d3 is mainly the name, geographical position, which city group belongs to and density clustering result in the city group. The purpose of marking the clustering result in advance is to be convenient for directly invoking data in the visualization process, so that the problem that the time spent in the visualization analysis is too long due to the fact that the clustering calculation is carried out by using JavaScript is avoided.

S6: and using a Django frame, visualizing the back-end data at the front end by using an Echarts icon library, modifying the attribute in the China. Js in the chart library, enabling the attributes to be visualized out of the boundaries of each province, city and county on the Chinese map, and adding mouse circle selection and chart linkage functions. The back end of the Django framework uses python as a data processing language to return processed data to the front end, and meanwhile, the front end can also combine various frameworks such as Bootstrap, node. Js and the like to carry out complete interactive design.

S7: analyzing the economic driving capability of a central city of a range selected by a mouse on a map according to a K-Means clustering result, judging the type of the city, and judging which industry is stronger and which short plate exists; comparing the DBSCAN clustering result with the actual city group distribution, and analyzing the hidden condition formed by the city group; judging the intensity of the flowing personnel in the range selected by the mouse by using an gravitation model, respectively representing two cities (areas) by using i and j, and representing the area connection by using the gravitation model, wherein the expression is as follows:

I _ij for the contact attraction value of two cities (regions), Q _i 、Q _j Is the number of people going and going in two cities (regions), D _ij For the straight line distance between cities, g is the gravitation adjustment coefficient between cities, the parameter can be adjusted to optimize the visual effect, the thickness of the available connecting line between the areas is used for representing the intensity, and the larger the gravitation value is, the stronger the area connection is. The larger the urban point, the more the population that flows in.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

The above examples should be understood as illustrative only and not limiting the scope of the invention. Various changes and modifications to the present invention may be made by one skilled in the art after reading the teachings herein, and such equivalent changes and modifications are intended to fall within the scope of the invention as defined in the appended claims.

Claims

1. The urban aggregation effect visual analysis method based on the floating population data characteristic clustering is characterized by comprising the following steps of:

s1: inputting an original floating population dynamic monitoring data set d1, converting dta format data of the original data set d1 into csv or json format files, unifying geographic position information of inflow places, outflow places and household places of each piece of data into longitude and latitude coordinates, and writing the longitude and latitude coordinates into the data set d 1;

s5: labeling the twice clustering results in the data sets d2 and d 3;

s7: analyzing the radiant capacity of the center cities, and analyzing the relations among the cities by using an attraction model;

the step S2 specifically includes: screening out valuable data items in the data set d1, wherein the valuable data items comprise: data including inflow places, professions, industries of the genus, salaries, local traffic evaluation and community life evaluation form a new data set d2;

in the step S3, longitude and latitude coordinates of cities in all urban groups of the whole country are extracted to form a data set d3, DBSCAN density clustering is performed on the d3, and the algorithm has two parameters: radius eps and density threshold MinPts, comprising the following specific steps:

(1) With each data point x _i Drawing a circle with eps as radius, which is called x _i Is a neighborhood of eps; counting the points contained in the circle if the number of points in a circle exceeds the density thresholdThe value MinPts, then the circle center of the circle is marked as a core point, which is also called a core object;

(3) If x _j From x _i Direct density, x _k From x _j Density up to … x _n From x _k Density is direct, then x _n From x _i The density is reachable, the property indicates the transmissibility through the density, and the density is derived to be reachable;

(4) If for x _k Let x _i And x _j All are made of x _k The density is up to, then, called x _i And x _j The density is connected, and the points connected with the density are connected together to form a cluster;

the clustered sample points processed by the DBSCAN algorithm are divided into: the core points, boundary points and noise points are defined as follows:

N _ε (p)≥MinPts

N _ε (p)＝{q∈d3|dist(p，q)≤ε}

as long as any two sample points are in a relationship of density direct or density reachable, the two sample points are classified into the same cluster; therefore, the DBSCAN algorithm randomly selects a core point from the data set d3 as a seed, a corresponding cluster is determined by starting from the seed, and when all the core points are traversed, the algorithm is ended, and a clustering result is obtained;

in the step S4, the K-Means clustering is performed on the percentage of the third industry in the inflow population of all cities in the data set d2, and specifically includes:

the percentage of the number of people belonging to the third industry is calculated, the K-Means cluster is used for dividing the interval of the percentage into 4 types, and the range from large to small respectively represents a core city, a secondary city, a tertiary city and a common city, and the specific steps are as follows:

(3) For all 4 clustering centers, if the value is unchanged after the updating by the iteration method, ending the iteration; otherwise, continuing iteration to obtain a clustering result;

in the step S6, using the Django framework, visualizing the back-end data at the front end using the echartis icon library, modifying the attribute in the china. Js in the chart library, so as to visualize the boundary of each province, city and county on the chinese map, and adding the mouse circle selection and chart linkage functions;

step S7, analyzing the radiation capacity of the center cities, and analyzing the relations among the cities by using an attraction model, wherein the method specifically comprises the following steps: analyzing the economic driving capability of a central city of a range selected by a mouse on a map according to a K-Means clustering result, judging the type of the city, and judging which industry is stronger and which short plate exists; comparing the DBSCAN clustering result with the actual city group distribution, and analyzing the hidden condition formed by the city group; judging the intensity of the flowing personnel in the range selected by the mouse by using an gravitation model, respectively representing two cities or regions by using i and j, and representing region connection by using the gravitation model;

the gravity model is used for representing the region connection, and the expression is as follows:

I _ij for the contact attraction value of two cities or regions, Q _i 、Q _j Is the number of people going and going in two cities or regions, D _ij For the straight line distance between cities, g is the gravitation adjustment coefficient between cities, the parameter is adjusted to optimize the visual effect, the thickness of the connecting line between the areas is used for representing the strength, and the larger the gravitation value is, the stronger the area connection is.