CN114661393B - Urban aggregation effect visual analysis method based on floating population data feature clustering - Google Patents

Urban aggregation effect visual analysis method based on floating population data feature clustering Download PDF

Info

Publication number
CN114661393B
CN114661393B CN202210193379.7A CN202210193379A CN114661393B CN 114661393 B CN114661393 B CN 114661393B CN 202210193379 A CN202210193379 A CN 202210193379A CN 114661393 B CN114661393 B CN 114661393B
Authority
CN
China
Prior art keywords
cities
points
density
data
clustering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210193379.7A
Other languages
Chinese (zh)
Other versions
CN114661393A (en
Inventor
秦红星
徐超群
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202210193379.7A priority Critical patent/CN114661393B/en
Publication of CN114661393A publication Critical patent/CN114661393A/en
Application granted granted Critical
Publication of CN114661393B publication Critical patent/CN114661393B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/451Execution arrangements for user interfaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A30/00Adapting or protecting infrastructure or their operation
    • Y02A30/60Planning or developing urban green infrastructure

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Educational Administration (AREA)
  • General Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Probability & Statistics with Applications (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Primary Health Care (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a visual analysis method for urban aggregation effect based on floating population data characteristic clustering, which comprises the following steps: s1: converting the format of the original data set d1, unifying all geographic positions and longitude and latitude coordinates contained in the data item; s2: screening out related data items in the data set by using priori knowledge to form a new data set d2; s3: performing DBSCAN density clustering on geographic coordinates of cities belonging to a city group; s4: K-Means clustering the percentage of the third industry in the inflow population of all cities in the dataset d2; s5: labeling the twice clustering results in the data sets d2 and d 3; s6: visually displaying the data sets d2 and d3 on a front-end page by using an ECharts chart library and adding mouse interaction; s7: the radioability of the center cities is analyzed and the links between the cities are analyzed using an attraction model. The invention provides a novel visual method for judging the intensity of the urban cluster aggregation effect, and the method is applicable to cities in the whole country.

Description

Urban aggregation effect visual analysis method based on floating population data feature clustering
Technical Field
The invention belongs to the field of visual computation, and particularly relates to visual analysis of urban gathering effect by using dynamic monitoring data and clustering modes of floating population.
Background
Along with the continuous improvement of the urban process in China, urban groups are also continuously expanding. The increase of urban inflow population is a manifestation of urban core competitiveness and is also a cause of urban aggregation effect. The aggregate Effect (aggregate Effect), which is an economic Effect produced by various industrial and economic activities in a spatially concentrated manner and a centripetal force attracting the economic activities toward a certain area, is a fundamental factor causing city formation and expansion. The urban "aggregation effect" refers to various influences or economic effects of socioeconomic activity due to space aggregation. Essentially, external economics are a typical manifestation or implementation of the "collective effect" of urban economic systems. However, we see that the urban aggregation effect is an omnibearing external economic effect, is a huge energy released by modern cities, and is an important motive power for the development of modern cities and the development of urbanization. Urban aggregation effects are further divided into: (1) neighbor effect: the method is the influence of the spatial relationship between enterprises and departments on the development of the urban economic activity, and is the economical efficiency brought by the fact that the economic activity is concentrated in the city. The concentration of enterprises in geographic positions is beneficial to the innovation and development of enterprises. (2) division effect: almost any unit of location is gathered together to enjoy the benefits of specialized division of work, such as socialization in service, collaboration in production division of work, etc., which is the division effect. The specialization of the urban industry also means the development of regional specialization. Specialization is communicated with space aggregation. (3) structural effect: refers to the aggregation mode of aggregation elements and the aggregation degree among elements has an effect on urban aggregation. (4) Scale Effect: both productivity benefits, namely production economies of scale, and consumer benefits, namely consumer economies of scale. (5) a depression effect: the interdependence relationship between cities and regions objectively exists a 'depression effect' in a geographic space, which is also called a 'city field effect'.
How to find the prominent field, potential direction and short board of urban development is one of the important subjects of urban research at present. By comparing the occupational constitution of the floating population among different cities, the general industrial structure of the cities can be analyzed, the proportion of different industries in the general industry of the cities can be found, and the development direction can be provided for the urban process.
The visual analysis after clustering the domestic population flow characteristics can be used for researching urban system connection and identifying urban groups, can reflect urban functions and urban attractions, is used for researching and judging market development and investment prospects of cities in businesses, has development potential in areas, grows cities, contracts and overcomes short plates, so that the development condition and rules of population flow are mastered, and the purposes of guiding the service management work of the floating population, predicting the working direction and improving employment quality are achieved. Because of the complex data hierarchy of population flow, the situation of poor characteristic clustering effect exists in the prior research (such as the identification of urban clusters), and the research area is relatively one-sided, in order to solve the problems, a method of K-Means fast clustering and DBSCAN density clustering is provided for carrying out data visualization in a national range, and the analysis accuracy can be improved.
Through retrieval, application publication number CN107609107B, a travel co-occurrence phenomenon visual analysis method based on multi-source city data comprises the following steps: firstly, dividing areas of cities by utilizing road network basic data and simulation tools, then modeling co-occurrence among the areas, carrying out association rule mining on the areas based on taxi track data by utilizing the models and parameters set by users, then mining area functions by combining with urban interest point data, and finally visually displaying co-occurrence mining results and the area functions. The invention can utilize multi-source city data: the taxi track data, the urban road network data and the POI data are used for carrying out full-aspect multi-angle visual analysis and exploration on the region co-occurrence phenomenon and the urban region function, providing effective information for urban traffic planning, and having the characteristics of convenience in analyzing the inherent association of the data, strong operability and the like. The research content of the patent is to analyze taxi track data and urban road network data to provide valuable information for future traffic planning of cities, and the invention utilizes dynamic monitoring data of floating population to analyze the aggregation effect of urban groups and the correlation of industrial structures, points out the current short plates of the cities/regions and provides suggestions for future development directions of the cities.
Application publication number CN109254984A, a visual analysis method for sensing urban dynamic structure evolution rules based on OD data, comprising the following steps: step 1: collecting OD data and storing in a database; step 2: clustering the positions, and clustering the tracks according to the positions and the hours; step 3: constructing a position clustering network sequence according to the hours, and representing the flow relation among all clusters in each hour; step 4: defining an LDA model based on a position clustering network sequence, training to obtain a topic model, and sequencing topics based on importance degrees; step 5: designing a theme-time view, visualizing probability distribution of different themes in each position network, and displaying evolution characteristics of the different themes along with time; step 6: designing an edge association view to intuitively display the spatial distribution of the important areas and the flow relation between the important areas; step 7: and designing an edge flow time distribution view, and displaying the probability of each arc in the edge association view under different time steps. The patent designs different topic views after calculating all OD data mixed with time dimension, and takes visual analysis as a final step, which is inflexible for the visual result of a certain (or several) area, so the method of the patent cannot meet the subjective activity of a user; in the invention, a user can freely select any area (including provinces, cities and regions) on the map by using the mouse, and the development condition of the cities in any area can be compared conveniently by calculating and visualizing according to the data of the area with strong interactivity.
Disclosure of Invention
The present invention is directed to solving the above problems of the prior art. A visual analysis method for urban aggregation effect based on floating population data feature clustering for finding urban functions, potential directions and weak industries is provided. The technical scheme of the invention is as follows:
a visual analysis method of urban concentration effect based on floating population data feature clustering, comprising the following steps:
s1: inputting an original floating population dynamic monitoring data set d1, converting dta format data of the original data set d1 into csv or json format files, unifying geographic position information of inflow places, outflow places, household places and the like of each piece of data into longitude and latitude coordinates, and writing the longitude and latitude coordinates into the data set d 1.
S2: screening out related value data items in the data set by using priori knowledge to form a new data set d2;
s3: extracting longitude and latitude coordinates of cities in all urban groups of the whole country to form a data set d3, and performing DBSCAN density clustering on the d3, wherein the algorithm has two parameters: radius eps and density threshold MinPts;
s4: K-Means clustering the percentage of the third industry in the inflow population of all cities in the dataset d2;
s5: labeling the twice clustering results in the data sets d2 and d 3;
s6: visually displaying the data sets d2 and d3 on a front-end page by using an ECharts chart library and adding mouse interaction;
s7: the radioability of the center cities is analyzed and the links between the cities are analyzed using an attraction model.
Further, the step S2 specifically includes: screening out valuable data items in the data set d1, wherein the valuable data items comprise: the new data set d2 is formed by data including inflow places, professions, industries of the people, salaries, local traffic evaluation and community life evaluation.
Further, in the step S3, longitude and latitude coordinates of cities in all urban groups of the whole country are extracted to form a data set d3, DBSCAN density clustering is performed on the d3, and the algorithm has two parameters: radius eps and density threshold MinPts, comprising the following specific steps:
(1) With each data point x i Drawing a circle with eps as radius, which is called x i Is a neighborhood of eps; counting the points contained in the circle, and if the number of the points in one circle exceeds a density threshold MinPts, marking the circle center of the circle as a core point, which is also called a core object;
(2) If the number of points in the eps neighborhood of a certain point is smaller than the density threshold value but falls in the neighborhood of the core point, the point is called as a boundary point; points that are neither core points nor boundary points, or noise points; core point x i All points in the eps neighborhood of (a) are x i Is directly connected with the density of the steel plate;
(3) If x j From x i Direct density, x k From x j Density up to … x n From x k Density is direct, then x n From x i The density is reachable, and the property indicates the transmissibility through the density, so that the density is reachable;
(4) If for x k Let x i And x j Can all be made of x k The density is up to, then, called x i And x j The density-connected points are connected together to form a cluster.
Further, the clustered sample points processed by the DBSCAN algorithm are divided into: the core points, boundary points and noise points are defined as follows:
core point: for the data set d3, if the epsilon neighborhood of the sample p at least contains MinPts samples, including the sample p, then the sample p is called as a core point, and the number of samples in the epsilon neighborhood of the core point p satisfies:
N ε (p)≥MinPts
wherein the distance between any point q in epsilon neighborhood and core point p is dist (p, q), then N ε The expression of (p) is:
N ε (p)={q∈d3|dist(p,q)≤ε}
boundary points: for sample b of non-core points, if b is within epsilon neighborhood of any core point p, then sample b is referred to as a boundary point, namely:
noise point: for sample n of non-core points, if n is not within epsilon neighborhood of any core point p, then sample n is called noise point, namely:
as long as any two sample points are in a relationship of density direct or density reachable, the two sample points are classified into the same cluster; therefore, the DBSCAN algorithm randomly selects a core point from the data set d3 as a seed, a corresponding cluster is determined by starting from the seed, and when all the core points are traversed, the algorithm is ended, so that a clustering result is obtained.
Further, in the step S4, the K-Means clustering is performed on the percentage of the third industry in the inflow population of all cities in the data set d2, and specifically includes:
the percentage of the number of people belonging to the third industry is calculated, the K-Means cluster is used for dividing the interval of the percentage into 4 types, and the range from large to small respectively represents a core city, a secondary city, a tertiary city and a common city (rural area), and the specific steps are as follows:
(1) Initially selecting the initial centers of 4 classes, and in the kth iteration, solving the distance from any sample to the 4 centers;
(2) Classifying the sample into the class of the center with the shortest distance, and updating the center value of the class by using a mean value method;
(3) For all 4 clustering centers, if the value is unchanged after the updating by the iteration method, ending the iteration; otherwise, continuing iteration to obtain a clustering result.
Further, in the step S6, using the Django framework, the back-end data is visualized using the echartis icon library at the front end, modifying the attribute in the china. Js in the chart library, so that the boundaries of each province, city and county on the chinese map can be visualized, and adding the mouse selection and chart linkage functions.
Further, the step S7 of analyzing the radiant capacity of the center cities and analyzing the links between the cities by using the gravity model specifically includes: analyzing the economic driving capability of a central city of a range selected by a mouse on a map according to a K-Means clustering result, judging the type of the city, and judging which industry is stronger and which short plate exists; comparing the DBSCAN clustering result with the actual city group distribution, and analyzing the hidden condition formed by the city group; and judging the intensity of the flowing personnel in the range selected by the mouse by using an gravitation model, respectively representing two cities or regions by using i and j, and representing the region connection by using the gravitation model.
Further, the gravity model is used for representing the region connection, and the expression is as follows:
I ij for the contact attraction value of two cities or regions, Q i 、Q j Is the number of people going and going in two cities (regions), D ij For the straight line distance between cities, g is the gravitation adjustment coefficient between cities, the parameter can be adjusted to optimize the visual effect, the thickness of the available connecting line between the areas is used for representing the intensity, and the larger the gravitation value is, the stronger the area connection is.
The invention has the advantages and beneficial effects as follows:
the invention provides a visual analysis method of urban gathering effect based on a dynamic monitoring data clustering mode of a floating population. The method fully utilizes the related items in the dynamic monitoring dataset of the floating population, can carry out K-Means rapid clustering or DBSCAN density clustering on partial characteristics, then uses an gravitation model for visualization, uses a front-end related technology to carry out circle selection on a specific area by using a mouse and carries out specific display and analysis on the data of the area in the circle, thereby accurately exploring more potential information (such as the short-board industry existing in the city at present and the degree of connection between cities) and providing a proper development direction for the urbanization process.
Specific innovation 1: the relevant value data items in the data set are screened out in S2 to form a new data set d2. The original data set is dynamic monitoring data of a floating population for several years, and each year comprises more than 80 ten thousand data, each data comprises more than 70 data items, and only more than ten data items conforming to the study, such as: source address, destination address, academic, professional, industry, reasons for willingness to stay local, traffic, infrastructure, community life, medical insurance, sophistication of social insurance, etc. The selection of data items is self-designed, and different data items are given different weights, so that the invention can analyze the process of urbanization from a novel angle by dynamically monitoring data of the floating population.
Specific innovation 2 the invention combines two methods of DBSCAN density clustering and K-Means fast clustering. In the past study of analyzing urban related data, a method of using DBSCAN to perform density clustering on urban position points is rarely used, and the clustering method is very effective for visualizing the clustering effect of urban clusters. The invention combines the two clustering modes, so that not only can the division result of the external urban geographic position be seen, but also the influence of the dynamic monitoring data of the floating population in the city on the urban hierarchy can be seen.
Specific innovation 3: in S6, the conventional study is usually performed on data of a specific city or region, and has a certain limitation. When the front end performs data visualization, the invention realizes a new interaction technology, can meet the requirement of selecting any area on the map by a mouse, uses JavaScript to process and calculate the data of the area, displays the calculated data, and can clearly compare the development difference of each city in the area. The difficulty in implementing the above is high, so that no other people have seen using this way of interaction.
Drawings
FIG. 1 is a flow chart providing a preferred embodiment of the present invention;
FIG. 2 is a DBSCAN density clustering process;
FIG. 3 is a K-Means fast clustering process.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and specifically described below with reference to the drawings in the embodiments of the present invention. The described embodiments are only a few embodiments of the present invention.
The technical scheme for solving the technical problems is as follows:
referring to fig. 1 to 3, the visual analysis method for urban aggregation effect based on a dynamic monitoring data clustering mode of a floating population provided by the invention comprises the following steps:
s1: and converting the dta format data of the original data set d1 into csv or json format files, unifying longitude and latitude coordinates of geographic position information such as inflow place, outflow place, household place and the like of each piece of data, and writing the geographic position information into the data set d1, wherein the details such as longitude and latitude data and regional name mapping, coding format conversion of hundred-degree maps and Goldmap and the like are involved.
S2: valuable data items in the data set d1, such as inflow places, professions, industries of the people, salaries, local traffic evaluation, community life evaluation and the like are screened out to form a new data set d2. For data items such as local traffic evaluation, community life evaluation and the like, the scoring weight of each data item is different, the data items are used as the basis for scoring the life conditions of the city by the mobile personnel, and the scoring result is added into the data set d2. The scoring rules are independent designs after the author's summary of the prior knowledge.
S3: and extracting longitude and latitude coordinates of cities in all urban groups of the whole country to form a data set d3, and performing DBSCAN density clustering on the d 3. The purpose of density clustering is to compare the clustering result with the actual urban group division, and analyze whether the urban economic zone is formed between the urban groups.
The algorithm has two parameters: radius eps and density threshold MinPts, comprising the following specific steps:
(1) With each data point x i And drawing a circle by taking eps as a radius as a circle center. This circle is called x i Is a neighborhood of eps; the points contained within this circle are counted. If the number of points inside a circle exceeds the density threshold MinPts, the center of the circle is marked as a core point, which is also called a core object.
(2) If the number of points in the eps neighborhood of a point is less than the density threshold but falls within the neighborhood of the core point, the point is referred to as a boundary point. Either the core point or the boundary point, or the noise point. Core point x i All points in the eps neighborhood of (a) are x i Is directly through to the density of the product.
(3) If x j From x i Direct density, x k From x j Density up to … x n From x k Density is direct, then x n From x i The density can be achieved. This property illustrates the transmissibility directly from density, from which it can be deduced that the density is reachable.
(4) If for x k Let x i And x j Can all be made of x k The density is up to, then, called x i And x j And (3) density connection. The densely connected points are connected together to form clusters.
The clustered sample points processed by the DBSCAN algorithm are divided into: core points (core points), boundary points (boundary points) and noise points (noise), the three types of sample points are defined as follows:
core point: for the data set d3, if the epsilon neighborhood of the sample p at least contains MinPts samples, including the sample p, then the sample p is called as a core point, and the number of samples in the epsilon neighborhood of the core point p satisfies:
N ε (p)≥MinPts
wherein the distance between any point q in epsilon neighborhood and core point p is dist (p, q), then N ε The expression of (p) is:
N ε (p)={q∈d3|dist(p,q)≤ε}
boundary points: for sample b of non-core points, if b is within epsilon neighborhood of any core point p, then sample b is referred to as a boundary point. Namely:
noise point: for sample n of non-core points, if n is not within epsilon neighborhood of any core point p, then sample n is referred to as a noise point. Namely:
any two sample points are classified into the same cluster as long as they are either density-direct or density-reachable. Therefore, the DBSCAN algorithm randomly selects a core point from the data set d3 as a seed, a corresponding cluster is determined by starting from the seed, and when all the core points are traversed, the algorithm is ended, so that a clustering result is obtained.
S4: in the data set d2, statistics is performed on the number of people belonging to the third industry as the industry of the inflow population of each city in the whole country, the percentage of the number of people belonging to the third industry is calculated, the K-Means cluster is used to divide the percentage interval into 4 categories, and the range from large to small represents a core city, a secondary city, a tertiary city and a common city (rural area), and the specific steps are as follows:
(1) Starting with the proper choice of the initial centers of the 4 classes, in the kth iteration, for any sample, find its distance to the 4 centers;
(2) Classifying the sample into the class of the center with the shortest distance, and updating the center value of the class by using methods such as a mean value and the like;
(3) For all 4 clustering centers, if the value is unchanged after the updating by the iteration method, ending the iteration; otherwise, continuing iteration to obtain a clustering result.
S5: the results of the S3, S4 clusters are marked in the datasets d2, d 3. The data item in d2 thus far contains the source location, destination location, occupation, salary, industry of interest, local property structure, local social life score, local social life class of each mobile person; d3 is mainly the name, geographical position, which city group belongs to and density clustering result in the city group. The purpose of marking the clustering result in advance is to be convenient for directly invoking data in the visualization process, so that the problem that the time spent in the visualization analysis is too long due to the fact that the clustering calculation is carried out by using JavaScript is avoided.
S6: and using a Django frame, visualizing the back-end data at the front end by using an Echarts icon library, modifying the attribute in the China. Js in the chart library, enabling the attributes to be visualized out of the boundaries of each province, city and county on the Chinese map, and adding mouse circle selection and chart linkage functions. The back end of the Django framework uses python as a data processing language to return processed data to the front end, and meanwhile, the front end can also combine various frameworks such as Bootstrap, node. Js and the like to carry out complete interactive design.
S7: analyzing the economic driving capability of a central city of a range selected by a mouse on a map according to a K-Means clustering result, judging the type of the city, and judging which industry is stronger and which short plate exists; comparing the DBSCAN clustering result with the actual city group distribution, and analyzing the hidden condition formed by the city group; judging the intensity of the flowing personnel in the range selected by the mouse by using an gravitation model, respectively representing two cities (areas) by using i and j, and representing the area connection by using the gravitation model, wherein the expression is as follows:
I ij for the contact attraction value of two cities (regions), Q i 、Q j Is the number of people going and going in two cities (regions), D ij For the straight line distance between cities, g is the gravitation adjustment coefficient between cities, the parameter can be adjusted to optimize the visual effect, the thickness of the available connecting line between the areas is used for representing the intensity, and the larger the gravitation value is, the stronger the area connection is. The larger the urban point, the more the population that flows in.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
The above examples should be understood as illustrative only and not limiting the scope of the invention. Various changes and modifications to the present invention may be made by one skilled in the art after reading the teachings herein, and such equivalent changes and modifications are intended to fall within the scope of the invention as defined in the appended claims.

Claims (1)

1. The urban aggregation effect visual analysis method based on the floating population data characteristic clustering is characterized by comprising the following steps of:
s1: inputting an original floating population dynamic monitoring data set d1, converting dta format data of the original data set d1 into csv or json format files, unifying geographic position information of inflow places, outflow places and household places of each piece of data into longitude and latitude coordinates, and writing the longitude and latitude coordinates into the data set d 1;
s2: screening out related value data items in the data set by using priori knowledge to form a new data set d2;
s3: extracting longitude and latitude coordinates of cities in all urban groups of the whole country to form a data set d3, and performing DBSCAN density clustering on the d3, wherein the algorithm has two parameters: radius eps and density threshold MinPts;
s4: K-Means clustering the percentage of the third industry in the inflow population of all cities in the dataset d2;
s5: labeling the twice clustering results in the data sets d2 and d 3;
s6: visually displaying the data sets d2 and d3 on a front-end page by using an ECharts chart library and adding mouse interaction;
s7: analyzing the radiant capacity of the center cities, and analyzing the relations among the cities by using an attraction model;
the step S2 specifically includes: screening out valuable data items in the data set d1, wherein the valuable data items comprise: data including inflow places, professions, industries of the genus, salaries, local traffic evaluation and community life evaluation form a new data set d2;
in the step S3, longitude and latitude coordinates of cities in all urban groups of the whole country are extracted to form a data set d3, DBSCAN density clustering is performed on the d3, and the algorithm has two parameters: radius eps and density threshold MinPts, comprising the following specific steps:
(1) With each data point x i Drawing a circle with eps as radius, which is called x i Is a neighborhood of eps; counting the points contained in the circle if the number of points in a circle exceeds the density thresholdThe value MinPts, then the circle center of the circle is marked as a core point, which is also called a core object;
(2) If the number of points in the eps neighborhood of a certain point is smaller than the density threshold value but falls in the neighborhood of the core point, the point is called as a boundary point; points that are neither core points nor boundary points, or noise points; core point x i All points in the eps neighborhood of (a) are x i Is directly connected with the density of the steel plate;
(3) If x j From x i Direct density, x k From x j Density up to … x n From x k Density is direct, then x n From x i The density is reachable, the property indicates the transmissibility through the density, and the density is derived to be reachable;
(4) If for x k Let x i And x j All are made of x k The density is up to, then, called x i And x j The density is connected, and the points connected with the density are connected together to form a cluster;
the clustered sample points processed by the DBSCAN algorithm are divided into: the core points, boundary points and noise points are defined as follows:
core point: for the data set d3, if the epsilon neighborhood of the sample p at least contains MinPts samples, including the sample p, then the sample p is called as a core point, and the number of samples in the epsilon neighborhood of the core point p satisfies:
N ε (p)≥MinPts
wherein the distance between any point q in epsilon neighborhood and core point p is dist (p, q), then N ε The expression of (p) is:
N ε (p)={q∈d3|dist(p,q)≤ε}
boundary points: for sample b of non-core points, if b is within epsilon neighborhood of any core point p, then sample b is referred to as a boundary point, namely:
noise point: for sample n of non-core points, if n is not within epsilon neighborhood of any core point p, then sample n is called noise point, namely:
as long as any two sample points are in a relationship of density direct or density reachable, the two sample points are classified into the same cluster; therefore, the DBSCAN algorithm randomly selects a core point from the data set d3 as a seed, a corresponding cluster is determined by starting from the seed, and when all the core points are traversed, the algorithm is ended, and a clustering result is obtained;
in the step S4, the K-Means clustering is performed on the percentage of the third industry in the inflow population of all cities in the data set d2, and specifically includes:
the percentage of the number of people belonging to the third industry is calculated, the K-Means cluster is used for dividing the interval of the percentage into 4 types, and the range from large to small respectively represents a core city, a secondary city, a tertiary city and a common city, and the specific steps are as follows:
(1) Initially selecting the initial centers of 4 classes, and in the kth iteration, solving the distance from any sample to the 4 centers;
(2) Classifying the sample into the class of the center with the shortest distance, and updating the center value of the class by using a mean value method;
(3) For all 4 clustering centers, if the value is unchanged after the updating by the iteration method, ending the iteration; otherwise, continuing iteration to obtain a clustering result;
in the step S6, using the Django framework, visualizing the back-end data at the front end using the echartis icon library, modifying the attribute in the china. Js in the chart library, so as to visualize the boundary of each province, city and county on the chinese map, and adding the mouse circle selection and chart linkage functions;
step S7, analyzing the radiation capacity of the center cities, and analyzing the relations among the cities by using an attraction model, wherein the method specifically comprises the following steps: analyzing the economic driving capability of a central city of a range selected by a mouse on a map according to a K-Means clustering result, judging the type of the city, and judging which industry is stronger and which short plate exists; comparing the DBSCAN clustering result with the actual city group distribution, and analyzing the hidden condition formed by the city group; judging the intensity of the flowing personnel in the range selected by the mouse by using an gravitation model, respectively representing two cities or regions by using i and j, and representing region connection by using the gravitation model;
the gravity model is used for representing the region connection, and the expression is as follows:
I ij for the contact attraction value of two cities or regions, Q i 、Q j Is the number of people going and going in two cities or regions, D ij For the straight line distance between cities, g is the gravitation adjustment coefficient between cities, the parameter is adjusted to optimize the visual effect, the thickness of the connecting line between the areas is used for representing the strength, and the larger the gravitation value is, the stronger the area connection is.
CN202210193379.7A 2022-03-01 2022-03-01 Urban aggregation effect visual analysis method based on floating population data feature clustering Active CN114661393B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210193379.7A CN114661393B (en) 2022-03-01 2022-03-01 Urban aggregation effect visual analysis method based on floating population data feature clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210193379.7A CN114661393B (en) 2022-03-01 2022-03-01 Urban aggregation effect visual analysis method based on floating population data feature clustering

Publications (2)

Publication Number Publication Date
CN114661393A CN114661393A (en) 2022-06-24
CN114661393B true CN114661393B (en) 2024-03-22

Family

ID=82026766

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210193379.7A Active CN114661393B (en) 2022-03-01 2022-03-01 Urban aggregation effect visual analysis method based on floating population data feature clustering

Country Status (1)

Country Link
CN (1) CN114661393B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116882828B (en) * 2023-07-14 2024-02-27 北京大学 Historical town network construction and classification evaluation method combining historical and modern tour data
CN117115494B (en) * 2023-10-23 2024-02-06 卡松科技股份有限公司 Lubricating oil impurity pollution detection method and device based on artificial intelligence

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107688955A (en) * 2016-08-03 2018-02-13 浙江工业大学 A kind of city commercial circle group variety division methods based on adaptive DBSCAN Density Clusterings
CN109214462A (en) * 2018-09-25 2019-01-15 东北大学 A kind of spatial data on-line talking method based on distributed incremental DBSCAN algorithm
US10187747B1 (en) * 2018-06-26 2019-01-22 Uber Technologies, Inc. Location search using dynamic regions generated based on service data
AU2020202909A1 (en) * 2017-12-11 2020-05-21 Accenture Global Solutions Limited Machine learning classification and prediction system
CN112288048A (en) * 2020-12-28 2021-01-29 湖南师范大学 Urban crowd trip identification method based on multi-source data driving
CN112765226A (en) * 2020-12-06 2021-05-07 复旦大学 Urban semantic map construction method based on trajectory data mining
CN112765426A (en) * 2021-01-18 2021-05-07 重庆邮电大学 Wasserstein space-based visual dimension reduction method
CN113378891A (en) * 2021-05-18 2021-09-10 东北师范大学 Urban area relation visual analysis method based on track distribution representation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150356099A1 (en) * 2014-06-05 2015-12-10 Walk Score Management, LLC Neighborhood similarity tool and method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107688955A (en) * 2016-08-03 2018-02-13 浙江工业大学 A kind of city commercial circle group variety division methods based on adaptive DBSCAN Density Clusterings
AU2020202909A1 (en) * 2017-12-11 2020-05-21 Accenture Global Solutions Limited Machine learning classification and prediction system
US10187747B1 (en) * 2018-06-26 2019-01-22 Uber Technologies, Inc. Location search using dynamic regions generated based on service data
CN109214462A (en) * 2018-09-25 2019-01-15 东北大学 A kind of spatial data on-line talking method based on distributed incremental DBSCAN algorithm
CN112765226A (en) * 2020-12-06 2021-05-07 复旦大学 Urban semantic map construction method based on trajectory data mining
CN112288048A (en) * 2020-12-28 2021-01-29 湖南师范大学 Urban crowd trip identification method based on multi-source data driving
CN112765426A (en) * 2021-01-18 2021-05-07 重庆邮电大学 Wasserstein space-based visual dimension reduction method
CN113378891A (en) * 2021-05-18 2021-09-10 东北师范大学 Urban area relation visual analysis method based on track distribution representation

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
城市创意产业空间动态集聚演化的计算与可视优化方法;周琦;地球信息科学学报;第22卷(第05期);1033-1048 *
基于GIS的河南省县级城市空间聚类研究与实现;徐芃;测绘与空间地理信息;第39卷(第09期);50-53 *
聚类分析在城市客流聚集风险分析中的应用;张嘉成;电信快报(第01期);25-30 *

Also Published As

Publication number Publication date
CN114661393A (en) 2022-06-24

Similar Documents

Publication Publication Date Title
Lv et al. A look back and a leap forward: a review and synthesis of big data and artificial intelligence literature in hospitality and tourism
CN110245981B (en) Crowd type identification method based on mobile phone signaling data
Zhang et al. Knowledge mapping of tourism demand forecasting research
CN114661393B (en) Urban aggregation effect visual analysis method based on floating population data feature clustering
Wu et al. Analyzing spatial heterogeneity of housing prices using large datasets
CN113378891B (en) Urban area relation visual analysis method based on track distribution representation
Yang et al. A constraint-based approach for identifying the urban–rural fringe of polycentric cities using multi-sourced data
CN102163214A (en) Numerical map generation device and method thereof
CN106408110A (en) Method and system for scenic spot locking
Tang et al. Visual analysis of traffic data based on topic modeling (ChinaVis 2017)
CN114510566A (en) Hot word mining, classifying and analyzing method and system based on work order
Wu et al. Research themes of geographical information science during 1991–2020: a retrospective bibliometric analysis
Yu et al. A heuristic approach to the generalization of complex building groups in urban villages
Wojan et al. Decomposing regional patenting rates: How the composition factor confounds the rate factor
Luo et al. Exploring destination image through online reviews: an augmented mining model using latent Dirichlet allocation combined with probabilistic hesitant fuzzy algorithm
Qiu et al. RPSBPT: A route planning scheme with best profit for taxi
CN116957280A (en) Mountain village and town community life circle public service facility configuration method based on supply and demand balance
Evers et al. Constructing epistemic landscapes: Methods of GIS-based mapping
Parry Computing research in South Africa: A scientometric investigation
Hasanzadeh SoftGIS data mining and analysis: A case study of urban impression in Helsinki
CN112668836A (en) Risk graph-oriented associated risk evidence efficient mining and monitoring method and device
Najafpour et al. Finding ways in an unfamiliar tourist destination: Salient clues for visitors to a Malaysian Town
Han et al. Nonlinear relationship between the urban form and street vitality: a data informed approach involving twelve Chinese cities
CN114862276B (en) Method and system for collaborative analysis and application of large data of producing city
Qu et al. From blocks to cities: Morphology structure rooted in 3D patterns and forming clusters at the block level

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant