CN108038734B - Urban commercial facility spatial distribution detection method and system based on comment data - Google Patents

Urban commercial facility spatial distribution detection method and system based on comment data Download PDF

Info

Publication number
CN108038734B
CN108038734B CN201711425589.XA CN201711425589A CN108038734B CN 108038734 B CN108038734 B CN 108038734B CN 201711425589 A CN201711425589 A CN 201711425589A CN 108038734 B CN108038734 B CN 108038734B
Authority
CN
China
Prior art keywords
commercial
centrality
road network
facility
facilities
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201711425589.XA
Other languages
Chinese (zh)
Other versions
CN108038734A (en
Inventor
王艳东
赵晓明
王腾
付小康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN201711425589.XA priority Critical patent/CN108038734B/en
Publication of CN108038734A publication Critical patent/CN108038734A/en
Application granted granted Critical
Publication of CN108038734B publication Critical patent/CN108038734B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Abstract

The invention provides a city commercial facility spatial distribution detection method and system based on comment data, which comprises the steps of initializing, detecting the difference of commercial facility spatial distribution patterns scored by different users under the constraint of a road network through a commercial facility network K function; detecting the difference of the space distribution modes of commercial facilities with different user scores, subway stations and commercial centers through K functions of the commercial facilities and other facility networks; the method comprises the steps of exploring the relation between the spatial distribution of commercial facilities with different user scores and the road network centrality by calculating the correlation coefficient between the commercial facilities with different user scores and the road network centrality; the method comprises the steps of detecting the conditions that the influence of all factors on the number of shops, the number of user comments and the user score changes along with the spatial position from the population distribution condition, the traffic facility condition and the road network centrality in combination with a road network structure. The invention discusses a novel large-scale data source for the research of spatial distribution and site selection of urban commercial facilities.

Description

Urban commercial facility spatial distribution detection method and system based on comment data
Technical Field
The invention relates to the field of social media data application, in particular to a city commercial facility spatial distribution detection method based on comment data.
Background
In recent years, social media such as Twitter, Microblog (Sina Microblog), public opinion network (dianping. com) and the like have rapidly developed and become important communication media for human beings. In 2011 twitter users released about 2 million twitches per day on average, while in 2012 the number had doubled, rising to 4 million twitches per day, essentially 27 million twitches per minute. At the same time, Fliker users upload more than 3000 pictures per minute, YouTube users upload nearly 72 hours of video files per minute. These social media are important big data sources. Concurrent with social media big data development is a proliferation in the number of location-aware devices. This allows the content contributed by most users through the web service to be accompanied by geographic information. This gradually pushed the emergence of a new type of geospatial information: user generated multimedia data with geographical location information and with different subject matter content. This data can propagate real-time, information on a variety of important events.
Com is one of the social media applications that is popular in our country in recent years. The public comment comprises city merchant information, and mass data related to daily life of consuming users are continuously generated, wherein the data comprise various attributes such as time, user comments and the like. The merchant information comprises a plurality of aspects including information such as the geographic position, name, category, ID and location area of the merchant, the consumption comment comprises information such as user rating and comment text, and the information provides data support for analyzing the customer encounter and the spatial distribution of customer satisfaction.
In the last 30 years, due to the rapid development of urban economy and the increasing living standard of people, the income of the retail industry in China shows a rapid growth trend which far exceeds the GDP growth speed of the nation, and the rapid development of urban commercial facilities plays an extremely important role in the national economic development, the employment channel expansion and the living quality of urban residents. Because the large city in China is still in the stage of great revolution and rapid development at present, the reasonable commercial facility layout is beneficial to the economic development of the city and is beneficial to improving the life quality of urban residents, and the unreasonable commercial facility distribution is not only not beneficial to the economic development of the city, but also has negative effects on the living standard of people in the city. The research on the spatial distribution mode and the spatial distribution characteristics of urban commercial facilities has urgent needs for urban planners, governments, merchants and the like, and has very important significance for reasonable allocation of various resources in cities, site selection of commercial facilities, healthy development of urban economy and the like.
Disclosure of Invention
Aiming at the problems, the invention provides a technical scheme for researching the spatial distribution of urban commercial facilities based on comment data.
The technical scheme of the invention provides a method for detecting the spatial distribution of urban commercial facilities based on comment data, which comprises the following steps,
initializing, namely acquiring data, selecting a research area, performing necessary preprocessing on the data, realizing the selection of the research area and a research object and the acquisition and preprocessing of comment data with position information and comment information, and classifying commercial facilities according to the grade of a user;
step two, initial detection, including detecting the difference of the commercial facility spatial distribution mode scored by different users under the constraint of the road network through a commercial facility network K function;
step three, first detection comprises detecting the difference of the space distribution modes of the commercial facilities with different user scores, the subway stations and the commercial centers through the K functions of the commercial facilities and other facility networks;
step four, second detection comprises the steps of exploring the relation between the spatial distribution of commercial facilities with different user scores and the road network centrality, and calculating the correlation coefficient between the distribution of the commercial facilities with different user scores and the road network centrality, so that the index values of the intermediate centrality, the adjacent centrality and the straight centrality of each node of the road network are calculated respectively;
calculating a planar kernel density estimate for a high user rating business facility, a low user rating business facility, and all business facilities;
respectively calculating plane kernel density estimated values of centrality, proximity centrality and straight centrality of three index intermediaries of the centrality of the road network in the research area;
calculating the correlation coefficients of high user score, low user score and centrality between all commercial facilities and three road networks respectively in the same grid unit;
and step five, detecting the third, namely detecting the situation that the influence of all factors on the shop number, the user comment number and the user score changes along with the spatial position by starting from the population distribution situation, the traffic facility situation and the road network centrality of the three factors influencing the distribution profile of the urban commercial facilities and combining with the road network structure.
Furthermore, the fifth step is implemented as follows,
carrying out grid division on a research area;
extracting the number of commercial facilities in each grid, the sign-in number of the commercial facilities of the commenting users, the distance from the subway station and the road network centrality value;
the collinearity among the variables is checked by adopting a principal component analysis method and a correlation coefficient check method by taking the calculated population, distance from a subway station and road network centrality as variables;
taking three factors of population, distance from a subway station and road network centrality obtained by calculation as model independent variables, taking the number of commercial facilities, the rating of commercial facility users and the number of commercial facility user reviews as model dependent variables, modeling the model dependent variables by adopting a GWR model and an OLS model to obtain a model evaluation index, and comparing the difference between the GWR model and the OLS model;
finally, solving the minimum value, the maximum value and the average value of coefficients of three factors of population, distance from a subway station and road network centrality by adopting a GWR model;
the method comprises the steps of visualizing coefficients of three factors of population, distance from a subway station and road network centrality on the number of commercial facilities, the rating of commercial facility users and the number of commercial facility user reviews on space, analyzing spatial inconsistency of influence degrees of the three factors on the number of the commercial facilities, the rating of the commercial facility users and the number of the commercial facility users, and finding out different influence degrees of the three factors on the number of the commercial facilities, the rating of the commercial facility users and the number of the commercial facility users.
The invention also provides a system for detecting the spatial distribution of urban commercial facilities based on the comment data, which comprises the following modules,
the initialization module is used for collecting data and selecting research areas, carrying out necessary preprocessing on the data and classifying commercial facilities according to the grade of a user;
the initial detection module is used for detecting the difference of commercial facility spatial distribution modes scored by different users under the constraint of a road network through a commercial facility network K function;
the first detection module is used for detecting the difference of the space distribution modes of the commercial facilities with different user scores, the subway station and the commercial center through the K function of the commercial facilities and other facility networks;
the second detection module is used for exploring the relation between the spatial distribution and the road network centrality of the commercial facilities with different user scores and calculating the correlation coefficient between the distribution and the road network centrality of the commercial facilities with different user scores, so that the index values of the intermediate centrality, the adjacent centrality and the straight centrality of each node of the road network are calculated respectively;
calculating a planar kernel density estimate for a high user rating business facility, a low user rating business facility, and all business facilities;
respectively calculating plane kernel density estimated values of centrality, proximity centrality and straight centrality of three index intermediaries of the centrality of the road network in the research area;
calculating the correlation coefficients of high user score, low user score and centrality between all commercial facilities and three road networks respectively in the same grid unit;
and the third detection module is used for detecting the conditions that the influence of all factors on the shop number, the user comment number and the user score changes along with the spatial position by combining a road network structure from the population distribution condition, the traffic facility condition and the road network centrality of the three factors which influence the distribution profile of the urban commercial facilities.
Furthermore, the third detection module is implemented as follows,
carrying out grid division on a research area;
extracting the number of commercial facilities in each grid, the sign-in number of the commercial facilities of the commenting users, the distance from the subway station and the road network centrality value;
the collinearity among the variables is checked by adopting a principal component analysis method and a correlation coefficient check method by taking the calculated population, distance from a subway station and road network centrality as variables;
taking three factors of population, distance from a subway station and road network centrality obtained by calculation as model independent variables, taking the number of commercial facilities, the rating of commercial facility users and the number of commercial facility user reviews as model dependent variables, modeling the model dependent variables by adopting a GWR model and an OLS model to obtain a model evaluation index, and comparing the difference between the GWR model and the OLS model;
finally, solving the minimum value, the maximum value and the average value of coefficients of three factors of population, distance from a subway station and road network centrality by adopting a GWR model;
the method comprises the steps of visualizing coefficients of three factors of population, distance from a subway station and road network centrality on the number of commercial facilities, the rating of commercial facility users and the number of commercial facility user reviews on space, analyzing spatial inconsistency of influence degrees of the three factors on the number of the commercial facilities, the rating of the commercial facility users and the number of the commercial facility users, and finding out different influence degrees of the three factors on the number of the commercial facilities, the rating of the commercial facility users and the number of the commercial facility users.
The invention provides the technical scheme for detecting the spatial distribution of the urban commercial facilities based on the comment data, the data acquisition is convenient, the time and the labor are saved, and the research cost is saved. Most importantly, the method discusses a new method for urban spatial distribution research, uses a brand-new data source, expands the thought of traditional urban facility research to a certain extent, has important significance for disclosing urban facility spatial distribution conditions, can provide important decision bases for urban commercial facility planning and policy making, can guide intelligent management of urban managers and intelligent selection of consumers, and has important market value.
Drawings
Fig. 1 is a schematic diagram of an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings in conjunction with specific embodiments.
The method fully considers the current situations of insufficient sample size, few additional attributes and unreasonable layout of the urban commercial facilities of the traditional data source, detects the spatial distribution of the urban commercial facilities by utilizing the social network comment data and combining various network space analysis methods, discusses the spatial distribution of the commercial facilities in the city, influences the commercial facility distribution, user comment distribution and user score distribution, and reveals the spatial inconsistency of the factors influencing the spatial distribution of the commercial facilities in the city. The invention explores the possibility of a novel large-scale data source for the research of the spatial distribution and the site selection of urban commercial facilities, and plays a significant role in the reasonable allocation of urban resources, the site selection of commercial facilities and the healthy development of urban economy.
Referring to fig. 1, the present invention provides a method for detecting spatial distribution of urban commercial facilities by using criticizing data, in consideration of information such as user scores and comments included in user criticizing. The embodiment of the invention uses the public comment as a supplementary data source of the urban commercial facility points, combines the road network data, the urban subway stations and other data, and effectively reveals the reasonability of the spatial distribution of the commercial facilities.
First, the theoretical basis is introduced:
the network K function, one of the most common methods for studying spatial point patterns, was first proposed by Ripley and then improved by Okabe. Generally, the method can be divided into a univariate K-function method for studying a single geographic object and a bivariate K-function method for studying a plurality of geographic objects. The study target of the univariate K-function method is usually one or one type of geographic object, which is analyzed for its distribution pattern in the network space, while the bivariate K-function method is usually a study target of two types of geographic objects, which is used to study whether one type of point object will have a dependency on the distribution pattern of another type of point object in the spatial distribution, or whether the spatial distribution of one type of point object will have a certain influence on the spatial distribution of another type of point object.
Road network centrality (Street centrality) is an index for measuring the importance of nodes in a road network, and is a way to reflect the accessibility of nodes in the road network. Exploring the relationship between road network centrality and urban facilities or locations can help analyze their spatial distribution patterns. There are three main parameters that generally describe road network centrality: mesocentrality (Betweeness centricity), proximity centricity (closensess centricity) and straight centricity (straight centricity). The three parameters represent the accessibility of the road in different aspects.
The intermediary centrality is an index for expressing the status of the nodes of the road network in the road network, and is a way for measuring the importance of the nodes in the road network, and is used for expressing the number of shortest paths passing through the node i between all the node pairs in the road network. In a network, the 'mediation centrality' is often extremely important, and if a certain node is needed to be contacted by a particularly large number of node pairs in the network, the role of the node in the network is undoubtedly very large. The proximity centrality is a measure of the distance between a given node and the shortest path of other reachable nodes in a road network, and the proximity is a global measure index capable of revealing the center of the network. The proximity of node i is specified as the inverse of the sum of the distances of the point to other nodes in the network. The flatness centrality is used to measure the degree to which the shortest path distance from one node to another can be expressed in terms of its euclidean distance, which represents the degree of convenience from one node to another in the network.
Correlation Analysis (Correlation Analysis) is often used in statistics, and mainly involves Correlation variable Analysis of two or more elements having Correlation, so as to determine whether there is a certain Correlation between variables to be analyzed. Correlation analysis does not analyze all variables, but only applies to situations where some correlation between two variables is possible or determined to exist. Correlations differ in definition between disciplines. In the correlation analysis, a linear correlation coefficient r is used for measuring a linear relation existing between two variables to be researched, the value of r is usually between-1 and 1, and when r is 0, the two variables are called to have no correlation; when r is 1, the two variables to be analyzed are called to have complete correlation; when r <1, the variation of one variable will cause partial variation of two variables, if the absolute value of r is larger, the variation of one variable will have larger effect on the variation of the other variable, when r >0.8, it is called high correlation, when r <0.3, it is called low correlation, and when r <0.3, it is medium correlation.
The first proposed Geographical Weighted Regression (GWR) model was proposed by Brunsdon and Fotheringham. It is based on the concept that adjacent elements tend to contain similar values. In the study, it was assumed that similar customers exhibited similar preferences. According to this method, the parameters established for a particular location must be established based on an assumption that locally observed data has a greater influence than data further away from it. The estimator of this model is very similar to global weighted least squares except that it relates weights to the position of the observations.
The technical scheme provided by the invention is a method for detecting the spatial distribution of urban commercial facilities by utilizing comment data, and the flow of the embodiment comprises the following steps:
firstly, initialization:
first, a selection of a region of interest and a subject is made. In practice, the skilled person can preset the study area and the study object according to the need. Taking Beijing as an example, since the area where people are frequently active is within three rings, the embodiment selects the three-ring area in Beijing as a research area; the study object was all coffee shop merchant data and consumer review data for all coffee shops in the three-ring area of beijing.
And then collecting and preprocessing comment data with position information and comment information.
In specific implementation, the comment data with the position and comment information is mainly acquired in a webpage crawler mode. The data area range and the facility type can be preset as required by those skilled in the art. Taking the present invention as an example, the selected region is "beijing city" and the facility type is "coffee shop". Data acquisition of subway stations and business centers is mainly carried out through a Baidu map open platform API.
The method comprises the following steps of (1) pre-processing comment data, namely calculating the number of comments of a user for each coffee shop; the average score is calculated. The coffee shops are classified according to the scores. And counting the number of coffee shops under different user satisfaction degrees.
Second, initial detection, including detecting the difference of the commercial facility spatial distribution pattern scored by different users under the constraint of the road network through the commercial facility network K function: this step performs the calculation of the commercial facility network K function. In specific implementation, those skilled in the art respectively calculate the observed network K function and the expected network K function of all coffee shops within the research scope, the coffee shops with the user score of more than 4 and the coffee shops with the user score of less than 3.
Thirdly, first detection comprises the following steps of detecting the difference of the space distribution modes of the commercial facilities with different user scores, the subway station and the commercial center through K functions of the commercial facilities and other facility networks: this step performs the calculation of the K function for the commercial facility and other facility networks. In specific implementation, a person skilled in the art respectively calculates an observed bivariate network K function and an expected bivariate network K function of coffee shops, subway stations and business centers with user scores of more than 4 and user scores of less than 3 in a research range.
Fourthly, second detection, including exploring the relation between the spatial distribution of the commercial facilities with different user scores and the road network centrality, and calculating the correlation coefficient between the distribution of the commercial facilities with different user scores and the road network centrality, so that technicians in the field respectively calculate the index values of the intermediary centrality, the adjacent centrality and the straight centrality of the road network, and in order to ensure the accuracy of the calculation results of the three road network centrality indexes, the road network calculation range is expanded to four rings in Beijing City;
according to the method, the bandwidth and the minimum unit are respectively set to be 800 meters and 20 meters, and the plane kernel density estimated values of the coffee shops with high user scores, the coffee shops with low user scores and all the coffee shops are calculated;
similarly, the bandwidth and the minimum unit are respectively set to be 800 meters and 20 meters, and the planar kernel density estimated values of three indexes (intermediate centrality, adjacent centrality and straight centrality) of the centrality of the road network in the four-ring region in Beijing are respectively calculated;
based on the same grid unit, the correlation coefficients of high user score, low user score and centrality between all coffee shops and the three road networks are calculated respectively.
The relevance of the spatial distribution of all coffee shops and road traffic is analyzed, wherein the relevance of the spatial distribution of the coffee shops is higher in user score and lower in user score, and the reasonability of the spatial distribution of the commercial facilities with different user satisfaction degrees is further explained.
And fifthly, detecting the situation that the influence of the factors on the number of shops, the number of comments of users and the user score changes along with the spatial position by combining a road network structure from three factors which possibly influence the distribution profile of the urban commercial facilities, namely population distribution situation, traffic facility situation and road network centrality after the first detection and the second detection are finished. The method is realized as follows:
1) a grid is divided over the study area. In specific implementation, the size of the grid is selected to be 400m × 400m by referring to the general grid size in urban planning analysis.
2) And respectively extracting the number of coffee shop facilities in each grid, the sign-in number of business facilities of Xinlang microblog users, the distance from the subway station and the road network centrality value. The method comprises the following specific steps:
calculating the number of coffee shop facilities in each grid by taking the grid to be counted as a unit, and standardizing;
calculating the commercial facility check-in number of the public commenting users in each grid by taking the grid to be counted as a unit, expressing the population number in the grid area, and carrying out standardization;
calculating the network distance from the centroid of each grid to different subway stations by using the grid to be counted as a unit and applying Dijkstra algorithm as the distance from the grid to the subway stations, and standardizing;
and taking the grids to be counted as a unit, calculating the mean value of the road network centrality of the road network nodes contained in each grid as the road network centrality value of each grid, and standardizing.
3) The co-linearity between the variables was checked. In specific implementation, the population, the distance from the subway station and the road network centrality obtained by the calculation in the step seven are used as variables, and the collinearity between the variables is detected by adopting a principal component analysis method and a correlation coefficient detection method. Principal component analysis and correlation coefficient test are prior art and are not described in detail herein.
4) And constructing a geographical weighted regression model. In specific implementation, the method takes the three factors of population, distance from the subway station and road network centrality obtained by calculation in the step seven as model independent variables, and takes the number of coffee shop facilities, the coffee shop user score and the coffee shop user point respectivelyThe evaluation quantity is used as a model dependent variable, then a GWR model and an OLS model are adopted to model the evaluation quantity to obtain model evaluation indexes, wherein the main indexes comprise R2An AICc value, etc. The GWR model and the OLS model are prior art and are not described in detail herein.
5) And calculating the size distribution of the three coefficients in the model. In specific implementation, the GWR model in the step nine is adopted to calculate the minimum value, the maximum value and the average value of the coefficients of the population, the distance from the subway station and the centrality of the road network.
In specific implementation, a person skilled in the art can implement the above process by using a computer software technology, and can flexibly adjust the process according to needs, and generally, the process may include the following basic steps:
step 1, collecting coffee shop information and coffee shop comment data mainly in a webpage crawler mode. Data acquisition of subway stations and business centers is mainly carried out through a Baidu open platform API;
and 2, pre-processing the comment data. For each coffee shop: calculating the number of the user comments; the average score is calculated. The coffee shops are classified according to the scores. Counting the number of coffee shops under each satisfaction degree;
and 3, respectively calculating the observation network K function and the expectation network K function of all coffee shops in the research range, the coffee shops with the user score of more than 4 and the coffee shops with the user score of less than 3. The method comprises the following specific steps:
for each coffee shop, drawing a circle by taking the coffee shop as a center of the circle and taking the network distance t as a radius;
calculating the number of other coffee shops in a circle with the coffee shop as the center of the circle and the network distance t as the radius;
calculating the mean value of the number of all coffee shops under the same radius, and dividing the mean value by the density of the coffee shops in the research area to obtain an observation network K function;
carrying out distribution mode inspection by adopting a Monte Carlo method, randomly simulating and generating a CSR point mode for 99 times in a three-ring road network area in Beijing, and then calculating an upper bound and a lower bound of an expected K function value according to a random mode result;
comparing the difference of the coffee shop network K function results of all coffee shops, the coffee shops with the user scores of more than 4 and the coffee shops with the user scores of less than 3; the method supports the comparative analysis of the results of the three network K functions and analyzes the reasons for generating the difference;
and 4, respectively calculating an observation bivariate network K function and an expectation bivariate network K function of the coffee shops, the subway station and the business center with the user score more than 4 and the user score less than 3 in the research range. The method comprises the following specific steps:
firstly, for each subway station, drawing a circle by taking the subway station as a circle center and taking a network distance t as a radius;
calculating the number of coffee shops in a circle with the subway station as the center of the circle and the network distance t as the radius;
calculating the mean value of the number of all coffee shops with scores greater than 4 or scores less than 3 at the same radius t, and dividing the mean value by the density of the coffee shops with scores greater than 4 or scores less than 3 in the research area to obtain an observation network K function;
adopting a Monte Carlo method to carry out a distribution mode test experiment, generating a CSR point mode for 99 times by random simulation in a three-ring road network area in Beijing, and then calculating an upper bound and a lower bound of an expected K function value by a random mode;
sequentially traversing and processing all subway stations to obtain an observation crossing K function value and an upper bound and a lower bound of an expected crossing K function within each network distance range;
respectively comparing the difference of the network K function results of the coffee shops with the user score of more than 4 and the coffee shops with the user score of less than 3 with the network K function results of the subway station and the business center; the method supports the analysis of spatial aggregation patterns of coffee shops and subway stations scored by different users and reasons for generating the spatial distribution patterns;
step 5, respectively calculating the index values of the center of the intermediary, the center of the proximity and the center of flatness of the road network, and expanding the calculation range of the road network to four rings of Beijing City in order to ensure the accuracy of the calculation results of the center index of the three road networks;
step 6, respectively setting the bandwidth and the minimum unit as 800 meters and 20 meters, and calculating the plane kernel density estimated values of the coffee shops with high user scores, the coffee shops with low user scores and all the coffee shops;
step 7, respectively setting the bandwidth and the minimum unit to be 800 meters and 20 meters, and respectively calculating plane core density estimated values of three indexes (intermediate centrality, adjacent centrality and straight centrality) of the centrality of the road network in the four-ring region in Beijing;
and 8, respectively calculating the higher user score, the lower user score and the correlation coefficients between all coffee shops and the centrality of the three road networks based on the same grid unit. Analyzing results of high user scores, low user scores and correlation coefficients of centers of all coffee shops and the three road networks;
step 9, dividing a grid into the research area, wherein the size of the grid is 400m by 400 m;
and step 10, respectively extracting the number of coffee shop facilities in each grid, the sign-in number of business facilities of Xinlang microblog users, the distance from a subway station and the road network centrality value. The method comprises the following specific steps:
calculating the number of coffee shop facilities in each grid by taking the grid to be counted as a unit, and standardizing;
calculating the commercial facility sign-in number of the Xinlang microblog users in each grid by taking the grid to be counted as a unit, representing the population number in the grid area, and carrying out standardization;
calculating the network distance from the centroid of each grid to different subway stations by using the grid to be counted as a unit and applying Dijkstra algorithm as the distance from the grid to the subway stations, and standardizing;
calculating the mean value of the road network centrality of road network nodes contained in each grid by taking the grid to be counted as a unit, taking the mean value as the road network centrality value of each grid, and standardizing;
step 11, taking the population, the distance from the subway station and the road network centrality which are obtained by calculation in the step 10 as variables, and adopting a principal component analysis method and a correlation coefficient test method to test the collinearity among the variables;
step 12, taking the three factors of the population, the distance from the subway station and the road network centrality which are obtained by calculation in the step 10 as model independent variables, and respectively taking coffee shop facilitiesThe quantity, the coffee shop user score and the coffee shop user comment quantity are used as model dependent variables, then a GWR model and an OLS model are adopted to model the model dependent variables to obtain model evaluation indexes, wherein the main indexes comprise R2AICc values, etc., comparing the differences of the GWR model and OLS model results;
step 13, solving the minimum value, the maximum value and the average value of the coefficients of the three factors of the population, the distance from the subway station and the road network centrality by adopting the GWR model in the step 12, and analyzing the difference of the influence degrees of the three correlation coefficients on the coffee shop facility number, the coffee shop user score and the coffee shop user comment number in the spatial distribution;
further, after step 3 is executed, the coffee shop univariate network K function results of all coffee shops with the user score greater than 4 and the user score less than 3 are obtained, and then the differences of the coffee shop spatial aggregation modes of all coffee shops with the user score greater than 4 and the user score less than 3 are analyzed.
Further, after step 4 is executed, bivariate network K function results of coffee shops with a user score of more than 4 and coffee shops with a user score of less than 3, a subway station and a business center are obtained, and then differences of space aggregation modes among coffee shops with a user score of more than 4 and coffee shops with a user score of less than 3, a subway station and a business are analyzed.
Further, after step 8 is executed, the relevance of the high user score, the low user score and the centrality of all coffee shops and the three road networks is analyzed, and meanwhile, a basis is provided for the establishment of a subsequent model.
In specific implementation, a corresponding system can be provided in a modular mode. A system for detecting spatial distribution of a commercial facility in a city using criticizing data, comprising the following modules:
the initialization module is used for acquiring data, selecting a research area and performing necessary preprocessing on the data;
the commercial facility classification module is used for classifying commercial facilities according to the grade of the user;
an initial detection module for detecting the difference of the spatial distribution pattern of the commercial facilities scored by different users under the constraint of the road network, wherein the detection process of the spatial aggregation pattern of all coffee shops is realized as follows,
under the constraint of a road network, for each coffee shop, drawing a circle by taking the coffee shop as a circle center and taking the network distance t as a radius;
calculating the number of other coffee shops in a circle with the coffee shop as the center of the circle and the network distance t as the radius;
calculating the mean value of the number of all coffee shops under the same radius, and dividing the mean value by the density of the coffee shops in the research area to obtain an observation network K function;
carrying out distribution mode inspection by adopting a Monte Carlo method, randomly simulating and generating a CSR point mode for 99 times in a three-ring road network area in Beijing, and then calculating an upper bound and a lower bound of an expected K function value according to a random mode result;
respectively calculating the observation network K function and the expected network K function of all coffee shops with the user score larger than 4 and the user score smaller than 3 by adopting the method, carrying out comparative analysis on the results of the three network K functions, and analyzing the reason of difference;
a first detection module for detecting the difference of spatial distribution patterns of commercial facilities with different user scores and subway stations and commercial centers, wherein the detection process of the spatial aggregation patterns of the coffee shops and the subway stations is realized as follows,
firstly, for each subway station, drawing a circle by taking the subway station as a circle center and taking a network distance t as a radius;
calculating the number of coffee shops in a circle with the subway station as the center of the circle and the network distance t as the radius;
calculating the mean value of the number of all coffee shops with scores greater than 4 or scores less than 3 at the same radius t, and dividing the mean value by the density of the coffee shops with scores greater than 4 or scores less than 3 in the research area to obtain an observation network K function;
adopting a Monte Carlo method to carry out a distribution mode test experiment, generating a CSR point mode for 99 times by random simulation in a three-ring road network area in Beijing, and then calculating an upper bound and a lower bound of an expected K function value by a random mode;
sequentially traversing and processing all subway stations to obtain an observation crossing K function value and an upper bound and a lower bound of an expected crossing K function within each network distance range;
the method is adopted to respectively calculate the observation network K function and the expected network K function of the coffee shops and the subway stations with the user scores of more than 4 and the user scores of less than 3, and further analyze the space aggregation modes of the coffee shops and the subway stations with different user scores and the reasons for generating the space distribution modes.
And the second detection module is used for exploring the relation between the spatial distribution of the commercial facilities with different user scores and the centrality of the road network. And calculating the correlation coefficient between the distribution of commercial facilities with different user scores and the centrality of the road network, and realizing the following steps,
respectively calculating the index values of the intermediate centrality, the adjacent centrality and the straight centrality of each node of the road network;
calculating plane kernel density estimated values of coffee shops with high user scores, coffee shops with low user scores and all coffee shops;
respectively calculating plane nuclear density estimated values of three indexes (intermediate centrality, adjacent centrality and straight centrality) of the centrality of the road network in the four-ring region in Beijing;
at the same grid cell 20m, the correlation coefficients between the user score higher, the user score lower, and all coffee shops and the centrality of the three road networks were calculated, respectively. The relevance of the spatial distribution of all coffee shops and road traffic is analyzed, wherein the relevance of the spatial distribution of the coffee shops is higher in user score and lower in user score, and the reasonability of the spatial distribution of the commercial facilities with different user satisfaction degrees is further explained.
And the third detection module detects the condition that the influence of each factor on the shop quantity, the user comment quantity and the user score changes along with the spatial position by combining a road network structure from three factors possibly influencing the distribution profile of the urban commercial facilities, namely population distribution condition, traffic facility condition and road network centrality after the first detection module and the second detection module are finished.
The realization is as follows,
dividing a research area into grids, wherein the size of each grid is 400m by 400 m;
extracting the number of coffee shop facilities in each grid, the sign-in number of business facilities of Xinlang microblog users, the distance from a subway station and the road network centrality value;
the collinearity among the variables is checked by adopting a principal component analysis method and a correlation coefficient check method by taking the calculated population, distance from a subway station and road network centrality as variables;
taking three factors of population, distance from a subway station and road network centrality obtained by calculation as model independent variables, taking the number of coffee shop facilities, coffee shop user scores and coffee shop user scores as model dependent variables, and then modeling the model dependent variables by adopting a GWR model and an OLS model to obtain model evaluation indexes, wherein the main indexes comprise R, R and OLS (earth-based modeling language), and the three factors are obtained by calculation2AICc values, etc., comparing the differences between the GWR model and OLS model results.
And finally, solving the minimum value, the maximum value and the average value of coefficients of three factors of population, distance from the subway station and road network centrality by adopting a GWR model.
The method comprises the steps of visualizing coefficients of three factors of population, distance from a subway station and road network centrality on the number of coffee shop facilities, the grade of coffee shop users and the comment number of the coffee shop users in space, analyzing spatial inconsistency of the influence degree of the three factors on the number of the coffee shop facilities, the grade of the coffee shop users and the comment number of the coffee shop users, finding out different influence degrees of the three factors on the number of the coffee shop facilities, the grade of the coffee shop users and the comment number of the coffee shop users, and further analyzing the reasonability of coffee shop spatial distribution.
Further, after the initial detection module finishes working, the differences of the commercial facility space aggregation modes of different user satisfaction degrees are revealed through the univariate network K functions of all coffee shops, the coffee shops with high user scores and the coffee shops with low user scores.
Further, after the first detection module finishes working, the difference of space aggregation modes of the commercial facilities with different user satisfaction degrees and the subway station and the commercial center is revealed through a bivariate network K function of the coffee shops with high user scores and the coffee shops with low user scores and the subway station and the commercial center.
Further, after the second detection module finishes working, correlation analysis is carried out on the centrality of the coffee shops and the road network scored by different users, and the difference of the dependence degree of the spatial distribution of the commercial facilities with different user satisfaction degrees on the centrality of the road network is revealed.
Further, after the third detection module finishes working, GWR model results of three factors which influence commercial facility distribution, namely population, subway stations and road network centrality, are analyzed, and the change conditions of the influence of the factors on the shop number, the user comment number and the user score along with the space position are disclosed.
When the method is specifically implemented, a human-computer interaction interface can be provided, and a user can conveniently participate in analysis and adjustment.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and they are included in the scope of the present invention.

Claims (4)

1. A city commercial facility space distribution detection method based on comment data comprises the following steps,
initializing, namely acquiring data, selecting a research area, performing necessary preprocessing on the data, realizing the selection of the research area and a research object and the acquisition and preprocessing of comment data with position information and comment information, and classifying commercial facilities according to the grade of a user;
step two, initial detection, including detecting the difference of the commercial facility spatial distribution mode scored by different users under the constraint of the road network through a commercial facility network K function;
step three, first detection comprises detecting the difference of the space distribution modes of the commercial facilities with different user scores, the subway stations and the commercial centers through the K functions of the commercial facilities and other facility networks;
step four, second detection comprises exploring the relation between the spatial distribution of the commercial facilities with different user scores and the road network centrality, calculating the correlation coefficient between the distribution of the commercial facilities with different user scores and the road network centrality, and realizing the following steps,
respectively calculating the index values of the intermediate centrality, the adjacent centrality and the straight centrality of each node of the road network;
calculating a planar kernel density estimate for a high user rating business facility, a low user rating business facility, and all business facilities; respectively calculating plane kernel density estimated values of centrality, proximity centrality and straight centrality of three index intermediaries of the centrality of the road network in the research area;
calculating the correlation coefficients of high user score, low user score and centrality between all commercial facilities and three road networks respectively in the same grid unit;
and step five, detecting the third, namely detecting the situation that the influence of all factors on the shop number, the user comment number and the user score changes along with the spatial position by starting from the population distribution situation, the traffic facility situation and the road network centrality of the three factors influencing the distribution profile of the urban commercial facilities and combining with the road network structure.
2. The method for detecting the spatial distribution of urban commercial facilities based on criticizing data according to claim 1, wherein: the fifth step is implemented as follows,
carrying out grid division on a research area;
extracting the number of commercial facilities in each grid, the sign-in number of the commercial facilities of the commenting users, the distance from the subway station and the road network centrality value;
the collinearity among the variables is checked by adopting a principal component analysis method and a correlation coefficient check method by taking the calculated population, distance from a subway station and road network centrality as variables;
taking three factors of population, distance from a subway station and road network centrality obtained by calculation as model independent variables, taking the number of commercial facilities, the rating of commercial facility users and the number of commercial facility user reviews as model dependent variables, modeling the model dependent variables by adopting a GWR model and an OLS model to obtain a model evaluation index, and comparing the difference between the GWR model and the OLS model;
finally, solving the minimum value, the maximum value and the average value of coefficients of three factors of population, distance from a subway station and road network centrality by adopting a GWR model;
the method comprises the steps of visualizing coefficients of three factors of population, distance from a subway station and road network centrality on the number of commercial facilities, the rating of commercial facility users and the number of commercial facility user reviews on space, analyzing spatial inconsistency of influence degrees of the three factors on the number of the commercial facilities, the rating of the commercial facility users and the number of the commercial facility users, and finding out different influence degrees of the three factors on the number of the commercial facilities, the rating of the commercial facility users and the number of the commercial facility users.
3. A city commercial facility spatial distribution detection system based on comment data is characterized in that: comprises the following modules of a plurality of modules,
the initialization module is used for collecting data and selecting research areas, carrying out necessary preprocessing on the data and classifying commercial facilities according to the grade of a user;
the initial detection module is used for detecting the difference of commercial facility spatial distribution modes scored by different users under the constraint of a road network through a commercial facility network K function;
the first detection module is used for detecting the difference of the space distribution modes of the commercial facilities with different user scores, the subway station and the commercial center through the K function of the commercial facilities and other facility networks;
the second detection module is used for exploring the relation between the spatial distribution of the commercial facilities with different user scores and the road network centrality and calculating the correlation coefficient between the distribution of the commercial facilities with different user scores and the road network centrality, and the method is realized as follows,
respectively calculating the index values of the intermediate centrality, the adjacent centrality and the straight centrality of each node of the road network;
calculating a planar kernel density estimate for a high user rating business facility, a low user rating business facility, and all business facilities; respectively calculating plane kernel density estimated values of centrality, proximity centrality and straight centrality of three index intermediaries of the centrality of the road network in the research area;
calculating the correlation coefficients of high user score, low user score and centrality between all commercial facilities and three road networks respectively in the same grid unit;
and the third detection module is used for detecting the conditions that the influence of all factors on the shop number, the user comment number and the user score changes along with the spatial position by combining a road network structure from the population distribution condition, the traffic facility condition and the road network centrality of the three factors which influence the distribution profile of the urban commercial facilities.
4. The system for detecting spatial distribution of urban commercial facilities based on criticizing data according to claim 3, wherein: the third detection module is implemented as follows,
carrying out grid division on a research area;
extracting the number of commercial facilities in each grid, the sign-in number of the commercial facilities of the commenting users, the distance from the subway station and the road network centrality value;
the collinearity among the variables is checked by adopting a principal component analysis method and a correlation coefficient check method by taking the calculated population, distance from a subway station and road network centrality as variables;
taking three factors of population, distance from a subway station and road network centrality obtained by calculation as model independent variables, taking the number of commercial facilities, the rating of commercial facility users and the number of commercial facility user reviews as model dependent variables, modeling the model dependent variables by adopting a GWR model and an OLS model to obtain a model evaluation index, and comparing the difference between the GWR model and the OLS model;
finally, solving the minimum value, the maximum value and the average value of coefficients of three factors of population, distance from a subway station and road network centrality by adopting a GWR model;
the method comprises the steps of visualizing coefficients of three factors of population, distance from a subway station and road network centrality on the number of commercial facilities, the rating of commercial facility users and the number of commercial facility user reviews on space, analyzing spatial inconsistency of influence degrees of the three factors on the number of the commercial facilities, the rating of the commercial facility users and the number of the commercial facility users, and finding out different influence degrees of the three factors on the number of the commercial facilities, the rating of the commercial facility users and the number of the commercial facility users.
CN201711425589.XA 2017-12-25 2017-12-25 Urban commercial facility spatial distribution detection method and system based on comment data Expired - Fee Related CN108038734B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711425589.XA CN108038734B (en) 2017-12-25 2017-12-25 Urban commercial facility spatial distribution detection method and system based on comment data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711425589.XA CN108038734B (en) 2017-12-25 2017-12-25 Urban commercial facility spatial distribution detection method and system based on comment data

Publications (2)

Publication Number Publication Date
CN108038734A CN108038734A (en) 2018-05-15
CN108038734B true CN108038734B (en) 2021-07-20

Family

ID=62101183

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711425589.XA Expired - Fee Related CN108038734B (en) 2017-12-25 2017-12-25 Urban commercial facility spatial distribution detection method and system based on comment data

Country Status (1)

Country Link
CN (1) CN108038734B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112115569A (en) * 2020-09-15 2020-12-22 中国科学院城市环境研究所 Urban road network density graph generation method, medium and equipment
CN112541683A (en) * 2020-12-17 2021-03-23 广东晟腾地信科技有限公司 Satisfaction evaluation method, system, electronic device and storage medium
CN113473483A (en) * 2021-06-29 2021-10-01 航天海鹰机电技术研究院有限公司 Positioning method and system for full users
CN113822048B (en) * 2021-09-16 2023-03-21 电子科技大学 Social media text denoising method based on space-time burst characteristics

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004104762A2 (en) * 2003-05-16 2004-12-02 Booz Allen Hamilton, Inc. Apparatus, method and computer readable medium for evaluating a network of entities and assets
CN102289581A (en) * 2011-08-10 2011-12-21 武汉大学 Method for simulating city expansion based on space function division

Also Published As

Publication number Publication date
CN108038734A (en) 2018-05-15

Similar Documents

Publication Publication Date Title
Yu et al. The analysis and delimitation of Central Business District using network kernel density estimation
CN108038734B (en) Urban commercial facility spatial distribution detection method and system based on comment data
CN103795613B (en) Method for predicting friend relationships in online social network
Geertman et al. Introduction to ‘planning support systems and smart cities’
CN105183870B (en) A kind of urban function region detection method and system using microblogging location information
Liu et al. Relationships between street centrality and land use intensity in Wuhan, China
Dong et al. Understanding the mesoscopic scaling patterns within cities
Rui et al. Network-constrained and category-based point pattern analysis for Suguo retail stores in Nanjing, China
CN104182517A (en) Data processing method and data processing device
CN111260392B (en) Automatic vending machine site selection method and device based on multi-source big data
CN105678590A (en) topN recommendation method for social network based on cloud model
CN108256724B (en) Power distribution network open capacity planning method based on dynamic industry coefficient
Deng Towards objective benchmarking of electronic government: an inter‐country analysis
CN106844626B (en) Method and system for simulating air quality by using microblog keywords and position information
Zhang et al. Space–time visualization analysis of bus passenger big data in Beijing
Caset et al. Mapping the spatial conditions of polycentric urban development in Europe: An open‐source software tool
CN107121143B (en) Road selection method for collaborative POI data
Sauter et al. Exploratory study of urban resilience in the region of Stuttgart based on OpenStreetMap and literature resilience indicators
Martino et al. Ocean of information: fusing aggregate & individual dynamics for metropolitan analysis
Lagarias Exploring land use policy scenarios with the use of a cellular automata-based model: urban sprawl containment and sustainable development in Thessaloniki
Chen et al. On a method for location and mobility analytics using location-based services: a case study of retail store recommendation
Kurowska et al. The use of gravity model in spatial planning
Xia et al. Predicting human mobility using sina weibo check-in data
Nadinta et al. A clustering-based approach for reorganizing bus route on bus rapid transit system
Li et al. Estimating dynamic distribution condition of pedestrian concentration on an urban scale

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210720

Termination date: 20211225

CF01 Termination of patent right due to non-payment of annual fee