CN109857983B - Catering venue heat degree analysis method for super-large cities - Google Patents

Catering venue heat degree analysis method for super-large cities Download PDF

Info

Publication number
CN109857983B
CN109857983B CN201811631005.9A CN201811631005A CN109857983B CN 109857983 B CN109857983 B CN 109857983B CN 201811631005 A CN201811631005 A CN 201811631005A CN 109857983 B CN109857983 B CN 109857983B
Authority
CN
China
Prior art keywords
data
catering
regression model
factors
restaurant
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811631005.9A
Other languages
Chinese (zh)
Other versions
CN109857983A (en
Inventor
王禹
魏涛
程浩
张劳模
陈素霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan Institute of Engineering
Original Assignee
Henan Institute of Engineering
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan Institute of Engineering filed Critical Henan Institute of Engineering
Priority to CN201811631005.9A priority Critical patent/CN109857983B/en
Publication of CN109857983A publication Critical patent/CN109857983A/en
Application granted granted Critical
Publication of CN109857983B publication Critical patent/CN109857983B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a catering venue heat analysis method for a super-large city, which comprises the following steps: selecting a target super-large city; selecting a data source; all sample data are initially analyzed, and main category factors are selected from multiple category factors; carrying out data inspection and data transformation on the original data of the main category factors; establishing a linear regression model, and optimizing the linear regression model to obtain an optimized regression model; and restoring the optimized regression model through inverse transformation of data transformation to obtain the interdependent quantitative relation between the catering venue heat of the target super-large city and various factors. The invention verifies that the restaurant comprehensive experience and the group effect have the greatest influence on the popularity of restaurants, has good reference significance for site selection, operation modes, consumption upgrading and the like of restaurants in the super-large city, and can provide reference for urban regional development planning and catering pattern optimization, thereby promoting the benign sustainable development of the super-large city catering industry.

Description

Catering venue heat degree analysis method for super-large cities
Technical Field
The invention relates to the technical field of restaurant heat degree analysis, in particular to a catering venue heat degree analysis method for a super-large city.
Background
With the rapid population migration and the economic advantage enrichment effect, the trend of the formation of the super-large cities is increasingly obvious on a global scale. With continued economic growth and population migration, the number of very large cities worldwide is increasing, and its impact is also on various aspects of the city. As catering consumption is one of the most basic consumption behaviors of human beings, factors such as human mouth scale, complex mobility, traffic conditions, diversified consumption capacity and the like in a super-large city inevitably bring influence on consumption modes and city development.
Researchers have conducted a series of explorations with respect to the catering industry in developing and developed countries and regions. Ip et al analyzed the information-oriented development situation of australian in china in recent years regarding competition in the catering industry, and pointed out the importance of geospatial information, social data, and data mining. Mattera et al paid attention to the development of small and medium-sized enterprises in Spain and researched which factors can assist data mining, thereby helping catering companies to grow faster. The comment is considered as an important mode for evaluating and judging the quality of the restaurant by the consumer, and the influence of the comment on the catering consumption is researched by utilizing online evaluation data including comment length, comment time and comment readability of Zhai and the like. At the same time, traffic factors also seem to contribute to the price development of urban retail, and in large cities, convenient traffic, such as subways and bus lines, are likely to create new catering consumer commercial corridors.
Today when people pay more and more attention to food culture, consumers sometimes want to wait hours to taste a meal of food, and in this case, what factor is what causes such eager pursuit? In other words, why are restaurants getting more and more popular? The prior art does not give a specific analytical method.
Disclosure of Invention
Aiming at the technical problem that the prior art does not disclose how to analyze the influence factors of the restaurant heat degree, the invention provides a restaurant heat degree analysis method for a super-large city, which obtains and calculates restaurant consumption data samples by utilizing a public commenting network and a Baidu map, establishes a linear regression model based on a statistical method, and analyzes and obtains the main factors of the influence of the restaurant consumption heat degree.
In order to achieve the purpose, the technical scheme of the invention is realized as follows: a catering venue heat degree analysis method for a super-large city comprises the following steps:
the method comprises the following steps: selecting a target super-large city: the super-large city has huge population, higher economic and cultural level and perfect supporting facilities for work and life;
step two: selecting a data source: initially defining multiple types of factors potentially influencing the heat degree of a catering venue in a target super-large city, selecting a proper data source according to the multiple types of factors, and acquiring sample data corresponding to the multiple types of factors;
step three: initially analyzing all sample data, selecting main category factors from the multiple category factors, selecting variables capable of representing the heat degree of the restaurant as dependent variables, and taking the multiple category factors potentially influencing the heat degree as independent variables;
step four: performing data inspection and data transformation on the original data of the main category factors processed in the step three to enable the transformed data to be close to strict normal distribution;
step five: establishing a linear regression model by using the data processed in the step four, and optimizing the linear regression model to obtain an optimized regression model;
step six: and restoring the optimized regression model through the inverse transformation of the data transformation in the fourth step to obtain the relation between independent variables and dependent variables and obtain the quantitative relation between the heat of the catering venue in the target super-large city and the interdependence among various factors.
The sample data comes from restaurants on the catering consumption websites disclosed on the Internet, the relevant data of the restaurant to be analyzed is complete, and the restaurant to be analyzed is located in a typical region of a super-large city. For example, in 2016, the public reviews that the data of restaurants can be circulated in food plates in the Beijing area on the Internet, and the locations of the restaurants are within 2 kilometers of the lush two rings along the line.
In the second step, various classification factors which possibly influence the operation of the catering venue in the super-large city are fully considered, the various classification factors comprise restaurant quality, dining environment, traffic conditions, peripheral consumption facilities and/or population group effect, and the information of geographic positions, public transportation and landmark buildings is derived from a hundred-degree map; the method for acquiring the sample data comprises a network crawling method, a method for disclosing an API (application program interface) or a method for paying by a third party.
The main category factors of the sample data comprise a total comment number alpha, a comprehensive star level beta, a person-average consumption upsilon and a nearest subway station distance tau s Number of nearby bus lines τ b Number of large business centers around pi c And the number of large residential areas and universities pi r Wherein the nearest subway station distance τ s And the number of nearby bus lines tau b Is a category factor related to the convenience of public transportation, the comprehensive star-level beta is a category factor of the comprehensive experience of the restaurant, and the number pi of the peripheral large commercial centers c And the number of large residential areas and universities is pi r And human-average consumption v is a category factor related to the effects of population aggregation; the total comment number alpha is a dependent variable, and star-level beta, average person consumption upsilon and nearest subway station distance tau are synthesized s Number of nearby bus lines τ b Number of large business centers around pi c And the number of large residential areas and universities pi r Is an independent variable.
The data inspection is realized by using a Shapiro-Wilk normality inspection method, sample data of the Shapiro-Wilk normality inspection method is subjected to variance homogeneity inspection, and under the 95% confidence level, if the inspection result is greater than 0.05, the sample data does not have obvious difference, so that whether the sample data accords with normal distribution or not is inspected.
The data transformation is processed by using Box-Cox transformation, so that the transformed data is closer to strict normal distribution; the transformation rule of the Box-Cox transformation is as follows:
Figure BDA0001929004550000021
wherein y represents an original numerical value before Box-Cox transformation, y (lambda) represents a numerical value after the original numerical value is subjected to Box-Cox transformation, and lambda represents a parameter to be transformed of sample data, so that a dependent variable y (lambda) satisfies:
Figure BDA0001929004550000031
wherein X represents an argument vector, β 1 As a parameter vector, X and beta 1 All parameters to be estimated, epsilon represents the residual error of the random error vector, and sigma is the standard error of the error,I n Is an identity matrix of n x n.
The linear regression model is a multiple linear regression model or a generalized linear regression model, and the regression model obtained by using the multiple linear regression model is as follows: ymam ═ beta 01 x 12 x 23 x 34 x 45 x 56 x 6
Wherein, the ylam represents the restaurant popularity based on the Box-Cox transformation, and the total number of comments alpha derived from the sample data; x is the number of 1 The comprehensive star rating of the restaurant is given by fusing taste, environment and service for the consumers, and is derived from the comprehensive star rating beta in the sample data; x is the number of 2 The average consumption amount of the restaurant is derived from the person average consumption upsilon in the sample; x is a radical of a fluorine atom 3 The distance between the restaurant and the nearest subway station distance tau from the sample data s ;x 4 The number of the public transportation lines in the restaurant range of 1km and the number tau of the public transportation lines nearby from the samples b ;x 5 Represents the number of large residential areas and universities adjacent to the restaurant range of 1km, and the number pi of large residential areas and universities in sample data r ;x 6 The number of the commercial service centers in the city level and the number pi of the large commercial centers at the periphery in the sample data c (ii) a At the same time, (beta) 0123456 ) Are unknown parameters.
The method for optimizing the linear regression model in the fifth step comprises the following steps: judging whether each independent variable has a significant correlation with a dependent variable by using a Spearman or Pearson method; if the independent variable and the dependent variable have the problem of collinearity, reducing the collinearity of the independent variable and the dependent variable by adopting a multi-ridge regression method; thirdly, calculating and analyzing the values of all parameters in the initial linear regression model by adopting a stepwise regression method, judging the degree of the intensity affecting the heat degree of the catering venue, and eliminating corresponding independent variables affecting the intensity; and fourthly, obtaining a final analysis model.
The invention has the beneficial effects that: selecting catering places of Beijing 2 ring within 1km along the line as research targets, firstly carrying out sample crawling and calculation based on a public comment network and a hectogram map, and establishing a factor system influencing the restaurant popularity by analyzing popularity characteristics of the restaurant, wherein the factor system comprises multiple factors such as comprehensive scoring, average price, shortest distance from a nearest subway station, number of city-level commercial service centers within 1km and the like; establishing a multiple regression analysis model influencing the heat degree of the restaurant through significance test and Box-Cox transformation, and optimizing the model by using a stepwise regression method; and performing operation analysis according to the 200 restaurant valid samples. The results show that: (1) the diner pays great attention to the comprehensive feelings including taste, environment and service provided by the restaurant, and the comprehensive feelings belong to the most central elements in the catering industry; (2) the group effect has importance to the catering industry, and also shows that the restaurant site selection has positive influence on the heat degree from another angle; (3) factors such as public transportation, dining consumption level and the like in the super-large city have no direct influence on the heat of the restaurant. The invention verifies that the influence of the comprehensive experience and group effect of the restaurant on the heat of the restaurant is the largest in the central area range of the super-large city, and proposes that the composition of the catering consumption capacity and the improvement of the operation management capacity are fully considered in the overall planning of the urban business district. The invention has good reference significance for site selection, operation mode, consumption upgrading and the like of restaurants in the super-large city, and can provide reference for urban regional development planning and catering pattern optimization, thereby promoting the benign sustainable development of the catering industry of the super-large city.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flow chart of the present invention.
FIG. 2 is a schematic diagram of a source region of sample data according to the present invention.
FIG. 3 is a dispersion plot of the independent variables of the present invention.
FIG. 4 is a histogram of residual values of the regression model after optimization according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
As shown in fig. 1, a dining venue heat analysis method for a very large city comprises the following steps:
the method comprises the following steps: selecting a target super-large city: super large cities have large population, high economic and cultural levels and complete supporting facilities for work and life.
In order to analyze the food consumption behavior in a very large city, and in particular to obtain factors about the popularity of the restaurant, all relevant data needs to be defined and described. Currently, there are four actual super-large cities in china, including beijing, shanghai, guangzhou and shenzhen, where beijing has an undisputed comprehensive leading advantage, whether political, economic or cultural. According to data of Beijing statistical bureau, as 2017, the population of the Beijing staple lives is nearly 2170 million, the consumption amount reaches 912 million yuan (only enterprises with annual business amount exceeding 200 million are counted), and the annual growth rate reaches 9.6%. As the method has the current status of great and various catering consumption, Beijing is taken as an ideal research object.
Step two: selecting a data source: multiple types of factors potentially influencing the heat degree of the catering venue in the target super-large city are initially defined, a proper data source is selected according to the multiple types of factors, and sample data corresponding to the multiple types of factors is obtained.
The sample data is data of a restaurant with data capable of being circulated in 2016 years on a food plate of a popular commenting online Beijing area, and the position of the restaurant is within 2 kilometers of a luxurious two-ring line of Beijing.
With the increasing importance of the internet in various social fields, the catering consumption behaviors of consumers are gradually reflected on the internet, and relevant evaluations become very valuable data analysis bases. Based on the method, the current mainstream and influential consumption website, namely the popular comment network (http:// www.dianping.com /), is taken as a research target to obtain the related food and beverage consumption data.
As a mature super-large city, the most active commercial activities and perfect infrastructures in Beijing are concentrated in the central urban area, so that the 2 km range of the Beijing-lusterless two-ring line is selected as a research area, as shown in FIG. 2, so as to collect catering related information, and in consideration of the diversification of the dietary types in China, sample data comprises as many dietary types as possible. The consumption information of 200 restaurants in total covering various catering types is obtained by commenting the food plate in Beijing area by the public. Note that since some merchants are online for too little time on the web site, information about 200 restaurants that have data available from 2016 for 1 month until 2016 for 12 months was selected. In addition, information about the geographic position, public transport, landmark buildings and the like is obtained by calculation from a Baidu map.
Step three: all sample data are initially analyzed, main category factors are selected from the multiple category factors, variables capable of representing the heat degree of the restaurant are selected as dependent variables, and the multiple category factors potentially influencing the heat degree are used as independent variables.
In the second step, various classification factors which possibly influence the operation of the catering venue in the super-large city are fully considered, the various classification factors comprise restaurant quality, dining environment, traffic conditions, peripheral consumption facilities and/or population group effect, and the information of geographic positions, public transportation and landmark buildings is derived from a hundred-degree map; the methods of raw analysis include web crawling, methods of exposing API interfaces, or methods of third party payment.
The main category factors of the sample data comprise a total comment number alpha, a comprehensive star level beta, a person-average consumption upsilon and a nearest subway station distance tau s Number of nearby bus lines τ b Number of large business centers around pi c And the number of large residential areas and universities pi r Nearest subway station distance τ s And the number of nearby bus lines tau b Is a category factor related to the convenience of public transportation, the comprehensive star-level beta is a category factor of the comprehensive experience of the restaurant, and the number pi of the peripheral large commercial centers c The number of large residential areas and universities is pi r And human-average consumption v is a category factor related to the effects of population aggregation; as shown in Table 1, all the sample data obtained by crawling and calculation comprises 7 categories, and all the categories have direct connection with the heat analysis of the restaurant. Note that the number of reviews α can appropriately reflect the attention level of a restaurant. Undeniably, even a seemingly unsightly cafe may attract numerous vermicelli, regardless of the level of consumption. Therefore, the total comment number alpha is a dependent variable, and the star-level beta, the person-average consumption upsilon and the nearest subway station distance tau are integrated s Number of nearby bus lines τ b Number of large business centers around pi c And the number of large residential areas and universities pi r Is an independent variable.
TABLE 1 consumption information Classification
Figure BDA0001929004550000051
Figure BDA0001929004550000061
Step four: and C, performing data inspection and data transformation on the original data of the main category factors processed in the step three to enable the transformed data to be close to strict normal distribution.
The data inspection is realized by using a Shapiro-Wilk normality inspection method, the Shapiro-Wilk normality inspection method is used for carrying out variance homogeneity inspection on sample data, and under the 95% confidence level, if the inspection result is greater than 0.05, the sample data has no obvious difference, so that whether the sample data accords with normal distribution is inspected. Through Shapiro-Wilk normality test, dependent variable data in a sample is found to be right-biased, and in order to meet the normal distribution requirement, Box-Cox transformation is used for operating the dependent variable data, so that the transformed data is closer to strict normal distribution and is used as the basis for subsequent analysis.
The data transformation is processed by using Box-Cox transformation, so that the transformed data is closer to strict normal distribution; the transformation rule of the Box-Cox transformation is as follows:
Figure BDA0001929004550000064
wherein y (lambda) represents the dependent variable after conversion, y represents the original numerical value of the dependent variable before Box-Cox conversion, y (lambda) represents the numerical value of the original numerical value after Box-Cox conversion, and lambda represents the undetermined conversion parameter of the sample data, so that the dependent variable y (lambda) satisfies the following conditions:
Figure BDA0001929004550000062
wherein X represents an argument vector, β 1 As a parameter vector, X and beta 1 All parameters to be estimated, epsilon represents the residual error of the random error vector, sigma is the standard error of the error, I n Is an identity matrix of n x n.
As for a sample space containing 200 pieces of sample data, the value of the to-be-determined conversion parameter λ is calculated by using a maximum likelihood estimation Method (MLE), and as a result, as shown in table 2, it can be known that the more preferable value range of the to-be-determined conversion parameter λ is between [0.0855 and 0.2597], and the optimal estimated value is 0.17. For simplicity of calculation, the present invention takes 0.2 as the λ value.
TABLE 2 dependent variable Box-Cox transform coefficients
Figure BDA0001929004550000063
Step five: and D, establishing a linear regression model by using the data processed in the step four, and optimizing the linear regression model to obtain an optimized regression model.
The linear regression model is a multiple linear regression model or a generalized linear regression model. Through trial calculation, the influence of various factors on the catering consumption heat can be well explained by utilizing the multiple linear regression model. The regression model obtained by using the multiple linear regression model is as follows: ymam ═ beta 01 x 12 x 23 x 34 x 45 x 56 x 6
Wherein, the ylam represents the restaurant popularity based on the Box-Cox transformation, and the total number of comments alpha derived from the sample data; x is the number of 1 The comprehensive star rating is given to the restaurant by fusing taste, environment and service for the consumer and is derived from the comprehensive star rating beta in the sample data; x is the number of 2 The average consumption amount of the restaurant is the upsilon consumed by people in the sample; x is the number of 3 The distance between the restaurant and the nearest subway station, the nearest subway station distance tau from the sample data s ;x 4 The number of the public transportation lines in the restaurant range of 1km and the number tau of the public transportation lines nearby from the samples b ;x 5 The number of large residential areas and universities adjacent to the restaurant range of 1km, the number of large residential areas and universities from sample data pi r ;x 6 The number of the commercial service centers in the city level and the number pi of the large commercial centers at the periphery in the sample data c (ii) a At the same time, (beta) 0123456 ) Are unknown parameters.
Linear fitting and analysis were then performed to obtain the individual independent variable coefficients. As shown in Table 3, the linear fit resulted in a coefficient of probability of 0.2171 with a p value of 8.538 e-10. The coefficient and the p value are obtained by calling a linear regression method, and the coefficient is used for measuring the argument x of the ylam in the fitting of the regression equation 1-6 The number of covariant effect of (a); the p-value is a criterion for significance testing, and is generally expressed as p<0.05 is significant, p<0.01 is very significant, so this example is very significant.
TABLE 3 initial Linear fitting coefficients
Figure BDA0001929004550000071
The method for optimizing the linear regression model in the fifth step comprises the following steps: judging whether each independent variable has a significant correlation with a dependent variable by using a Spearman or Pearson method; if the independent variable and the dependent variable have the problem of collinearity, reducing the collinearity of the independent variable and the dependent variable by adopting a multi-ridge regression method; thirdly, calculating and analyzing the values of all parameters in the initial linear regression model by adopting a stepwise regression method, judging the degree of the intensity affecting the heat degree of the catering venue, and eliminating corresponding independent variables affecting the intensity; and fourthly, obtaining a final analysis model.
In order to judge whether multiple collinearity problems exist among independent variables, the invention introduces a Variance Inflation Factor (VIF) for detection, data obtained by calling a VIF method in an R language are shown in a table 4, and the result shows that all dependent variables meet 0< VIF < 10. In addition, the judgment is further carried out by constructing a scatter diagram among variables, the parameters are the independent variable parameters of the above types, and a Plot method is called as shown in FIG. 3, so that the linear regression model has no multiple collinearity problem.
TABLE 4 VIF test results
Figure BDA0001929004550000081
The independent variable x is then implemented in order to achieve a better degree of fit 1 -x 6 Correlation analysis with the dependent variable ylam, as can be seen from Table 5, the independent variable x 3 And x 4 There is no correlation between the univariate variables ylam because both have p-values significantly greater than 0.05, i.e., with a 95% confidence that the original has a phaseThe assumption of the relationship cannot be made.
TABLE 5 analysis of the independent variable correlation
Figure BDA0001929004550000082
Based on this, the initial linear regression model was optimized using Stepwise regression (Stepwise regression), in which uncorrelated independent arguments were removed from the model using a comparative gradient (bidirectionalization) method. Through comparison, the ylam-x is obtained 1 +x 5 For the best model, the results of fitting are shown in Table 6, where the coefficient of probability is 0.2248 and the p value is 4.722e-12, all of which are better than the initial linear regression model.
TABLE 6 optimized independent variable coefficient
Figure BDA0001929004550000083
Therefore, through the optimized independent variable parameters, the optimized regression model is obtained, as shown in formula (4), the residual error of the regression model meets the normal distribution with the average value of 0.00 and the standard deviation of 3.7744, and the residual error histogram is shown in fig. 4.
ylam=-3.64965+3.38566x 1 +0.30572x 5 (4)
Step six: and restoring the optimized regression model through the inverse transformation of the data transformation in the fourth step to obtain the relation between independent variables and dependent variables and obtain the quantitative relation between the heat of the catering venue in the target super-large city and the interdependence among various factors.
Because Box-Cox transformation is carried out before regression, the formula (4) can not intuitively display the original relationship between the commodity residential item sale price and each factor, so the formula (4) is reduced according to the Box-Cox transformation mode to obtain the correlation relationship between the independent variable and the dependent variable as follows:
y=(0.27+0.677x 1 +0.0061x 5 ) 5 (5)
reduced by Box-Cox transformationThe linear fitting model is shown in equation (5), from which it can be seen that the consumer gives the restaurant a comprehensive star x among the factors that affect the number of restaurant reviews 1 The maximum absolute value of the corresponding coefficient indicates that the comprehensive enjoyment given to diners by a certain restaurant is the most important, the taste, the environment and the service are combined, and the heat of the restaurant can be influenced most obviously. Second, the major factor is the number of large residential areas and universities, which in fact is a reflection of restaurant location attributes. This factor illustrates the importance of the group effect to the catering industry, and more population flow is likely to bring more eaters to consume.
Contrary to the previously expected assumptions, the public transportation factor x 3 And x 4 It seems that there is no influence on the total number of restaurant consumption reviews, and the analysis from the results is that the public transportation usually has a fixed operation period, and the customers have no need to eat in a public transportation mode in consideration of the uncertain factors of the eating time, so that the goodness and badness of the public transportation conditions have no significant influence on the restaurant popularity. Meanwhile, the consumption condition of everyone is unexpected, the dependent variable is basically not influenced, and in consideration of the higher consumption level of Beijing and the extremely rich dietary resources, customers pay more attention to the quality of the catering rather than the price.
According to the invention, a public comment network and a Baidu map are utilized to obtain and calculate restaurant consumption data samples, a linear regression model is established based on a statistical method, influence factors of restaurant consumption heat along the Beijing 2-loop line are analyzed, and the following conclusion is obtained: (1) the analysis result shows that the diner pays great attention to the comprehensive feelings including taste, environment and service provided by the restaurant, and the comprehensive feelings belong to the most central elements in the catering industry; (2) the analysis result illustrates the importance of the group effect on the catering industry, and also illustrates that the restaurant site selection site has positive influence on the heat degree from another angle; (3) in contrast, the analysis result shows that the factors such as public transportation, dining consumption level and the like in the super-large city have no direct influence on the restaurant popularity. In conclusion, the method has good reference significance for site selection, operation modes, consumption upgrading and the like of restaurants in the super-large city, and can provide reference for urban regional development planning and catering pattern optimization, so that the benign sustainable development of the catering industry of the super-large city is promoted.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (8)

1. A catering venue heat analysis method for a super-large city is characterized by comprising the following steps:
the method comprises the following steps: selecting a target super-large city: the super-large city has huge population, higher economic and cultural level and perfect supporting facilities for work and life;
step two: selecting a data source: initially defining multiple types of factors potentially influencing the heat degree of a catering venue in a target super-large city, selecting a proper data source according to the multiple types of factors, and acquiring sample data corresponding to the multiple types of factors;
step three: initially analyzing all sample data, selecting main category factors from the multiple category factors, selecting variables capable of representing the heat degree of the restaurant as dependent variables, and taking the multiple category factors potentially influencing the heat degree as independent variables;
step four: performing data inspection and data transformation on the original data of the main category factors processed in the step three to enable the transformed data to be close to strict normal distribution;
step five: establishing a linear regression model by using the data processed in the fourth step, and optimizing the linear regression model to obtain an optimized regression model;
step six: and restoring the optimized regression model through inverse transformation of data transformation in the fourth step to obtain the relation between independent variable and dependent variable and obtain the interdependent quantitative relation between the heat degree of the catering venue in the target super large city and various factors.
2. The dining venue for the metropolitan area of claim 1, wherein the sample data is from restaurants on the catering consumption website published on the internet, the relevant data of the restaurants to be analyzed is complete, and the restaurants to be analyzed are located in typical locations of the metropolitan area.
3. The dining venue heat analysis method oriented to the very large city according to claim 1 or 2, wherein various classification factors in the second step fully consider various factors which may influence the operation of the dining venue in the very large city, the various classification factors including restaurant quality, dining environment, traffic conditions, peripheral consumption facilities and/or population group effects, wherein the information of geographic positions, public transportation and landmark buildings is derived from a hundred-degree map; the method for acquiring the sample data comprises a network crawling method, a method for disclosing an API (application program interface) or a method for paying by a third party.
4. The method for analyzing the popularity of catering venues for the supermarkets of claim 3, wherein the main category factors of the sample data include total number of reviews (alpha), comprehensive star level (beta), average person consumption (upsilon), and distance of nearest subway station (tau) s Number of nearby bus lines τ b Number of large business centers around pi c And the number of large residential areas and universities is pi r Wherein the nearest subway station distance τ s And the number of nearby bus lines tau b Is a category factor related to the convenience of public transportation, the comprehensive star-level beta is a category factor of the comprehensive experience of the restaurant, and the number pi of the peripheral large commercial centers c The number of large residential areas and universities is pi r And the person-to-person consumption υ is a category factor related to the population aggregation effect; the total comment number alpha is a dependent variable, and star-level beta, average person consumption upsilon and nearest subway station distance tau are synthesized s Number of nearby bus lines τ b Number of large business centers around pi c And the number of large residential areas and universities pi r Is an independent variable.
5. The dining venue heat analysis method oriented to the very large city according to claim 1, wherein the data inspection is realized by using a Shapiro-Wilk normality inspection method, sample data of the Shapiro-Wilk normality inspection method is subjected to a homogeneity of variance inspection, and under a 95% confidence level, if an inspection result is greater than 0.05, no obvious difference exists between the sample data, so that whether the sample data accords with normal distribution is inspected.
6. The dining venue heat analysis method oriented to the very large city according to claim 4 or 5, wherein the data transformation is processed by using Box-Cox transformation, so that the transformed data is closer to strict normal distribution; the transformation rule of the Box-Cox transformation is as follows:
Figure FDA0001929004540000021
wherein y represents an original numerical value before Box-Cox transformation, y (lambda) represents a numerical value after the original numerical value is subjected to Box-Cox transformation, and lambda represents a parameter to be transformed of sample data, so that a dependent variable y (lambda) satisfies:
Figure FDA0001929004540000022
wherein X represents an argument vector, β 1 As a parameter vector, X and beta 1 All parameters to be estimated, epsilon represents the residual error of the random error vector, sigma is the standard error of the error, I n Is an identity matrix of n x n.
7. The method for analyzing the popularity of catering venues according to claim 6, wherein the linear regression model is a multiple linear regression model or a generalized linear regression model, and the regression model obtained by using the multiple linear regression model is: ymam ═ beta 01 x 12 x 23 x 34 x 45 x 56 x 6
Wherein, ylam represents the restaurant popularity based on the Box-Cox transformation, and the total number of comments alpha derived from the sample data; x is the number of 1 The comprehensive star rating is given to the restaurant by fusing taste, environment and service for the consumer and is derived from the comprehensive star rating beta in the sample data; x is a radical of a fluorine atom 2 The average consumption amount of the restaurant is derived from the person average consumption upsilon in the sample; x is the number of 3 The distance between the restaurant and the nearest subway station distance tau from the sample data s ;x 4 The number of the public transportation lines in the restaurant range of 1km and the number tau of the public transportation lines nearby from the samples b ;x 5 Represents the number of large residential areas and universities adjacent to the restaurant range of 1km, and the number pi of large residential areas and universities in sample data r ;x 6 The number of the commercial service centers in the city level and the number pi of the large commercial centers at the periphery in the sample data c (ii) a At the same time, (beta) 0123456 ) Are unknown parameters.
8. The dining venue heat analysis method oriented to the very big city as claimed in claim 6, wherein the method for optimizing the linear regression model in the fifth step is as follows: judging whether each independent variable has a significant correlation with a dependent variable by using a Spearman or Pearson method; if the independent variable and the dependent variable have the problem of collinearity, reducing the collinearity of the independent variable and the dependent variable by adopting a multi-ridge regression method; thirdly, calculating and analyzing the values of all parameters in the initial linear regression model by adopting a stepwise regression method, judging the degree of the intensity affecting the heat degree of the catering venue, and eliminating corresponding independent variables affecting the intensity; and fourthly, obtaining a final analysis model.
CN201811631005.9A 2018-12-29 2018-12-29 Catering venue heat degree analysis method for super-large cities Active CN109857983B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811631005.9A CN109857983B (en) 2018-12-29 2018-12-29 Catering venue heat degree analysis method for super-large cities

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811631005.9A CN109857983B (en) 2018-12-29 2018-12-29 Catering venue heat degree analysis method for super-large cities

Publications (2)

Publication Number Publication Date
CN109857983A CN109857983A (en) 2019-06-07
CN109857983B true CN109857983B (en) 2022-09-30

Family

ID=66893159

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811631005.9A Active CN109857983B (en) 2018-12-29 2018-12-29 Catering venue heat degree analysis method for super-large cities

Country Status (1)

Country Link
CN (1) CN109857983B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111783213B (en) * 2020-07-09 2023-10-24 沈阳建筑大学 Cavity ventilation design optimization method and system for oversized train station waiting hall
CN112434262A (en) * 2020-11-22 2021-03-02 同济大学 Waterfront public space activity influence factor identification method and terminal
CN113379269B (en) * 2021-06-21 2023-08-18 华南理工大学 Urban business function partitioning method, device and medium for multi-factor spatial clustering
CN113743540B (en) * 2021-11-04 2022-02-18 华能(天津)煤气化发电有限公司 Coal quality melting point prediction method based on multi-model fusion Stacking algorithm
CN114048910B (en) * 2021-11-17 2024-03-29 东南大学 Community public green land group layout optimization method based on big data

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776868A (en) * 2016-11-29 2017-05-31 浙江工业大学 A kind of restaurant score in predicting method based on multiple linear regression model

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10643245B2 (en) * 2016-07-15 2020-05-05 NXT-ID, Inc. Preference-driven advertising systems and methods

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776868A (en) * 2016-11-29 2017-05-31 浙江工业大学 A kind of restaurant score in predicting method based on multiple linear regression model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
中国地级城市餐饮业分布格局及影响因素――基于"大众点评网"数据的实证研究;夏令军等;《经济地理》;20180526(第05期);全文 *

Also Published As

Publication number Publication date
CN109857983A (en) 2019-06-07

Similar Documents

Publication Publication Date Title
CN109857983B (en) Catering venue heat degree analysis method for super-large cities
Poelman et al. A cross-sectional comparison of meal delivery options in three international cities
US8768867B1 (en) Crowd Prediction and attendance forecasting
Luo et al. Intra-metropolitan location choice of star-rated and non-rated budget hotels: The role of agglomeration economies
Jin et al. Spatiotemporal analysis of regional tourism development: A semiparametric Geographically Weighted Regression model approach
Boly et al. Diaspora investments and firm export performance in selected sub-Saharan African countries
Zhai et al. Mapping the popularity of urban restaurants using social media data
WO2018126740A1 (en) Method and device for pushing information
CN111949834A (en) Site selection method and site selection platform
Schuetz Do art galleries stimulate redevelopment?
Chaturvedi et al. Investigating the impact of restaurants' sustainable practices on consumers' satisfaction and revisit intentions: a study on leading green restaurants
CN107679103B (en) Attribute analysis method and system for entity
Zheng et al. Land supply and capitalization of public goods in housing prices: Evidence from Beijing
Huang et al. Spatial spillovers of regional wages: Evidence from Chinese provinces
Wu et al. Data mining for hotel occupancy rate: an independent component analysis approach
Bernini et al. Place‐based attributes and spatial expenditure behavior in tourism
Jones et al. Tourism satellite accounts for regions? A review of development issues and an alternative
Wang et al. Consumer adoption of online-to-offline food delivery services in China and New Zealand
Narzullaeva et al. A DATA ANALYTICS APPROACH FOR ASSESSING THE ROLE OF CHAIN SUPERMARKETS IN THE ECONOMY
Qiu et al. Factors influencing commercial buildings to obtain green certificates
Nobles et al. Marriage and socioeconomic change in contemporary Indonesia
Kagy Female labor market opportunities, household decision-making power, and domestic violence: Evidence from the Bangladesh garment industry
Antipov et al. Are buyers of apartments superstitious? Evidence from the Russian real estate market
CN111414542A (en) Real estate customer group identification and marketing method
Liu et al. Real estate rental market: a 10-year bibliometricbased review

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant