CN111242697A

CN111242697A - Merchant site selection method and system based on pollination heuristic clustering

Info

Publication number: CN111242697A
Application number: CN202010065995.5A
Authority: CN
Inventors: 屈洪春; 吴晶晶; 吕强; 张兴成; 尹力
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2020-01-20
Filing date: 2020-01-20
Publication date: 2020-06-05
Anticipated expiration: 2040-01-20
Also published as: CN111242697B

Abstract

The invention relates to a merchant site selection method and system based on pollination heuristic clustering, and belongs to the field of data mining and machine learning. The method comprises the following steps: acquiring data: collecting a data set of a mobile terminal investigating a certain area or street; constructing a system: the database unit, the information classification and analysis unit and the information query unit are integrated together to form a platform for displaying an integrated website and a database; inputting database information: inputting the collected information into a database in time, and updating the information in the database; classifying and analyzing database information: classifying and sorting the data in the database, mining the interests and hobbies of users by using an IPCA algorithm, judging the business investment prospect of each different address, and then selecting an initial merchant address by using an SEE principle and a geographic information system; intelligent query: and inputting the address to be inquired into the system, and inputting the weight of the relevant factors to obtain the final address selection location. The invention can quickly and efficiently select commercial sites.

Description

Merchant site selection method and system based on pollination heuristic clustering

Technical Field

The invention belongs to the field of data mining and machine learning, and relates to a merchant site selection method and system based on pollination heuristic clustering.

Background

At present, when a merchant selects a site for a merchant such as a restaurant, a retail store or a vending machine, the location where the merchant appropriately sets up the store and the dispensing equipment is generally judged and selected by manually counting the passenger flow, and the method usually needs a large amount of manpower and time investment, so that the site selection efficiency is low. Moreover, the most common K-means clustering method in merchant location needs to give a certain number of clusters at the beginning of clustering, and in many practical applications, the number of clusters cannot be determined.

Therefore, a method for analyzing the best merchant location under the premise of unknown cluster number is needed.

Disclosure of Invention

In view of the above, the present invention provides a method for combining a clustering algorithm based on pollination heuristic with a Geographic Information System (GIS) spatial analysis method, which solves the problems that when the number of clusters cannot be predicted, a K-means clustering method in the conventional addressing method cannot be used, and the existing addressing method for artificially counting passenger flow is low in efficiency.

In order to achieve the purpose, the invention provides the following technical scheme:

a merchant site selection method based on pollination heuristic clustering specifically comprises the following steps:

s1: acquiring data: collecting a data set for surveying mobile terminal users in a certain area or street and the like, wherein the data set comprises information of (residence places), income level, consumption level and the like;

s2: constructing a system: the database unit, the information classification and analysis unit and the information query unit are integrated together to form a platform for displaying functions of an integrated website, a database and the like;

s3: inputting database information: inputting the collected information into a database in time, and continuously updating the information in the database according to the lapse of time;

s4: classifying and analyzing database information: classifying and sorting data in a database, mining interests and hobbies of users by using an pollination inspired clustering algorithm (IPCA), judging the business investment prospect of each different address, and then selecting an initial merchant address by using an SEE principle and a geographic information system analysis method;

s5: intelligent query: the inquirer inputs the address to be inquired into the system, and then inputs the weight of relevant factors, such as rent, consumption level of mobile terminal users of each cell in the area and the like, to obtain the final address selection location.

Further, in step S2, the constructed system includes a database unit, an information classification and analysis unit, and an information query unit;

constructing a database unit: constructing an address database (including a residential area database) containing a large number of addresses and a business information database which is related to the address database, is required to be known by a storage inquirer and corresponds to each address, and establishing an index;

constructing an information classification and analysis unit: carrying out space positioning on the address of the user in the address database, and processing the data by adopting a pollination heuristic clustering algorithm;

constructing an information query unit: comprehensively analyzing the influence of each factor on address selection, inquiring in the index according to the commercial information of the commercial customers in the existing region, outputting in a report form, and finally determining the address selection of the commercial customers.

Further, in step S4, classifying and analyzing the information by using an IPCA clustering algorithm to find the geographical location of the initial merchant address, specifically comprising the following steps:

s41: constructing a clustering statistical model based on an IPCA clustering algorithm;

s42: performing cluster analysis on the data of the mobile terminal user to obtain cluster clusters, deducing the characteristics of the user according to each cluster, such as the living point, the working place, which restaurant to eat frequently, which subway to go frequently and the like of the user, and analyzing the information of the interest and hobbies, consumption records and the like of the user;

s43: information analysis, namely deducing interest points of a user according to the position information of the user in a certain area, and combining the consumption record of the user, the information of the commercial tenant to be addressed and the information of the existing commercial tenant to realize the address selection of the commercial tenant;

s44: and (3) initially selecting the merchant, taking the geographical position of each cell as a center, and minimizing the total distance from the user of each cell to the merchant address by using an objective function SSE in a k-means algorithm, thereby calculating the initial position of the merchant.

Further, in step S4, the IPCA clustering algorithm specifically includes:

1) according to the principle of the insect-mediated plant pollination ecological process, the interaction between an insect pollination medium and a plant is abstracted, and a clustering algorithm is formulated by the following assumptions: a) wherein each datum represents a separate annual plant and the attribute of each datum represents a characteristic of the flower trait of the plant; b) each individual agent represents an insect pollinator, sensing, memory, foraging, pollen basket, pollen extraction and deposition; c) the similarity between the data is equal to the pollen grain similarity between different plant species; d) each plant dies after one year, but if sufficient pollen reproduction is obtained, its progeny replace the position of the parent or are re-randomly placed anywhere in space; after all the plants are subjected to repeated iterative evolution, the positions of all the plants are regarded as the clustering result of the data set;

2) insects in the algorithm are agents, and plants are mobile terminal data points; the foraging activities of the insects comprise pollination and pollen collection behaviors, wherein the pollination refers to that the insects randomly scatter a certain amount of pollen from a pollen basket of the insects on the current plant stigma, namely information communication between an agent and a mobile terminal data point and communication of similarity are represented; pollen collection means that pollen obtained by insects from current plants is loaded into a pollen basket of the insects, namely representing that an agent obtains data information from a mobile terminal data point; 3) in the initial stage, a certain data set and a certain number of agents are selected, each data point in the data set represents a mobile terminal data point, and the feature vector of the data point represents the feature of the mobile terminal data; randomly distributing the positions of all the agents and the mobile terminal data points in the space, and enabling each agent to randomly select one data point as a foraging object when an algorithm is started;

4) the agent selects a data point flying to the mobile terminal closest to the position of the agent to carry out foraging activity;

5) counting the total number N of data points obtained by a mobile terminal data point from an agent, selecting Euclidean distance as a measure of characteristic difference between similar data, and calculating the similarity S between the data points by using a Laplace kernel function;

6) recording past foraging history of the agent, and adding the mobile terminal data point in the foraging history of the agent after the foraging process is completed;

7) determining whether the length of the linear queue exceeds the memory depth M of the agent_d(ii) a If the length of the linear queue exceeds the memory depth M of the agent_dDeleting the labels of the data points of the excess part; if not, not changing the linear queue;

8) judging whether the foraging times reaches a threshold value; if the agent foraging times do not reach the threshold, the agent continues to select the mobile terminal data points which are not repeated in the foraging history and are closest to the grid for foraging activity, if the agent foraging times reach the threshold, the total number N of the data points obtained by each mobile terminal data point from the agent and the similarity S between the data points are counted, and after the foraging times of all the agents reach the upper limit, the total data income obtained by each data point from all the agents is calculated according to the data income obtained by each mobile terminal data point from the agent; calculating the survival probability of the current mobile terminal data point;

9) probability of survival P based on current position of each data point_iAnd a random number R₁Judging whether the current position of the data point changes or not according to the comparison result; if the survival rate P of a certain data point i_i>R₁Mobile terminal data points representing the location can survive, with the current location remaining unchanged; if P is_i<R₁If the mobile terminal data point at the position is dead, counting other mobile terminal data in the neighborhood of the data pointThe number of points;

10) if the number of the mobile terminal data points in the neighborhood of the mobile terminal data point i is not zero, calculating the neighborhood fitness Fit of the mobile terminal data point_iIf not, regenerating a position which is not repeated with the current survival mobile terminal data point for the mobile terminal data point by using a global position updating strategy to replace the position of the death data point;

11) fit the fitness function of data point i_iAnd a random number R₂Comparison, if Fit_i>R₂If the mobile terminal data point is in the same position, the descendant of the next year of the data point is in the position; if the neighborhood fitness Fit of the data point_i<R₂Then the terminal position data point i is reassigned to a position using the global position update strategy₁And is not repeated with the position of the current surviving mobile terminal data point; and then uses the local position update strategy to position₁Local adjustment is carried out, so that the mobile terminal data point is updated to a position with higher fitness₂；

12) When the algorithm iterates 1000 times, counting the number N of data points of the changed positions in the iteration result_c(ii) a If N is present_cN higher than 0.2 times_cThen update parameter α to α +0.04 if N is equal to_cLess than 0.2 times N_cAt time α remains unchanged;

13) when a certain condition is met, the algorithm is terminated;

14) and when each mobile terminal data point obtains a final clustering position on the grid, obtaining a clustering cluster according to the final position obtained by the mobile terminal data.

Further, in step 3), the measure of the characteristic difference between the pollens is:

wherein x is_iThe pollen representing the mobile terminal data point i contains a feature vector, x, of all attribute values_i＝(x_i1,x_i2,x_i3,…x_in)，x_jThe pollen representing the mobile terminal data point j contains a feature vector, x, of all attribute values_j＝(x_j1,x_j2,x_j3,…x_jn)；

Similarity between pollen, namely similarity between mobile terminal data points, is measured by adopting Laplace kernel function

Wherein α denotes a similarity parameter, i.e. an adaptation parameter.

wherein x is_iFeature vector, x, representing that the pollen of plant i contains all attribute values_i＝(x_i1,x_i2,x_i3,…x_in)，x_jFeature vector, x, representing that the pollen of plant j contains all attribute values_j＝(x_j1,x_j2,x_j3,…x_jn) (ii) a The bigger the characteristic difference among the pollens is, the lower the similarity among the pollens is, otherwise, the higher the similarity among the pollens is;

the similarity between pollen is measured by Laplace kernel function

Where α denotes the similarity parameter, i.e. the adaptive parameter of the algorithm, the initial value is set to 0.06.

Further, in step 6), the data benefit obtained from the agent by each mobile terminal location data point is:

the total data yield from all agents for each data point is:

wherein D is_nkIndicating that the mobile terminal data point i receives the amount of data from the kth mobile terminal data point, S_ikRepresenting the similarity between a mobile terminal data point i and a mobile terminal data point k, wherein the pollination amount in the algorithm is a fixed value, m represents the number of agents which visit the mobile terminal data point i, and n represents that the data point i receives data from n mobile terminal data points from the agents;

the probability of survival for the current data point is:

wherein, P₀Represents the initial pollen yield, is related to the neighborhood radius of the data points in the algorithm, and the initial value is set to 0.79.

Further, in step 8), the neighborhood fitness Fit of the data point i_iThe calculation formula is determined by the local density:

and the ZOI represents the neighborhood size of the mobile terminal data point i, and the t represents the number of the mobile terminal data points in the neighborhood range of the mobile terminal data point i.

Further, in step 9), the position of a certain mobile terminal data point is set as x₁，x_iRepresenting other data points within the mobile terminal data point neighborhood; wherein, i is 2,3,4, the adjustment mode is as follows:

then X₁Namely the position after being updated by the local position updating strategy₂(ii) a Wherein the content of the first and second substances,

representing a characteristic difference, position, between a mobile-terminal data point i and a mobile-terminal data point j_iRepresenting the current location of data point i and neighborwood representing the Neighborhood range of mobile terminal data point i.

Further, in step 11), the meeting a certain condition specifically includes:

A. if the iteration result shows that the positions of all data points are not changed;

B. the number of iterations of the algorithm reaches a threshold.

The invention has the beneficial effects that: the invention adopts a clustering method of pollination inspiration, can automatically emerge the number of clustering clusters according to the characteristic relation among data objects, thereby leading the classification method to be more scientific, effectively finding the frequently-going interest points of the user and further excavating the interest and hobbies of the user, leading the commercial tenant to better carry out marketing service, meeting the living demands of the user and leading the commercial tenant to obtain benefits. In addition, the entry and analysis of business information related to address information enable the business addressing to be rapid and controllable factors are improved.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.

Drawings

For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:

FIG. 1 is a flow chart of a method for locating a merchant according to the present invention;

FIG. 2 is a flow chart of a pollination heuristic clustering algorithm for information classification and analysis;

FIG. 3 is a flow chart of a clustering algorithm for pollination elicitation;

FIG. 4 is a diagram of the effect of data location distribution before clustering;

fig. 5 is a diagram showing the effect of data position distribution after clustering.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.

Referring to fig. 1 to 5, as shown in fig. 1, the merchant location system according to the present invention includes: a database unit: and constructing a database containing a large amount of mobile terminal user information and addresses related to the database, storing business information databases which are expected to be known by inquirers and correspond to the addresses, and establishing an index. An information classification and analysis unit: and analyzing the position information data of the user terminal and the consumption data of the mobile terminal so as to find the interests and hobbies of the user. An information inquiry unit: comprehensively analyzing the influence of each factor on address selection, inquiring in the index according to the commercial information of the commercial customers in the region, outputting in a report form, and finally determining the address selection of the commercial customers.

Carrying out time sequencing on the business information data corresponding to each level of address with well-positioned space, namely sequencing the business information data of each address in a period of time according to the front and back of the time and parallel inputting the business information data into a table; carrying out induction statistics on the data in the table with the time sequence pushed, and analyzing the change generated by the data; after the statistics are summarized, the analysis information is output in a chart form or an analysis report form. An information inquiry unit: the input is selected according to the weight of each factor, the address to be inquired or the commercial information is inquired in the index, and a chart and an analysis report related to the address or the commercial tenant information can also be output.

The main business information data which needs to be collected in the early stage comprise a macroscopic investment environment requirement data set module and a microscopic environment requirement data set module. The macro investment environment demand data group module comprises a population data group, a business data group, a income level data group and a resident consumption level data group. The method is mainly used for screening the large environment with poor business situation of the associated address and realizing profit. The micro environment demand data group module comprises a passenger flow data group, a rent data group, a merchant combined data group and a property management data group, is mainly used for screening the small environment with poor situation of merchants of the associated addresses, is not only the micro environment demand but also the macro environment demand, and is wholly for the profit and sustainable development of the merchants, and on the basis, the resident consumers can also meet the living demands of the consumers.

The raw data is subjected to a series of preprocessing processes of collection, extraction, cleaning, sorting and the like to form high-quality data. The data in the invention represents the data of the mobile terminal of the user, such as a GeoLife data set (mobile terminal positioning data), and the invention carries out pruning processing on the data based on speed, screens out meaningless points and the like in the advancing process of the user, thereby obtaining the desired data and further analyzing the data. The present invention is described in detail below with respect to the analysis of data using an pollination heuristic clustering algorithm (IPCA) to analyze processed data.

As shown in fig. 1, the merchant location system according to the present invention is constructed as follows:

the first step is as follows: data acquisition, collecting and surveying a data set of mobile terminal users, such as a certain area or street, includes information such as (residence) information, income level, and consumption level. The builder manually surveys the business information data corresponding to each address, such as the type of industry in the region, the regional rent condition, and the like.

The second step is that: the system is constructed, and a platform capable of displaying functions of an integrated website, a database and the like is constructed by integrating a database unit, an information classification and analysis unit and an information query unit.

The third step: and (3) inputting the database information, namely inputting the acquired information into the database in time, and continuously updating the information in the database according to the lapse of time.

The fourth step: classifying and analyzing database information, classifying and sorting data in the database, mining interests and hobbies of users by using a clustering algorithm inspired by pollination, judging the business investment prospect of each different address, and selecting an initial merchant address by using an SEE principle and combining a geographic information system analysis method. As shown in fig. 2, the specific process of information classification and analysis includes:

The fifth step: and (4) intelligent query, wherein after a querier inputs an address to be queried into the system and inputs the weight of relevant factors, such as rent and the like, a final site selection location is obtained.

As shown in fig. 3, the IPCA clustering algorithm specifically includes:

9) probability of survival P based on current position of each data point_iAnd a random number R₁Judging whether the current position of the data point changes or not according to the comparison result; if the survival rate P of a certain data point i_i>R₁Mobile terminal data points representing the location can survive, with the current location remaining unchanged; if P is_i<R₁If the mobile terminal data point at the position is dead, counting the number of other mobile terminal data points in the neighborhood of the data point;

11) fitness function of data point iFit_iAnd a random number R₂Comparison, if Fit_i>R₂If the mobile terminal data point is in the same position, the descendant of the next year of the data point is in the position; if the neighborhood fitness Fit of the data point_i<R₂Then the terminal position data point i is reassigned to a position using the global position update strategy₁And is not repeated with the position of the current surviving mobile terminal data point; and then uses the local position update strategy to position₁Local adjustment is carried out, so that the mobile terminal data point is updated to a position with higher fitness₂；

The algorithm terminates when the following conditions are satisfied:

B. the number of iterations of the algorithm reaches a threshold.

13) when a certain condition is met, the algorithm is terminated;

Example 1

As shown in fig. 2, in the information classification and analysis of the present invention, an IPCA clustering algorithm is used to find the interests and hobbies of users, and the preliminary location of a merchant can be obtained by combining the consumption records of users, the implementation includes the following steps, the data points in this embodiment represent the data points of user mobile terminals in a certain area, insects represent agents, and the clustering algorithm is specifically explained as follows:

1) selecting a mobile terminal positioning data set, such as a GeoLife data set and the like, representing the number of the mobile terminal positioning points by the number of data points, representing a mobile terminal position by each data point in the data set, and representing the characteristics of the mobile terminal position, such as longitude, latitude, elevation and the like, by the feature vector of the data point. According to different characteristics of each data, different colors are respectively given, a certain number of agents are simultaneously selected, and all the agents and data points are randomly distributed in a 40 x 40 grid space, as shown in fig. 4.

2) And clustering the data points of the mobile terminal by using an IPCA algorithm to obtain a cluster.

Example 2

1) As shown in fig. 5, each cluster is subjected to data analysis to obtain information such as interests and hobbies of the user (for example, the user likes a periblack duck), and consumption conditions of each user on hobbies and restaurants (periblack ducks) are analyzed to judge the love degree of the user on different restaurants.

2) Determining initial site selection position, querying residence information of users in data set by combining Geographic Information System (GIS), and selecting middle geographic position of each cell as A₁,A₂,A₃…A_l,C₁,C₂,C₃...C_kRespectively representing alternative merchant positions in the geographical position of the area, and then obtaining the corresponding distance of each position by using an objective function in k-means to determine the initial address of the catering shop. The formula is as follows:

the SEE (ai) values obtained above are sorted from small to large to obtain a preliminary addressing sequence position_iThe minimum SEE (ai) is the desired initial address Position (Position)₀)。

3) After obtaining the initial site selection result, according to the consumption level D of the user₁Rental situation D of each area (different rents at different locations)₂And traffic situation D₃Factors, etc., each of which is weighted by a weight such as ω₁,ω₂,ω₃Further adjusting the position of the shop to obtain the final position of the shopThe calculation formula is as follows:

Position_lasti＝(ω₁×D₁+ω₂×D₂+ω₃×D₃…)×Position_i

when Position₀After the value of (minimum in sequence) comprehensively considers each factor, the value is minimum in each sequence, and the final result of addressing is Position₀Otherwise, we are based on the preliminary addressing sequence position_iThe other values in (a) update the initial addressing new sequence after taking into account various factors. We obtain the adjusted Position positions according to the above method_iAnd sorting it to obtain the most proper value Position after sorting_last(typically the minimum value in the sequence) as the final addressed position.

Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.

Claims

1. A merchant site selection method based on pollination heuristic clustering is characterized by comprising the following steps:

s1: acquiring data: collecting a data set of mobile terminal users investigating a certain area or street;

s2: constructing a system: the database unit, the information classification and analysis unit and the information query unit are integrated together to form a platform for displaying an integrated website and a database;

s4: classifying and analyzing database information: classifying and sorting data in a database, mining interests and hobbies of users by using a pollination inspired clustering algorithm (IPCA), judging business investment prospects of different addresses, and then selecting an initial merchant address by using an SEE principle and a geographic information system analysis method;

s5: intelligent query: the inquirer inputs the address to be inquired into the system, and then inputs the weight of the relevant factors to obtain the final site selection.

2. The merchant site selection method based on pollination heuristic clustering as claimed in claim 1, wherein in step S2, the constructed system comprises a database unit, an information classification and analysis unit and an information query unit;

constructing a database unit: constructing an address database containing a large number of addresses and a business information database which is related to the address database, is required to be known by a storage inquirer and corresponds to each address, and establishing an index;

constructing an information classification and analysis unit: carrying out space positioning on the address of the user in the address database, and clustering data by adopting a clustering algorithm based on pollination heuristic;

3. The merchant site selection method based on pollination heuristic clustering as claimed in claim 1, wherein in step S4, an IPCA clustering algorithm is used to classify and analyze the information to find the geographical location of the initial merchant site selection, specifically comprising the steps of:

s42: performing cluster analysis on the mobile terminal user data to obtain cluster clusters, deducing the characteristics of the user according to each cluster, and analyzing the interest and hobbies of the user and consumption record information;

4. The merchant location method based on pollination heuristic clustering as claimed in claim 3, wherein in step S4, the IPCA clustering algorithm specifically comprises:

9) probability of survival P based on current position of each data point_iAnd a random number R₁Judging whether the current position of the data point changes or not according to the comparison result; if the survival rate P of a certain data point i_i>R₁Mobile terminal data points representing the location can survive, with the current location remaining unchanged; if P is_i<R₁If the mobile terminal data point at the position is dead, counting other mobile terminals in the neighborhood of the data pointThe number of end data points;

13) when a certain condition is met, the algorithm is terminated;

5. The merchant site selection method based on pollination heuristic clustering as claimed in claim 4, wherein in step 3), the measure of the feature difference between the pollens is:

wherein，x_iThe pollen representing the mobile terminal data point i contains a feature vector, x, of all attribute values_i＝(x_i1,x_i2,x_i3,…x_in)，x_jThe pollen representing the mobile terminal data point j contains a feature vector, x, of all attribute values_j＝(x_j1,x_j2,x_j3,…x_jn)；

Wherein α denotes a similarity parameter, i.e. an adaptation parameter.

6. The merchant location method based on pollination heuristic clustering as claimed in claim 5, wherein in step 6), the data yield obtained from the agent for each mobile terminal data point is:

the total data yield from all agents for each data point is:

the probability of survival for the current data point is:

wherein, P₀The initial pollen yield, i.e. the initial data yield, is represented.

7. The merchant location method based on pollination heuristic clustering as claimed in claim 6, wherein in step 8), neighborhood fitness Fit of data point i_iThe calculation formula is determined by the local density:

8. The merchant location method based on pollination heuristic clustering as claimed in claim 7, wherein in step 9), the position of a certain mobile terminal data point is set as x₁，x_iRepresenting other data points within the mobile terminal data point neighborhood; wherein, i is 2,3,4, the adjustment mode is as follows:

9. The merchant site selection method based on pollination heuristic clustering as claimed in claim 8, wherein in step 11), said meeting a certain condition specifically comprises:

B. the number of iterations of the algorithm reaches a threshold.