CN111242697A - Merchant site selection method and system based on pollination heuristic clustering - Google Patents
Merchant site selection method and system based on pollination heuristic clustering Download PDFInfo
- Publication number
- CN111242697A CN111242697A CN202010065995.5A CN202010065995A CN111242697A CN 111242697 A CN111242697 A CN 111242697A CN 202010065995 A CN202010065995 A CN 202010065995A CN 111242697 A CN111242697 A CN 111242697A
- Authority
- CN
- China
- Prior art keywords
- mobile terminal
- data point
- information
- terminal data
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0282—Rating or review of business operators or products
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0204—Market segmentation
- G06Q30/0205—Location or geographical consideration
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Finance (AREA)
- Data Mining & Analysis (AREA)
- Development Economics (AREA)
- Accounting & Taxation (AREA)
- General Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Game Theory and Decision Science (AREA)
- Economics (AREA)
- Marketing (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Business, Economics & Management (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Probability & Statistics with Applications (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a merchant site selection method and system based on pollination heuristic clustering, and belongs to the field of data mining and machine learning. The method comprises the following steps: acquiring data: collecting a data set of a mobile terminal investigating a certain area or street; constructing a system: the database unit, the information classification and analysis unit and the information query unit are integrated together to form a platform for displaying an integrated website and a database; inputting database information: inputting the collected information into a database in time, and updating the information in the database; classifying and analyzing database information: classifying and sorting the data in the database, mining the interests and hobbies of users by using an IPCA algorithm, judging the business investment prospect of each different address, and then selecting an initial merchant address by using an SEE principle and a geographic information system; intelligent query: and inputting the address to be inquired into the system, and inputting the weight of the relevant factors to obtain the final address selection location. The invention can quickly and efficiently select commercial sites.
Description
Technical Field
The invention belongs to the field of data mining and machine learning, and relates to a merchant site selection method and system based on pollination heuristic clustering.
Background
At present, when a merchant selects a site for a merchant such as a restaurant, a retail store or a vending machine, the location where the merchant appropriately sets up the store and the dispensing equipment is generally judged and selected by manually counting the passenger flow, and the method usually needs a large amount of manpower and time investment, so that the site selection efficiency is low. Moreover, the most common K-means clustering method in merchant location needs to give a certain number of clusters at the beginning of clustering, and in many practical applications, the number of clusters cannot be determined.
Therefore, a method for analyzing the best merchant location under the premise of unknown cluster number is needed.
Disclosure of Invention
In view of the above, the present invention provides a method for combining a clustering algorithm based on pollination heuristic with a Geographic Information System (GIS) spatial analysis method, which solves the problems that when the number of clusters cannot be predicted, a K-means clustering method in the conventional addressing method cannot be used, and the existing addressing method for artificially counting passenger flow is low in efficiency.
In order to achieve the purpose, the invention provides the following technical scheme:
a merchant site selection method based on pollination heuristic clustering specifically comprises the following steps:
s1: acquiring data: collecting a data set for surveying mobile terminal users in a certain area or street and the like, wherein the data set comprises information of (residence places), income level, consumption level and the like;
s2: constructing a system: the database unit, the information classification and analysis unit and the information query unit are integrated together to form a platform for displaying functions of an integrated website, a database and the like;
s3: inputting database information: inputting the collected information into a database in time, and continuously updating the information in the database according to the lapse of time;
s4: classifying and analyzing database information: classifying and sorting data in a database, mining interests and hobbies of users by using an pollination inspired clustering algorithm (IPCA), judging the business investment prospect of each different address, and then selecting an initial merchant address by using an SEE principle and a geographic information system analysis method;
s5: intelligent query: the inquirer inputs the address to be inquired into the system, and then inputs the weight of relevant factors, such as rent, consumption level of mobile terminal users of each cell in the area and the like, to obtain the final address selection location.
Further, in step S2, the constructed system includes a database unit, an information classification and analysis unit, and an information query unit;
constructing a database unit: constructing an address database (including a residential area database) containing a large number of addresses and a business information database which is related to the address database, is required to be known by a storage inquirer and corresponds to each address, and establishing an index;
constructing an information classification and analysis unit: carrying out space positioning on the address of the user in the address database, and processing the data by adopting a pollination heuristic clustering algorithm;
constructing an information query unit: comprehensively analyzing the influence of each factor on address selection, inquiring in the index according to the commercial information of the commercial customers in the existing region, outputting in a report form, and finally determining the address selection of the commercial customers.
Further, in step S4, classifying and analyzing the information by using an IPCA clustering algorithm to find the geographical location of the initial merchant address, specifically comprising the following steps:
s41: constructing a clustering statistical model based on an IPCA clustering algorithm;
s42: performing cluster analysis on the data of the mobile terminal user to obtain cluster clusters, deducing the characteristics of the user according to each cluster, such as the living point, the working place, which restaurant to eat frequently, which subway to go frequently and the like of the user, and analyzing the information of the interest and hobbies, consumption records and the like of the user;
s43: information analysis, namely deducing interest points of a user according to the position information of the user in a certain area, and combining the consumption record of the user, the information of the commercial tenant to be addressed and the information of the existing commercial tenant to realize the address selection of the commercial tenant;
s44: and (3) initially selecting the merchant, taking the geographical position of each cell as a center, and minimizing the total distance from the user of each cell to the merchant address by using an objective function SSE in a k-means algorithm, thereby calculating the initial position of the merchant.
Further, in step S4, the IPCA clustering algorithm specifically includes:
1) according to the principle of the insect-mediated plant pollination ecological process, the interaction between an insect pollination medium and a plant is abstracted, and a clustering algorithm is formulated by the following assumptions: a) wherein each datum represents a separate annual plant and the attribute of each datum represents a characteristic of the flower trait of the plant; b) each individual agent represents an insect pollinator, sensing, memory, foraging, pollen basket, pollen extraction and deposition; c) the similarity between the data is equal to the pollen grain similarity between different plant species; d) each plant dies after one year, but if sufficient pollen reproduction is obtained, its progeny replace the position of the parent or are re-randomly placed anywhere in space; after all the plants are subjected to repeated iterative evolution, the positions of all the plants are regarded as the clustering result of the data set;
2) insects in the algorithm are agents, and plants are mobile terminal data points; the foraging activities of the insects comprise pollination and pollen collection behaviors, wherein the pollination refers to that the insects randomly scatter a certain amount of pollen from a pollen basket of the insects on the current plant stigma, namely information communication between an agent and a mobile terminal data point and communication of similarity are represented; pollen collection means that pollen obtained by insects from current plants is loaded into a pollen basket of the insects, namely representing that an agent obtains data information from a mobile terminal data point; 3) in the initial stage, a certain data set and a certain number of agents are selected, each data point in the data set represents a mobile terminal data point, and the feature vector of the data point represents the feature of the mobile terminal data; randomly distributing the positions of all the agents and the mobile terminal data points in the space, and enabling each agent to randomly select one data point as a foraging object when an algorithm is started;
4) the agent selects a data point flying to the mobile terminal closest to the position of the agent to carry out foraging activity;
5) counting the total number N of data points obtained by a mobile terminal data point from an agent, selecting Euclidean distance as a measure of characteristic difference between similar data, and calculating the similarity S between the data points by using a Laplace kernel function;
6) recording past foraging history of the agent, and adding the mobile terminal data point in the foraging history of the agent after the foraging process is completed;
7) determining whether the length of the linear queue exceeds the memory depth M of the agentd(ii) a If the length of the linear queue exceeds the memory depth M of the agentdDeleting the labels of the data points of the excess part; if not, not changing the linear queue;
8) judging whether the foraging times reaches a threshold value; if the agent foraging times do not reach the threshold, the agent continues to select the mobile terminal data points which are not repeated in the foraging history and are closest to the grid for foraging activity, if the agent foraging times reach the threshold, the total number N of the data points obtained by each mobile terminal data point from the agent and the similarity S between the data points are counted, and after the foraging times of all the agents reach the upper limit, the total data income obtained by each data point from all the agents is calculated according to the data income obtained by each mobile terminal data point from the agent; calculating the survival probability of the current mobile terminal data point;
9) probability of survival P based on current position of each data pointiAnd a random number R1Judging whether the current position of the data point changes or not according to the comparison result; if the survival rate P of a certain data point ii>R1Mobile terminal data points representing the location can survive, with the current location remaining unchanged; if P isi<R1If the mobile terminal data point at the position is dead, counting other mobile terminal data in the neighborhood of the data pointThe number of points;
10) if the number of the mobile terminal data points in the neighborhood of the mobile terminal data point i is not zero, calculating the neighborhood fitness Fit of the mobile terminal data pointiIf not, regenerating a position which is not repeated with the current survival mobile terminal data point for the mobile terminal data point by using a global position updating strategy to replace the position of the death data point;
11) fit the fitness function of data point iiAnd a random number R2Comparison, if Fiti>R2If the mobile terminal data point is in the same position, the descendant of the next year of the data point is in the position; if the neighborhood fitness Fit of the data pointi<R2Then the terminal position data point i is reassigned to a position using the global position update strategy1And is not repeated with the position of the current surviving mobile terminal data point; and then uses the local position update strategy to position1Local adjustment is carried out, so that the mobile terminal data point is updated to a position with higher fitness2;
12) When the algorithm iterates 1000 times, counting the number N of data points of the changed positions in the iteration resultc(ii) a If N is presentcN higher than 0.2 timescThen update parameter α to α +0.04 if N is equal tocLess than 0.2 times NcAt time α remains unchanged;
13) when a certain condition is met, the algorithm is terminated;
14) and when each mobile terminal data point obtains a final clustering position on the grid, obtaining a clustering cluster according to the final position obtained by the mobile terminal data.
Further, in step 3), the measure of the characteristic difference between the pollens is:
wherein x isiThe pollen representing the mobile terminal data point i contains a feature vector, x, of all attribute valuesi=(xi1,xi2,xi3,…xin),xjThe pollen representing the mobile terminal data point j contains a feature vector, x, of all attribute valuesj=(xj1,xj2,xj3,…xjn);
Similarity between pollen, namely similarity between mobile terminal data points, is measured by adopting Laplace kernel functionWherein α denotes a similarity parameter, i.e. an adaptation parameter.
Further, in step 3), the measure of the characteristic difference between the pollens is:
wherein x isiFeature vector, x, representing that the pollen of plant i contains all attribute valuesi=(xi1,xi2,xi3,…xin),xjFeature vector, x, representing that the pollen of plant j contains all attribute valuesj=(xj1,xj2,xj3,…xjn) (ii) a The bigger the characteristic difference among the pollens is, the lower the similarity among the pollens is, otherwise, the higher the similarity among the pollens is;
the similarity between pollen is measured by Laplace kernel functionWhere α denotes the similarity parameter, i.e. the adaptive parameter of the algorithm, the initial value is set to 0.06.
Further, in step 6), the data benefit obtained from the agent by each mobile terminal location data point is:
the total data yield from all agents for each data point is:
wherein D isnkIndicating that the mobile terminal data point i receives the amount of data from the kth mobile terminal data point, SikRepresenting the similarity between a mobile terminal data point i and a mobile terminal data point k, wherein the pollination amount in the algorithm is a fixed value, m represents the number of agents which visit the mobile terminal data point i, and n represents that the data point i receives data from n mobile terminal data points from the agents;
the probability of survival for the current data point is:
wherein, P0Represents the initial pollen yield, is related to the neighborhood radius of the data points in the algorithm, and the initial value is set to 0.79.
Further, in step 8), the neighborhood fitness Fit of the data point iiThe calculation formula is determined by the local density:
and the ZOI represents the neighborhood size of the mobile terminal data point i, and the t represents the number of the mobile terminal data points in the neighborhood range of the mobile terminal data point i.
Further, in step 9), the position of a certain mobile terminal data point is set as x1,xiRepresenting other data points within the mobile terminal data point neighborhood; wherein, i is 2,3,4, the adjustment mode is as follows:
then X1Namely the position after being updated by the local position updating strategy2(ii) a Wherein the content of the first and second substances,representing a characteristic difference, position, between a mobile-terminal data point i and a mobile-terminal data point jiRepresenting the current location of data point i and neighborwood representing the Neighborhood range of mobile terminal data point i.
Further, in step 11), the meeting a certain condition specifically includes:
A. if the iteration result shows that the positions of all data points are not changed;
B. the number of iterations of the algorithm reaches a threshold.
The invention has the beneficial effects that: the invention adopts a clustering method of pollination inspiration, can automatically emerge the number of clustering clusters according to the characteristic relation among data objects, thereby leading the classification method to be more scientific, effectively finding the frequently-going interest points of the user and further excavating the interest and hobbies of the user, leading the commercial tenant to better carry out marketing service, meeting the living demands of the user and leading the commercial tenant to obtain benefits. In addition, the entry and analysis of business information related to address information enable the business addressing to be rapid and controllable factors are improved.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.
Drawings
For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 is a flow chart of a method for locating a merchant according to the present invention;
FIG. 2 is a flow chart of a pollination heuristic clustering algorithm for information classification and analysis;
FIG. 3 is a flow chart of a clustering algorithm for pollination elicitation;
FIG. 4 is a diagram of the effect of data location distribution before clustering;
fig. 5 is a diagram showing the effect of data position distribution after clustering.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.
Referring to fig. 1 to 5, as shown in fig. 1, the merchant location system according to the present invention includes: a database unit: and constructing a database containing a large amount of mobile terminal user information and addresses related to the database, storing business information databases which are expected to be known by inquirers and correspond to the addresses, and establishing an index. An information classification and analysis unit: and analyzing the position information data of the user terminal and the consumption data of the mobile terminal so as to find the interests and hobbies of the user. An information inquiry unit: comprehensively analyzing the influence of each factor on address selection, inquiring in the index according to the commercial information of the commercial customers in the region, outputting in a report form, and finally determining the address selection of the commercial customers.
Carrying out time sequencing on the business information data corresponding to each level of address with well-positioned space, namely sequencing the business information data of each address in a period of time according to the front and back of the time and parallel inputting the business information data into a table; carrying out induction statistics on the data in the table with the time sequence pushed, and analyzing the change generated by the data; after the statistics are summarized, the analysis information is output in a chart form or an analysis report form. An information inquiry unit: the input is selected according to the weight of each factor, the address to be inquired or the commercial information is inquired in the index, and a chart and an analysis report related to the address or the commercial tenant information can also be output.
The main business information data which needs to be collected in the early stage comprise a macroscopic investment environment requirement data set module and a microscopic environment requirement data set module. The macro investment environment demand data group module comprises a population data group, a business data group, a income level data group and a resident consumption level data group. The method is mainly used for screening the large environment with poor business situation of the associated address and realizing profit. The micro environment demand data group module comprises a passenger flow data group, a rent data group, a merchant combined data group and a property management data group, is mainly used for screening the small environment with poor situation of merchants of the associated addresses, is not only the micro environment demand but also the macro environment demand, and is wholly for the profit and sustainable development of the merchants, and on the basis, the resident consumers can also meet the living demands of the consumers.
The raw data is subjected to a series of preprocessing processes of collection, extraction, cleaning, sorting and the like to form high-quality data. The data in the invention represents the data of the mobile terminal of the user, such as a GeoLife data set (mobile terminal positioning data), and the invention carries out pruning processing on the data based on speed, screens out meaningless points and the like in the advancing process of the user, thereby obtaining the desired data and further analyzing the data. The present invention is described in detail below with respect to the analysis of data using an pollination heuristic clustering algorithm (IPCA) to analyze processed data.
As shown in fig. 1, the merchant location system according to the present invention is constructed as follows:
the first step is as follows: data acquisition, collecting and surveying a data set of mobile terminal users, such as a certain area or street, includes information such as (residence) information, income level, and consumption level. The builder manually surveys the business information data corresponding to each address, such as the type of industry in the region, the regional rent condition, and the like.
The second step is that: the system is constructed, and a platform capable of displaying functions of an integrated website, a database and the like is constructed by integrating a database unit, an information classification and analysis unit and an information query unit.
The third step: and (3) inputting the database information, namely inputting the acquired information into the database in time, and continuously updating the information in the database according to the lapse of time.
The fourth step: classifying and analyzing database information, classifying and sorting data in the database, mining interests and hobbies of users by using a clustering algorithm inspired by pollination, judging the business investment prospect of each different address, and selecting an initial merchant address by using an SEE principle and combining a geographic information system analysis method. As shown in fig. 2, the specific process of information classification and analysis includes:
s41: constructing a clustering statistical model based on an IPCA clustering algorithm;
s42: performing cluster analysis on the data of the mobile terminal user to obtain cluster clusters, deducing the characteristics of the user according to each cluster, such as the living point, the working place, which restaurant to eat frequently, which subway to go frequently and the like of the user, and analyzing the information of the interest and hobbies, consumption records and the like of the user;
s43: information analysis, namely deducing interest points of a user according to the position information of the user in a certain area, and combining the consumption record of the user, the information of the commercial tenant to be addressed and the information of the existing commercial tenant to realize the address selection of the commercial tenant;
s44: and (3) initially selecting the merchant, taking the geographical position of each cell as a center, and minimizing the total distance from the user of each cell to the merchant address by using an objective function SSE in a k-means algorithm, thereby calculating the initial position of the merchant.
The fifth step: and (4) intelligent query, wherein after a querier inputs an address to be queried into the system and inputs the weight of relevant factors, such as rent and the like, a final site selection location is obtained.
As shown in fig. 3, the IPCA clustering algorithm specifically includes:
1) according to the principle of the insect-mediated plant pollination ecological process, the interaction between an insect pollination medium and a plant is abstracted, and a clustering algorithm is formulated by the following assumptions: a) wherein each datum represents a separate annual plant and the attribute of each datum represents a characteristic of the flower trait of the plant; b) each individual agent represents an insect pollinator, sensing, memory, foraging, pollen basket, pollen extraction and deposition; c) the similarity between the data is equal to the pollen grain similarity between different plant species; d) each plant dies after one year, but if sufficient pollen reproduction is obtained, its progeny replace the position of the parent or are re-randomly placed anywhere in space; after all the plants are subjected to repeated iterative evolution, the positions of all the plants are regarded as the clustering result of the data set;
2) insects in the algorithm are agents, and plants are mobile terminal data points; the foraging activities of the insects comprise pollination and pollen collection behaviors, wherein the pollination refers to that the insects randomly scatter a certain amount of pollen from a pollen basket of the insects on the current plant stigma, namely information communication between an agent and a mobile terminal data point and communication of similarity are represented; pollen collection means that pollen obtained by insects from current plants is loaded into a pollen basket of the insects, namely representing that an agent obtains data information from a mobile terminal data point; 3) in the initial stage, a certain data set and a certain number of agents are selected, each data point in the data set represents a mobile terminal data point, and the feature vector of the data point represents the feature of the mobile terminal data; randomly distributing the positions of all the agents and the mobile terminal data points in the space, and enabling each agent to randomly select one data point as a foraging object when an algorithm is started;
4) the agent selects a data point flying to the mobile terminal closest to the position of the agent to carry out foraging activity;
5) counting the total number N of data points obtained by a mobile terminal data point from an agent, selecting Euclidean distance as a measure of characteristic difference between similar data, and calculating the similarity S between the data points by using a Laplace kernel function;
6) recording past foraging history of the agent, and adding the mobile terminal data point in the foraging history of the agent after the foraging process is completed;
7) determining whether the length of the linear queue exceeds the memory depth M of the agentd(ii) a If the length of the linear queue exceeds the memory depth M of the agentdDeleting the labels of the data points of the excess part; if not, not changing the linear queue;
8) judging whether the foraging times reaches a threshold value; if the agent foraging times do not reach the threshold, the agent continues to select the mobile terminal data points which are not repeated in the foraging history and are closest to the grid for foraging activity, if the agent foraging times reach the threshold, the total number N of the data points obtained by each mobile terminal data point from the agent and the similarity S between the data points are counted, and after the foraging times of all the agents reach the upper limit, the total data income obtained by each data point from all the agents is calculated according to the data income obtained by each mobile terminal data point from the agent; calculating the survival probability of the current mobile terminal data point;
9) probability of survival P based on current position of each data pointiAnd a random number R1Judging whether the current position of the data point changes or not according to the comparison result; if the survival rate P of a certain data point ii>R1Mobile terminal data points representing the location can survive, with the current location remaining unchanged; if P isi<R1If the mobile terminal data point at the position is dead, counting the number of other mobile terminal data points in the neighborhood of the data point;
10) if the number of the mobile terminal data points in the neighborhood of the mobile terminal data point i is not zero, calculating the neighborhood fitness Fit of the mobile terminal data pointiIf not, regenerating a position which is not repeated with the current survival mobile terminal data point for the mobile terminal data point by using a global position updating strategy to replace the position of the death data point;
11) fitness function of data point iFitiAnd a random number R2Comparison, if Fiti>R2If the mobile terminal data point is in the same position, the descendant of the next year of the data point is in the position; if the neighborhood fitness Fit of the data pointi<R2Then the terminal position data point i is reassigned to a position using the global position update strategy1And is not repeated with the position of the current surviving mobile terminal data point; and then uses the local position update strategy to position1Local adjustment is carried out, so that the mobile terminal data point is updated to a position with higher fitness2;
The algorithm terminates when the following conditions are satisfied:
A. if the iteration result shows that the positions of all data points are not changed;
B. the number of iterations of the algorithm reaches a threshold.
12) When the algorithm iterates 1000 times, counting the number N of data points of the changed positions in the iteration resultc(ii) a If N is presentcN higher than 0.2 timescThen update parameter α to α +0.04 if N is equal tocLess than 0.2 times NcAt time α remains unchanged;
13) when a certain condition is met, the algorithm is terminated;
14) and when each mobile terminal data point obtains a final clustering position on the grid, obtaining a clustering cluster according to the final position obtained by the mobile terminal data.
Example 1
As shown in fig. 2, in the information classification and analysis of the present invention, an IPCA clustering algorithm is used to find the interests and hobbies of users, and the preliminary location of a merchant can be obtained by combining the consumption records of users, the implementation includes the following steps, the data points in this embodiment represent the data points of user mobile terminals in a certain area, insects represent agents, and the clustering algorithm is specifically explained as follows:
1) selecting a mobile terminal positioning data set, such as a GeoLife data set and the like, representing the number of the mobile terminal positioning points by the number of data points, representing a mobile terminal position by each data point in the data set, and representing the characteristics of the mobile terminal position, such as longitude, latitude, elevation and the like, by the feature vector of the data point. According to different characteristics of each data, different colors are respectively given, a certain number of agents are simultaneously selected, and all the agents and data points are randomly distributed in a 40 x 40 grid space, as shown in fig. 4.
2) And clustering the data points of the mobile terminal by using an IPCA algorithm to obtain a cluster.
Example 2
1) As shown in fig. 5, each cluster is subjected to data analysis to obtain information such as interests and hobbies of the user (for example, the user likes a periblack duck), and consumption conditions of each user on hobbies and restaurants (periblack ducks) are analyzed to judge the love degree of the user on different restaurants.
2) Determining initial site selection position, querying residence information of users in data set by combining Geographic Information System (GIS), and selecting middle geographic position of each cell as A1,A2,A3…Al,C1,C2,C3...CkRespectively representing alternative merchant positions in the geographical position of the area, and then obtaining the corresponding distance of each position by using an objective function in k-means to determine the initial address of the catering shop. The formula is as follows:
the SEE (ai) values obtained above are sorted from small to large to obtain a preliminary addressing sequence positioniThe minimum SEE (ai) is the desired initial address Position (Position)0)。
3) After obtaining the initial site selection result, according to the consumption level D of the user1Rental situation D of each area (different rents at different locations)2And traffic situation D3Factors, etc., each of which is weighted by a weight such as ω1,ω2,ω3Further adjusting the position of the shop to obtain the final position of the shopThe calculation formula is as follows:
Positionlasti=(ω1×D1+ω2×D2+ω3×D3…)×Positioni
when Position0After the value of (minimum in sequence) comprehensively considers each factor, the value is minimum in each sequence, and the final result of addressing is Position0Otherwise, we are based on the preliminary addressing sequence positioniThe other values in (a) update the initial addressing new sequence after taking into account various factors. We obtain the adjusted Position positions according to the above methodiAnd sorting it to obtain the most proper value Position after sortinglast(typically the minimum value in the sequence) as the final addressed position.
Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.
Claims (9)
1. A merchant site selection method based on pollination heuristic clustering is characterized by comprising the following steps:
s1: acquiring data: collecting a data set of mobile terminal users investigating a certain area or street;
s2: constructing a system: the database unit, the information classification and analysis unit and the information query unit are integrated together to form a platform for displaying an integrated website and a database;
s3: inputting database information: inputting the collected information into a database in time, and continuously updating the information in the database according to the lapse of time;
s4: classifying and analyzing database information: classifying and sorting data in a database, mining interests and hobbies of users by using a pollination inspired clustering algorithm (IPCA), judging business investment prospects of different addresses, and then selecting an initial merchant address by using an SEE principle and a geographic information system analysis method;
s5: intelligent query: the inquirer inputs the address to be inquired into the system, and then inputs the weight of the relevant factors to obtain the final site selection.
2. The merchant site selection method based on pollination heuristic clustering as claimed in claim 1, wherein in step S2, the constructed system comprises a database unit, an information classification and analysis unit and an information query unit;
constructing a database unit: constructing an address database containing a large number of addresses and a business information database which is related to the address database, is required to be known by a storage inquirer and corresponds to each address, and establishing an index;
constructing an information classification and analysis unit: carrying out space positioning on the address of the user in the address database, and clustering data by adopting a clustering algorithm based on pollination heuristic;
constructing an information query unit: comprehensively analyzing the influence of each factor on address selection, inquiring in the index according to the commercial information of the commercial customers in the existing region, outputting in a report form, and finally determining the address selection of the commercial customers.
3. The merchant site selection method based on pollination heuristic clustering as claimed in claim 1, wherein in step S4, an IPCA clustering algorithm is used to classify and analyze the information to find the geographical location of the initial merchant site selection, specifically comprising the steps of:
s41: constructing a clustering statistical model based on an IPCA clustering algorithm;
s42: performing cluster analysis on the mobile terminal user data to obtain cluster clusters, deducing the characteristics of the user according to each cluster, and analyzing the interest and hobbies of the user and consumption record information;
s43: information analysis, namely deducing interest points of a user according to the position information of the user in a certain area, and combining the consumption record of the user, the information of the commercial tenant to be addressed and the information of the existing commercial tenant to realize the address selection of the commercial tenant;
s44: and (3) initially selecting the merchant, taking the geographical position of each cell as a center, and minimizing the total distance from the user of each cell to the merchant address by using an objective function SSE in a k-means algorithm, thereby calculating the initial position of the merchant.
4. The merchant location method based on pollination heuristic clustering as claimed in claim 3, wherein in step S4, the IPCA clustering algorithm specifically comprises:
1) according to the principle of the insect-mediated plant pollination ecological process, the interaction between an insect pollination medium and a plant is abstracted, and a clustering algorithm is formulated by the following assumptions: a) wherein each datum represents a separate annual plant and the attribute of each datum represents a characteristic of the flower trait of the plant; b) each individual agent represents an insect pollinator, sensing, memory, foraging, pollen basket, pollen extraction and deposition; c) the similarity between the data is equal to the pollen grain similarity between different plant species; d) each plant dies after one year, but if sufficient pollen reproduction is obtained, its progeny replace the position of the parent or are re-randomly placed anywhere in space; after all the plants are subjected to repeated iterative evolution, the positions of all the plants are regarded as the clustering result of the data set;
2) insects in the algorithm are agents, and plants are mobile terminal data points; the foraging activities of the insects comprise pollination and pollen collection behaviors, wherein the pollination refers to that the insects randomly scatter a certain amount of pollen from a pollen basket of the insects on the current plant stigma, namely information communication between an agent and a mobile terminal data point and communication of similarity are represented; pollen collection means that pollen obtained by insects from current plants is loaded into a pollen basket of the insects, namely representing that an agent obtains data information from a mobile terminal data point; 3) in the initial stage, a certain data set and a certain number of agents are selected, each data point in the data set represents a mobile terminal data point, and the feature vector of the data point represents the feature of the mobile terminal data; randomly distributing the positions of all the agents and the mobile terminal data points in the space, and enabling each agent to randomly select one data point as a foraging object when an algorithm is started;
4) the agent selects a data point flying to the mobile terminal closest to the position of the agent to carry out foraging activity;
5) counting the total number N of data points obtained by a mobile terminal data point from an agent, selecting Euclidean distance as a measure of characteristic difference between similar data, and calculating the similarity S between the data points by using a Laplace kernel function;
6) recording past foraging history of the agent, and adding the mobile terminal data point in the foraging history of the agent after the foraging process is completed;
7) determining whether the length of the linear queue exceeds the memory depth M of the agentd(ii) a If the length of the linear queue exceeds the memory depth M of the agentdDeleting the labels of the data points of the excess part; if not, not changing the linear queue;
8) judging whether the foraging times reaches a threshold value; if the agent foraging times do not reach the threshold, the agent continues to select the mobile terminal data points which are not repeated in the foraging history and are closest to the grid for foraging activity, if the agent foraging times reach the threshold, the total number N of the data points obtained by each mobile terminal data point from the agent and the similarity S between the data points are counted, and after the foraging times of all the agents reach the upper limit, the total data income obtained by each data point from all the agents is calculated according to the data income obtained by each mobile terminal data point from the agent; calculating the survival probability of the current mobile terminal data point;
9) probability of survival P based on current position of each data pointiAnd a random number R1Judging whether the current position of the data point changes or not according to the comparison result; if the survival rate P of a certain data point ii>R1Mobile terminal data points representing the location can survive, with the current location remaining unchanged; if P isi<R1If the mobile terminal data point at the position is dead, counting other mobile terminals in the neighborhood of the data pointThe number of end data points;
10) if the number of the mobile terminal data points in the neighborhood of the mobile terminal data point i is not zero, calculating the neighborhood fitness Fit of the mobile terminal data pointiIf not, regenerating a position which is not repeated with the current survival mobile terminal data point for the mobile terminal data point by using a global position updating strategy to replace the position of the death data point;
11) fit the fitness function of data point iiAnd a random number R2Comparison, if Fiti>R2If the mobile terminal data point is in the same position, the descendant of the next year of the data point is in the position; if the neighborhood fitness Fit of the data pointi<R2Then the terminal position data point i is reassigned to a position using the global position update strategy1And is not repeated with the position of the current surviving mobile terminal data point; and then uses the local position update strategy to position1Local adjustment is carried out, so that the mobile terminal data point is updated to a position with higher fitness2;
12) When the algorithm iterates 1000 times, counting the number N of data points of the changed positions in the iteration resultc(ii) a If N is presentcN higher than 0.2 timescThen update parameter α to α +0.04 if N is equal tocLess than 0.2 times NcAt time α remains unchanged;
13) when a certain condition is met, the algorithm is terminated;
14) and when each mobile terminal data point obtains a final clustering position on the grid, obtaining a clustering cluster according to the final position obtained by the mobile terminal data.
5. The merchant site selection method based on pollination heuristic clustering as claimed in claim 4, wherein in step 3), the measure of the feature difference between the pollens is:
wherein,xiThe pollen representing the mobile terminal data point i contains a feature vector, x, of all attribute valuesi=(xi1,xi2,xi3,…xin),xjThe pollen representing the mobile terminal data point j contains a feature vector, x, of all attribute valuesj=(xj1,xj2,xj3,…xjn);
6. The merchant location method based on pollination heuristic clustering as claimed in claim 5, wherein in step 6), the data yield obtained from the agent for each mobile terminal data point is:
the total data yield from all agents for each data point is:
wherein D isnkIndicating that the mobile terminal data point i receives the amount of data from the kth mobile terminal data point, SikRepresenting the similarity between a mobile terminal data point i and a mobile terminal data point k, wherein the pollination amount in the algorithm is a fixed value, m represents the number of agents which visit the mobile terminal data point i, and n represents that the data point i receives data from n mobile terminal data points from the agents;
the probability of survival for the current data point is:
wherein, P0The initial pollen yield, i.e. the initial data yield, is represented.
7. The merchant location method based on pollination heuristic clustering as claimed in claim 6, wherein in step 8), neighborhood fitness Fit of data point iiThe calculation formula is determined by the local density:
and the ZOI represents the neighborhood size of the mobile terminal data point i, and the t represents the number of the mobile terminal data points in the neighborhood range of the mobile terminal data point i.
8. The merchant location method based on pollination heuristic clustering as claimed in claim 7, wherein in step 9), the position of a certain mobile terminal data point is set as x1,xiRepresenting other data points within the mobile terminal data point neighborhood; wherein, i is 2,3,4, the adjustment mode is as follows:
then X1Namely the position after being updated by the local position updating strategy2(ii) a Wherein the content of the first and second substances,representing a characteristic difference, position, between a mobile-terminal data point i and a mobile-terminal data point jiRepresenting the current location of data point i and neighborwood representing the Neighborhood range of mobile terminal data point i.
9. The merchant site selection method based on pollination heuristic clustering as claimed in claim 8, wherein in step 11), said meeting a certain condition specifically comprises:
A. if the iteration result shows that the positions of all data points are not changed;
B. the number of iterations of the algorithm reaches a threshold.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010065995.5A CN111242697B (en) | 2020-01-20 | 2020-01-20 | Merchant site selection method based on pollination heuristic clustering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010065995.5A CN111242697B (en) | 2020-01-20 | 2020-01-20 | Merchant site selection method based on pollination heuristic clustering |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111242697A true CN111242697A (en) | 2020-06-05 |
CN111242697B CN111242697B (en) | 2022-09-09 |
Family
ID=70879759
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010065995.5A Active CN111242697B (en) | 2020-01-20 | 2020-01-20 | Merchant site selection method based on pollination heuristic clustering |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111242697B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116028829A (en) * | 2021-01-20 | 2023-04-28 | 国义招标股份有限公司 | Correction clustering processing method, device and storage medium based on transmission step length adjustment |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090150345A1 (en) * | 2007-12-06 | 2009-06-11 | Leviathan Entertainment | Web Domain Data Replication System |
CN102842088A (en) * | 2012-07-12 | 2012-12-26 | 孙军 | Intelligent business address selection system |
CN105516928A (en) * | 2016-01-15 | 2016-04-20 | 中国联合网络通信有限公司广东省分公司 | Position recommending method and system based on position crowd characteristics |
CN106599928A (en) * | 2016-12-22 | 2017-04-26 | 重庆邮电大学 | Bio-Inspired adaptive clustering method |
CN106650790A (en) * | 2016-11-21 | 2017-05-10 | 中国科学院东北地理与农业生态研究所 | Remote sensing image cluster method based on swarm intelligence |
CN109764884A (en) * | 2019-01-02 | 2019-05-17 | 北京科技大学 | A kind of school bus paths planning method and device for planning |
CN110309887A (en) * | 2019-07-09 | 2019-10-08 | 哈尔滨理工大学 | Based on the Fuzzy C-Means Clustering method for detecting abnormality for improving flower pollination |
-
2020
- 2020-01-20 CN CN202010065995.5A patent/CN111242697B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090150345A1 (en) * | 2007-12-06 | 2009-06-11 | Leviathan Entertainment | Web Domain Data Replication System |
CN102842088A (en) * | 2012-07-12 | 2012-12-26 | 孙军 | Intelligent business address selection system |
CN105516928A (en) * | 2016-01-15 | 2016-04-20 | 中国联合网络通信有限公司广东省分公司 | Position recommending method and system based on position crowd characteristics |
CN106650790A (en) * | 2016-11-21 | 2017-05-10 | 中国科学院东北地理与农业生态研究所 | Remote sensing image cluster method based on swarm intelligence |
CN106599928A (en) * | 2016-12-22 | 2017-04-26 | 重庆邮电大学 | Bio-Inspired adaptive clustering method |
CN109764884A (en) * | 2019-01-02 | 2019-05-17 | 北京科技大学 | A kind of school bus paths planning method and device for planning |
CN110309887A (en) * | 2019-07-09 | 2019-10-08 | 哈尔滨理工大学 | Based on the Fuzzy C-Means Clustering method for detecting abnormality for improving flower pollination |
Non-Patent Citations (7)
Title |
---|
HONG CHUN, QU 等: "A New Clustering Algorithm Inspired by Insect Pollination", 《2019 CHINESE AUTOMATION CONGRESS (CAC)》 * |
YANG X.: "Flower pollination algorithm for global optimization", 《INTERNATIONAL CONFERENCE ON UNCONVENTIONAL COMPUTATION AND NATURAL COMPUTATION》 * |
吕强: "基于虫媒传粉启发的自适应聚类算法研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
吴晶晶: "传粉策略在聚类算法中的研究与应用", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 * |
张姣: "基于智能算法的聚类算法研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
王永贵 等: "一种优化聚类的协同过滤推荐算法", 《计算机工程与应用》 * |
陶志勇 等: "基于改进花朵授粉的K-均值聚类算法", 《HTTPS://KNS.CNKI.NET/KCMS/DETAIL/51.1196.TP.20180811.1324.022.HTML》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116028829A (en) * | 2021-01-20 | 2023-04-28 | 国义招标股份有限公司 | Correction clustering processing method, device and storage medium based on transmission step length adjustment |
CN116028829B (en) * | 2021-01-20 | 2023-10-24 | 国义招标股份有限公司 | Correction clustering processing method, device and storage medium based on transmission step length adjustment |
Also Published As
Publication number | Publication date |
---|---|
CN111242697B (en) | 2022-09-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110245981B (en) | Crowd type identification method based on mobile phone signaling data | |
CN109002492B (en) | Performance point prediction method based on LightGBM | |
Store et al. | A GIS-based multi-scale approach to habitat suitability modeling | |
CN115618021B (en) | Method and device for recommending planting area suitable for crop variety | |
CN107578277B (en) | Rental house client positioning method for electric power marketing | |
CN112837087A (en) | User portrait construction method oriented to consultation service system | |
CN110020712A (en) | A kind of optimization population BP neural network forecast method and system based on cluster | |
CN113076484A (en) | Product recommendation method, device, equipment and storage medium based on deep learning | |
CN107220676A (en) | A kind of smart city planning system | |
CN111242697B (en) | Merchant site selection method based on pollination heuristic clustering | |
CN109919227A (en) | A kind of density peaks clustering method towards mixed attributes data set | |
CN115018357A (en) | Farmer portrait construction method and system for production performance improvement | |
CN117036061B (en) | Intelligent solution providing method and system for intelligent agricultural insurance | |
CN110633401A (en) | Prediction model of store data and establishment method thereof | |
CN111985576B (en) | Shop site selection method based on decision tree | |
Scott et al. | Forest inventory | |
CN113379269A (en) | Urban business function zoning method, device and medium for multi-factor spatial clustering | |
Moller-Jensen et al. | Patterns of population change in Ghana (1984–2000): urbanization and frontier development | |
CN108647189B (en) | Method and device for identifying user crowd attributes | |
CN116662860A (en) | User portrait and classification method based on energy big data | |
CN108133296B (en) | Event attendance prediction method combining environmental data under social network based on events | |
CN110377805A (en) | A kind of sensor resource recommended method for matching sort algorithm based on speediness embranchment | |
CN109583712B (en) | Data index analysis method and device and storage medium | |
Feng et al. | Comparing two neighbourhood classifications: a multilevel analysis of London property price 2011-2014 | |
CN111260443A (en) | Dispatching optimization method and system based on house owner demand and tenant demand |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |