CN111242697A - Merchant site selection method and system based on pollination heuristic clustering - Google Patents

Merchant site selection method and system based on pollination heuristic clustering Download PDF

Info

Publication number
CN111242697A
CN111242697A CN202010065995.5A CN202010065995A CN111242697A CN 111242697 A CN111242697 A CN 111242697A CN 202010065995 A CN202010065995 A CN 202010065995A CN 111242697 A CN111242697 A CN 111242697A
Authority
CN
China
Prior art keywords
mobile terminal
data point
information
terminal data
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010065995.5A
Other languages
Chinese (zh)
Other versions
CN111242697B (en
Inventor
屈洪春
吴晶晶
吕强
张兴成
尹力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202010065995.5A priority Critical patent/CN111242697B/en
Publication of CN111242697A publication Critical patent/CN111242697A/en
Application granted granted Critical
Publication of CN111242697B publication Critical patent/CN111242697B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0282Rating or review of business operators or products
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0204Market segmentation
    • G06Q30/0205Location or geographical consideration

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Data Mining & Analysis (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a merchant site selection method and system based on pollination heuristic clustering, and belongs to the field of data mining and machine learning. The method comprises the following steps: acquiring data: collecting a data set of a mobile terminal investigating a certain area or street; constructing a system: the database unit, the information classification and analysis unit and the information query unit are integrated together to form a platform for displaying an integrated website and a database; inputting database information: inputting the collected information into a database in time, and updating the information in the database; classifying and analyzing database information: classifying and sorting the data in the database, mining the interests and hobbies of users by using an IPCA algorithm, judging the business investment prospect of each different address, and then selecting an initial merchant address by using an SEE principle and a geographic information system; intelligent query: and inputting the address to be inquired into the system, and inputting the weight of the relevant factors to obtain the final address selection location. The invention can quickly and efficiently select commercial sites.

Description

Merchant site selection method and system based on pollination heuristic clustering
Technical Field
The invention belongs to the field of data mining and machine learning, and relates to a merchant site selection method and system based on pollination heuristic clustering.
Background
At present, when a merchant selects a site for a merchant such as a restaurant, a retail store or a vending machine, the location where the merchant appropriately sets up the store and the dispensing equipment is generally judged and selected by manually counting the passenger flow, and the method usually needs a large amount of manpower and time investment, so that the site selection efficiency is low. Moreover, the most common K-means clustering method in merchant location needs to give a certain number of clusters at the beginning of clustering, and in many practical applications, the number of clusters cannot be determined.
Therefore, a method for analyzing the best merchant location under the premise of unknown cluster number is needed.
Disclosure of Invention
In view of the above, the present invention provides a method for combining a clustering algorithm based on pollination heuristic with a Geographic Information System (GIS) spatial analysis method, which solves the problems that when the number of clusters cannot be predicted, a K-means clustering method in the conventional addressing method cannot be used, and the existing addressing method for artificially counting passenger flow is low in efficiency.
In order to achieve the purpose, the invention provides the following technical scheme:
a merchant site selection method based on pollination heuristic clustering specifically comprises the following steps:
s1: acquiring data: collecting a data set for surveying mobile terminal users in a certain area or street and the like, wherein the data set comprises information of (residence places), income level, consumption level and the like;
s2: constructing a system: the database unit, the information classification and analysis unit and the information query unit are integrated together to form a platform for displaying functions of an integrated website, a database and the like;
s3: inputting database information: inputting the collected information into a database in time, and continuously updating the information in the database according to the lapse of time;
s4: classifying and analyzing database information: classifying and sorting data in a database, mining interests and hobbies of users by using an pollination inspired clustering algorithm (IPCA), judging the business investment prospect of each different address, and then selecting an initial merchant address by using an SEE principle and a geographic information system analysis method;
s5: intelligent query: the inquirer inputs the address to be inquired into the system, and then inputs the weight of relevant factors, such as rent, consumption level of mobile terminal users of each cell in the area and the like, to obtain the final address selection location.
Further, in step S2, the constructed system includes a database unit, an information classification and analysis unit, and an information query unit;
constructing a database unit: constructing an address database (including a residential area database) containing a large number of addresses and a business information database which is related to the address database, is required to be known by a storage inquirer and corresponds to each address, and establishing an index;
constructing an information classification and analysis unit: carrying out space positioning on the address of the user in the address database, and processing the data by adopting a pollination heuristic clustering algorithm;
constructing an information query unit: comprehensively analyzing the influence of each factor on address selection, inquiring in the index according to the commercial information of the commercial customers in the existing region, outputting in a report form, and finally determining the address selection of the commercial customers.
Further, in step S4, classifying and analyzing the information by using an IPCA clustering algorithm to find the geographical location of the initial merchant address, specifically comprising the following steps:
s41: constructing a clustering statistical model based on an IPCA clustering algorithm;
s42: performing cluster analysis on the data of the mobile terminal user to obtain cluster clusters, deducing the characteristics of the user according to each cluster, such as the living point, the working place, which restaurant to eat frequently, which subway to go frequently and the like of the user, and analyzing the information of the interest and hobbies, consumption records and the like of the user;
s43: information analysis, namely deducing interest points of a user according to the position information of the user in a certain area, and combining the consumption record of the user, the information of the commercial tenant to be addressed and the information of the existing commercial tenant to realize the address selection of the commercial tenant;
s44: and (3) initially selecting the merchant, taking the geographical position of each cell as a center, and minimizing the total distance from the user of each cell to the merchant address by using an objective function SSE in a k-means algorithm, thereby calculating the initial position of the merchant.
Further, in step S4, the IPCA clustering algorithm specifically includes:
1) according to the principle of the insect-mediated plant pollination ecological process, the interaction between an insect pollination medium and a plant is abstracted, and a clustering algorithm is formulated by the following assumptions: a) wherein each datum represents a separate annual plant and the attribute of each datum represents a characteristic of the flower trait of the plant; b) each individual agent represents an insect pollinator, sensing, memory, foraging, pollen basket, pollen extraction and deposition; c) the similarity between the data is equal to the pollen grain similarity between different plant species; d) each plant dies after one year, but if sufficient pollen reproduction is obtained, its progeny replace the position of the parent or are re-randomly placed anywhere in space; after all the plants are subjected to repeated iterative evolution, the positions of all the plants are regarded as the clustering result of the data set;
2) insects in the algorithm are agents, and plants are mobile terminal data points; the foraging activities of the insects comprise pollination and pollen collection behaviors, wherein the pollination refers to that the insects randomly scatter a certain amount of pollen from a pollen basket of the insects on the current plant stigma, namely information communication between an agent and a mobile terminal data point and communication of similarity are represented; pollen collection means that pollen obtained by insects from current plants is loaded into a pollen basket of the insects, namely representing that an agent obtains data information from a mobile terminal data point; 3) in the initial stage, a certain data set and a certain number of agents are selected, each data point in the data set represents a mobile terminal data point, and the feature vector of the data point represents the feature of the mobile terminal data; randomly distributing the positions of all the agents and the mobile terminal data points in the space, and enabling each agent to randomly select one data point as a foraging object when an algorithm is started;
4) the agent selects a data point flying to the mobile terminal closest to the position of the agent to carry out foraging activity;
5) counting the total number N of data points obtained by a mobile terminal data point from an agent, selecting Euclidean distance as a measure of characteristic difference between similar data, and calculating the similarity S between the data points by using a Laplace kernel function;
6) recording past foraging history of the agent, and adding the mobile terminal data point in the foraging history of the agent after the foraging process is completed;
7) determining whether the length of the linear queue exceeds the memory depth M of the agentd(ii) a If the length of the linear queue exceeds the memory depth M of the agentdDeleting the labels of the data points of the excess part; if not, not changing the linear queue;
8) judging whether the foraging times reaches a threshold value; if the agent foraging times do not reach the threshold, the agent continues to select the mobile terminal data points which are not repeated in the foraging history and are closest to the grid for foraging activity, if the agent foraging times reach the threshold, the total number N of the data points obtained by each mobile terminal data point from the agent and the similarity S between the data points are counted, and after the foraging times of all the agents reach the upper limit, the total data income obtained by each data point from all the agents is calculated according to the data income obtained by each mobile terminal data point from the agent; calculating the survival probability of the current mobile terminal data point;
9) probability of survival P based on current position of each data pointiAnd a random number R1Judging whether the current position of the data point changes or not according to the comparison result; if the survival rate P of a certain data point ii>R1Mobile terminal data points representing the location can survive, with the current location remaining unchanged; if P isi<R1If the mobile terminal data point at the position is dead, counting other mobile terminal data in the neighborhood of the data pointThe number of points;
10) if the number of the mobile terminal data points in the neighborhood of the mobile terminal data point i is not zero, calculating the neighborhood fitness Fit of the mobile terminal data pointiIf not, regenerating a position which is not repeated with the current survival mobile terminal data point for the mobile terminal data point by using a global position updating strategy to replace the position of the death data point;
11) fit the fitness function of data point iiAnd a random number R2Comparison, if Fiti>R2If the mobile terminal data point is in the same position, the descendant of the next year of the data point is in the position; if the neighborhood fitness Fit of the data pointi<R2Then the terminal position data point i is reassigned to a position using the global position update strategy1And is not repeated with the position of the current surviving mobile terminal data point; and then uses the local position update strategy to position1Local adjustment is carried out, so that the mobile terminal data point is updated to a position with higher fitness2
12) When the algorithm iterates 1000 times, counting the number N of data points of the changed positions in the iteration resultc(ii) a If N is presentcN higher than 0.2 timescThen update parameter α to α +0.04 if N is equal tocLess than 0.2 times NcAt time α remains unchanged;
13) when a certain condition is met, the algorithm is terminated;
14) and when each mobile terminal data point obtains a final clustering position on the grid, obtaining a clustering cluster according to the final position obtained by the mobile terminal data.
Further, in step 3), the measure of the characteristic difference between the pollens is:
Figure BDA0002375974820000041
wherein x isiThe pollen representing the mobile terminal data point i contains a feature vector, x, of all attribute valuesi=(xi1,xi2,xi3,…xin),xjThe pollen representing the mobile terminal data point j contains a feature vector, x, of all attribute valuesj=(xj1,xj2,xj3,…xjn);
Similarity between pollen, namely similarity between mobile terminal data points, is measured by adopting Laplace kernel function
Figure BDA0002375974820000042
Wherein α denotes a similarity parameter, i.e. an adaptation parameter.
Further, in step 3), the measure of the characteristic difference between the pollens is:
Figure BDA0002375974820000043
wherein x isiFeature vector, x, representing that the pollen of plant i contains all attribute valuesi=(xi1,xi2,xi3,…xin),xjFeature vector, x, representing that the pollen of plant j contains all attribute valuesj=(xj1,xj2,xj3,…xjn) (ii) a The bigger the characteristic difference among the pollens is, the lower the similarity among the pollens is, otherwise, the higher the similarity among the pollens is;
the similarity between pollen is measured by Laplace kernel function
Figure BDA0002375974820000044
Where α denotes the similarity parameter, i.e. the adaptive parameter of the algorithm, the initial value is set to 0.06.
Further, in step 6), the data benefit obtained from the agent by each mobile terminal location data point is:
Figure BDA0002375974820000045
the total data yield from all agents for each data point is:
Figure BDA0002375974820000046
wherein D isnkIndicating that the mobile terminal data point i receives the amount of data from the kth mobile terminal data point, SikRepresenting the similarity between a mobile terminal data point i and a mobile terminal data point k, wherein the pollination amount in the algorithm is a fixed value, m represents the number of agents which visit the mobile terminal data point i, and n represents that the data point i receives data from n mobile terminal data points from the agents;
the probability of survival for the current data point is:
Figure BDA0002375974820000047
wherein, P0Represents the initial pollen yield, is related to the neighborhood radius of the data points in the algorithm, and the initial value is set to 0.79.
Further, in step 8), the neighborhood fitness Fit of the data point iiThe calculation formula is determined by the local density:
Figure BDA0002375974820000051
Figure BDA0002375974820000052
and the ZOI represents the neighborhood size of the mobile terminal data point i, and the t represents the number of the mobile terminal data points in the neighborhood range of the mobile terminal data point i.
Further, in step 9), the position of a certain mobile terminal data point is set as x1,xiRepresenting other data points within the mobile terminal data point neighborhood; wherein, i is 2,3,4, the adjustment mode is as follows:
Figure BDA0002375974820000053
Figure BDA0002375974820000054
Figure BDA0002375974820000055
then X1Namely the position after being updated by the local position updating strategy2(ii) a Wherein the content of the first and second substances,
Figure BDA0002375974820000056
representing a characteristic difference, position, between a mobile-terminal data point i and a mobile-terminal data point jiRepresenting the current location of data point i and neighborwood representing the Neighborhood range of mobile terminal data point i.
Further, in step 11), the meeting a certain condition specifically includes:
A. if the iteration result shows that the positions of all data points are not changed;
B. the number of iterations of the algorithm reaches a threshold.
The invention has the beneficial effects that: the invention adopts a clustering method of pollination inspiration, can automatically emerge the number of clustering clusters according to the characteristic relation among data objects, thereby leading the classification method to be more scientific, effectively finding the frequently-going interest points of the user and further excavating the interest and hobbies of the user, leading the commercial tenant to better carry out marketing service, meeting the living demands of the user and leading the commercial tenant to obtain benefits. In addition, the entry and analysis of business information related to address information enable the business addressing to be rapid and controllable factors are improved.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.
Drawings
For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 is a flow chart of a method for locating a merchant according to the present invention;
FIG. 2 is a flow chart of a pollination heuristic clustering algorithm for information classification and analysis;
FIG. 3 is a flow chart of a clustering algorithm for pollination elicitation;
FIG. 4 is a diagram of the effect of data location distribution before clustering;
fig. 5 is a diagram showing the effect of data position distribution after clustering.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.
Referring to fig. 1 to 5, as shown in fig. 1, the merchant location system according to the present invention includes: a database unit: and constructing a database containing a large amount of mobile terminal user information and addresses related to the database, storing business information databases which are expected to be known by inquirers and correspond to the addresses, and establishing an index. An information classification and analysis unit: and analyzing the position information data of the user terminal and the consumption data of the mobile terminal so as to find the interests and hobbies of the user. An information inquiry unit: comprehensively analyzing the influence of each factor on address selection, inquiring in the index according to the commercial information of the commercial customers in the region, outputting in a report form, and finally determining the address selection of the commercial customers.
Carrying out time sequencing on the business information data corresponding to each level of address with well-positioned space, namely sequencing the business information data of each address in a period of time according to the front and back of the time and parallel inputting the business information data into a table; carrying out induction statistics on the data in the table with the time sequence pushed, and analyzing the change generated by the data; after the statistics are summarized, the analysis information is output in a chart form or an analysis report form. An information inquiry unit: the input is selected according to the weight of each factor, the address to be inquired or the commercial information is inquired in the index, and a chart and an analysis report related to the address or the commercial tenant information can also be output.
The main business information data which needs to be collected in the early stage comprise a macroscopic investment environment requirement data set module and a microscopic environment requirement data set module. The macro investment environment demand data group module comprises a population data group, a business data group, a income level data group and a resident consumption level data group. The method is mainly used for screening the large environment with poor business situation of the associated address and realizing profit. The micro environment demand data group module comprises a passenger flow data group, a rent data group, a merchant combined data group and a property management data group, is mainly used for screening the small environment with poor situation of merchants of the associated addresses, is not only the micro environment demand but also the macro environment demand, and is wholly for the profit and sustainable development of the merchants, and on the basis, the resident consumers can also meet the living demands of the consumers.
The raw data is subjected to a series of preprocessing processes of collection, extraction, cleaning, sorting and the like to form high-quality data. The data in the invention represents the data of the mobile terminal of the user, such as a GeoLife data set (mobile terminal positioning data), and the invention carries out pruning processing on the data based on speed, screens out meaningless points and the like in the advancing process of the user, thereby obtaining the desired data and further analyzing the data. The present invention is described in detail below with respect to the analysis of data using an pollination heuristic clustering algorithm (IPCA) to analyze processed data.
As shown in fig. 1, the merchant location system according to the present invention is constructed as follows:
the first step is as follows: data acquisition, collecting and surveying a data set of mobile terminal users, such as a certain area or street, includes information such as (residence) information, income level, and consumption level. The builder manually surveys the business information data corresponding to each address, such as the type of industry in the region, the regional rent condition, and the like.
The second step is that: the system is constructed, and a platform capable of displaying functions of an integrated website, a database and the like is constructed by integrating a database unit, an information classification and analysis unit and an information query unit.
The third step: and (3) inputting the database information, namely inputting the acquired information into the database in time, and continuously updating the information in the database according to the lapse of time.
The fourth step: classifying and analyzing database information, classifying and sorting data in the database, mining interests and hobbies of users by using a clustering algorithm inspired by pollination, judging the business investment prospect of each different address, and selecting an initial merchant address by using an SEE principle and combining a geographic information system analysis method. As shown in fig. 2, the specific process of information classification and analysis includes:
s41: constructing a clustering statistical model based on an IPCA clustering algorithm;
s42: performing cluster analysis on the data of the mobile terminal user to obtain cluster clusters, deducing the characteristics of the user according to each cluster, such as the living point, the working place, which restaurant to eat frequently, which subway to go frequently and the like of the user, and analyzing the information of the interest and hobbies, consumption records and the like of the user;
s43: information analysis, namely deducing interest points of a user according to the position information of the user in a certain area, and combining the consumption record of the user, the information of the commercial tenant to be addressed and the information of the existing commercial tenant to realize the address selection of the commercial tenant;
s44: and (3) initially selecting the merchant, taking the geographical position of each cell as a center, and minimizing the total distance from the user of each cell to the merchant address by using an objective function SSE in a k-means algorithm, thereby calculating the initial position of the merchant.
The fifth step: and (4) intelligent query, wherein after a querier inputs an address to be queried into the system and inputs the weight of relevant factors, such as rent and the like, a final site selection location is obtained.
As shown in fig. 3, the IPCA clustering algorithm specifically includes:
1) according to the principle of the insect-mediated plant pollination ecological process, the interaction between an insect pollination medium and a plant is abstracted, and a clustering algorithm is formulated by the following assumptions: a) wherein each datum represents a separate annual plant and the attribute of each datum represents a characteristic of the flower trait of the plant; b) each individual agent represents an insect pollinator, sensing, memory, foraging, pollen basket, pollen extraction and deposition; c) the similarity between the data is equal to the pollen grain similarity between different plant species; d) each plant dies after one year, but if sufficient pollen reproduction is obtained, its progeny replace the position of the parent or are re-randomly placed anywhere in space; after all the plants are subjected to repeated iterative evolution, the positions of all the plants are regarded as the clustering result of the data set;
2) insects in the algorithm are agents, and plants are mobile terminal data points; the foraging activities of the insects comprise pollination and pollen collection behaviors, wherein the pollination refers to that the insects randomly scatter a certain amount of pollen from a pollen basket of the insects on the current plant stigma, namely information communication between an agent and a mobile terminal data point and communication of similarity are represented; pollen collection means that pollen obtained by insects from current plants is loaded into a pollen basket of the insects, namely representing that an agent obtains data information from a mobile terminal data point; 3) in the initial stage, a certain data set and a certain number of agents are selected, each data point in the data set represents a mobile terminal data point, and the feature vector of the data point represents the feature of the mobile terminal data; randomly distributing the positions of all the agents and the mobile terminal data points in the space, and enabling each agent to randomly select one data point as a foraging object when an algorithm is started;
4) the agent selects a data point flying to the mobile terminal closest to the position of the agent to carry out foraging activity;
5) counting the total number N of data points obtained by a mobile terminal data point from an agent, selecting Euclidean distance as a measure of characteristic difference between similar data, and calculating the similarity S between the data points by using a Laplace kernel function;
6) recording past foraging history of the agent, and adding the mobile terminal data point in the foraging history of the agent after the foraging process is completed;
7) determining whether the length of the linear queue exceeds the memory depth M of the agentd(ii) a If the length of the linear queue exceeds the memory depth M of the agentdDeleting the labels of the data points of the excess part; if not, not changing the linear queue;
8) judging whether the foraging times reaches a threshold value; if the agent foraging times do not reach the threshold, the agent continues to select the mobile terminal data points which are not repeated in the foraging history and are closest to the grid for foraging activity, if the agent foraging times reach the threshold, the total number N of the data points obtained by each mobile terminal data point from the agent and the similarity S between the data points are counted, and after the foraging times of all the agents reach the upper limit, the total data income obtained by each data point from all the agents is calculated according to the data income obtained by each mobile terminal data point from the agent; calculating the survival probability of the current mobile terminal data point;
9) probability of survival P based on current position of each data pointiAnd a random number R1Judging whether the current position of the data point changes or not according to the comparison result; if the survival rate P of a certain data point ii>R1Mobile terminal data points representing the location can survive, with the current location remaining unchanged; if P isi<R1If the mobile terminal data point at the position is dead, counting the number of other mobile terminal data points in the neighborhood of the data point;
10) if the number of the mobile terminal data points in the neighborhood of the mobile terminal data point i is not zero, calculating the neighborhood fitness Fit of the mobile terminal data pointiIf not, regenerating a position which is not repeated with the current survival mobile terminal data point for the mobile terminal data point by using a global position updating strategy to replace the position of the death data point;
11) fitness function of data point iFitiAnd a random number R2Comparison, if Fiti>R2If the mobile terminal data point is in the same position, the descendant of the next year of the data point is in the position; if the neighborhood fitness Fit of the data pointi<R2Then the terminal position data point i is reassigned to a position using the global position update strategy1And is not repeated with the position of the current surviving mobile terminal data point; and then uses the local position update strategy to position1Local adjustment is carried out, so that the mobile terminal data point is updated to a position with higher fitness2
The algorithm terminates when the following conditions are satisfied:
A. if the iteration result shows that the positions of all data points are not changed;
B. the number of iterations of the algorithm reaches a threshold.
12) When the algorithm iterates 1000 times, counting the number N of data points of the changed positions in the iteration resultc(ii) a If N is presentcN higher than 0.2 timescThen update parameter α to α +0.04 if N is equal tocLess than 0.2 times NcAt time α remains unchanged;
13) when a certain condition is met, the algorithm is terminated;
14) and when each mobile terminal data point obtains a final clustering position on the grid, obtaining a clustering cluster according to the final position obtained by the mobile terminal data.
Example 1
As shown in fig. 2, in the information classification and analysis of the present invention, an IPCA clustering algorithm is used to find the interests and hobbies of users, and the preliminary location of a merchant can be obtained by combining the consumption records of users, the implementation includes the following steps, the data points in this embodiment represent the data points of user mobile terminals in a certain area, insects represent agents, and the clustering algorithm is specifically explained as follows:
1) selecting a mobile terminal positioning data set, such as a GeoLife data set and the like, representing the number of the mobile terminal positioning points by the number of data points, representing a mobile terminal position by each data point in the data set, and representing the characteristics of the mobile terminal position, such as longitude, latitude, elevation and the like, by the feature vector of the data point. According to different characteristics of each data, different colors are respectively given, a certain number of agents are simultaneously selected, and all the agents and data points are randomly distributed in a 40 x 40 grid space, as shown in fig. 4.
2) And clustering the data points of the mobile terminal by using an IPCA algorithm to obtain a cluster.
Example 2
1) As shown in fig. 5, each cluster is subjected to data analysis to obtain information such as interests and hobbies of the user (for example, the user likes a periblack duck), and consumption conditions of each user on hobbies and restaurants (periblack ducks) are analyzed to judge the love degree of the user on different restaurants.
2) Determining initial site selection position, querying residence information of users in data set by combining Geographic Information System (GIS), and selecting middle geographic position of each cell as A1,A2,A3…Al,C1,C2,C3...CkRespectively representing alternative merchant positions in the geographical position of the area, and then obtaining the corresponding distance of each position by using an objective function in k-means to determine the initial address of the catering shop. The formula is as follows:
Figure BDA0002375974820000101
the SEE (ai) values obtained above are sorted from small to large to obtain a preliminary addressing sequence positioniThe minimum SEE (ai) is the desired initial address Position (Position)0)。
3) After obtaining the initial site selection result, according to the consumption level D of the user1Rental situation D of each area (different rents at different locations)2And traffic situation D3Factors, etc., each of which is weighted by a weight such as ω123Further adjusting the position of the shop to obtain the final position of the shopThe calculation formula is as follows:
Positionlasti=(ω1×D12×D23×D3…)×Positioni
when Position0After the value of (minimum in sequence) comprehensively considers each factor, the value is minimum in each sequence, and the final result of addressing is Position0Otherwise, we are based on the preliminary addressing sequence positioniThe other values in (a) update the initial addressing new sequence after taking into account various factors. We obtain the adjusted Position positions according to the above methodiAnd sorting it to obtain the most proper value Position after sortinglast(typically the minimum value in the sequence) as the final addressed position.
Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.

Claims (9)

1. A merchant site selection method based on pollination heuristic clustering is characterized by comprising the following steps:
s1: acquiring data: collecting a data set of mobile terminal users investigating a certain area or street;
s2: constructing a system: the database unit, the information classification and analysis unit and the information query unit are integrated together to form a platform for displaying an integrated website and a database;
s3: inputting database information: inputting the collected information into a database in time, and continuously updating the information in the database according to the lapse of time;
s4: classifying and analyzing database information: classifying and sorting data in a database, mining interests and hobbies of users by using a pollination inspired clustering algorithm (IPCA), judging business investment prospects of different addresses, and then selecting an initial merchant address by using an SEE principle and a geographic information system analysis method;
s5: intelligent query: the inquirer inputs the address to be inquired into the system, and then inputs the weight of the relevant factors to obtain the final site selection.
2. The merchant site selection method based on pollination heuristic clustering as claimed in claim 1, wherein in step S2, the constructed system comprises a database unit, an information classification and analysis unit and an information query unit;
constructing a database unit: constructing an address database containing a large number of addresses and a business information database which is related to the address database, is required to be known by a storage inquirer and corresponds to each address, and establishing an index;
constructing an information classification and analysis unit: carrying out space positioning on the address of the user in the address database, and clustering data by adopting a clustering algorithm based on pollination heuristic;
constructing an information query unit: comprehensively analyzing the influence of each factor on address selection, inquiring in the index according to the commercial information of the commercial customers in the existing region, outputting in a report form, and finally determining the address selection of the commercial customers.
3. The merchant site selection method based on pollination heuristic clustering as claimed in claim 1, wherein in step S4, an IPCA clustering algorithm is used to classify and analyze the information to find the geographical location of the initial merchant site selection, specifically comprising the steps of:
s41: constructing a clustering statistical model based on an IPCA clustering algorithm;
s42: performing cluster analysis on the mobile terminal user data to obtain cluster clusters, deducing the characteristics of the user according to each cluster, and analyzing the interest and hobbies of the user and consumption record information;
s43: information analysis, namely deducing interest points of a user according to the position information of the user in a certain area, and combining the consumption record of the user, the information of the commercial tenant to be addressed and the information of the existing commercial tenant to realize the address selection of the commercial tenant;
s44: and (3) initially selecting the merchant, taking the geographical position of each cell as a center, and minimizing the total distance from the user of each cell to the merchant address by using an objective function SSE in a k-means algorithm, thereby calculating the initial position of the merchant.
4. The merchant location method based on pollination heuristic clustering as claimed in claim 3, wherein in step S4, the IPCA clustering algorithm specifically comprises:
1) according to the principle of the insect-mediated plant pollination ecological process, the interaction between an insect pollination medium and a plant is abstracted, and a clustering algorithm is formulated by the following assumptions: a) wherein each datum represents a separate annual plant and the attribute of each datum represents a characteristic of the flower trait of the plant; b) each individual agent represents an insect pollinator, sensing, memory, foraging, pollen basket, pollen extraction and deposition; c) the similarity between the data is equal to the pollen grain similarity between different plant species; d) each plant dies after one year, but if sufficient pollen reproduction is obtained, its progeny replace the position of the parent or are re-randomly placed anywhere in space; after all the plants are subjected to repeated iterative evolution, the positions of all the plants are regarded as the clustering result of the data set;
2) insects in the algorithm are agents, and plants are mobile terminal data points; the foraging activities of the insects comprise pollination and pollen collection behaviors, wherein the pollination refers to that the insects randomly scatter a certain amount of pollen from a pollen basket of the insects on the current plant stigma, namely information communication between an agent and a mobile terminal data point and communication of similarity are represented; pollen collection means that pollen obtained by insects from current plants is loaded into a pollen basket of the insects, namely representing that an agent obtains data information from a mobile terminal data point; 3) in the initial stage, a certain data set and a certain number of agents are selected, each data point in the data set represents a mobile terminal data point, and the feature vector of the data point represents the feature of the mobile terminal data; randomly distributing the positions of all the agents and the mobile terminal data points in the space, and enabling each agent to randomly select one data point as a foraging object when an algorithm is started;
4) the agent selects a data point flying to the mobile terminal closest to the position of the agent to carry out foraging activity;
5) counting the total number N of data points obtained by a mobile terminal data point from an agent, selecting Euclidean distance as a measure of characteristic difference between similar data, and calculating the similarity S between the data points by using a Laplace kernel function;
6) recording past foraging history of the agent, and adding the mobile terminal data point in the foraging history of the agent after the foraging process is completed;
7) determining whether the length of the linear queue exceeds the memory depth M of the agentd(ii) a If the length of the linear queue exceeds the memory depth M of the agentdDeleting the labels of the data points of the excess part; if not, not changing the linear queue;
8) judging whether the foraging times reaches a threshold value; if the agent foraging times do not reach the threshold, the agent continues to select the mobile terminal data points which are not repeated in the foraging history and are closest to the grid for foraging activity, if the agent foraging times reach the threshold, the total number N of the data points obtained by each mobile terminal data point from the agent and the similarity S between the data points are counted, and after the foraging times of all the agents reach the upper limit, the total data income obtained by each data point from all the agents is calculated according to the data income obtained by each mobile terminal data point from the agent; calculating the survival probability of the current mobile terminal data point;
9) probability of survival P based on current position of each data pointiAnd a random number R1Judging whether the current position of the data point changes or not according to the comparison result; if the survival rate P of a certain data point ii>R1Mobile terminal data points representing the location can survive, with the current location remaining unchanged; if P isi<R1If the mobile terminal data point at the position is dead, counting other mobile terminals in the neighborhood of the data pointThe number of end data points;
10) if the number of the mobile terminal data points in the neighborhood of the mobile terminal data point i is not zero, calculating the neighborhood fitness Fit of the mobile terminal data pointiIf not, regenerating a position which is not repeated with the current survival mobile terminal data point for the mobile terminal data point by using a global position updating strategy to replace the position of the death data point;
11) fit the fitness function of data point iiAnd a random number R2Comparison, if Fiti>R2If the mobile terminal data point is in the same position, the descendant of the next year of the data point is in the position; if the neighborhood fitness Fit of the data pointi<R2Then the terminal position data point i is reassigned to a position using the global position update strategy1And is not repeated with the position of the current surviving mobile terminal data point; and then uses the local position update strategy to position1Local adjustment is carried out, so that the mobile terminal data point is updated to a position with higher fitness2
12) When the algorithm iterates 1000 times, counting the number N of data points of the changed positions in the iteration resultc(ii) a If N is presentcN higher than 0.2 timescThen update parameter α to α +0.04 if N is equal tocLess than 0.2 times NcAt time α remains unchanged;
13) when a certain condition is met, the algorithm is terminated;
14) and when each mobile terminal data point obtains a final clustering position on the grid, obtaining a clustering cluster according to the final position obtained by the mobile terminal data.
5. The merchant site selection method based on pollination heuristic clustering as claimed in claim 4, wherein in step 3), the measure of the feature difference between the pollens is:
Figure FDA0002375974810000031
wherein,xiThe pollen representing the mobile terminal data point i contains a feature vector, x, of all attribute valuesi=(xi1,xi2,xi3,…xin),xjThe pollen representing the mobile terminal data point j contains a feature vector, x, of all attribute valuesj=(xj1,xj2,xj3,…xjn);
Similarity between pollen, namely similarity between mobile terminal data points, is measured by adopting Laplace kernel function
Figure FDA0002375974810000032
Wherein α denotes a similarity parameter, i.e. an adaptation parameter.
6. The merchant location method based on pollination heuristic clustering as claimed in claim 5, wherein in step 6), the data yield obtained from the agent for each mobile terminal data point is:
Figure FDA0002375974810000033
the total data yield from all agents for each data point is:
Figure FDA0002375974810000034
wherein D isnkIndicating that the mobile terminal data point i receives the amount of data from the kth mobile terminal data point, SikRepresenting the similarity between a mobile terminal data point i and a mobile terminal data point k, wherein the pollination amount in the algorithm is a fixed value, m represents the number of agents which visit the mobile terminal data point i, and n represents that the data point i receives data from n mobile terminal data points from the agents;
the probability of survival for the current data point is:
Figure FDA0002375974810000041
wherein, P0The initial pollen yield, i.e. the initial data yield, is represented.
7. The merchant location method based on pollination heuristic clustering as claimed in claim 6, wherein in step 8), neighborhood fitness Fit of data point iiThe calculation formula is determined by the local density:
Figure FDA0002375974810000042
Figure FDA0002375974810000043
and the ZOI represents the neighborhood size of the mobile terminal data point i, and the t represents the number of the mobile terminal data points in the neighborhood range of the mobile terminal data point i.
8. The merchant location method based on pollination heuristic clustering as claimed in claim 7, wherein in step 9), the position of a certain mobile terminal data point is set as x1,xiRepresenting other data points within the mobile terminal data point neighborhood; wherein, i is 2,3,4, the adjustment mode is as follows:
Figure FDA0002375974810000044
Figure FDA0002375974810000045
Figure FDA0002375974810000046
then X1Namely the position after being updated by the local position updating strategy2(ii) a Wherein the content of the first and second substances,
Figure FDA0002375974810000047
representing a characteristic difference, position, between a mobile-terminal data point i and a mobile-terminal data point jiRepresenting the current location of data point i and neighborwood representing the Neighborhood range of mobile terminal data point i.
9. The merchant site selection method based on pollination heuristic clustering as claimed in claim 8, wherein in step 11), said meeting a certain condition specifically comprises:
A. if the iteration result shows that the positions of all data points are not changed;
B. the number of iterations of the algorithm reaches a threshold.
CN202010065995.5A 2020-01-20 2020-01-20 Merchant site selection method based on pollination heuristic clustering Active CN111242697B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010065995.5A CN111242697B (en) 2020-01-20 2020-01-20 Merchant site selection method based on pollination heuristic clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010065995.5A CN111242697B (en) 2020-01-20 2020-01-20 Merchant site selection method based on pollination heuristic clustering

Publications (2)

Publication Number Publication Date
CN111242697A true CN111242697A (en) 2020-06-05
CN111242697B CN111242697B (en) 2022-09-09

Family

ID=70879759

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010065995.5A Active CN111242697B (en) 2020-01-20 2020-01-20 Merchant site selection method based on pollination heuristic clustering

Country Status (1)

Country Link
CN (1) CN111242697B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116028829A (en) * 2021-01-20 2023-04-28 国义招标股份有限公司 Correction clustering processing method, device and storage medium based on transmission step length adjustment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090150345A1 (en) * 2007-12-06 2009-06-11 Leviathan Entertainment Web Domain Data Replication System
CN102842088A (en) * 2012-07-12 2012-12-26 孙军 Intelligent business address selection system
CN105516928A (en) * 2016-01-15 2016-04-20 中国联合网络通信有限公司广东省分公司 Position recommending method and system based on position crowd characteristics
CN106599928A (en) * 2016-12-22 2017-04-26 重庆邮电大学 Bio-Inspired adaptive clustering method
CN106650790A (en) * 2016-11-21 2017-05-10 中国科学院东北地理与农业生态研究所 Remote sensing image cluster method based on swarm intelligence
CN109764884A (en) * 2019-01-02 2019-05-17 北京科技大学 A kind of school bus paths planning method and device for planning
CN110309887A (en) * 2019-07-09 2019-10-08 哈尔滨理工大学 Based on the Fuzzy C-Means Clustering method for detecting abnormality for improving flower pollination

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090150345A1 (en) * 2007-12-06 2009-06-11 Leviathan Entertainment Web Domain Data Replication System
CN102842088A (en) * 2012-07-12 2012-12-26 孙军 Intelligent business address selection system
CN105516928A (en) * 2016-01-15 2016-04-20 中国联合网络通信有限公司广东省分公司 Position recommending method and system based on position crowd characteristics
CN106650790A (en) * 2016-11-21 2017-05-10 中国科学院东北地理与农业生态研究所 Remote sensing image cluster method based on swarm intelligence
CN106599928A (en) * 2016-12-22 2017-04-26 重庆邮电大学 Bio-Inspired adaptive clustering method
CN109764884A (en) * 2019-01-02 2019-05-17 北京科技大学 A kind of school bus paths planning method and device for planning
CN110309887A (en) * 2019-07-09 2019-10-08 哈尔滨理工大学 Based on the Fuzzy C-Means Clustering method for detecting abnormality for improving flower pollination

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
HONG CHUN, QU 等: "A New Clustering Algorithm Inspired by Insect Pollination", 《2019 CHINESE AUTOMATION CONGRESS (CAC)》 *
YANG X.: "Flower pollination algorithm for global optimization", 《INTERNATIONAL CONFERENCE ON UNCONVENTIONAL COMPUTATION AND NATURAL COMPUTATION》 *
吕强: "基于虫媒传粉启发的自适应聚类算法研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *
吴晶晶: "传粉策略在聚类算法中的研究与应用", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 *
张姣: "基于智能算法的聚类算法研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *
王永贵 等: "一种优化聚类的协同过滤推荐算法", 《计算机工程与应用》 *
陶志勇 等: "基于改进花朵授粉的K-均值聚类算法", 《HTTPS://KNS.CNKI.NET/KCMS/DETAIL/51.1196.TP.20180811.1324.022.HTML》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116028829A (en) * 2021-01-20 2023-04-28 国义招标股份有限公司 Correction clustering processing method, device and storage medium based on transmission step length adjustment
CN116028829B (en) * 2021-01-20 2023-10-24 国义招标股份有限公司 Correction clustering processing method, device and storage medium based on transmission step length adjustment

Also Published As

Publication number Publication date
CN111242697B (en) 2022-09-09

Similar Documents

Publication Publication Date Title
CN110245981B (en) Crowd type identification method based on mobile phone signaling data
CN109002492B (en) Performance point prediction method based on LightGBM
Store et al. A GIS-based multi-scale approach to habitat suitability modeling
CN115618021B (en) Method and device for recommending planting area suitable for crop variety
CN107578277B (en) Rental house client positioning method for electric power marketing
CN112837087A (en) User portrait construction method oriented to consultation service system
CN110020712A (en) A kind of optimization population BP neural network forecast method and system based on cluster
CN113076484A (en) Product recommendation method, device, equipment and storage medium based on deep learning
CN107220676A (en) A kind of smart city planning system
CN111242697B (en) Merchant site selection method based on pollination heuristic clustering
CN109919227A (en) A kind of density peaks clustering method towards mixed attributes data set
CN115018357A (en) Farmer portrait construction method and system for production performance improvement
CN117036061B (en) Intelligent solution providing method and system for intelligent agricultural insurance
CN110633401A (en) Prediction model of store data and establishment method thereof
CN111985576B (en) Shop site selection method based on decision tree
Scott et al. Forest inventory
CN113379269A (en) Urban business function zoning method, device and medium for multi-factor spatial clustering
Moller-Jensen et al. Patterns of population change in Ghana (1984–2000): urbanization and frontier development
CN108647189B (en) Method and device for identifying user crowd attributes
CN116662860A (en) User portrait and classification method based on energy big data
CN108133296B (en) Event attendance prediction method combining environmental data under social network based on events
CN110377805A (en) A kind of sensor resource recommended method for matching sort algorithm based on speediness embranchment
CN109583712B (en) Data index analysis method and device and storage medium
Feng et al. Comparing two neighbourhood classifications: a multilevel analysis of London property price 2011-2014
CN111260443A (en) Dispatching optimization method and system based on house owner demand and tenant demand

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant