WO2021203728A1 - 一种展业区域的选址方法、装置、计算机设备和介质 - Google Patents

一种展业区域的选址方法、装置、计算机设备和介质 Download PDF

Info

Publication number
WO2021203728A1
WO2021203728A1 PCT/CN2020/135617 CN2020135617W WO2021203728A1 WO 2021203728 A1 WO2021203728 A1 WO 2021203728A1 CN 2020135617 W CN2020135617 W CN 2020135617W WO 2021203728 A1 WO2021203728 A1 WO 2021203728A1
Authority
WO
WIPO (PCT)
Prior art keywords
area
category
data
landmarks
clustering
Prior art date
Application number
PCT/CN2020/135617
Other languages
English (en)
French (fr)
Inventor
周敏芳
薛淼
邓坤
王建明
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021203728A1 publication Critical patent/WO2021203728A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0204Market segmentation
    • G06Q30/0205Location or geographical consideration

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a method for selecting a location for a business area, a device for selecting a location for a business area, computer equipment, and a computer-readable storage medium.
  • the selection of the exhibition area is generally manual, that is, the business personnel themselves use map software, review software recommendations, or the team based on experience to select landmarks such as transportation hubs, business districts, or shopping malls as the exhibition area.
  • the existing transportation hubs, business districts or shopping malls recommended by map software or review software are generally in accordance with the artificial regulations of the administrative area or commercial area, which have the disadvantages of ambiguous scope, uncertain size and single nature, and most of the popular business districts are tourism. Scenic spots or shopping districts, as far as the entire city is concerned, the coverage area is small and the crowd involved is small. To select a region that meets the needs of the exhibition industry, it requires business personnel to spend a lot of time.
  • the purpose of the embodiments of this application is to propose a location method for the exhibition area, a location device for the exhibition area, computer equipment, and computer-readable storage medium, so as to solve the problem of how to select the location of the exhibition area from the map. Conveniently determine the exhibition area that meets the preset exhibition needs.
  • an embodiment of the present application provides a method for selecting a location for an exhibition area, and the method for selecting a location may include:
  • first landmark data is used to indicate geographic location information of different landmarks that can be included in the target geographic area
  • the dbscan algorithm uses the dbscan algorithm to perform clustering processing on the first landmark data, and determine the area corresponding to the category that meets the preset requirements in at least one of the clustered categories as the exhibition area, where the preset requirements may include the size of the area Requirements and/or requirements for the number of landmarks in the area.
  • the clustering process is performed on the first landmark data by using the dbscan algorithm, and the area corresponding to the category that meets the preset requirements in the at least one category obtained by the clustering is determined to be the exhibition area, which may include:
  • the preset requirements may include that the number of landmarks within the category is less than or equal to the first category.
  • a preset value and the maximum point distance is less than or equal to a second preset value.
  • the method may further include:
  • the location method may further include:
  • the target data may include one or more of landmark attribute data, customer behavior data, basic customer data, business personnel data, business history data, and passenger flow data;
  • the fusion model may include a GBDT model, a GRU model, and an RF model.
  • the target data is input into a preset fusion model for potential prediction, and the recommendation score of the business area is output, which may include:
  • performing clustering processing on the first landmark data using the dbscan algorithm to obtain at least one first category may include:
  • Step A Set the radius to r, the minimum number of landmarks in the initial area is m, and the step size of the minimum number of landmarks in each new area is step_m. Before the algorithm is executed for the first time, set the minimum number of landmarks in the newly added area to be the first time The smallest number of landmarks in the area;
  • Step B Use the dbscan algorithm to perform a clustering operation on the first landmark data to obtain n+2 clustering categories from -1 to n;
  • Step C Determine that the number of landmarks in the clustering category is less than m+N ⁇ step_m as the first category, and modify the minimum number of landmarks in the newly added area to m+N ⁇ step_m, where N is the number of executions of the clustering operation;
  • Step D Perform a clustering operation on the target geographic area except for the area corresponding to the first category
  • Step E Repeat Step C and Step D until all the first categories that can be included in the target geographic area are determined.
  • the determining the center point and the maximum point distance of each corresponding area of the first category may include:
  • the geographic location information of the different landmarks that can be included in each corresponding area of the first category is input into the following formula to calculate the center point coordinates:
  • respectively refers to the center point of the corresponding area of the first category
  • Q is the number of landmarks within the category
  • x i is the coordinate value of the landmarks that can be included in each corresponding area of the first category
  • the point distance between the center point and each landmark in the area corresponding to the first category is calculated, and the largest distance among the point distances is determined to be the largest point distance.
  • an embodiment of the present application provides an address selection device for an exhibition area, and the address selection device may include:
  • the first acquisition module is configured to acquire first landmark data, where the first landmark data is used to indicate geographic location information of different landmarks that can be included in the target geographic area;
  • the first clustering processing module is configured to perform clustering processing on the first landmark data using the dbscan algorithm, and determine that the area corresponding to the category that meets the preset requirements in the at least one category obtained by clustering is the exhibition area, where the preset
  • the requirements may include area size requirements and/or requirements for the number of landmarks in the area.
  • an embodiment of the present application also provides a computer device, which may include a memory and a processor, the memory stores computer-readable instructions, and the processor executes the computer-readable instructions to implement the following Steps of the site selection method for the exhibition area:
  • first landmark data is used to indicate geographic location information of different landmarks included in the target geographic area
  • the dbscan algorithm is used to perform clustering processing on the first landmark data, and in at least one category obtained by clustering, an area corresponding to a category that meets a preset requirement is determined as an exhibition area, where the preset requirement includes an area The size requirements and/or the number of landmarks in the area.
  • the embodiments of the present application also provide a computer-readable storage medium.
  • the computer-readable storage medium may be non-volatile or volatile, and computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by the processor, the following operations are realized Steps of area location method:
  • first landmark data is used to indicate geographic location information of different landmarks included in the target geographic area
  • the dbscan algorithm is used to perform clustering processing on the first landmark data, and in at least one category obtained by clustering, an area corresponding to a category that meets a preset requirement is determined as an exhibition area, where the preset requirement includes an area The size requirements and/or the number of landmarks in the area.
  • the dbscan algorithm is used to perform a clustering operation on the first landmark data in the target geographic area, and the regions corresponding to the multiple categories obtained from the clustering Select the area that meets the preset requirements as the exhibition area. Since the dbscan algorithm is a density-based clustering algorithm, after the first landmark data is processed by the dbscan algorithm, some areas in the target geographic area with sufficiently high landmark density can be determined, and the landmark density in the area can reflect to a certain extent Out of the potential for dense crowds in the area. Therefore, in the solution of the present application, an area that meets the preset requirements is selected from the areas corresponding to the multiple categories obtained by clustering as the exhibition area, which can conveniently realize the selection of the exhibition area that has the potential for crowd-intensive flow.
  • FIG. 1 is a schematic diagram of an embodiment of a method for selecting a location for an exhibition area in an embodiment of the present application
  • FIG. 2 is a schematic diagram of an embodiment after step S120 in FIG. 1;
  • FIG. 3 is a schematic diagram of an embodiment of step S140 in FIG. 2;
  • FIG. 4 is a schematic diagram of an embodiment of step S120 in FIG. 1;
  • FIG. 5 is a schematic diagram of an embodiment after step S122 in FIG. 4;
  • FIG. 6 is a schematic diagram of an embodiment of a location selection device for an exhibition area in an embodiment of the present application.
  • FIG. 7 is a schematic diagram of an embodiment of the first clustering processing module 602 in the embodiment shown in FIG. 6;
  • FIG. 8 is a schematic diagram of another embodiment of a location selection device for an exhibition area provided by this application.
  • FIG. 9 is a schematic diagram of another embodiment of a location selection device for an exhibition area provided by this application.
  • FIG. 10 is a schematic diagram of an embodiment of a computer device provided by this application.
  • FIG. 1 is a schematic diagram of an embodiment of a method for selecting a location for a business area in an embodiment of the application.
  • the method for selecting a location for a business area may include:
  • Step S110 Acquire first landmark data, where the first landmark data is used to indicate geographic location information of different landmarks included in the target geographic area.
  • the location method of the exhibition area can be run on the electronic device on it.
  • the electronic device can be a server or a terminal device, which can respond to a user's operation input or an instruction input from an external device. , And perform the corresponding operation.
  • the target geographic area may be an area on a map selected in response to a user operation.
  • the range may be large or small.
  • the large area may be a city, urban area or street, and the small area may be a commercial area, residential area, etc.
  • the first landmark data corresponds to the target geographic area, and may include geographic location information of different landmarks in the target geographic area, and the information may be obtained from various map databases.
  • the types of landmarks included in the first landmark data can be divided into categories: shopping services (including subcategories: shopping malls, convenience stores, home appliances and electronics stores, supermarkets, furniture markets, flower and bird markets, etc.), commercial residences (Including subcategories: office buildings, residential areas, industrial parks), companies (including subcategories: companies, factories, etc.), healthcare services (including subcategories: hospitals, medical and healthcare sales stores, etc.), transportation facilities services (including small Category: Airport), road ancillary facilities (including subcategories: gas stations, etc.), colleges and universities (including subcategories: university towns, etc.), tourist attractions (including subcategories: scenic spots, etc.), etc., geographic location information of different landmarks It can be the latitude and longitude information of each landmark, and the user can preset the level and type of the extracted landmark.
  • Step S120 clustering the first landmark data using the dbscan algorithm, and determining an area corresponding to a category that meets preset requirements in at least one category obtained by clustering as a business area, wherein the preset Requirements include area size requirements and/or requirements for the number of landmarks in the area.
  • the density-based spatial clustering of applications with noise (DBSCAN) algorithm is a density-based clustering algorithm.
  • the algorithm divides areas with sufficient density into clusters, that is, categories, and finds clusters of arbitrary shapes in a noisy spatial database.
  • clusters are defined as the largest collection of densely connected points.
  • the algorithm uses the concept of density-based clustering, which requires that the number of objects (points or other spatial objects) contained in a certain area in the clustering space is not less than a given threshold.
  • the significant advantage of the dbscan algorithm is that the clustering speed is fast and it can effectively deal with noise points and find spatial clusters of arbitrary shapes.
  • various parameters of the dbscan algorithm need to be set in advance, including the radius r and minPts, which indicates the number of sample points included in the neighborhood of r.
  • the sample points in this application refer to landmark points.
  • the dbscan algorithm is used to cluster the first landmark data, and multiple categories or clusters are obtained by clustering, and each category or cluster corresponds to a region in the target geographic area.
  • the dbscan algorithm is used to perform clustering processing on the first coordinate data to obtain multiple categories, which is similar to the method of using the dbscan algorithm to perform clustering operations on other data in the prior art, and will not be repeated here.
  • the regions corresponding to each category obtained after the above clustering operation are evaluated separately to determine whether the size of the region and the number of landmarks in the region meet the preset requirements, and if so, it is determined to be an exhibition area.
  • the preset conditions may include requirements for the size of the area and/or the number of landmarks in the area, for example, the radius or area of the area is within a certain range, and the number of landmarks in the area is within a certain range.
  • the location selection method of the exhibition area provided in the embodiments of this application can also be used for location selection in any scenario that focuses on landmarks or traffic demand.
  • it can also be used for physical store selection.
  • Location by setting different preset requirements, so as to achieve the location requirements of physical stores.
  • the dbscan algorithm is used to perform a clustering operation on the obtained first landmark data in the target geographic area, and then multiple categories obtained from the clustering Select the area that meets the preset requirements from the corresponding area as the exhibition area. Since the dbscan algorithm is a density-based clustering algorithm, after the first landmark data is processed by the dbscan algorithm, some areas in the target geographic area with sufficiently high landmark density can be determined, and the landmark density in the area can reflect to a certain extent Extend the crowd-intensive potential in the area and meet the demand for crowd flow in the exhibition area. Therefore, in the solution of the present application, an area that meets the preset requirements is selected from the areas corresponding to the multiple categories obtained by clustering as the exhibition area, which can conveniently realize the selection of the exhibition area that has the potential for crowd-intensive flow.
  • FIG. 2 is a schematic diagram of an embodiment after step S120, which may include:
  • Step S130 Obtain target data of the exhibition area.
  • the target data includes one or more of landmark attribute data, customer behavior data, basic customer data, exhibition personnel data, exhibition history data, and passenger flow data.
  • the target data corresponding to the business area can be further obtained from the server or the network according to the interface corresponding to the various target data.
  • the target data may include landmark attribute data, customer behavior data, and customer basic data , One or more of exhibition personnel data, exhibition industry historical data and passenger flow data.
  • the specific target data format and actual content examples can be referred to as shown in Table 1 below.
  • the input includes some data uploaded when applying for registration, for example, the data uploaded when applying for a savings card, and the submission includes data collected when the registration or processing is successful.
  • the target data may also be stored in a node of a blockchain.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
  • Blockchain can include the underlying blockchain platform, platform product service layer, and application service layer.
  • Step S140 input the target data into the preset fusion model, and output the recommendation degree score of the business area.
  • a fusion model combining multiple models is used to process the target data, so as to calculate the recommendation degree score of the output business area.
  • the recommendation score is used to indicate the predicted potential of the exhibition area to be crowded in a certain period of time in the future.
  • the reason for adopting the fusion model is that the predictive ability of a single model is not high and each model has certain shortcomings when used separately. Therefore, the fusion model obtained by combining multiple models can be used for processing to improve the predictive ability.
  • the target data in the business area is acquired, and then the preset fusion model is used to process the target data, and the recommendation score corresponding to each business area is calculated, which is more helpful for the user to select the appropriate Exhibition area.
  • the preset fusion model may include a gradient boosting decision tree (GBDT) model, a GRU (Gated Recurrent Unit) model, and a random forest (RF) model .
  • GBDT gradient boosting decision tree
  • GRU Gate Recurrent Unit
  • RF random forest
  • S141 Enter the target data into the GBDT model, and identify and determine the most important feature data and feature combinations in the target data.
  • the GBDT model is an iterative decision tree algorithm, composed of multiple decision trees, used to automatically process a large number of sparse features, identify high-importance features, and obtain new feature combinations.
  • the GBDT model can be used to identify and determine the more important data in the target data to combine the data to generate corresponding data features.
  • S142 Use the GRU model to process the feature data and feature combination in a time series, and output the feature integration data.
  • the GRU model is a variant of the long short-term memory neural network (LSTM), which is mainly good at processing time series data and making time series predictions.
  • the GRU model can be used to process the important data output by the GBDT model and the combined data in time series. For example, it can capture the changes in passenger flow and incoming parts in the exhibition area in the time series, so as to obtain feature integration. data.
  • S143 Sort the target data according to time to obtain static data and time series data.
  • the target data can be sorted into static data and time series data according to time.
  • static data is also called cross-sectional data, which refers to data collected at the same or similar time points.
  • Time series data also known as dynamic data, refers to data collected in chronological order and used to describe changes in phenomena over time.
  • static data can refer to Table 2 below
  • time series data can refer to Table 3 below.
  • Table 3 Card-opening consumption data of each customer in a time series over a period of time
  • step S143 there is no necessary execution order requirement for step S143 and step S141.
  • S141 may be executed first, or S143 may be executed first, and the specific execution order is not limited here.
  • S144 Input the feature integration data, static data and time series data into the RF model for potential prediction, and output the recommendation degree scores of all business areas in the target geographic area.
  • the RF model adopts the method of replacement uniform sampling (Bagging) for sampling training, which is not sensitive to outliers, and because the trees are independent of each other, can be parallelized, and are not easy to overfit, the model is predicting You can have a higher accuracy and training speed on it.
  • the RF model is composed of multiple decision trees, and the predicted probability of each tree is averaged, which is the probability value of the entire random forest model.
  • the feature integration data, static data, and time series data are used as input, and the pre-trained RF model is used to predict the potential and generate a corresponding score, which is the recommendation score of the business area.
  • the target data is processed by using a fusion model combining GBDT, GRU, and RF, so that a highly accurate recommendation degree score of the business area can be obtained.
  • FIG. 4 is a schematic diagram of an embodiment of step S120, which may include:
  • S121 Perform clustering processing on the first landmark data by using the dbscan algorithm to obtain at least one first category.
  • the technical means of clustering processing may be similar to that in the foregoing step S120, and will not be repeated here.
  • using the dbscan algorithm to perform clustering processing on the first landmark data may be multiple rounds of clustering to obtain at least one first category, and the multiple rounds of clustering may include:
  • Step A Set the radius eps to r, the minimum number of landmarks in the first area (first_min_sample) to m, and the step size of the minimum number of landmarks (min_sample) in each new area to step_m.
  • min_sample first_min_sample.
  • min_sample is the parameter minPts of the dbscan algorithm.
  • Step B Use the dbscan algorithm to perform a clustering operation on the first landmark data to obtain n+2 clustering categories from -1 to n.
  • Step D Perform a clustering operation on other regions in the target geographic region except for the region corresponding to the first category;
  • Step E Repeat Step C and Step D until all the first categories included in the target geographic area are determined. Among them, repeat steps C and D until the area corresponding to the category with the last landmark number less than m+N ⁇ step_m is retained in the target geographic area. At this time, all the first categories included in the target geographic area are determined .
  • S122 Analyze the corresponding area of each first category, and determine the center point and the maximum point distance of each first category corresponding area, where the maximum point distance is the distance between all landmarks in each first category corresponding area and the center point The maximum value in.
  • the area corresponding to each first category can be parsed and determined.
  • the specific analysis operation may include: determining all landmark points in each first category; determining the coverage area of each landmark point in all landmark points, where the coverage area is a circular area with the landmark point as the center and a radius of r; The union of the coverage areas of all landmark points in the first category is used as the area corresponding to the first category.
  • Each first category corresponds to an area, and the center point of the corresponding area of each first category can be calculated, and then the maximum point distance can be determined according to the center point.
  • determining the distance between the center point and the maximum point may include:
  • ⁇ 1 , ⁇ 2 , ... ⁇ k respectively refer to the center points of the K corresponding areas of the first category
  • x i is the coordinate value of the landmark included in the corresponding area of each first category
  • Q j is the jth
  • the first category corresponds to the number of landmarks included in the area.
  • the point distance between the center point and each landmark in the corresponding area of the first category can be calculated, and the largest distance among these point distances is determined as the maximum point distance.
  • S123 Count the number of landmarks in each first category corresponding to the area, and determine that the first area corresponding to the category in the first category that meets the preset requirements is the exhibition area.
  • the preset requirement includes that the number of landmarks in the category is less than or equal to the first preset. Set the value and the maximum point distance is less than or equal to the second preset value.
  • the preset requirements may include that the number of landmarks in the category is less than or equal to the first preset value and/or the maximum point distance is less than or equal to the second preset value.
  • FIG. 5 is a schematic diagram of an embodiment after step S122, which may include:
  • S124 Determine a second area corresponding to a category that does not meet the preset requirements in the first category, and extract geographic location information of different landmarks included in the second area as second landmark data.
  • the second area corresponding to the category that does not meet the preset requirements in the first category can be determined, and the geographic location information of different landmarks included in the second area can be extracted as the first category. 2. Landmark data.
  • S125 Perform clustering processing on the second landmark data by using the dbscan algorithm to obtain at least one second category.
  • intra-category clustering processing can be performed on the second regions corresponding to each category that does not meet the preset requirements, so as to obtain at least one second category.
  • the categories respectively correspond to a sub-areas in the area of the first category. It should be noted that step S124 is similar to the clustering processing in step S120, and will not be repeated here.
  • S126 Analyze the corresponding area of each second category, and determine the center point and the maximum point distance of the corresponding area of each second category.
  • S127 Count the number of landmarks within the category of the corresponding area of each second category, and determine that the first area corresponding to the category that meets the preset requirements in the second category is the exhibition area.
  • step S126-step S127 are similar to steps S122-S123, and will not be repeated here.
  • step S124-step S127 can be clustered multiple times, that is, after each clustering process, the area of the category that meets the preset requirements is determined as the exhibition area, and the category that does not meet the preset requirements corresponds to Area, perform the clustering operation in the category area again, until all the business areas that meet the preset requirements included in the target geographic area are determined. Due to multiple clustering and screening with preset requirements, the size and shape of the finally obtained exhibition areas are more similar, which is convenient for actual exhibition deployment activities.
  • the second region corresponding to the category that does not meet the preset requirements can be subjected to intra-class clustering processing, so as to achieve further identification and determination of the business area in the target geographic area.
  • This application can be used in many general or special computer system environments or configurations. For example: personal computers, server computers, handheld devices or portable devices, tablet devices, multi-processor systems, microprocessor-based systems, set-top boxes, programmable consumer electronic devices, network PCs, small computers, large computers, including Distributed computing environment for any of the above systems or equipment, etc.
  • This application may be described in the general context of computer-executable instructions executed by a computer, such as a program module.
  • program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types.
  • This application can also be practiced in distributed computing environments. In these distributed computing environments, tasks are performed by remote processing devices connected through a communication network.
  • program modules can be located in local and remote computer storage media including storage devices.
  • the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.
  • FIG. 6 is a schematic diagram of an embodiment of a location selection device for an exhibition area provided by this application, which may include:
  • the first obtaining module 601 is configured to obtain first landmark data, where the first landmark data is used to indicate geographic location information of different landmarks included in the target geographic area;
  • the first clustering processing module 602 is configured to perform clustering processing on the first landmark data by using the dbscan algorithm, and determine an area corresponding to a category that meets a preset requirement in at least one category obtained by clustering as an exhibition area , wherein the preset requirements include area size requirements and/or requirements for the number of landmarks in the area.
  • FIG. 7 is a schematic diagram of an embodiment of the first clustering processing module 602 in the embodiment shown in FIG. 6.
  • the first clustering processing module 602 includes a clustering processing sub-module 6021, The calculation sub-module 6022 and the judgment sub-module 6023, in which:
  • the clustering processing sub-module 6021 is configured to perform clustering processing on the first landmark data by using the dbscan algorithm to obtain at least one first category;
  • the calculation sub-module 6022 is used to analyze each corresponding area of the first category, and determine the center point and the maximum point distance of each corresponding area of the first category, and the maximum point distance corresponds to each of the first category The maximum value of the distances between all landmarks in the area and the center point;
  • the judging sub-module 6023 is used to count the number of landmarks in each corresponding area of the first category, and determine that the first area corresponding to the category that meets the preset requirements in the first category is an exhibition area. It is assumed that the requirements include that the number of landmarks in the class is less than or equal to a first preset value and/or the maximum point distance is less than or equal to a second preset value.
  • FIG. 8 is a schematic diagram of another embodiment of a location selection device for a business area provided by this application.
  • the location selection device further includes a second clustering processing module 603, wherein :
  • the second clustering processing module 603 is specifically configured to determine a second area corresponding to a category that does not meet the preset requirements in the first category, and extract geographic location information of different landmarks included in the second area , As the second landmark data; use the dbscan algorithm to cluster the second landmark data to obtain at least one second category; analyze the corresponding area of each second category to determine each second category The center point and the maximum point distance of the corresponding area; count the number of landmarks in each corresponding area of the second category, and determine that the first area corresponding to the category that meets the preset requirements in the second category is the exhibition industry area.
  • FIG. 9 is a schematic diagram of another embodiment of a location selection device for a business area provided by this application.
  • the location selection device further includes a second acquisition module 604 and recommendation evaluation Module 605, where:
  • the second acquisition module 604 is configured to acquire target data of the business area, where the target data includes one or more of landmark attribute data, customer behavior data, basic customer data, business personnel data, business history data, and passenger flow data kind;
  • the recommendation degree evaluation module 605 is configured to input the target data into a preset fusion model, and output the recommendation degree score of the business area.
  • the fusion model includes a GBDT model, a GRU model, and an RF model
  • the recommendation evaluation module 605 is specifically configured to input the target data into the GBDT model, identify and determine the feature data and feature combinations with high importance in the target data; use the GRU model to compare the feature data and the feature Combine time series processing to output feature integration data; sort the target data according to time to obtain static data and time series data; input the feature integration data, the static data and the time series data into the office
  • the RF model makes potential predictions, and outputs the recommendation score of the business area.
  • the clustering processing sub-module 6021 is specifically configured to perform the following steps:
  • Step A Set the radius to r, the minimum number of landmarks in the initial area is m, and the step size of the minimum number of landmarks in each new area is step_m. Before the algorithm is executed for the first time, set the minimum number of landmarks in the newly added area to be the first time The smallest number of landmarks in the area;
  • Step B Use the dbscan algorithm to perform a clustering operation on the first landmark data to obtain n+2 clustering categories from -1 to n;
  • Step C Determine that the number of landmarks in the clustering category is less than m+N ⁇ step_m as the first category, and modify the minimum number of landmarks in the newly added area to m+N ⁇ step_m, where N is the number of executions of the clustering operation;
  • Step D Perform a clustering operation on other regions in the target geographic region except for the region corresponding to the first category;
  • Step E Repeat Step C and Step D until all the first categories included in the target geographic area are determined.
  • calculation sub-module 6022 is specifically configured to perform the following steps:
  • respectively refers to the center point of the corresponding area of the first category
  • Q is the number of landmarks within the category
  • x i is the coordinate value of the landmarks included in each corresponding area of the first category
  • the point distance between the center point and each landmark in the area corresponding to the first category is calculated, and the largest distance among the point distances is determined as the maximum point distance.
  • the location selection device of the business area uses the dbscan algorithm to perform a clustering operation on the first landmark data in the target geographic area, and then obtains from the clustering Select the area that meets the preset requirements from the areas corresponding to the multiple categories as the exhibition area.
  • the dbscan algorithm is a density-based clustering algorithm
  • some areas in the target geographic area with sufficiently high landmark density can be determined, and the landmark density in the area can reflect to a certain extent Out of the potential for dense crowds in the area.
  • the location selection device of the exhibition area selects the area that meets the preset requirements from the areas corresponding to the multiple categories obtained by clustering as the exhibition area, which can conveniently realize the exhibition area with the potential for crowded people. s Choice.
  • FIG. 10 is a basic structural block diagram of a computer device in an embodiment of this application.
  • the computer device includes a memory 1001, a processor 1002, and a network interface 1003 that are mutually communicatively connected via a system bus. It should be pointed out that the figure only shows computer equipment with components 1001-1003, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions.
  • Its hardware includes, but is not limited to, a microprocessor, a dedicated Integrated Circuit (Application Specific Integrated Circuit, ASIC), Programmable Gate Array (Field-Programmable Gate Array, FPGA), Digital Processor (Digital Signal Processor, DSP), embedded equipment, etc.
  • ASIC Application Specific Integrated Circuit
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Processor
  • the computer device may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the computer device can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.
  • the memory 1001 includes at least one type of readable storage medium.
  • the readable storage medium may be non-volatile or volatile.
  • the readable storage medium includes flash memory, hard disk, multimedia card, and card.
  • Type memory for example, SD or DX memory, etc.
  • RAM random access memory
  • SRAM static random access memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • PROM programmable only Read memory
  • magnetic memory magnetic disk, optical disk, etc.
  • the memory 1001 may be an internal storage unit of the computer device, such as a hard disk or memory of the computer device.
  • the memory 1001 may also be an external storage device of the computer device, such as a plug-in hard disk equipped on the computer device, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital) Digital, SD) cards, flash cards, etc.
  • the memory 1001 may also include both an internal storage unit of the computer device and an external storage device thereof.
  • the memory 1001 is generally used to store an operating system and various application software installed in the computer device, such as computer-readable instructions for the address selection method of the business area shown in FIG. 1.
  • the memory 1001 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 1002 may be a central processing unit (CPU), a controller, a microcontroller, a microprocessor, or other data processing chips.
  • the processor 1002 is generally used to control the overall operation of the computer device.
  • the processor 1002 is configured to run computer-readable instructions or process data stored in the memory 1001, for example, run the computer-readable instructions of the address selection method for the business area shown in FIG. 1.
  • the network interface 1003 may include a wireless network interface or a wired network interface, and the network interface 1003 is generally used to establish a communication connection between the computer device and other electronic devices.
  • the dbscan algorithm is a density-based clustering algorithm
  • the first landmark data is processed by the dbscan algorithm, some areas in the target geographic area with sufficiently high landmark density can be determined, and the landmark density in the area can reflect to a certain extent Out of the potential for dense crowds in the area. Therefore, in the solution of the present application, an area that meets the preset requirements is selected from the areas corresponding to the multiple categories obtained by clustering as the exhibition area, which can conveniently realize the selection of the exhibition area that has the potential for crowd-intensive flow.
  • the computer-readable storage medium may be non-volatile or volatile, and the computer-readable storage medium stores Computer-readable instructions, the computer-readable instructions can be executed by at least one processor, so that the at least one processor executes the address selection method of the business area shown in FIG. 1 and any of the optional embodiments described above step.
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to make a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present application.
  • a terminal device which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种展业区域的选址方法,属于人工智能领域,用于解决现有在进行展业区域的选址时,如何从地图上便捷确定满足预设的展业需求的展业区域。其中,所述选址方法包括:获取第一地标数据,所述第一地标数据用于指示目标地理区域内所包括的不同地标的地理位置信息(S110);利用dbscan算法对所述第一地标数据进行聚类处理,并在聚类得到的至少一个类别中将满足预设要求的类别所对应的区域确定为展业区域,所述预设要求包括区域大小要求和/或区域内地标数量要求(S120)。

Description

一种展业区域的选址方法、装置、计算机设备和介质
本申请以2020年10月26日提交的申请号为202011157555.9,名称为“一种展业区域的选址方法、装置、计算机设备和介质”的中国发明专利申请为基础,并要求其优先权。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种展业区域的选址方法、展业区域的选址装置、计算机设备和计算机可读存储介质。
背景技术
区域选择在线下展业过程中非常重要,传统的线下展业(保险、信用卡等)需要选择人流密集潜力大、区域中心明确且半径接近等要求的区域,以便在满足展业活动的人流量需求的同时,尽可能实现尽量好的覆盖效果。
目前展业区域的选择一般都是人工方式,即业务人员自己通过地图软件、点评类软件推荐或者团队根据经验选择交通枢纽、商圈或者商场等地标作为展业区域。但现有的,地图软件或者点评类软件所推荐的交通枢纽、商圈或者商场一般按照行政区域或者商业区域人为规定,存在范围模糊、大小不定和性质单一等缺点,并且热门商圈多为旅游景点或者购物商圈,对于整个城市而言,覆盖范围小、涉及人群少。要选择满足展业需求的区域,就需要业务人员耗费大量时间。
由上可见,发明人意识到如何从地图上便捷确定满足预设的展业需求的展业区域,是一个尚待解决的问题。
发明内容
本申请实施例的目的在于提出一种展业区域的选址方法、展业区域的选址装置、计算机设备和计算机可读存储介质,以解决现有在进行展业区域的选址时,如何从地图上便捷确定满足预设的展业需求的展业区域。
为了解决上述技术问题,本申请实施例采用了如下该的技术方案:
第一方面,本申请实施例提供一种展业区域的选址方法,该选址方法可以包括:
获取第一地标数据,该第一地标数据用于指示目标地理区域内所可以包括的不同地标的地理位置信息;
利用dbscan算法对该第一地标数据进行聚类处理,并在聚类得到的至少一个类别中将满足预设要求的类别所对应的区域确定为展业区域,其中,该预设要求可以包括区域大小要求和/或区域内地标数量要求。
在一些可能的实现方式中,该利用dbscan算法对该第一地标数据进行聚类处理,确定聚类得到的至少一个类别中满足预设要求的类别对应的区域为展业区域,可以包括:
利用dbscan算法对该第一地标数据进行聚类处理,得到至少一个第一类别;
解析每个该第一类别对应区域,确定每个该第一类别对应区域的中心点和最大点距离,该最大点距离为每个该第一类别对应区域中的所有地标与该中心点之间的距离中的最大值;
统计每个该第一类别对应区域的类内地标数,确定该第一类别中满足该预设要求的类别对应的第一区域为展业区域,该预设要求可以包括类内地标数小于等于第一预设值且该最大点距离小于等于第二预设值。
在一些可能的实现方式中,该解析每个该第一类别对应区域,确定每个该第一类别对应区域的中心点和最大点距离之后,该方法还可以包括:
确定该第一类别中不满足该预设要求的类别对应的第二区域,并提取该第二区域内所可以包括的不同地标的地理位置信息,作为第二地标数据;
利用该dbscan算法对该第二地标数据进行聚类处理,得到至少一个第二类别;
解析每个该第二类别对应区域,确定每个该第二类别对应区域的中心点和最大点距离;
统计每个该第二类别对应区域的类内地标数,确定该第二类别中满足该预设要求的类 别对应的第一区域为该展业区域。
在一些可能的实现方式中,在确定该第一类别中满足该预设要求的类别对应的第一区域为展业区域之后,该选址方法还可以包括:
获取该展业区域的目标数据,该目标数据可以包括地标属性数据、客户行为数据、客户基础数据、展业人员数据、展业历史数据和客流数据中的一种或多种;
将该目标数据输入预设的融合模型,输出该展业区域的推荐度评分。
在一些可能的实现方式中,该融合模型可以包括GBDT模型、GRU模型和RF模型,该将该目标数据输入预设的融合模型进行潜力预测,输出该展业区域的推荐度评分,可以包括:
将该目标数据输入该GBDT模型,识别确定该目标数据中的重要性高的特征数据和特征组合;
利用GRU模型对该特征数据和该特征组合进行时间序列上的处理,输出特征集成数据;
对该目标数据按照时间进行整理,得到静态数据和时间序列数据;
将该特征集成数据、该静态数据和该时间序列数据输入该RF模型做潜力预测,输出该展业区域的推荐度评分。
在一些可能的实现方式中,该利用dbscan算法对该第一地标数据进行聚类处理,得到至少一个第一类别,可以包括:
步骤A:设置半径为r,初次区域内最小地标数量为m,每次新增区域内最小地标数量的步长为step_m,在第一次执行算法之前,令新增区域内最小地标数量为初次区域内最小地标数量;
步骤B:利用该dbscan算法对该第一地标数据进行聚类操作,得到-1至n共n+2个聚类类别;
步骤C:确定聚类类别中地标数量小于m+N×step_m的类别为第一类别,修改新增区域内最小地标数量为m+N×step_m,其中,N为聚类操作的执行次数;
步骤D:对该目标地理区域中除该第一类别对应区域外的其他区域执行聚类操作;
步骤E:重复执行步骤C和步骤D,直至确定该目标地理区域所可以包括的所有该第一类别。
在一些可能的实现方式中,该确定每个该第一类别对应区域的中心点和最大点距离,可以包括:
将每个该第一类别对应区域内可以包括的不同地标的地理位置信息输入以下公式计算求得中心点坐标:
Figure PCTCN2020135617-appb-000001
其中,上式中,μ分别指该第一类别对应区域的中心点,Q为类内地标数,x i为各个该第一类别对应区域内可以包括的地标的坐标值;
计算该中心点与该第一类别对应区域内的每个地标之间的点距离,确定该点距离中最大的距离为该最大点距离。
第二方面,本申请实施例提供一种展业区域的选址装置,该选址装置可以包括:
第一获取模块,用于获取第一地标数据,该第一地标数据用于指示目标地理区域内所可以包括的不同地标的地理位置信息;
第一聚类处理模块,用于利用dbscan算法对该第一地标数据进行聚类处理,确定聚类得到的至少一个类别中满足预设要求的类别对应的区域为展业区域,其中,该预设要求可以包括区域大小要求和/或区域内地标数量要求。
第三方面,本申请实施例还提供了一种计算机设备,该计算机设备可以包括存储器和处理器,该存储器中存储有计算机可读指令,该处理器执行该计算机可读指令时实现如下所述展业区域的选址方法的步骤:
获取第一地标数据,所述第一地标数据用于指示目标地理区域内所包括的不同地标的地理位置信息;
利用dbscan算法对所述第一地标数据进行聚类处理,并在聚类得到的至少一个类别中将满足预设要求的类别所对应的区域确定为展业区域,其中,所述预设要求包括区域大小要求和/或区域内地标数量要求。
第四方面,本申请实施例还提供了一种计算机可读存储介质。其中,该计算机可读存储介质可以是非易失性,也可以是易失性,该计算机可读存储介质上存储有计算机可读指令,该计算机可读指令被处理器执行时实现如下所述展业区域的选址方法的步骤:
获取第一地标数据,所述第一地标数据用于指示目标地理区域内所包括的不同地标的地理位置信息;
利用dbscan算法对所述第一地标数据进行聚类处理,并在聚类得到的至少一个类别中将满足预设要求的类别所对应的区域确定为展业区域,其中,所述预设要求包括区域大小要求和/或区域内地标数量要求。
与现有技术相比,本申请实施例主要有以下有益效果:
本申请的实施例中,获取到目标地理区域内的第一地标数据后,通过采用dbscan算法对目标地理区域的第一地标数据进行聚类操作,进而从聚类得到的多个类别对应的区域中选择满足预设要求的区域作为展业区域。由于dbscan算法是一种基于密度的聚类算法,采用dbscan算法对第一地标数据处理后,可以确定目标地理区域内地标密度足够高的一些区域,而区域内的地标密度在一定程度上可以反映出该区域内的人流密集潜力。因此本申请的方案中,从聚类得到的多个类别对应的区域中选择满足预设要求的区域作为展业区域,可以便捷得实现对人流密集潜力具有要求的展业区域的选择。
附图说明
为了更清楚地说明本申请中的方案,下面将对本申请实施例描述中所需要使用的附图作一个简单介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例中一种展业区域的选址方法的一种实施例示意图;
图2是图1中步骤S120之后的一种实施例示意图;
图3是图2中步骤S140的一种实施例示意图;
图4是图1中步骤S120的一种实施例示意图;
图5是图4中步骤S122之后的一种实施例示意图;
图6是本申请实施例中一种展业区域的选址装置的一种实施例示意图;
图7是图6所示实施例中第一聚类处理模块602的一个实施例示意图;
图8为本申请提供的一种展业区域的选址装置的又一个实施例示意图;
图9为本申请提供的一种展业区域的选址装置的又一个实施例示意图;
图10为本申请提供的一种计算机设备的一个实施例示意图。
具体实施方式
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同;本文中在申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请;本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。本申请的说明书和权利要求书或上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
为了使本技术领域的人员更好地理解本申请方案,下面将结合附图,对本申请实施例中的技术方案进行清楚、完整地描述。
参考图1,图1为本申请实施例中一种展业区域的选址方法的一个实施例示意图,展业区域的选址方法可以包括:
步骤S110,获取第一地标数据,第一地标数据用于指示目标地理区域内所包括的不同地标的地理位置信息。
本实施例及后续实施例中,展业区域的选址方法均可以运行于其上的电子设备上,该电子设备可以是服务器或终端设备,其可以对用户操作输入或外部设备输入的指令进行响应,并执行对应操作。
其中,目标地理区域可以是响应用户操作选取的一个地图上的一片区域,范围可以大也可以小,大范围可以是一个城市、城区或街道,小范围可以是一个商业区域、住宅区等。第一地标数据与目标地理区域相对应,可以包括目标地理区域内不同地标的地理位置信息,该信息可以从各类地图数据库中获取。其中,第一地标数据所包括的地标的类型按照大类可以分为:购物服务类(包含小类:商场,便民商店、家电电子卖场、超级市场、家具市场、花鸟市场等),商务住宅类(包含小类:写字楼,住宅区、产业园区),公司企业(包含小类:公司、工厂等),医疗保健服务(包含小类:医院、医药保健销售店等),交通设施服务(包含小类:机场),道路附属设施(包含小类:加油站等),高等院校(包含小类:大学城等),旅游景区(包含小类:风景名胜等)等,不同地标的地理位置信息则可以是各个地标的经纬度信息,用户可以预先设定提取的地标的层级和类型。
步骤S120,利用dbscan算法对所述第一地标数据进行聚类处理,并在聚类得到的至少一个类别中将满足预设要求的类别所对应的区域确定为展业区域,其中,所述预设要求包括区域大小要求和/或区域内地标数量要求。
在本实施例中,具有噪声的基于密度的聚类方法(density-based spatial clustering of applications with noise,DBSCAN)算法是一种基于密度的聚类算法。该算法将具有足够密度的区域划分为簇,也即类别,并在具有噪声的空间数据库中发现任意形状的簇,dbscan算法中将簇定义为密度相连的点的最大集合。该算法利用基于密度的聚类的概念,即要求聚类空间中的一定区域内所包含对象(点或其他空间对象)的数目不小于某一给定阈值。dbscan算法的显著优点是聚类速度快且能够有效处理噪声点和发现任意形状的空间聚类。
在聚类处理前,需要预先设置dbscan算法的各类参数,包括半径r和表示某样本点在r邻域范围内所包括样本点的数量minPts。其中,本申请中样本点指地标点。
之后,采用dbscan算法对第一地标数据进行聚类处理,聚类得到多个类别或簇,每个类别或簇分别对应目标地理区域中的一块区域。其中,本实施例中利用dbscan算法对第一坐标数据进行聚类处理得到多个类别,与现有技术中利用dbscan算法对其他数据进行聚类操作的方法类似,此处不做过多赘述。
最后,对上述进行聚类操作后得到的各个类别对应的区域分别进行评估,判断其区域大小及区域内地标数量是否满足预设要求,若满足,则确定其为展业区域。其中,预设条件可以包括区域大小要求和/或区域内地标数量要求,例如区域半径或者区域面积在一定范围内,区域内地标数量在一定范围内。
在一些可能的实现场景中,本申请实施例提供的展业区域的选址方法也可以用于任一侧重地标或者人流量需求的场景下地理位置的选址,例如还可以用于实体店铺的选址,通过设置不同的预设要求,从而实现实体店铺的选址需求。
与现有技术相比,本申请实施例主要有以下有益效果:
本申请的实施例中,获取到目标地理区域内的第一地标数据后,通过采用dbscan算法对获取到的目标地理区域的第一地标数据进行聚类操作,进而从聚类得到的多个类别对应的区域中选择满足预设要求的区域作为展业区域。由于dbscan算法是一种基于密度的 聚类算法,采用dbscan算法对第一地标数据处理后,可以确定目标地理区域内地标密度足够高的一些区域,而区域内的地标密度在一定程度上可以反映出该区域内的人流密集潜力,满足展业区域对人流量的需求。因此本申请的方案中,从聚类得到的多个类别对应的区域中选择满足预设要求的区域作为展业区域,可以便捷得实现对人流密集潜力具有要求的展业区域的选择。
在一些可能的实现方式中,具体参照图2,图2为步骤S120之后的一个实施例示意图,可以包括:
步骤S130,获取展业区域的目标数据,目标数据包括地标属性数据、客户行为数据、客户基础数据、展业人员数据、展业历史数据和客流数据中的一种或多种。
本实施例中,在确定展业区域后,可以进一步根据各类目标数据对应的接口从服务器或者网络中获取展业区域对应的目标数据,该目标数据可以包括地标属性数据、客户行为数据、客户基础数据、展业人员数据、展业历史数据和客流数据中的一种或多种。具体目标数据的形式和实际包含的内容举例可以参照如下表1所示。
Figure PCTCN2020135617-appb-000002
表1
上述的历史展业数据中,进件包括申请注册时上传的一些资料数据,例如储蓄卡申请时上传的数据,过件包括注册或办理成功时采集的数据。
在一些可能的实现方式中,为进一步保证上述目标数据的私密和安全性,在获取到目标数据后,上述的目标数据还可以存储于一区块链的节点中。
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台 产品服务层以及应用服务层等。
步骤S140,将目标数据输入预设的融合模型,输出展业区域的推荐度评分。
本实施例中,采用多个模型组合的融合模型对目标数据进行处理,从而计算输出展业区域的推荐度评分。其中,该推荐度评分用于指示展业区域被预测的在未来一定时间内人流密集的潜力。采用融合模型的原因在于,单个模型预测能力不高且各个模型单独采用都具有一定的缺陷,因此可以采用多个模型组合得到的融合模型进行处理,提高预测能力。
与现有技术相比,本申请实施例主要有以下有益效果:
本申请的实施例中,通过获取展业区域内的目标数据,进而采用预设的融合模型对目标数据进行处理,计算得到每个展业区域对应的推荐度评分,从而更有助于用户选择合适的展业区域。
在一些可能的实现方式中,步骤S140中,预设的融合模型可以包括梯度提升决策树(gradient boosting decision tree,GBDT)模型、GRU(Gated Recurrent Unit)模型和随机森林(random forest,RF)模型。如图3所示,图3为步骤S140的一个实施例示意图,可以包括:
S141,将目标数据进输入GBDT模型,识别确定目标数据中的重要性高的特征数据和特征组合。
其中,GBDT模型是一种迭代的决策树算法,由多棵决策树组成,用来自动化处理海量稀疏特征,识别到重要性高的特征,并能得到新的特征组合。本实施例中,利用GBDT模型可以识别确定目标数据中的重要性较高的数据以并进行数据组合,生成对应的数据特征。
S142,利用GRU模型对特征数据和特征组合进行时间序列上的处理,输出特征集成数据。
其中,GRU模型是长短期记忆神经网络(long short-term memory,LSTM)的一种变体,主要擅长处理时序化数据,做时间序列上的预测。本实施例中,可以应用GRU模型对GBDT模型输出的重要性高的数据以及组合的数据进行时间序列上的处理,例如可以捕捉时间序列上展业区域客流变化、进件变化等,从而得到特征集成数据。
S143,对目标数据按照时间进行整理,得到静态数据和时间序列数据。
本实施例中,在得到目标数据后,可以按照时间将目标数据整理为静态数据和时间序列数据。其中,静态数据又称截面数据,是指在相同或相似的时间点收集到的数据。时间序列数据,又称动态数据,是指按照时间顺序收集到的、用于描述现象随时间变化情况的数据。
具体的,静态数据可以参照下表2所示,时间序列数据可以参照下表3所示。
Figure PCTCN2020135617-appb-000003
表2
表3:每个客户一段时间内时间序列上的开卡消费数据
Figure PCTCN2020135617-appb-000004
Figure PCTCN2020135617-appb-000005
表3
需要说明的是,步骤S143与步骤S141没有必然的执行顺序要求,可以先执行S141,也可以先执行S143,具体执行次序此处不做限制。
S144,将特征集成数据、静态数据和时间序列数据输入所述RF模型做潜力预测,输出目标地理区域内所有展业区域的推荐度评分。
其中,RF模型采用有放回均匀抽样的方法(Bagging)进行采样训练,对异常值不敏感,并且由于树与树之间是相互独立的,可以并行,不容易过拟合,该模型在预测上可以有较高的精准度与训练速度。RF模型由多棵决策树组成,每棵树预测出的概率取平均值,即为整个随机森林模型的概率值。
本实施例中,将特征集成数据、静态数据和时间序列数据作为输入,利用预先训练得到的RF模型进行潜力预测,并生成对应的评分,该评分即为展业区域的推荐度评分。
与现有技术相比,本申请实施例主要有以下有益效果:
本申请的实施例中,通过采用GBDT、GRU和RF组合的融合模型对目标数据进行处理,从而可以得到准确性较高的展业区域的推荐度评分。
在一些可能的实现方式中,具体参见图4,图4为步骤S120的一个实施例示意图,可以包括:
S121,利用dbscan算法对所述第一地标数据进行聚类处理,得到至少一个第一类别。
在一个具体的实施例中,聚类处理的技术手段可以与前述步骤S120中类似,此处不再赘述。
在另一个具体的实施例中,利用dbscan算法对所述第一地标数据进行聚类处理可以是多次循环聚类,从而得到至少一个第一类别,其多次循环聚类可以包括:
步骤A:设置半径eps为r,初次区域内最小地标数量(first_min_sample)为m,每次新增区域内最小地标数量(min_sample)数量的步长为step_m,在第一次执行算法之前,令min_sample=first_min_sample。其中,min_sample即为dbscan算法的参数minPts。
步骤B:利用所述dbscan算法对所述第一地标数据进行聚类操作,得到-1至n共n+2个聚类类别。
步骤C:确定所述聚类类别中地标数量小于m+N×step_m的类别为第一类别,修改min_sample为m+N×step_m,其中,N为聚类操作的执行次数。例如,若m=5,step_m=2,则第一次执行聚类操作时,m+N×step_m=5+1*2=7。需要说明的是,N的初始值为0,每执行一次聚类操作时,则对N的值加1。
步骤D:对所述目标地理区域中除所述第一类别对应区域外的其他区域执行聚类操作;
步骤E:重复执行步骤C和步骤D,直至确定所述目标地理区域所包括的所有所述第 一类别。其中,重复执行步骤C和步骤D,直至保留目标地理区域内,最后一个地标数量小于m+N×step_m的类别对应的区域,此时,也即确定了目标地理区域所包括的所有第一类别。
S122,解析每个第一类别对应区域,确定每个第一类别对应区域的中心点和最大点距离,该最大点距离为每个第一类别对应区域中的所有地标与中心点之间的距离中的最大值。
本实施例中,确定至少一个第一类别后,可以解析确定每个第一类别对应的区域。具体解析操作可以包括:确定每个第一类别中的所有地标点;确定所有地标点中每个地标点的覆盖区域,该覆盖区域为以地标点作为圆心,半径为r的圆形区域;取第一类别中所有地标点的覆盖区域的并集,作为第一类别对应的区域。
每个第一类别分别对应一个区域,可以计算每个第一类别对应区域的中心点,进而根据中心点确定最大点距离。
具体的,确定中心点和最大点距离可以包括:
将每个第一类别对应区域内包括的不同地标的地理位置信息输入以下公式:
Figure PCTCN2020135617-appb-000006
其中,μ 12,…μ k分别指K个所述第一类别对应区域的中心点,x i为各个所述第一类别对应区域内包括的地标的坐标值,Q j为第j个第一类别对应区域内所包括的地标数。
对上述所述公式进行求导:
Figure PCTCN2020135617-appb-000007
并令
Figure PCTCN2020135617-appb-000008
从而求得第j个第一类别对应区域的中心点的坐标计算公式为:
Figure PCTCN2020135617-appb-000009
计算得到中心点坐标后,可以计算中心点与第一类别对应区域内的每个地标之间的点距离,确定这些点距离中最大的距离为所述最大点距离。
S123:统计每个第一类别对应区域的类内地标数,确定第一类别中满足预设要求的类别对应的第一区域为展业区域,该预设要求包括类内地标数小于等于第一预设值且最大点距离小于等于第二预设值。
本实施例中,预设要求可以包括类内地标数量小于等于第一预设值和/或所述最大点距离小于等于第二预设值,确定各个第一类别对应的最大点距离后,可以判断各个第一类别对应的区域是否满足该预设要求,若满足,则确定其为展业区域。其中,第一预设值可以设置为dbscan算法的初始参数半径r,以提高收敛效果。
与现有技术相比,本申请实施例主要有以下有益效果:
本申请的实施例中,提供了一种预设要求的设置以及展业区域选择的具体方法。
在一些可能的实现方式中,可以参照图5,图5为步骤S122之后的一种实施例示意图,可以包括:
S124,确定第一类别中不满足预设要求的类别对应的第二区域,并提取第二区域内所包括的不同地标的地理位置信息,作为第二地标数据。
本实施例中,在经过初次聚类处理后,可以确定第一类别中不满足预设要求的类别对应的第二区域,并提取该第二区域内包括的不同地标的地理位置信息,作为第二地标数据。
S125,利用dbscan算法对第二地标数据进行聚类处理,得到至少一个第二类别。
本实施例中,确定不满足预设要求的类别后,可以对每个不满足预设要求的类别对应的第二区域进行类别内聚类处理,从而得到至少一个第二类别,每个第二类别分别对应第一类别的区域中的一块子区域。需要说明的是,步骤S124与步骤S120中的聚类处理中类似,此处不再赘述。
S126,解析每个第二类别对应区域,确定每个第二类别对应区域的中心点和最大点距离。
S127,统计每个第二类别对应区域的类内地标数,确定第二类别中满足预设要求的类 别对应的第一区域为展业区域。
本实施例中,步骤S126-步骤S127所采用的技术手段与步骤S122-S123类似,此处不再赘述。
需要说明的是,现有一般dbscan算法中,是设置固定的半径eps和范围内最小点数量minPts,并对该两个参数进行调整,形成最好的密度可达聚类。然而该算法缺点明显,最终的聚类大小一般相差较大且形状各异。本实施例中,步骤S124-步骤S127可以多次循环聚类,即每次聚类处理之后,将满足预设要求要求的类别的区域确定为展业区域,对不满足预设要求的类别对应的区域,再次进行类别区域内的聚类操作,直至确定目标地理区域内所包括的所有满足预设要求的展业区域。由于多次聚类和利用预设要求筛选,从而使得最终得到的展业区域之间在大小、形状方面比较趋同,便于实际开展布展活动。
与现有技术相比,本申请实施例主要有以下有益效果:
本实施例中,初次聚类处理后,可以对不满足预设要求的类别对应的第二区域,进行类内聚类处理,从而实现目标地理区域内展业区域进一步的识别和确定。
本申请可用于众多通用或专用的计算机系统环境或配置中。例如:个人计算机、服务器计算机、手持设备或便携式设备、平板型设备、多处理器系统、基于微处理器的系统、置顶盒、可编程的消费电子设备、网络PC、小型计算机、大型计算机、包括以上任何系统或设备的分布式计算环境等等。本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,该计算机可读指令可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random Access Memory,RAM)等。
应该理解的是,虽然附图的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
下面结合图6,图6为本申请提供的一种展业区域的选址装置的一个实施例示意图,可以包括:
第一获取模块601,用于获取第一地标数据,所述第一地标数据用于指示目标地理区域内所包括的不同地标的地理位置信息;
第一聚类处理模块602,用于利用dbscan算法对所述第一地标数据进行聚类处理,并在聚类得到的至少一个类别中将满足预设要求的类别所对应的区域确定为展业区域,其中,所述预设要求包括区域大小要求和/或区域内地标数量要求。
在一些可能的实现方式中,结合图7,图7为图6所示实施例中第一聚类处理模块602的一个实施例示意图,第一聚类处理模块602包括聚类处理子模块6021,计算子模块6022和判断子模块6023,其中:
聚类处理子模块6021,用于利用dbscan算法对所述第一地标数据进行聚类处理,得到至少一个第一类别;
计算子模块6022,用于解析每个所述第一类别对应区域,确定每个所述第一类别对应区域的中心点和最大点距离,所述最大点距离为每个所述第一类别对应区域中的所有地标 与所述中心点之间的距离中的最大值;
判断子模块6023,用于统计每个所述第一类别对应区域的类内地标数,确定所述第一类别中满足所述预设要求的类别对应的第一区域为展业区域,所述预设要求包括类内地标数小于等于第一预设值和/或所述最大点距离小于等于第二预设值。
在一些可能的实现方式中,结合图8,图8为本申请提供的一种展业区域的选址装置的又一个实施例示意图,所述选址装置还包括第二聚类处理模块603,其中:
第二聚类处理模块603,具体用于确定所述第一类别中不满足所述预设要求的类别对应的第二区域,并提取所述第二区域内所包括的不同地标的地理位置信息,作为第二地标数据;利用所述dbscan算法对所述第二地标数据进行聚类处理,得到至少一个第二类别;解析每个所述第二类别对应区域,确定每个所述第二类别对应区域的中心点和最大点距离;统计每个所述第二类别对应区域的类内地标数,确定所述第二类别中满足所述预设要求的类别对应的第一区域为所述展业区域。
在一些可能的实现方式中,结合图9,图9为本申请提供的一种展业区域的选址装置的又一个实施例示意图,所述选址装置还包括第二获取模块604和推荐度评价模块605,其中:
第二获取模块604,用于获取所述展业区域的目标数据,所述目标数据包括地标属性数据、客户行为数据、客户基础数据、展业人员数据、展业历史数据和客流数据中的一种或多种;
推荐度评价模块605,用于将所述目标数据输入预设的融合模型,输出所述展业区域的推荐度评分。
在一些可能的实现方式中,所述融合模型包括GBDT模型、GRU模型和RF模型,
推荐度评价模块605,具体用于将所述目标数据输入所述GBDT模型,识别确定所述目标数据中的重要性高的特征数据和特征组合;利用GRU模型对所述特征数据和所述特征组合进行时间序列上的处理,输出特征集成数据;对所述目标数据按照时间进行整理,得到静态数据和时间序列数据;将所述特征集成数据、所述静态数据和所述时间序列数据输入所述RF模型做潜力预测,输出所述展业区域的推荐度评分。
在一些可能的实现方式中,聚类处理子模块6021,具体用于执行以下步骤:
步骤A:设置半径为r,初次区域内最小地标数量为m,每次新增区域内最小地标数量的步长为step_m,在第一次执行算法之前,令新增区域内最小地标数量为初次区域内最小地标数量;
步骤B:利用所述dbscan算法对所述第一地标数据进行聚类操作,得到-1至n共n+2个聚类类别;
步骤C:确定聚类类别中地标数量小于m+N×step_m的类别为第一类别,修改新增区域内最小地标数量为m+N×step_m,其中,N为聚类操作的执行次数;
步骤D:对所述目标地理区域中除所述第一类别对应区域外的其他区域执行聚类操作;
步骤E:重复执行步骤C和步骤D,直至确定所述目标地理区域所包括的所有所述第一类别。
在一些可能的实现方式中,计算子模块6022,具体用于执行以下步骤:
解析每个所述第一类别对应区域,将每个所述第一类别对应区域内包括的不同地标的地理位置信息输入以下公式计算求得中心点坐标:
Figure PCTCN2020135617-appb-000010
其中,μ分别指所述第一类别对应区域的中心点,Q为类内地标数,x i为各个所述第一类别对应区域内包括的地标的坐标值;
计算所述中心点与所述第一类别对应区域内的每个地标之间的点距离,确定所述点距离中最大的距离为所述最大点距离。
与现有技术相比,本申请实施例主要有以下有益效果:
本申请的实施例中,展业区域的选址装置在获取到目标地理区域内的第一地标数据后,通过采用dbscan算法对目标地理区域的第一地标数据进行聚类操作,进而从聚类得到的多个类别对应的区域中选择满足预设要求的区域作为展业区域。由于dbscan算法是一种基于密度的聚类算法,采用dbscan算法对第一地标数据处理后,可以确定目标地理区域内地标密度足够高的一些区域,而区域内的地标密度在一定程度上可以反映出该区域内的人流密集潜力。因此本申请的方案中,该展业区域的选址装置从聚类得到的多个类别对应的区域中选择满足预设要求的区域作为展业区域,可以便捷得实现对人流密集潜力具有要求的展业区域的选择。
本申请实施例还提供一种计算机设备。具体请参阅图10,图10为本申请实施例中一种计算机设备的基本结构框图。
所述计算机设备包括通过系统总线相互通信连接存储器1001、处理器1002、网络接口1003。需要指出的是,图中仅示出了具有组件1001-1003的计算机设备,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。其中,本技术领域技术人员可以理解,这里的计算机设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。
所述计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。
所述存储器1001至少包括一种类型的可读存储介质,所述可读存储介质可以是非易失性,也可以是易失性,,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器1001可以是所述计算机设备的内部存储单元,例如该计算机设备的硬盘或内存。在另一些实施例中,所述存储器1001也可以是所述计算机设备的外部存储设备,例如该计算机设备上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器1001还可以既包括所述计算机设备的内部存储单元也包括其外部存储设备。本实施例中,所述存储器1001通常用于存储安装于所述计算机设备的操作系统和各类应用软件,例如图1所示的展业区域的选址方法的计算机可读指令等。此外,所述存储器1001还可以用于暂时地存储已经输出或者将要输出的各类数据。
所述处理器1002在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器1002通常用于控制所述计算机设备的总体操作。本实施例中,所述处理器1002用于运行所述存储器1001中存储的计算机可读指令或者处理数据,例如运行图1所示的展业区域的选址方法的计算机可读指令。
所述网络接口1003可包括无线网络接口或有线网络接口,该网络接口1003通常用于在所述计算机设备与其他电子设备之间建立通信连接。
与现有技术相比,本申请实施例主要有以下有益效果:
获取到目标地理区域内的第一地标数据后,通过采用dbscan算法对目标地理区域的第一地标数据进行聚类操作,进而从聚类得到的多个类别对应的区域中选择满足预设要求的区域作为展业区域。由于dbscan算法是一种基于密度的聚类算法,采用dbscan算法对第一地标数据处理后,可以确定目标地理区域内地标密度足够高的一些区域,而区域内的 地标密度在一定程度上可以反映出该区域内的人流密集潜力。因此本申请的方案中,从聚类得到的多个类别对应的区域中选择满足预设要求的区域作为展业区域,可以便捷得实现对人流密集潜力具有要求的展业区域的选择。
本申请还提供了另一种实施方式,即提供一种计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令可被至少一个处理器执行,以使所述至少一个处理器执行如上述的图1所示展业区域的选址方法及任一可选实施方式中的步骤。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
显然,以上所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例,附图中给出了本申请的较佳实施例,但并不限制本申请的专利范围。本申请可以以许多不同的形式来实现,相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。尽管参照前述实施例对本申请进行了详细的说明,对于本领域的技术人员来而言,其依然可以对前述各具体实施方式所记载的技术方案进行修改,或者对其中部分技术特征进行等效替换。凡是利用本申请说明书及附图内容所做的等效结构,直接或间接运用在其他相关的技术领域,均同理在本申请专利保护范围之内。

Claims (20)

  1. 一种展业区域的选址方法,其中,所述选址方法包括:
    获取第一地标数据,所述第一地标数据用于指示目标地理区域内所包括的不同地标的地理位置信息;
    利用dbscan算法对所述第一地标数据进行聚类处理,并在聚类得到的至少一个类别中将满足预设要求的类别所对应的区域确定为展业区域,其中,所述预设要求包括区域大小要求和/或区域内地标数量要求。
  2. 根据权利要求1所述的选址方法,其中,所述利用dbscan算法对所述第一地标数据进行聚类处理,确定聚类得到的至少一个类别中满足预设要求的类别对应的区域为展业区域,包括:
    利用dbscan算法对所述第一地标数据进行聚类处理,得到至少一个第一类别;
    解析每个所述第一类别对应区域,确定每个所述第一类别对应区域的中心点和最大点距离,所述最大点距离为每个所述第一类别对应区域中的所有地标与所述中心点之间的距离中的最大值;
    统计每个所述第一类别对应区域的类内地标数,确定所述第一类别中满足所述预设要求的类别对应的第一区域为展业区域,所述预设要求包括类内地标数小于等于第一预设值和/或所述最大点距离小于等于第二预设值。
  3. 根据权利要求2所述的选址方法,其中,所述解析每个所述第一类别对应区域,确定每个所述第一类别对应区域的中心点和最大点距离之后,所述方法还包括:
    确定所述第一类别中不满足所述预设要求的类别对应的第二区域,并提取所述第二区域内所包括的不同地标的地理位置信息,作为第二地标数据;
    利用所述dbscan算法对所述第二地标数据进行聚类处理,得到至少一个第二类别;
    解析每个所述第二类别对应区域,确定每个所述第二类别对应区域的中心点和最大点距离;
    统计每个所述第二类别对应区域的类内地标数,确定所述第二类别中满足所述预设要求的类别对应的第一区域为所述展业区域。
  4. 根据权利要求3所述的选址方法,其中,在确定所述第一类别中满足所述预设要求的类别对应的第一区域为展业区域之后,所述选址方法还包括:
    获取所述展业区域的目标数据,所述目标数据包括地标属性数据、客户行为数据、客户基础数据、展业人员数据、展业历史数据和客流数据中的一种或多种;
    将所述目标数据输入预设的融合模型,输出所述展业区域的推荐度评分。
  5. 根据权利要求4所述的选址方法,其中,所述融合模型包括GBDT模型、GRU模型和RF模型,所述将所述目标数据输入预设的融合模型进行潜力预测,输出所述展业区域的推荐度评分,包括:
    将所述目标数据输入所述GBDT模型,识别确定所述目标数据中的重要性高的特征数据和特征组合;
    利用GRU模型对所述特征数据和所述特征组合进行时间序列上的处理,输出特征集成数据;
    对所述目标数据按照时间进行整理,得到静态数据和时间序列数据;
    将所述特征集成数据、所述静态数据和所述时间序列数据输入所述RF模型做潜力预测,输出所述展业区域的推荐度评分。
  6. 根据权利要求2-5中任一项所述的选址方法,其中,所述利用dbscan算法对所述第一地标数据进行聚类处理,得到至少一个第一类别,包括:
    步骤A:设置半径为r,初次区域内最小地标数量为m,每次新增区域内最小地标数量的步长为step_m,在第一次执行算法之前,令新增区域内最小地标数量为初次区域内最小地标数量;
    步骤B:利用所述dbscan算法对所述第一地标数据进行聚类操作,得到-1至n共n+2 个聚类类别;
    步骤C:确定聚类类别中地标数量小于m+N×step_m的类别为第一类别,修改新增区域内最小地标数量为m+N×step_m,其中,N为所述聚类操作的执行次数;
    步骤D:对所述目标地理区域中除所述第一类别对应区域外的其他区域执行所述聚类操作;
    步骤E:重复执行所述步骤C和所述步骤D,直至确定所述目标地理区域所包括的所有所述第一类别。
  7. 根据权利要求2-5中任一项所述的选址方法,其中,所述解析每个所述第一类别对应区域,确定每个所述第一类别对应区域的中心点和最大点距离,包括:
    解析每个所述第一类别对应区域,将每个所述第一类别对应区域内包括的不同地标的地理位置信息输入以下公式计算求得中心点坐标:
    Figure PCTCN2020135617-appb-100001
    其中,μ分别指所述第一类别对应区域的中心点,Q为类内地标数,x i为各个所述第一类别对应区域内包括的地标的坐标值;
    计算所述中心点与所述第一类别对应区域内的每个地标之间的点距离,确定所述点距离中最大的距离为所述最大点距离。
  8. 一种展业区域的选址装置,其中,包括:
    第一获取模块,用于获取第一地标数据,所述第一地标数据用于指示目标地理区域内所包括的不同地标的地理位置信息;
    第一聚类处理模块,用于利用dbscan算法对所述第一地标数据进行聚类处理,并在聚类得到的至少一个类别中将满足预设要求的类别所对应的区域确定为展业区域,其中,所述预设要求包括区域大小要求和/或区域内地标数量要求。
  9. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现如下所述展业区域的选址方法的步骤:
    获取第一地标数据,所述第一地标数据用于指示目标地理区域内所包括的不同地标的地理位置信息;
    利用dbscan算法对所述第一地标数据进行聚类处理,并在聚类得到的至少一个类别中将满足预设要求的类别所对应的区域确定为展业区域,其中,所述预设要求包括区域大小要求和/或区域内地标数量要求。
  10. 根据权利要求9所述的计算机设备,其中,所述利用dbscan算法对所述第一地标数据进行聚类处理,确定聚类得到的至少一个类别中满足预设要求的类别对应的区域为展业区域,包括:
    利用dbscan算法对所述第一地标数据进行聚类处理,得到至少一个第一类别;
    解析每个所述第一类别对应区域,确定每个所述第一类别对应区域的中心点和最大点距离,所述最大点距离为每个所述第一类别对应区域中的所有地标与所述中心点之间的距离中的最大值;
    统计每个所述第一类别对应区域的类内地标数,确定所述第一类别中满足所述预设要求的类别对应的第一区域为展业区域,所述预设要求包括类内地标数小于等于第一预设值和/或所述最大点距离小于等于第二预设值。
  11. 根据权利要求10所述的计算机设备,其中,所述解析每个所述第一类别对应区域,确定每个所述第一类别对应区域的中心点和最大点距离之后,所述处理器执行所述计算机可读指令时还实现如下步骤:
    确定所述第一类别中不满足所述预设要求的类别对应的第二区域,并提取所述第二区域内所包括的不同地标的地理位置信息,作为第二地标数据;
    利用所述dbscan算法对所述第二地标数据进行聚类处理,得到至少一个第二类别;
    解析每个所述第二类别对应区域,确定每个所述第二类别对应区域的中心点和最大点距离;
    统计每个所述第二类别对应区域的类内地标数,确定所述第二类别中满足所述预设要求的类别对应的第一区域为所述展业区域。
  12. 根据权利要求11所述的计算机设备,其中,在确定所述第一类别中满足所述预设要求的类别对应的第一区域为展业区域之后,所述处理器执行所述计算机可读指令时还实现如下步骤:
    获取所述展业区域的目标数据,所述目标数据包括地标属性数据、客户行为数据、客户基础数据、展业人员数据、展业历史数据和客流数据中的一种或多种;
    将所述目标数据输入预设的融合模型,输出所述展业区域的推荐度评分。
  13. 根据权利要求12所述的计算机设备,其中,所述融合模型包括GBDT模型、GRU模型和RF模型,所述将所述目标数据输入预设的融合模型进行潜力预测,输出所述展业区域的推荐度评分,包括:
    将所述目标数据输入所述GBDT模型,识别确定所述目标数据中的重要性高的特征数据和特征组合;
    利用GRU模型对所述特征数据和所述特征组合进行时间序列上的处理,输出特征集成数据;
    对所述目标数据按照时间进行整理,得到静态数据和时间序列数据;
    将所述特征集成数据、所述静态数据和所述时间序列数据输入所述RF模型做潜力预测,输出所述展业区域的推荐度评分。
  14. 根据权利要求10-13中任一项所述的计算机设备,其中,所述利用dbscan算法对所述第一地标数据进行聚类处理,得到至少一个第一类别,包括:
    步骤A:设置半径为r,初次区域内最小地标数量为m,每次新增区域内最小地标数量的步长为step_m,在第一次执行算法之前,令新增区域内最小地标数量为初次区域内最小地标数量;
    步骤B:利用所述dbscan算法对所述第一地标数据进行聚类操作,得到-1至n共n+2个聚类类别;
    步骤C:确定聚类类别中地标数量小于m+N×step_m的类别为第一类别,修改新增区域内最小地标数量为m+N×step_m,其中,N为所述聚类操作的执行次数;
    步骤D:对所述目标地理区域中除所述第一类别对应区域外的其他区域执行所述聚类操作;
    步骤E:重复执行所述步骤C和所述步骤D,直至确定所述目标地理区域所包括的所有所述第一类别。
  15. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如下所述展业区域的选址方法的步骤:
    获取第一地标数据,所述第一地标数据用于指示目标地理区域内所包括的不同地标的地理位置信息;
    利用dbscan算法对所述第一地标数据进行聚类处理,并在聚类得到的至少一个类别中将满足预设要求的类别所对应的区域确定为展业区域,其中,所述预设要求包括区域大小要求和/或区域内地标数量要求。
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述利用dbscan算法对所述第一地标数据进行聚类处理,确定聚类得到的至少一个类别中满足预设要求的类别对应的区域为展业区域,包括:
    利用dbscan算法对所述第一地标数据进行聚类处理,得到至少一个第一类别;
    解析每个所述第一类别对应区域,确定每个所述第一类别对应区域的中心点和最大点距离,所述最大点距离为每个所述第一类别对应区域中的所有地标与所述中心点之间的距 离中的最大值;
    统计每个所述第一类别对应区域的类内地标数,确定所述第一类别中满足所述预设要求的类别对应的第一区域为展业区域,所述预设要求包括类内地标数小于等于第一预设值和/或所述最大点距离小于等于第二预设值。
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述解析每个所述第一类别对应区域,确定每个所述第一类别对应区域的中心点和最大点距离之后,所述计算机可读指令被所述处理器执行时,使得所述处理器还执行如下步骤:
    确定所述第一类别中不满足所述预设要求的类别对应的第二区域,并提取所述第二区域内所包括的不同地标的地理位置信息,作为第二地标数据;
    利用所述dbscan算法对所述第二地标数据进行聚类处理,得到至少一个第二类别;
    解析每个所述第二类别对应区域,确定每个所述第二类别对应区域的中心点和最大点距离;
    统计每个所述第二类别对应区域的类内地标数,确定所述第二类别中满足所述预设要求的类别对应的第一区域为所述展业区域。
  18. 根据权利要求17所述的计算机可读存储介质,其中,在确定所述第一类别中满足所述预设要求的类别对应的第一区域为展业区域之后,所述计算机可读指令被所述处理器执行时,使得所述处理器还执行如下步骤:
    获取所述展业区域的目标数据,所述目标数据包括地标属性数据、客户行为数据、客户基础数据、展业人员数据、展业历史数据和客流数据中的一种或多种;
    将所述目标数据输入预设的融合模型,输出所述展业区域的推荐度评分。
  19. 根据权利要求18所述的计算机可读存储介质,其中,所述融合模型包括GBDT模型、GRU模型和RF模型,所述将所述目标数据输入预设的融合模型进行潜力预测,输出所述展业区域的推荐度评分,包括:
    将所述目标数据输入所述GBDT模型,识别确定所述目标数据中的重要性高的特征数据和特征组合;
    利用GRU模型对所述特征数据和所述特征组合进行时间序列上的处理,输出特征集成数据;
    对所述目标数据按照时间进行整理,得到静态数据和时间序列数据;
    将所述特征集成数据、所述静态数据和所述时间序列数据输入所述RF模型做潜力预测,输出所述展业区域的推荐度评分。
  20. 根据权利要求16-19中任一项所述的计算机可读存储介质,其中,所述利用dbscan算法对所述第一地标数据进行聚类处理,得到至少一个第一类别,包括:
    步骤A:设置半径为r,初次区域内最小地标数量为m,每次新增区域内最小地标数量的步长为step_m,在第一次执行算法之前,令新增区域内最小地标数量为初次区域内最小地标数量;
    步骤B:利用所述dbscan算法对所述第一地标数据进行聚类操作,得到-1至n共n+2个聚类类别;
    步骤C:确定聚类类别中地标数量小于m+N×step_m的类别为第一类别,修改新增区域内最小地标数量为m+N×step_m,其中,N为所述聚类操作的执行次数;
    步骤D:对所述目标地理区域中除所述第一类别对应区域外的其他区域执行所述聚类操作;
    步骤E:重复执行所述步骤C和所述步骤D,直至确定所述目标地理区域所包括的所有所述第一类别。
PCT/CN2020/135617 2020-10-26 2020-12-11 一种展业区域的选址方法、装置、计算机设备和介质 WO2021203728A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011157555.9 2020-10-26
CN202011157555 2020-10-26

Publications (1)

Publication Number Publication Date
WO2021203728A1 true WO2021203728A1 (zh) 2021-10-14

Family

ID=75989499

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/135617 WO2021203728A1 (zh) 2020-10-26 2020-12-11 一种展业区域的选址方法、装置、计算机设备和介质

Country Status (2)

Country Link
CN (1) CN112861972B (zh)
WO (1) WO2021203728A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114611624A (zh) * 2022-03-22 2022-06-10 广东贤能数字科技有限公司 基于人工智能的商铺或营业厅业务活跃度评价系统及方法
CN116151839A (zh) * 2023-04-18 2023-05-23 中汽传媒(天津)有限公司 一种汽车售后点动态规划方法及系统
CN116756439A (zh) * 2023-08-16 2023-09-15 中移(苏州)软件技术有限公司 选址方法、装置、服务器和计算机可读存储介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113329037B (zh) * 2021-08-02 2021-11-16 平安科技(深圳)有限公司 基于高维模式的异常访问数据预警方法及相关设备
CN116308501B (zh) * 2023-05-24 2023-10-17 北京骑胜科技有限公司 用于管理共享车辆的运营区域的方法、装置、设备和介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109934265A (zh) * 2019-02-15 2019-06-25 同盾控股有限公司 一种常驻地址的确定方法和装置
CN110310153A (zh) * 2019-06-18 2019-10-08 平安普惠企业管理有限公司 一种交易预测方法及装置
US20200058025A1 (en) * 2018-08-15 2020-02-20 Royal Bank Of Canada System, methods, and devices for payment recovery platform
CN111723959A (zh) * 2019-03-19 2020-09-29 腾讯科技(深圳)有限公司 区域的划分方法、装置、存储介质及电子装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008134595A1 (en) * 2007-04-27 2008-11-06 Pelago, Inc. Determining locations of interest based on user visits
CN103185581B (zh) * 2011-12-28 2017-03-08 上海博泰悦臻电子设备制造有限公司 信息提示装置、poi搜索结果的提示方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200058025A1 (en) * 2018-08-15 2020-02-20 Royal Bank Of Canada System, methods, and devices for payment recovery platform
CN109934265A (zh) * 2019-02-15 2019-06-25 同盾控股有限公司 一种常驻地址的确定方法和装置
CN111723959A (zh) * 2019-03-19 2020-09-29 腾讯科技(深圳)有限公司 区域的划分方法、装置、存储介质及电子装置
CN110310153A (zh) * 2019-06-18 2019-10-08 平安普惠企业管理有限公司 一种交易预测方法及装置

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114611624A (zh) * 2022-03-22 2022-06-10 广东贤能数字科技有限公司 基于人工智能的商铺或营业厅业务活跃度评价系统及方法
CN114611624B (zh) * 2022-03-22 2023-04-07 广东贤能数字科技有限公司 基于人工智能的商铺或营业厅业务活跃度评价系统及方法
CN116151839A (zh) * 2023-04-18 2023-05-23 中汽传媒(天津)有限公司 一种汽车售后点动态规划方法及系统
CN116151839B (zh) * 2023-04-18 2023-06-27 中汽传媒(天津)有限公司 一种汽车售后点动态规划方法及系统
CN116756439A (zh) * 2023-08-16 2023-09-15 中移(苏州)软件技术有限公司 选址方法、装置、服务器和计算机可读存储介质
CN116756439B (zh) * 2023-08-16 2024-01-26 中移(苏州)软件技术有限公司 选址方法、装置、服务器和计算机可读存储介质

Also Published As

Publication number Publication date
CN112861972B (zh) 2023-09-26
CN112861972A (zh) 2021-05-28

Similar Documents

Publication Publication Date Title
WO2021203728A1 (zh) 一种展业区域的选址方法、装置、计算机设备和介质
US20190012914A1 (en) Parking Identification and Availability Prediction
JP6569313B2 (ja) 施設特性を更新する方法、施設をプロファイリングする方法、及びコンピュータ・システム
JP2020126670A (ja) 情報処理装置、情報処理方法およびプログラム
WO2018205372A1 (zh) 房产信息处理方法、装置、计算机设备及存储介质
US11176217B2 (en) Taxonomy-based system for discovering and annotating geofences from geo-referenced data
US11966424B2 (en) Method and apparatus for dividing region, storage medium, and electronic device
CN110309433B (zh) 一种数据处理方法、装置及服务器
CN107220308B (zh) Poi的合理性的检测方法及装置、设备及可读介质
CN107430631B (zh) 从位置报告确定语义地点名称
CN110503485B (zh) 地理区域分类方法及装置、电子设备、存储介质
CN111881377A (zh) 位置兴趣点的处理方法及装置
CN107506499A (zh) 兴趣点与建筑物之间建立逻辑关系的方法、装置及服务器
US20230049839A1 (en) Question Answering Method for Query Information, and Related Apparatus
CN111522838A (zh) 地址相似度计算方法及相关装置
Deng et al. Identify urban building functions with multisource data: A case study in Guangzhou, China
CN112052848B (zh) 街区标注中样本数据的获取方法及装置
CN117079148B (zh) 城市功能区的识别方法、装置、设备和介质
CN113379269A (zh) 多因素空间聚类的城市商业功能区划方法、装置及介质
D'Andrea et al. Smart profiling of city areas based on web data
Chen et al. Large-scale urban building function mapping by integrating multi-source web-based geospatial data
CN112235714B (zh) 基于人工智能的poi定位方法、装置、计算机设备及介质
US11138615B1 (en) Location-based place attribute prediction
Kilic et al. Effects of reverse geocoding on OpenStreetMap tag quality assessment
CN106940189B (zh) 导航系统中的经典线路获取方法、装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20930516

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20930516

Country of ref document: EP

Kind code of ref document: A1