CN114510988A

CN114510988A - Method, device, computer equipment and storage medium for building a site location model

Info

Publication number: CN114510988A
Application number: CN202111572516.XA
Authority: CN
Inventors: 刘权盼
Original assignee: China Construction Bank Corp
Current assignee: China Construction Bank Corp
Priority date: 2021-12-21
Filing date: 2021-12-21
Publication date: 2022-05-17

Abstract

The present application relates to a method, device, computer equipment, storage medium and computer program product for constructing a site selection model. The method includes: obtaining original feature data corresponding to site selection from channel multi-source data based on a preset feature index system; obtaining evidence weight and information value corresponding to each feature in the original feature data, based on the evidence weight According to the information value, the smart features corresponding to the site selection model are screened; based on the smart features, the site site selection model is constructed through principal component analysis and entropy value method. The present application conducts feature screening by pre-establishing a feature index system, and then selects smart features corresponding to the site selection model from the screened features, thereby constructing the site site selection model through principal component analysis and entropy method. The solution of the present application analyzes the multi-source data of site location selection through multiple feature screening, thereby constructing a site site selection model, which can effectively realize the comprehensive consideration of the site site location process.

Description

Method, device, computer equipment and storage medium for building a site location model

技术领域technical field

本申请涉及计算机技术领域，特别是涉及一种网点选址模型构建方法、装置、计算机设备、存储介质和计算机程序产品。The present application relates to the field of computer technology, and in particular, to a method, apparatus, computer equipment, storage medium and computer program product for constructing a site selection model.

背景技术Background technique

随着网络技术的发展，出现了大数据技术，大数据，或称巨量资料，指的是所涉及的资料量规模巨大到无法透过目前主流软件工具，在合理时间内达到撷取、管理、处理、并整理成为帮助企业经营决策更积极目的的资讯。With the development of network technology, big data technology has emerged. Big data, or huge amount of data, refers to the amount of data involved that is too large to be captured and managed within a reasonable time through mainstream software tools. , process, and organize information into a more positive purpose to help business decision-making.

企业的网点作为企业向大众提供线下服务的主要渠道，在吸引优质客户、提高企业效益、实现企业发展战略等方面起到重要的作用。同时，由于投入较大、位置固定及相关政策要求，网点资源也是有限的。通过大数据技术来解决网点选址问题也成为了一种选择。As the main channel for enterprises to provide offline services to the public, the network of enterprises plays an important role in attracting high-quality customers, improving enterprise efficiency, and realizing enterprise development strategies. At the same time, due to the large investment, fixed location and relevant policy requirements, the network resources are also limited. It has also become a choice to solve the problem of site location through big data technology.

传统技术中，一般通过定量分析法来进行网点选址的计算，包括摸拟方法和启发式方法，如遗传算法、粒子群算法、蚁群算法等，然而这些方法注重算法的最优解，但考虑不够全面，考虑到的影响因素较少，没法对各类多源数据进行深入分析。In traditional technology, the calculation of site location is generally performed by quantitative analysis methods, including simulation methods and heuristic methods, such as genetic algorithm, particle swarm algorithm, ant colony algorithm, etc. However, these methods focus on the optimal solution of the algorithm, but The consideration is not comprehensive enough, and there are few influencing factors considered, so it is impossible to conduct in-depth analysis of various types of multi-source data.

发明内容SUMMARY OF THE INVENTION

基于此，有必要针对上述技术问题，提供一种能够针对多源数据进行分析，从而实现全面考量的网点选址模型构建方法、装置、计算机设备、计算机可读存储介质和计算机程序产品。Based on this, it is necessary to provide a method, device, computer equipment, computer-readable storage medium and computer program product for constructing a site location model that can analyze multi-source data to achieve comprehensive consideration for the above technical problems.

第一方面，本申请提供了一种网点选址模型构建方法。所述方法包括：In a first aspect, the present application provides a method for constructing a site selection model. The method includes:

基于预设特征指标体系，从渠道多源数据获取网点选址对应的原始特征数据；Based on the preset feature index system, obtain the original feature data corresponding to the site selection from the multi-source data of the channel;

获取所述原始特征数据中各项特征对应的证据权重与信息价值，基于所述证据权重与所述信息价值，筛选网点选址模型对应的聪明特征；Obtain the evidence weight and the information value corresponding to each feature in the original feature data, and screen the smart features corresponding to the site selection model based on the evidence weight and the information value;

基于所述聪明特征，通过主成分分析与熵值法构建网点选址模型。Based on the smart features, a site selection model is constructed by principal component analysis and entropy method.

在其中一个实施例中，所述基于预设特征指标体系，从渠道多源数据获取网点选址对应的原始特征数据包括：In one embodiment, based on the preset feature index system, obtaining the original feature data corresponding to the site selection from the channel multi-source data includes:

获取网点选址对应的渠道多源数据；Obtain the channel multi-source data corresponding to the site selection;

基于预设特征指标体系对所述渠道多源数据进行分类筛选，获取网点选址对应的原始特征数据。Based on the preset feature index system, the multi-source data of the channel is classified and filtered, and the original feature data corresponding to the site selection is obtained.

在其中一个实施例中，所述获取所述原始特征数据中各项特征对应的证据权重与信息价值，基于所述证据权重与所述信息价值，筛选网点选址模型对应的聪明特征包括：In one embodiment, the obtaining of the evidence weight and information value corresponding to each feature in the original feature data, and based on the evidence weight and the information value, screening the smart features corresponding to the site location model includes:

获取所述原始特征数据中各类特征数据对应的空值率；Obtain the null value ratios corresponding to various types of characteristic data in the original characteristic data;

基于所述空值率对所述原始特征数据进行筛选；screening the original feature data based on the null rate;

获取筛选完成的原始特征数据对应的证据权重与信息价值；Obtain the evidence weight and information value corresponding to the original feature data that has been screened;

剔除信息价值低于预设信息价值阈值的原始特征数据，获取第一初始特征；Eliminate the original feature data whose information value is lower than the preset information value threshold, and obtain the first initial feature;

基于所述证据权重识别所述第一初始特征对应的非线性关系，将所述第一初始特征对应的非线性关系为倒U型关系的第一初始特征作为第二初始特征；Identify the nonlinear relationship corresponding to the first initial feature based on the evidence weight, and use the first initial feature of the inverted U-shaped relationship corresponding to the nonlinear relationship corresponding to the first initial feature as the second initial feature;

基于所述第二初始特征，获取网点选址模型对应的聪明特征。Based on the second initial feature, a smart feature corresponding to the site location model is acquired.

在其中一个实施例中，所述基于所述初始特征获取网点选址模型对应的聪明特征之前，还包括：In one embodiment, before acquiring the smart feature corresponding to the site selection model based on the initial feature, the method further includes:

基于全局搜索法对初始随机森林模型进行训练，获取随机森林模型；The initial random forest model is trained based on the global search method to obtain the random forest model;

所述基于所述初始特征获取网点选址模型对应的聪明特征包括：The smart features corresponding to the acquisition of the site location model based on the initial features include:

基于随机森林模型对所述初始特征进行筛选，获取聪明特征。The initial features are screened based on a random forest model to obtain smart features.

在其中一个实施例中，所述基于所述聪明特征，通过主成分分析与熵值法构建网点选址模型包括：In one of the embodiments, based on the smart features, constructing a site selection model by principal component analysis and entropy value method includes:

通过主成分分析对所述聪明特征进行降维处理，获取降维特征；Perform dimensionality reduction processing on the smart features through principal component analysis to obtain dimensionality reduction features;

通过熵值法对所述降维特征进行处理，获取降维特征对应的特征权重；The dimensionality reduction feature is processed by the entropy value method, and the feature weight corresponding to the dimensionality reduction feature is obtained;

基于所述降维特征以及所述降维特征对应的特征权重，构建网点选址模型。Based on the dimension reduction feature and the feature weight corresponding to the dimension reduction feature, a network site selection model is constructed.

在其中一个实施例中，所述预设特征指标体系渠道服务维度指标体系以及区域发展维度指标体系，所述基于所述聪明特征，通过主成分分析与熵值法构建网点选址模型之后，还包括：In one of the embodiments, the preset feature index system, the channel service dimension index system and the regional development dimension index system, based on the smart features, after the network site selection model is constructed by principal component analysis and entropy value method, further include:

获取目标选址区域内各候选地点对应的网点选址数据；Obtain the site selection data corresponding to each candidate site in the target site selection area;

将所述网点选址数据输入所述网点选址模型，获取所述各候选地点对应的渠道服务能力评分以及区域发展潜力评分；Input the site location data into the site location model, and obtain the channel service capability score and the regional development potential score corresponding to each candidate location;

基于所述渠道服务能力评分以及所述区域发展潜力评分，获取目标选址区域对应的选址地点。Based on the channel service capability score and the regional development potential score, a site selection location corresponding to the target site selection area is obtained.

第二方面，本申请还提供了一种网点选址模型构建装置。所述装置包括：In a second aspect, the present application also provides an apparatus for constructing a site selection model. The device includes:

数据筛选模块，用于基于预设特征指标体系，从渠道多源数据获取网点选址对应的原始特征数据；The data screening module is used to obtain the original feature data corresponding to the site selection from the multi-source data of the channel based on the preset feature index system;

特征筛选模块，用于获取所述原始特征数据中各项特征对应的证据权重与信息价值，基于所述证据权重与所述信息价值，筛选网点选址模型对应的聪明特征；A feature screening module, configured to obtain the evidence weight and information value corresponding to each feature in the original feature data, and based on the evidence weight and the information value, screen smart features corresponding to the site selection model;

模型构建模块，用于基于所述聪明特征，通过主成分分析与熵值法构建网点选址模型。The model building module is used for building a site selection model by principal component analysis and entropy value method based on the smart features.

第三方面，本申请还提供了一种计算机设备。所述计算机设备包括存储器和处理器，所述存储器存储有计算机程序，所述处理器执行所述计算机程序时实现以下步骤：In a third aspect, the present application also provides a computer device. The computer device includes a memory and a processor, the memory stores a computer program, and the processor implements the following steps when executing the computer program:

第四方面，本申请还提供了一种计算机可读存储介质。所述计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行时实现以下步骤：In a fourth aspect, the present application also provides a computer-readable storage medium. The computer-readable storage medium has a computer program stored thereon, and when the computer program is executed by the processor, the following steps are implemented:

第五方面，本申请还提供了一种计算机程序产品。所述计算机程序产品，包括计算机程序，该计算机程序被处理器执行时实现以下步骤：In a fifth aspect, the present application also provides a computer program product. The computer program product includes a computer program that, when executed by a processor, implements the following steps:

上述网点选址模型构建方法、装置、计算机设备、存储介质和计算机程序产品，其中方法基于预设特征指标体系，从渠道多源数据获取网点选址对应的原始特征数据；获取原始特征数据中各项特征对应的证据权重与信息价值，基于证据权重与信息价值，筛选网点选址模型对应的聪明特征；基于聪明特征，通过主成分分析与熵值法构建网点选址模型。本申请的网点选址模型构建方法通过预先建立特征指标体系来进行特征筛选，而后从筛选后的特征中选出网点选址模型对应的聪明特征，从而通过主成分分析与熵值法构建网点选址模型。本申请的方案通过多重特征筛选来对网点选址的多源数据进行分析，从而构建网点选址模型，可以有效实现网点选址过程的全面考量。The above-mentioned method, device, computer equipment, storage medium and computer program product for constructing a site selection model, wherein the method is based on a preset feature index system, and obtains original feature data corresponding to site site selection from channel multi-source data; The evidence weight and information value corresponding to the item features, based on the evidence weight and information value, screen the smart features corresponding to the site selection model; based on the smart features, the site site selection model is constructed by principal component analysis and entropy value method. The method for constructing a site selection model of the present application performs feature screening by pre-establishing a feature index system, and then selects smart features corresponding to the site selection model from the screened features, so as to construct a site selection model through principal component analysis and entropy method. address model. The solution of the present application analyzes the multi-source data of site location selection through multiple feature screening, thereby constructing a site site selection model, which can effectively realize the comprehensive consideration of the site site selection process.

附图说明Description of drawings

图1为一个实施例中网点选址模型构建方法的应用环境图；1 is an application environment diagram of a method for constructing a site selection model in one embodiment;

图2为一个实施例中网点选址模型构方法的流程示意图；Fig. 2 is a schematic flow chart of a method for constructing a site selection model in one embodiment;

图3为一个实施例中图2中步骤201的子流程示意图；Fig. 3 is a sub-flow schematic diagram of step 201 in Fig. 2 in one embodiment;

图4为一个实施例中基于多源数据聚合以及评估指标体系来进行数据分类筛选过程的示意图；4 is a schematic diagram of a data classification and screening process based on multi-source data aggregation and an evaluation index system in one embodiment;

图5为一个实施例中图2中步骤203的子流程示意图；Fig. 5 is a sub-flow schematic diagram of step 203 in Fig. 2 in one embodiment;

图6为一个实施例中图2中步骤205的子流程示意图；Fig. 6 is a sub-flow schematic diagram of step 205 in Fig. 2 in one embodiment;

图7为一个实施例中根据网点选址模型进行网点选址步骤的流程示意图；7 is a schematic flow chart of the steps of site location selection according to a site location model in one embodiment;

图8为一个实施例中网点选址评分对应的分类象限示意图；8 is a schematic diagram of the classification quadrant corresponding to the site selection score in one embodiment;

图9为一个实施例中网点选址模型构建装置的结构框图；9 is a structural block diagram of an apparatus for constructing a site selection model in one embodiment;

图10为一个实施例中计算机设备的内部结构图。Figure 10 is a diagram of the internal structure of a computer device in one embodiment.

具体实施方式Detailed ways

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处描述的具体实施例仅仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application more clearly understood, the present application will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.

本申请实施例提供的网点选址模型构建方法，可以应用于如图1所示的应用环境中。其中，终端102通过网络与服务器104进行通信。数据存储系统可以存储服务器104网点选址模型构建过程需要处理的数据。数据存储系统可以集成在服务器104上，也可以放在云上或其他网络服务器上。终端102可以发送网点选址模型构建请求给服务器104，服务器104在接收到网点选址模型构建请求后，基于预设特征指标体系，从渠道多源数据获取网点选址对应的原始特征数据；获取原始特征数据中各项特征对应的证据权重与信息价值，基于证据权重与信息价值，筛选网点选址模型对应的聪明特征获取原始特征数据中各项特征对应的证据权重与信息价值，基于证据权重与信息价值，筛选网点选址模型对应的聪明特征；基于聪明特征，通过主成分分析与熵值法构建网点选址模型。其中，终端102可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑、物联网设备和便携式可穿戴设备，物联网设备可为智能音箱、智能电视、智能空调、智能车载设备等。便携式可穿戴设备可为智能手表、智能手环、头戴设备等。服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。The method for constructing a site selection model provided by the embodiment of the present application can be applied to the application environment shown in FIG. 1 . The terminal 102 communicates with the server 104 through the network. The data storage system can store the data that the server 104 needs to process in the process of building the site selection model. The data storage system can be integrated on the server 104, or it can be placed on the cloud or other network server. The terminal 102 may send a site location model construction request to the server 104, and after receiving the site site site model construction request, the server 104 obtains the original feature data corresponding to the site site location from the multi-channel multi-source data based on the preset feature index system; The evidence weight and information value corresponding to each feature in the original feature data, based on the evidence weight and information value, filter the smart features corresponding to the network location model to obtain the evidence weight and information value corresponding to each feature in the original feature data, based on the evidence weight According to the information value, the smart features corresponding to the site selection model are screened; based on the smart features, the site site selection model is constructed by principal component analysis and entropy method. The terminal 102 can be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, IoT devices and portable wearable devices, and the IoT devices can be smart speakers, smart TVs, smart air conditioners, smart vehicle-mounted devices, etc. . The portable wearable device may be a smart watch, a smart bracelet, a head-mounted device, or the like. The server 104 can be implemented by an independent server or a server cluster composed of multiple servers.

在一个实施例中，如图2所示，提供了一种网点选址模型构建方法，以该方法应用于图1中的服务器104为例进行说明，包括以下步骤：In one embodiment, as shown in FIG. 2, a method for constructing a site selection model is provided, and the method is applied to the server 104 in FIG. 1 as an example for description, including the following steps:

步骤201，基于预设特征指标体系，从渠道多源数据获取网点选址对应的原始特征数据。Step 201 , based on a preset feature index system, obtain original feature data corresponding to the site selection from channel multi-source data.

其中，预设特征指标体系是指通过各项指标数据来确定影响网点选址过程的特征，在每个选址地点都能通过预设特征指标体系进行分析评估，得到选址地点对应的网点选址特征，预设特征指标体系具体包括了商圈生态评估体系、交通便利评估体系、竞争分布评估体系、顾客价值评估体系以及业务需求评估体系。网点选址可能受到域经济基础因素、人口分布因素、市场竞争因素、道路交通因素等因素的影响。因此，在进行网点选址模型构建时，可以预先建立区域经济基础因素、人口分布因素、市场竞争因素、道路交通因素等影响网点选址的指标体系。而后通过于预设特征指标体系，从渠道多源数据获取网点选址对应的原始特征数据。其中，经济基础因素是指选址区域的经济水平、周边商圈、基础设施等因素。一个区域的经济水平越高，用户消费水平较高，对企业网点服务的需求也会比较大，就越适合作为网点布局的选择。人口分布因素是指一个区域覆范围内的用户数量、年龄层分布、购买力指数、金融特征、客群分布等。网点作为企业服务用户的线下载体，主要是尽可能服务更优质更多的客户。市场竞争因素则是指随着市场的成熟、用户的选择也更加多样化，在同一个区域内，如果服务水平相当的企业网点分布密度较大，就会导致竞争加剧、用户分散的局面，对发展有不利影响。道路交通因素是指在选址过程中，应该综合考虑道路自身路况、公共交通设施的分布、道路连接状况等因素。一个区域的道路交通状况，其决定了能否给客户提高最大化的便捷度。道路交通因素具体包库：一、市场竞争因素，伴随着市场日益成熟，用户的选择也趋于多样化。若同一区域内服务水平相当的银行网点分布密集，将导致竞争加剧、用户分散，不利于行业长远发展；二、道路交通因素。在选址决策中，应综合考虑道路建设质量、公共交通便利性、道路连接通畅性等要素。一个区域的道路交通状况，决定其能否为客户提供最大化的便捷度。原始特征数据则是指基于预设特征指标体系，从各个渠道收集到的基础数据中筛选得到的特征数据。Among them, the preset feature index system refers to determining the features that affect the site selection process through various index data, and each site can be analyzed and evaluated through the preset feature index system to obtain the site selection corresponding to the site location. The preset feature index system specifically includes the business circle ecological evaluation system, the transportation convenience evaluation system, the competition distribution evaluation system, the customer value evaluation system and the business demand evaluation system. The site selection may be affected by factors such as regional economic fundamentals, population distribution, market competition, and road traffic. Therefore, when constructing a network site selection model, an index system that affects the site selection can be pre-established by regional economic basic factors, population distribution factors, market competition factors, and road traffic factors. Then, through the preset feature index system, the original feature data corresponding to the site selection is obtained from the multi-source data of the channel. Among them, the basic economic factors refer to the economic level, surrounding business circles, infrastructure and other factors of the site selection area. The higher the economic level of a region, the higher the consumption level of users, and the greater the demand for corporate network services, the more suitable it is as a choice for network layout. Population distribution factors refer to the number of users, age group distribution, purchasing power index, financial characteristics, customer group distribution, etc. within the coverage of a region. As the online download body of enterprise service users, outlets are mainly to serve more customers with better quality as much as possible. The market competition factor means that as the market matures and the choices of users become more diverse, in the same area, if the distribution density of enterprises with similar service levels is relatively large, it will lead to intensified competition and scattered users. development has adverse effects. Road traffic factors refer to factors such as road conditions, the distribution of public transport facilities, and road connections that should be comprehensively considered in the process of site selection. The road traffic conditions in an area determine whether it can maximize the convenience for customers. The specific package library of road traffic factors: 1. Market competition factors, as the market becomes more and more mature, the choices of users also tend to be diversified. If the banking outlets with similar service levels in the same area are densely distributed, it will lead to intensified competition and scattered users, which is not conducive to the long-term development of the industry; 2. Road traffic factors. In the site selection decision, factors such as road construction quality, public transportation convenience, and road connection smoothness should be comprehensively considered. The road traffic conditions in an area determine whether it can provide maximum convenience to customers. The original feature data refers to the feature data filtered from the basic data collected from various channels based on the preset feature index system.

具体地，在进行网点选址的过程中，需要先获取用于构建模型的特征数据。而影响网点选址的数据来源较为复杂多样，因此可以通过建立网点选址过程对应的预设特征指标体系。基于指标体系来收集对网点选址过程造成影响的原始特征数据。可以预先收集的网点选址GIS(Geographic Information System，地理信息系统)数据、移动信令数据、网络爬虫数据以及地图软件API数据等网点选址的基础数据，而后基于预设特征指标体系对基础数据进行评估识别，获取到网点选址对应的原始特征数据。Specifically, in the process of site location selection, it is necessary to obtain feature data for building a model first. However, the sources of data that affect the site selection are complex and diverse, so a preset feature index system corresponding to the site site selection process can be established. Based on the index system, the original feature data that affects the site selection process is collected. Pre-collected network site location GIS (Geographic Information System) data, mobile signaling data, web crawler data, map software API data and other basic data for site location selection, and then based on the preset feature index system for basic data. Perform evaluation and identification to obtain the original feature data corresponding to the site selection.

步骤203，获取原始特征数据中各项特征对应的证据权重与信息价值，基于证据权重与信息价值，筛选网点选址模型对应的聪明特征。Step 203: Obtain evidence weight and information value corresponding to each feature in the original feature data, and screen smart features corresponding to the site selection model based on the evidence weight and information value.

其中，证据权重与信息价值中，证据权重表明自变量相对于因变量的预测能力。而信息价值则是用于根据变量的重要性对变量进行排名。证据权重与信息价值目前一般可以用作筛选信用风险建模项目中的变量(例如违约概率)的基准。而本申请中，则是将证据权重与信息价值用作筛选网点选址模型中特征的基准。聪明特征是指对网点选址过程贡献较大的特征，可以基于这些聪明特征来对网点选址过程进行优化，从而从多个候选的地点中，选出较为合适的网点。Among them, in the weight of evidence and the value of information, the weight of evidence indicates the predictive ability of the independent variable relative to the dependent variable. Information value is used to rank variables according to their importance. Evidence weight and information value are currently generally used as benchmarks for screening variables such as probability of default in credit risk modeling projects. In the present application, the weight of evidence and the value of information are used as benchmarks for screening features in the site selection model. Smart features refer to features that greatly contribute to the site selection process. Based on these smart features, the site site selection process can be optimized, so that a more suitable site can be selected from multiple candidate sites.

由于原始特征数据中涉及到的特征太过复杂多样，其中还存在很多对网点选址过程没有影响或者影响过小的特征，所以在得到原始特征数据后，还需要进行相应的特征筛选，获取原始特征数据中各项特征对应的证据权重与信息价值，基于证据权重与信息价值，筛选网点选址模型对应的聪明特征，从而在保证网点选址模型构建准确性的同时，降低模型的复杂度，提高模型构建过程的构建效率，以及后续网点选址过程的计算效率。在筛选聪明特征时，具体可以计算出各个原始特征对于网点选址过程的贡献度，而后基于贡献度的排名确定聪明特征，如将贡献度高于预设阈值的原始特征作为聪明特征，在其中一个实施例中，可以将贡献度排名在前50％的原始特征作为聪明特征。Since the features involved in the original feature data are too complex and diverse, there are still many features that have no or too little influence on the site selection process, so after obtaining the original feature data, it is necessary to perform corresponding feature screening to obtain the original The weight of evidence and the value of information corresponding to each feature in the feature data, based on the weight of evidence and the value of information, the smart features corresponding to the site selection model are screened, so as to ensure the accuracy of the site selection model and reduce the complexity of the model. Improve the construction efficiency of the model construction process and the calculation efficiency of the subsequent network site location process. When screening smart features, the contribution of each original feature to the site selection process can be calculated, and then the smart features can be determined based on the ranking of the contribution. In one embodiment, the original features ranked in the top 50% of the contribution may be regarded as smart features.

步骤205，基于聪明特征，通过主成分分析与熵值法构建网点选址模型。Step 205 , based on the smart features, construct a site selection model by principal component analysis and entropy method.

其中，主成分分析是指是一种统计方法。通过正交变换将一组可能存在相关性的变量转换为一组线性不相关的变量，转换后的这组变量叫主成分。在很多情形，变量之间是有一定的相关关系的，当两个变量之间有一定相关关系时，可以解释为这两个变量反映此课题的信息有一定的重叠。主成分分析是对于原先提出的所有变量，将重复的变量(关系紧密的变量)删去多余，建立尽可能少的新变量，使得这些新变量是两两不相关的，而且这些新变量在反映课题的信息方面尽可能保持原有的信息。因此，在得到聪明特征之后，可以通过主成分分析对聪明特征进行地组合，从而实现对聪明特征的降维，进一步地降低分析过程的复杂度，提高计算效率。而熵值法是指用来判断某个指标的离散程度的数学方法。离散程度越大，该指标对综合评价的影响越大。可以用熵值判断某个指标的离散程度。Among them, principal component analysis refers to a statistical method. A set of potentially correlated variables is transformed into a set of linearly uncorrelated variables through orthogonal transformation, and the transformed set of variables is called principal components. In many cases, there is a certain correlation between variables. When there is a certain correlation between two variables, it can be explained that the information of the two variables reflecting the subject has a certain overlap. Principal component analysis is to delete redundant variables (closely related variables) for all the variables originally proposed, and establish as few new variables as possible, so that these new variables are uncorrelated, and these new variables reflect The information on the subject should be kept as original as possible. Therefore, after the smart features are obtained, the smart features can be combined through principal component analysis, so as to realize the dimension reduction of the smart features, further reduce the complexity of the analysis process, and improve the computational efficiency. The entropy method is a mathematical method used to judge the degree of dispersion of an indicator. The greater the degree of dispersion, the greater the impact of the index on the comprehensive evaluation. The entropy value can be used to judge the degree of dispersion of an indicator.

具体地，在得到聪明特征后，可以先确定聪明特征之间的关联关系，而后通过正交变换将存在相关性的聪明特征转换降维特征，从而删除掉聪明特征中可能存在重复的变量，实现聪明特征的降维化，并尽可能地保持原有信息。在主成分分析完成后，即可通过熵值法对各项降维特征进行评价，从而得出降维特征对应的权重，实现网点选址模型的构建。在得到网点选址模型之后，可以将各个选址地点对应的特征输入至网点选址模型，网点选址模型则可以根据总结出的降维特征以及特征权重，分别计算出这些选址地点对应的选址评分，从而得到更优的网点选址地址，实现网点选址过程的优化。Specifically, after obtaining the smart features, the correlation between the smart features can be determined first, and then the correlated smart features can be converted into dimension-reduced features through orthogonal transformation, so as to delete the possible duplicate variables in the smart features, and realize Dimensionality reduction of smart features, and keep the original information as much as possible. After the principal component analysis is completed, each dimension reduction feature can be evaluated by the entropy method, so as to obtain the corresponding weight of the dimension reduction feature, and realize the construction of the network site selection model. After obtaining the site selection model, the corresponding features of each site can be input into the site site model, and the site site model can calculate the corresponding features of these sites according to the summed up dimensionality reduction features and feature weights. The site selection score can be obtained to obtain a better site selection address and realize the optimization of the site site selection process.

上述网点选址模型构建方法，基于预设特征指标体系，从渠道多源数据获取网点选址对应的原始特征数据；获取原始特征数据中各项特征对应的证据权重与信息价值，基于证据权重与信息价值，筛选网点选址模型对应的聪明特征；基于聪明特征，通过主成分分析与熵值法构建网点选址模型。本申请的网点选址模型构建方法通过预先建立特征指标体系来进行特征筛选，而后从筛选后的特征中选出网点选址模型对应的聪明特征，从而通过主成分分析与熵值法构建网点选址模型。本申请的方案通过多重特征筛选来对网点选址的多源数据进行分析，从而构建网点选址模型，可以有效实现网点选址过程的全面考量。The above-mentioned method for constructing an outlet location model, based on a preset feature index system, obtains the original feature data corresponding to the outlet location from the multi-source data of channels; Information value, screen out the smart features corresponding to the site selection model; based on the smart features, the site site selection model is constructed through principal component analysis and entropy value method. The method for constructing a site selection model of the present application performs feature screening by pre-establishing a feature index system, and then selects smart features corresponding to the site selection model from the screened features, so as to construct a site selection model through principal component analysis and entropy method. address model. The solution of the present application analyzes the multi-source data of site location selection through multiple feature screening, thereby constructing a site site selection model, which can effectively realize the comprehensive consideration of the site site selection process.

在一个实施例中，如图3所示，步骤201包括：In one embodiment, as shown in FIG. 3, step 201 includes:

步骤302，获取网点选址对应的渠道多源数据。Step 302: Acquire channel multi-source data corresponding to the site selection.

步骤304，基于预设特征指标体系对渠道多源数据进行分类筛选，获取网点选址对应的原始特征数据。Step 304 , classify and filter the channel multi-source data based on the preset feature index system, and obtain the original feature data corresponding to the site selection.

其中，渠道多源数据包括了从多个不同渠道收集来的基础数据，具体包括了现有网点对应的网点选址GIS数据、移动信令数据、网络爬虫数据以及地图软件API数据等网点选址的基础数据。这些数据分别来自于不同的信息渠道，通过收集这些渠道多源数据，可以从内部以及外部等不同渠道展示现有网点下的网点选址相关信息。Among them, the channel multi-source data includes basic data collected from multiple different channels, including the site location GIS data, mobile signaling data, web crawler data, and map software API data corresponding to the existing sites. basic data. These data come from different information channels. By collecting multi-source data from these channels, we can display information about the location selection of existing outlets from different channels such as internal and external channels.

具体地，在进行网点选址模型构建时，首先需要确定影响网点选址的特征数据。因此，可以在网点选址模型构建，首先建立起特征指标体系。而后基于预设特征指标体系对渠道多源数据进行分类筛选。参照图4所示，在其中一个实施例中，针对银行网点进行选址时，在将渠道多源数据分类为经济基础因素、人口分布因素、市场竞争因素、道路交通因素等影响网点选址的数据之后，可以在各个预设特征指标体系下对对应的渠道多源数据进行分类筛选，确定每一个预设特征指标体系对应的数据，从而构建得到网点选址对应的原始特征数据。本实施例中，基于预设特征指标体系来对渠道多源数据进行分类筛选，可以有效地从渠道多源数据中得到渠道多源数据进行分类筛选，从渠道多源数据获取网点选址对应的原始特征数据。Specifically, when constructing a site location model, it is first necessary to determine characteristic data affecting site location. Therefore, it is possible to build a network location model, and first establish a characteristic index system. Then, based on the preset feature index system, the channel multi-source data is classified and filtered. Referring to FIG. 4, in one of the embodiments, when selecting a location for a bank outlet, the multi-source data of channels is classified into factors such as economic fundamentals, population distribution factors, market competition factors, and road traffic factors that affect the location selection of the outlets. After the data is collected, the corresponding channel multi-source data can be classified and filtered under each preset feature index system to determine the data corresponding to each preset feature index system, so as to construct and obtain the original feature data corresponding to the site selection. In this embodiment, the channel multi-source data is classified and screened based on the preset feature index system, and the channel multi-source data can be effectively obtained from the channel multi-source data for classification and screening, and the corresponding network site selection can be obtained from the channel multi-source data. raw feature data.

在其中一个实施例中，如图5所示，步骤203包括：In one embodiment, as shown in FIG. 5 , step 203 includes:

步骤502，获取原始特征数据中各类特征数据对应的空值率。Step 502: Obtain the null ratios corresponding to various types of feature data in the original feature data.

步骤504，基于空值率对原始特征数据进行筛选。Step 504 , filter the original feature data based on the null rate.

步骤506，获取筛选完成的原始特征数据对应的证据权重与信息价值。Step 506: Obtain the evidence weight and information value corresponding to the original feature data that has been screened.

步骤508，剔除信息价值低于预设信息价值阈值的原始特征数据，获取第一初始特征。Step 508: Eliminate the original feature data whose information value is lower than the preset information value threshold, and obtain the first initial feature.

步骤510，基于证据权重识别第一初始特征对应的非线性关系，将第一初始特征对应的非线性关系为倒U型关系的第一初始特征作为第二初始特征。Step 510: Identify the nonlinear relationship corresponding to the first initial feature based on the weight of evidence, and use the first initial feature where the nonlinear relationship corresponding to the first initial feature is an inverted U-shaped relationship as the second initial feature.

步骤512，基于第二初始特征，获取网点选址模型对应的聪明特征。Step 512: Based on the second initial feature, acquire the smart feature corresponding to the site location model.

其中，空值率又称字段空值率，是指原始特征数据中，各个特征对应的字段值为空的概率。证据权重与信息价值中，证据权重表明自变量相对于因变量的预测能力。而信息价值则是用于根据变量的重要性对变量进行排名。证据权重与信息价值目前一般可以用作筛选信用风险建模项目中的变量(例如违约概率)的基准。而本申请中，则是将证据权重与信息价值用作筛选网点选址模型中初始特征的基准。Among them, the null value rate, also known as the field null value rate, refers to the probability that the field value corresponding to each feature in the original feature data is null. In the weight of evidence and the value of information, the weight of evidence indicates the predictive power of the independent variable relative to the dependent variable. Information value is used to rank variables according to their importance. Evidence weight and information value are currently generally used as benchmarks for screening variables such as probability of default in credit risk modeling projects. In this application, the weight of evidence and the value of information are used as benchmarks for screening initial features in the site selection model.

具体地，在得到原始特征数据后，由于原始特征数据中涉及到的特征太过复杂多样，其中还存在很多对网点选址过程没有影响或者影响过小的特征，所以在得到原始特征数据后，还需要进行相应的特征筛选，获取原始特征数据中各项特征对应的证据权重与信息价值，基于证据权重与信息价值，筛选网点选址模型对应的聪明特征。这一过程具体可以基于原始特征数据中各类特征数据对应的空值率来进行第一重筛选，而后根据各类特征数据对应的证据权重与信息价值来进行第二重筛选。在基于空值率进行筛选时，具体可以筛除掉空值率大于预设空值率阈值的特征数据。空值率过高的特征数据，其完整性无法保障，为了保证选址模型的可用性，可以首先去除掉空值率大于预设空值率阈值的特征数据。而基于证据权重与信息价值进行特征筛选时，具体可以去除掉信息价值低于预设信息价值阈值的特征，如去除掉信息价值小于0.5的特征数据，同时挑选出证据权重为倒U型的特征，作为原始特征数据中的初始特征。而后即可基于初始特征进行进一步地筛选，得到用于网点选址模型构建的聪明特征。本实施例中，基于空值率、证据权重以及信息价值来对原始特征数据中各类特征数据进行筛选，可以有效从原始特征数据中筛选出用于网点选址建模的初始特征，从而保证聪明特征识别的准确率。Specifically, after obtaining the original feature data, because the features involved in the original feature data are too complex and diverse, there are still many features that have no or little influence on the site location selection process, so after obtaining the original feature data, Corresponding feature screening is also required to obtain the evidence weight and information value corresponding to each feature in the original feature data, and based on the evidence weight and information value, the smart features corresponding to the site selection model are screened. In this process, the first screening can be performed based on the null value rates corresponding to various types of feature data in the original feature data, and then the second screening can be performed according to the evidence weights and information values corresponding to various types of feature data. When filtering based on the null value rate, the feature data with the null value rate greater than the preset null value rate threshold may be specifically filtered out. The integrity of the feature data with a high null rate cannot be guaranteed. In order to ensure the availability of the site selection model, the feature data with a null rate greater than the preset null rate threshold can be removed first. When performing feature screening based on evidence weight and information value, it is possible to remove features whose information value is lower than the preset information value threshold, such as removing feature data whose information value is less than 0.5, and select features whose evidence weight is an inverted U shape. , as the initial features in the original feature data. Then, based on the initial features, further screening can be performed to obtain smart features for the construction of the site location model. In this embodiment, various types of feature data in the original feature data are screened based on the null value rate, the weight of evidence and the information value, which can effectively screen out the initial features used for the modeling of site selection from the original feature data, thereby ensuring that The accuracy of smart feature recognition.

在其中一个实施例中，步骤510之前，还包括：基于全局搜索法对初始随机森林模型进行训练，获取随机森林模型。步骤510包括：基于随机森林模型对初始特征进行筛选，获取聪明特征。In one embodiment, before step 510, the method further includes: training an initial random forest model based on a global search method to obtain a random forest model. Step 510 includes: screening initial features based on a random forest model to obtain smart features.

其中，全局搜索法是指在整个可行集上开展搜索，以找到极值点。这些方法只需要计算函数目标值，不需要对目标函数进行求导。而随机森林指的是利用多棵树对样本进行训练并预测的一种分类器。Among them, the global search method refers to conducting a search on the entire feasible set to find extreme points. These methods only need to calculate the objective value of the function and do not need to differentiate the objective function. Random forest refers to a classifier that uses multiple trees to train and predict samples.

具体地，在得到初始特征之后，还可以进行进一步地特征筛选，这一过程可以通过随机森林算法来进行，基于全局搜索法训练模型，设置随机森林的最优参数为n_estimators\max_features\max_depth，而后运用交叉验证循环计算，对各初始特征的重要性取均值并筛选聪明特征，在具体实施例中，可以通过随机森林模型确定各个初始特征对应的重要性，然后筛选出排名靠前的50％的特征，将这些特征作为网点选址模型的聪明特征。在另一个实施例中，还可以使用lasso算法来筛选聪明特征，此时可以设置惩罚项为0.01，而后得到系数不为0的特征，将这些特征作为聪明特征。本实施例中，通过随机森林模型来对初始特征进行筛选，可以得到更有效地聪明特征，从而提高网点选址模型构建的成功率，提高网点选址的准确率。Specifically, after the initial features are obtained, further feature screening can be performed. This process can be performed through the random forest algorithm. The model is trained based on the global search method, and the optimal parameter of the random forest is set to n_estimators\max_features\max_depth, and then Using the cross-validation loop calculation, the importance of each initial feature is averaged and the smart features are screened. In a specific embodiment, the corresponding importance of each initial feature can be determined through a random forest model, and then the top 50% of the features are screened out. features as smart features of the site location model. In another embodiment, the lasso algorithm can also be used to filter smart features. In this case, the penalty term can be set to 0.01, and then features with coefficients other than 0 are obtained, and these features are regarded as smart features. In this embodiment, the random forest model is used to screen the initial features, so that more effective smart features can be obtained, thereby improving the success rate of building the site selection model and the accuracy of site site selection.

在其中一个实施例中，如图6所示，步骤205包括：In one embodiment, as shown in FIG. 6, step 205 includes:

步骤601，通过主成分分析对聪明特征进行降维处理，获取降维特征。Step 601: Perform dimension reduction processing on smart features through principal component analysis to obtain dimension reduction features.

步骤603，通过熵值法对降维特征进行处理，获取降维特征对应的特征权重。In step 603, the dimension reduction feature is processed by the entropy value method, and the feature weight corresponding to the dimension reduction feature is obtained.

步骤605，基于降维特征以及降维特征对应的特征权重，构建网点选址模型。Step 605 , based on the dimension reduction feature and the feature weight corresponding to the dimension reduction feature, construct a site selection model.

其中，主成分分析是指是一种统计方法。通过正交变换将一组可能存在相关性的变量转换为一组线性不相关的变量，转换后的这组变量叫主成分。本申请可以在得到聪明特征之后，可以通过主成分分析对不同的聪明特征进行地组合，从而实现对聪明特征的降维，得到降维特征，进一步地降低分析过程的复杂度，提高计算效率。Among them, principal component analysis refers to a statistical method. A set of potentially correlated variables is transformed into a set of linearly uncorrelated variables through orthogonal transformation, and the transformed set of variables is called principal components. In the present application, after obtaining smart features, different smart features can be combined through principal component analysis, thereby realizing dimensionality reduction of smart features, obtaining dimensionality-reducing features, further reducing the complexity of the analysis process, and improving computational efficiency.

而熵值法是指用来判断某个指标的离散程度的数学方法。离散程度越大，该指标对综合评价的影响越大。可以用熵值判断某个指标的离散程度。本申请中主要通过熵值法对各项降维特征进行评价，从而得出降维特征对应的权重，实现网点选址模型的构建。在得到降维特征以及降维特征的权重后，即可基于这两个数据来构建网点选址模型，可以将各个选址地点对应的特征输入至网点选址模型，网点选址模型则可以根据输入，得出对应的降维特征数值，而后基于其对应的特征权重，即可分别计算出这些选址地点对应的选址评分，从而得到更优的网点选址地址，实现网点选址过程的优化。本实施例中，通过PCA结合熵值法来进行模型构建过程的优化，可以有效提高模型的运算效率，从而提高网点选址过程的处理效率。The entropy method is a mathematical method used to judge the degree of dispersion of an indicator. The greater the degree of dispersion, the greater the impact of the index on the comprehensive evaluation. The entropy value can be used to judge the degree of dispersion of an indicator. In this application, the entropy value method is mainly used to evaluate various dimensionality reduction features, so as to obtain the corresponding weights of the dimensionality reduction features, so as to realize the construction of the network site location model. After the dimension reduction feature and the weight of the dimension reduction feature are obtained, the site selection model can be constructed based on these two data. The features corresponding to each site location can be input into the site site selection model, and the site site selection model can be Input, get the corresponding dimensionality reduction feature value, and then based on its corresponding feature weight, the site selection score corresponding to these site selection locations can be calculated separately, so as to obtain a better site selection address, and realize the process of site site selection. optimization. In this embodiment, the optimization of the model building process is carried out through PCA combined with the entropy method, which can effectively improve the operation efficiency of the model, thereby improving the processing efficiency of the site location selection process.

在其中一个实施例中，预设特征指标体系渠道服务维度指标体系以及区域发展维度指标体系，如图7所示，步骤205之后，还包括：In one embodiment, the preset feature index system, channel service dimension index system and regional development dimension index system, as shown in FIG. 7 , after step 205, further includes:

步骤702，获取目标选址区域内各候选地点对应的网点选址数据。Step 702: Obtain the site location data corresponding to each candidate location in the target location area.

步骤704，将网点选址数据输入网点选址模型，获取各候选地点对应的渠道服务能力评分以及区域发展潜力评分。Step 704: Input the site location data into the site site selection model, and obtain the channel service capability score and the regional development potential score corresponding to each candidate site.

步骤706，基于渠道服务能力评分以及区域发展潜力评分，获取目标选址区域对应的选址地点。Step 706 , based on the channel service capability score and the regional development potential score, obtain the site selection location corresponding to the target site selection area.

其中，网点选址数据具体可以根据对初始特征进行筛选所获取的聪明特征来确定，在确定候选地点之后，可以基于构建模型所用的聪明特征，从候选地点对应的数据中筛选得到网点选址数据。Among them, the site location data can be specifically determined according to the smart features obtained by screening the initial features. After the candidate sites are determined, the site site selection data can be obtained by screening the data corresponding to the candidate sites based on the smart features used to construct the model. .

具体地，在得到网点选址模型之后，即可基于网点选址模型来进行目标选址区域内的网点选址。首先需要从目标选址区域中挑选出若干个各候选地点。而后将这些候选地点对应的网点选址数据输入至网点选址模型，获取各候选地点对应的渠道服务能力评分以及区域发展潜力评分；基于渠道服务能力评分以及区域发展潜力评分，获取目标选址区域对应的选址地点。在一个实施例中，如图8所示，可以构建基于渠道服务能力评分以及区域发展潜力评分的分类象限图，从而确定各个候选地点对应的评分象限，从而基于需求来确定目标选址区域对应的选址地点。在另一个实施例中，本申请的网点选址模型还可以用于评估存量网点渠道的健康度，从而优化渠道布点，提升资源利用率，提高市场占有率。本实施例中，通过网点选址数据以及网点选址模型，来进行网点选址的判断，能够有效地从候选地点出选出最优的网点地址，实现网点选址过程的优化。Specifically, after obtaining the site location model, the site site selection in the target site selection area can be performed based on the site site location model. First, several candidate sites need to be selected from the target site selection area. Then input the site location data corresponding to these candidate locations into the site location model to obtain the channel service capability score and regional development potential score corresponding to each candidate location; based on the channel service capability score and regional development potential score, obtain the target location area corresponding location. In one embodiment, as shown in FIG. 8 , a classification quadrant map based on the channel service capability score and the regional development potential score can be constructed, so as to determine the score quadrant corresponding to each candidate location, so as to determine the corresponding target location area based on the demand. Address location. In another embodiment, the outlet location model of the present application can also be used to evaluate the health of existing outlet channels, thereby optimizing channel distribution, improving resource utilization, and increasing market share. In this embodiment, the site site selection is judged by the site site selection data and site site selection model, and the optimal site site address can be effectively selected from the candidate sites to realize the optimization of the site site site selection process.

应该理解的是，虽然如上的各实施例所涉及的流程图中的各个步骤按照箭头的指示依次显示，但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明，这些步骤的执行并没有严格的顺序限制，这些步骤可以以其它的顺序执行。而且，如上的各实施例所涉及的流程图中的至少一部分步骤可以包括多个步骤或者多个阶段，这些步骤或者阶段并不必然是在同一时刻执行完成，而是可以在不同的时刻执行，这些步骤或者阶段的执行顺序也不必然是依次进行，而是可以与其它步骤或者其它步骤中的步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that, although the steps in the flowcharts involved in the above embodiments are sequentially displayed according to the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order, and these steps may be performed in other orders. Moreover, at least a part of the steps in the flowcharts involved in the above embodiments may include multiple steps or multiple stages. These steps or stages are not necessarily executed at the same time, but may be executed at different times. The order of execution of these steps or stages is also not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the steps or stages in the other steps.

基于同样的发明构思，本申请实施例还提供了一种用于实现上述所涉及的网点选址模型构建方法的网点选址模型构建装置。该装置所提供的解决问题的实现方案与上述方法中所记载的实现方案相似，故下面所提供的一个或多个网点选址模型构建装置实施例中的具体限定可以参见上文中对于网点选址模型构建方法的限定，在此不再赘述。Based on the same inventive concept, an embodiment of the present application further provides an apparatus for constructing a site location model for implementing the above-mentioned method for constructing a site location model. The solution to the problem provided by the device is similar to the implementation described in the above method, so the specific limitations in the embodiment of one or more network site location model building apparatus provided below can refer to the above for the network site location selection. The limitation of the model construction method will not be repeated here.

在一个实施例中，如图9所示，提供了一种网点选址模型构建装置，包括：In one embodiment, as shown in FIG. 9 , an apparatus for constructing a site selection model is provided, including:

数据筛选模块902，用于基于预设特征指标体系，从渠道多源数据获取网点选址对应的原始特征数据。The data screening module 902 is configured to obtain the original feature data corresponding to the site selection from the multi-channel multi-source data based on the preset feature index system.

特征筛选模块904，用于获取原始特征数据中各项特征对应的证据权重与信息价值，基于证据权重与信息价值，筛选网点选址模型对应的聪明特征。The feature screening module 904 is used to obtain the evidence weight and information value corresponding to each feature in the original feature data, and based on the evidence weight and information value, screen the smart features corresponding to the site location model.

模型构建模块906，用于基于聪明特征，通过主成分分析与熵值法构建网点选址模型。The model building module 906 is configured to build a site selection model by principal component analysis and entropy method based on smart features.

在一个实施例中，数据筛选模块902具体用于：获取网点选址对应的渠道多源数据；基于预设特征指标体系对渠道多源数据进行分类筛选，获取网点选址对应的原始特征数据。In one embodiment, the data screening module 902 is specifically configured to: obtain the multi-source data of the channel corresponding to the site selection; classify and filter the multi-source data of the channel based on the preset feature index system, and obtain the original feature data corresponding to the site selection.

在一个实施例中，特征筛选模块904具体用于：获取原始特征数据中各类特征数据对应的空值率；基于空值率对原始特征数据进行筛选；获取筛选完成的原始特征数据对应的证据权重与信息价值；剔除信息价值低于预设信息价值阈值的原始特征数据，获取第一初始特征；基于证据权重识别第一初始特征对应的非线性关系，将第一初始特征对应的非线性关系为倒U型关系的第一初始特征作为第二初始特征；基于第二初始特征，获取网点选址模型对应的聪明特征。In one embodiment, the feature screening module 904 is specifically configured to: obtain the null ratios corresponding to various types of feature data in the original feature data; screen the raw feature data based on the null ratios; obtain evidence corresponding to the filtered raw feature data Weight and information value; remove the original feature data whose information value is lower than the preset information value threshold to obtain the first initial feature; identify the nonlinear relationship corresponding to the first initial feature based on the evidence weight, and convert the nonlinear relationship corresponding to the first initial feature The first initial feature of the inverted U-shaped relationship is used as the second initial feature; based on the second initial feature, the smart feature corresponding to the network site location model is obtained.

在一个实施例中，特征筛选模块904具体用于：基于全局搜索法对初始随机森林模型进行训练，获取随机森林模型；基于初始特征获取网点选址模型对应的聪明特征包括：基于随机森林模型对初始特征进行筛选，获取聪明特征。In one embodiment, the feature screening module 904 is specifically configured to: train an initial random forest model based on a global search method, and obtain a random forest model; and obtain smart features corresponding to the site location model based on the initial features, including: based on the random forest model The initial features are filtered to obtain smart features.

在一个实施例中，模型构建模块906具体用于：通过主成分分析对聪明特征进行降维处理，获取降维特征；通过熵值法对降维特征进行处理，获取降维特征对应的特征权重；基于降维特征以及降维特征对应的特征权重，构建网点选址模型。In one embodiment, the model building module 906 is specifically configured to: perform dimension reduction processing on smart features through principal component analysis to obtain dimension reduction features; process the dimension reduction features through entropy method to obtain feature weights corresponding to the dimension reduction features ; Based on the dimensionality reduction features and the feature weights corresponding to the dimensionality reduction features, a network site selection model is constructed.

在一个实施例中，预设特征指标体系渠道服务维度指标体系以及区域发展维度指标体系，还包括网点选址模块，用于：获取目标选址区域内各候选地点对应的网点选址数据；将网点选址数据输入网点选址模型，获取各候选地点对应的渠道服务能力评分以及区域发展潜力评分；基于渠道服务能力评分以及区域发展潜力评分，获取目标选址区域对应的选址地点。In one embodiment, the preset feature index system, the channel service dimension index system and the regional development dimension index system, further includes an outlet location module for: acquiring outlet location data corresponding to each candidate location in the target location area; The outlet location data is input into the outlet location model to obtain the channel service capability score and regional development potential score corresponding to each candidate location; based on the channel service capability score and the regional development potential score, the location location corresponding to the target location area is obtained.

上述网点选址模型构建装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中，也可以以软件形式存储于计算机设备中的存储器中，以便于处理器调用执行以上各个模块对应的操作。All or part of the modules in the above-mentioned network site location model construction device can be implemented by software, hardware and combinations thereof. The above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.

在一个实施例中，提供了一种计算机设备，该计算机设备可以是服务器，其内部结构图可以如图10所示。该计算机设备包括通过系统总线连接的处理器、存储器和网络接口。其中，该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质和内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储网点选址模型构建数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种网点选址模型构建方法。In one embodiment, a computer device is provided, and the computer device may be a server, and its internal structure diagram may be as shown in FIG. 10 . The computer device includes a processor, memory, and a network interface connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes non-volatile storage media and internal memory. The nonvolatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used to store the site location model construction data. The network interface of the computer device is used to communicate with an external terminal through a network connection. When the computer program is executed by the processor, a method for constructing a site selection model is realized.

本领域技术人员可以理解，图10中示出的结构，仅仅是与本申请方案相关的部分结构的框图，并不构成对本申请方案所应用于其上的计算机设备的限定，具体的计算机设备可以包括比图中所示更多或更少的部件，或者组合某些部件，或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in FIG. 10 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.

在一个实施例中，提供了一种计算机设备，包括存储器和处理器，存储器中存储有计算机程序，该处理器执行计算机程序时实现以下步骤：In one embodiment, a computer device is provided, including a memory and a processor, a computer program is stored in the memory, and the processor implements the following steps when executing the computer program:

获取原始特征数据中各项特征对应的证据权重与信息价值，基于证据权重与信息价值，筛选网点选址模型对应的聪明特征；Obtain the evidence weight and information value corresponding to each feature in the original feature data, and screen the smart features corresponding to the site selection model based on the evidence weight and information value;

基于聪明特征，通过主成分分析与熵值法构建网点选址模型。Based on smart features, a network site selection model is constructed through principal component analysis and entropy method.

在一个实施例中，处理器执行计算机程序时还实现以下步骤：获取网点选址对应的渠道多源数据；基于预设特征指标体系对渠道多源数据进行分类筛选，获取网点选址对应的原始特征数据。In one embodiment, when the processor executes the computer program, the following steps are further implemented: acquiring channel multi-source data corresponding to the site selection; classifying and screening the channel multi-source data based on a preset feature index system, and acquiring raw data corresponding to the site site selection. characteristic data.

在一个实施例中，处理器执行计算机程序时还实现以下步骤：获取原始特征数据中各类特征数据对应的空值率；基于空值率对原始特征数据进行筛选；获取筛选完成的原始特征数据对应的证据权重与信息价值；剔除信息价值低于预设信息价值阈值的原始特征数据，获取第一初始特征；基于证据权重识别第一初始特征对应的非线性关系，将第一初始特征对应的非线性关系为倒U型关系的第一初始特征作为第二初始特征；基于第二初始特征，获取网点选址模型对应的聪明特征。In one embodiment, when the processor executes the computer program, the following steps are further implemented: obtaining the null ratios corresponding to various types of feature data in the original feature data; screening the raw feature data based on the null ratios; obtaining the filtered raw feature data Corresponding evidence weight and information value; remove the original feature data whose information value is lower than the preset information value threshold to obtain the first initial feature; identify the nonlinear relationship corresponding to the first initial feature based on the evidence weight, The first initial feature whose nonlinear relationship is an inverted U-shaped relationship is used as the second initial feature; and based on the second initial feature, the smart feature corresponding to the site selection model is obtained.

在一个实施例中，处理器执行计算机程序时还实现以下步骤：基于全局搜索法对初始随机森林模型进行训练，获取随机森林模型；基于随机森林模型对初始特征进行筛选，获取聪明特征。In one embodiment, the processor further implements the following steps when executing the computer program: training an initial random forest model based on a global search method to obtain a random forest model; screening initial features based on the random forest model to obtain smart features.

在一个实施例中，处理器执行计算机程序时还实现以下步骤：通过主成分分析对聪明特征进行降维处理，获取降维特征；通过熵值法对降维特征进行处理，获取降维特征对应的特征权重；基于降维特征以及降维特征对应的特征权重，构建网点选址模型。In one embodiment, the processor further implements the following steps when executing the computer program: performing dimension reduction processing on smart features through principal component analysis to obtain dimension reduction features; Based on the feature weights of dimensionality reduction features and the feature weights corresponding to the dimensionality reduction features, a network site selection model is constructed.

在一个实施例中，处理器执行计算机程序时还实现以下步骤：获取目标选址区域内各候选地点对应的网点选址数据；将网点选址数据输入网点选址模型，获取各候选地点对应的渠道服务能力评分以及区域发展潜力评分；基于渠道服务能力评分以及区域发展潜力评分，获取目标选址区域对应的选址地点。In one embodiment, the processor further implements the following steps when executing the computer program: acquiring the site location data corresponding to each candidate location in the target site selection area; Channel service capability score and regional development potential score; based on the channel service capability score and regional development potential score, obtain the location corresponding to the target location area.

在一个实施例中，提供了一种计算机可读存储介质，其上存储有计算机程序，计算机程序被处理器执行时实现以下步骤：In one embodiment, a computer-readable storage medium is provided on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented:

在一个实施例中，计算机程序被处理器执行时还实现以下步骤：获取网点选址对应的渠道多源数据；基于预设特征指标体系对渠道多源数据进行分类筛选，获取网点选址对应的原始特征数据。In one embodiment, when the computer program is executed by the processor, the following steps are further implemented: acquiring the multi-source data of the channel corresponding to the site selection; raw feature data.

在一个实施例中，计算机程序被处理器执行时还实现以下步骤：获取原始特征数据中各类特征数据对应的空值率；基于空值率对原始特征数据进行筛选；获取筛选完成的原始特征数据对应的证据权重与信息价值；剔除信息价值低于预设信息价值阈值的原始特征数据，获取第一初始特征；基于证据权重识别第一初始特征对应的非线性关系，将第一初始特征对应的非线性关系为倒U型关系的第一初始特征作为第二初始特征；基于第二初始特征，获取网点选址模型对应的聪明特征。In one embodiment, when the computer program is executed by the processor, the following steps are further implemented: obtaining the null ratios corresponding to various types of feature data in the original feature data; screening the raw feature data based on the null ratios; obtaining the filtered raw features Evidence weight and information value corresponding to the data; remove the original feature data whose information value is lower than the preset information value threshold to obtain the first initial feature; identify the nonlinear relationship corresponding to the first initial feature based on the evidence weight, and assign the first initial feature to the corresponding The first initial feature of the inverted U-shaped relationship is used as the second initial feature; based on the second initial feature, the smart feature corresponding to the site selection model is obtained.

在一个实施例中，计算机程序被处理器执行时还实现以下步骤：基于全局搜索法对初始随机森林模型进行训练，获取随机森林模型；基于随机森林模型对初始特征进行筛选，获取聪明特征。In one embodiment, the computer program further implements the following steps when executed by the processor: training an initial random forest model based on a global search method to obtain a random forest model; screening initial features based on the random forest model to obtain smart features.

在一个实施例中，计算机程序被处理器执行时还实现以下步骤：通过主成分分析对聪明特征进行降维处理，获取降维特征；通过熵值法对降维特征进行处理，获取降维特征对应的特征权重；基于降维特征以及降维特征对应的特征权重，构建网点选址模型。In one embodiment, when the computer program is executed by the processor, the following steps are further implemented: performing dimension reduction processing on smart features through principal component analysis to obtain dimension reduction features; processing the dimension reduction features through entropy method to obtain dimension reduction features Corresponding feature weights; based on the dimension reduction features and the feature weights corresponding to the dimension reduction features, a network site selection model is constructed.

在一个实施例中，计算机程序被处理器执行时还实现以下步骤：获取目标选址区域内各候选地点对应的网点选址数据；将网点选址数据输入网点选址模型，获取各候选地点对应的渠道服务能力评分以及区域发展潜力评分；基于渠道服务能力评分以及区域发展潜力评分，获取目标选址区域对应的选址地点。In one embodiment, when the computer program is executed by the processor, the following steps are further implemented: acquiring the site location data corresponding to each candidate location in the target site selection area; inputting the site location data into the site location model, and acquiring the corresponding site location data of each candidate location Based on the channel service capability score and regional development potential score; based on the channel service capability score and the regional development potential score, the site selection location corresponding to the target location area is obtained.

在一个实施例中，提供了一种计算机程序产品，包括计算机程序，该计算机程序被处理器执行时实现以下步骤：In one embodiment, a computer program product is provided, comprising a computer program that, when executed by a processor, implements the following steps:

在一个实施例中，计算机程序被处理器执行时还实现以下步骤：获取原始特征数据中各类特征数据对应的空值率；基于空值率对原始特征数据进行筛选；获取筛选完成的原始特征数据对应的证据权重与信息价值；基于证据权重与信息价值识别原始特征数据中的初始特征；基于初始特征获取网点选址模型对应的聪明特征。In one embodiment, when the computer program is executed by the processor, the following steps are further implemented: obtaining the null ratios corresponding to various types of feature data in the original feature data; screening the raw feature data based on the null ratios; obtaining the filtered raw features Evidence weight and information value corresponding to the data; identify initial features in original feature data based on evidence weight and information value; obtain smart features corresponding to the site selection model based on initial features.

需要说明的是，本申请所涉及的用户信息(包括但不限于用户设备信息、用户个人信息等)和数据(包括但不限于用于分析的数据、存储的数据、展示的数据等)，均为经用户授权或者经过各方充分授权的信息和数据。It should be noted that the user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, displayed data, etc.) involved in this application are all Information and data authorized by the user or fully authorized by the parties.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，的计算机程序可存储于一非易失性计算机可读取存储介质中，该计算机程序在执行时，可包括如上述各方法的实施例的流程。其中，本申请所提供的各实施例中所使用的对存储器、数据库或其它介质的任何引用，均可包括非易失性和易失性存储器中的至少一种。非易失性存储器可包括只读存储器(Read-OnlyMemory，ROM)、磁带、软盘、闪存、光存储器、高密度嵌入式非易失性存储器、阻变存储器(ReRAM)、磁变存储器(Magnetoresistive Random Access Memory，MRAM)、铁电存储器(Ferroelectric Random Access Memory，FRAM)、相变存储器(Phase Change Memory，PCM)、石墨烯存储器等。易失性存储器可包括随机存取存储器(Random Access Memory，RAM)或外部高速缓冲存储器等。作为说明而非局限，RAM可以是多种形式，比如静态随机存取存储器(Static Random Access Memory，SRAM)或动态随机存取存储器(Dynamic RandomAccess Memory，DRAM)等。本申请所提供的各实施例中所涉及的数据库可包括关系型数据库和非关系型数据库中至少一种。非关系型数据库可包括基于区块链的分布式数据库等，不限于此。本申请所提供的各实施例中所涉及的处理器可为通用处理器、中央处理器、图形处理器、数字信号处理器、可编程逻辑器、基于量子计算的数据处理逻辑器等，不限于此。Those skilled in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program, and the computer program can be stored in a non-volatile computer-readable storage medium , when the computer program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to a memory, a database or other media used in the various embodiments provided in this application may include at least one of a non-volatile memory and a volatile memory. Non-volatile memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive memory (ReRAM), magnetic variable memory (Magnetoresistive Random Memory) Access Memory, MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (Phase Change Memory, PCM), graphene memory, etc. Volatile memory may include random access memory (Random Access Memory, RAM) or external cache memory, and the like. As an illustration and not a limitation, the RAM can be in various forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM). The database involved in the various embodiments provided in this application may include at least one of a relational database and a non-relational database. The non-relational database may include a blockchain-based distributed database, etc., but is not limited thereto. The processors involved in the various embodiments provided in this application may be general-purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, data processing logic devices based on quantum computing, etc., and are not limited to this.

以上实施例的各技术特征可以进行任意的组合，为使描述简洁，未对上述实施例中的各个技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. In order to make the description simple, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features It is considered to be the range described in this specification.

以上实施例仅表达了本申请的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对本申请专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本申请构思的前提下，还可以做出若干变形和改进，这些都属于本申请的保护范围。因此，本申请的保护范围应以所附权利要求为准。The above examples only represent several embodiments of the present application, and the descriptions thereof are relatively specific and detailed, but should not be construed as a limitation on the scope of the patent of the present application. It should be noted that, for those skilled in the art, without departing from the concept of the present application, several modifications and improvements can be made, which all belong to the protection scope of the present application. Therefore, the scope of protection of the present application should be determined by the appended claims.

Claims

1. A method for constructing a site selection model of a network point is characterized by comprising the following steps:

acquiring original characteristic data corresponding to website selection from channel multi-source data based on a preset characteristic index system;

acquiring evidence weight and information value corresponding to each feature in the original feature data, and screening smart features corresponding to a site selection model based on the evidence weight and the information value;

and constructing a site selection model of the net points by a principal component analysis and entropy method based on the intelligent characteristics.

2. The method of claim 1, wherein the obtaining of raw feature data corresponding to website addressing from channel multi-source data based on a preset feature index system comprises:

acquiring channel multi-source data corresponding to the site selection of the network points;

and classifying and screening the channel multi-source data based on a preset feature index system to obtain original feature data corresponding to the site selection of the network points.

3. The method according to claim 1, wherein the obtaining of the evidence weight and the information value corresponding to each feature in the raw feature data, and the screening of smart features required by the site selection model based on the evidence weight and the information value comprises:

obtaining null value rates corresponding to various types of feature data in the original feature data;

screening the original characteristic data based on the null value rate;

acquiring evidence weight and information value corresponding to the screened original characteristic data;

removing original characteristic data with information value lower than a preset information value threshold value to obtain a first initial characteristic;

identifying a nonlinear relation corresponding to the first initial feature based on the evidence weight, and taking the first initial feature of which the nonlinear relation corresponding to the first initial feature is an inverse U-shaped relation as a second initial feature;

and acquiring smart characteristics corresponding to the site selection model based on the second initial characteristics.

4. The method according to claim 3, wherein before obtaining the smart feature corresponding to the site selection model based on the second initial feature, further comprising:

training the initial random forest model based on a global search method to obtain a random forest model;

the acquiring of the smart characteristics corresponding to the site selection model based on the second initial characteristics comprises:

and screening the second initial characteristics based on a random forest model to obtain smart characteristics.

5. The method of claim 1, wherein constructing a site selection model by principal component analysis and entropy based on the smart features comprises:

performing dimensionality reduction processing on the smart features through principal component analysis to obtain dimensionality reduction features;

processing the dimensionality reduction features through an entropy method to obtain feature weights corresponding to the dimensionality reduction features;

and constructing a site selection model of the net points based on the dimension reduction features and the feature weights corresponding to the dimension reduction features.

6. The method according to claim 1, wherein the pre-set characteristic index system, the service dimension index system and the region development dimension index system, after constructing the site selection model by principal component analysis and entropy method based on the smart characteristics, further comprises:

acquiring site selection data of network points corresponding to each candidate site in a target site selection area;

inputting the site selection data into the site selection model to obtain channel service capability scores and regional development potential scores corresponding to the candidate sites;

and acquiring the site selection site corresponding to the target site selection area based on the channel service capability score and the area development potential score.

7. A mesh point site selection model construction device is characterized by comprising the following components:

the data screening module is used for acquiring original characteristic data corresponding to the site selection of the network points from channel multi-source data based on a preset characteristic index system;

the characteristic screening module is used for acquiring the evidence weight and the information value corresponding to each characteristic in the original characteristic data and screening the smart characteristic corresponding to the site selection model based on the evidence weight and the information value;

and the model construction module is used for constructing a site selection model of the net points by a principal component analysis and entropy method based on the intelligent characteristics.

8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 6.

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.

10. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 6 when executed by a processor.