CN110633401A

CN110633401A - A prediction model of store data and its establishment method

Info

Publication number: CN110633401A
Application number: CN201910683454.6A
Authority: CN
Inventors: 付恩照; 曹建昌; 孙炜; 张言; 何同昕
Original assignee: Suning Cloud Computing Co Ltd
Current assignee: Suning Cloud Computing Co Ltd
Priority date: 2019-07-26
Filing date: 2019-07-26
Publication date: 2019-12-31

Abstract

The embodiment of the invention discloses a prediction model of store data and an establishment method thereof, wherein the prediction method of the store data comprises the following steps: acquiring first store information around stores through a web crawler technology, acquiring second store information through an enterprise internal management system, cleaning and performing structured data processing to obtain different types of sales data, dividing the sales data into continuous variable data and discrete variable data, performing normalization correction on the continuous variable data, and performing assignment continuous optimization on the discrete variable data; and performing training set learning by using a machine learning method to obtain a prediction model of the sales scale, and performing replication optimization on the prediction model by using actual sales scale data of the store to be predicted. The prediction model of store data and the establishment method thereof disclosed by the embodiment of the invention can reasonably and accurately predict the sales data of stores, further reasonably arrange the operation of the stores and improve the terrace effect of the stores.

Description

A prediction model of store data and its establishment method

技术领域technical field

本发明涉及大数据领域，具体涉及一种门店数据的预测模型及其建立方法。The invention relates to the field of big data, in particular to a prediction model for store data and a method for establishing the same.

背景技术Background technique

传统上，企业对于门店销售规模的预测往往通过基于业务专家的人工预测，既通过销售业务人员的经验结合人工分析预测出门店的销售情况，这种预测方法称为专家法，能够快速、简单得出预测结果，但是这种预测没有利用销售数据做出合理和说服力的判断，也没有理论支持，造成预测效果参差不齐，进而导致了门店的筹建、人员、备货、运营、管理等都受到影响。Traditionally, companies have often used manual forecasting based on business experts to predict the sales volume of stores, which is based on the experience of sales personnel combined with manual analysis to predict the sales of stores. This forecasting method is called the expert method, which can quickly and easily obtain However, this kind of prediction did not use the sales data to make a reasonable and convincing judgment, nor did it have theoretical support, resulting in uneven prediction results, which in turn led to the establishment, personnel, stocking, operation and management of the store. influences.

由于机器学习的方法广泛应用，目前企业会直接将不同门店的数据导入到计算机中，简单的通过机器学习预测出门店的销售情况，这种预测只是对机器学习的简单应用，但是企业的门店销售不仅仅受门店自身的管理数据影响，还会受到不同的地理位置、城市发展情况以及其他企业门店的多重因素干扰，这种预测方法也没有对各种因素进行分析处理，此外由于机器学习中对门店的管理数据应用会有较大的不确定性，同样会影响到最终的预测效果。Due to the wide application of machine learning methods, at present, enterprises will directly import the data of different stores into the computer, and simply predict the sales situation of the stores through machine learning. This prediction is only a simple application of machine learning, but the company's store sales It is not only affected by the management data of the store itself, but also affected by multiple factors of different geographical locations, urban development and other enterprise stores. This prediction method does not analyze and process various factors. The application of store management data will have greater uncertainty, which will also affect the final forecasting effect.

发明内容SUMMARY OF THE INVENTION

为了解决现有技术的问题，本发明实施例提供了一种门店数据预测模型的建立方法及系统，能够合理准确的预测出门店的销售数据，进而合理安排门店的运营，提高门店的坪效。In order to solve the problems of the prior art, the embodiment of the present invention provides a method and system for establishing a store data prediction model, which can reasonably and accurately predict the sales data of the store, and then reasonably arrange the operation of the store and improve the floor efficiency of the store.

为解决上述技术问题，本发明采用的技术方案是：In order to solve the above-mentioned technical problems, the technical scheme adopted in the present invention is:

第一方面，本发明实施例提供了一种门店数据预测模型的建立方法，包括以下步骤：In a first aspect, an embodiment of the present invention provides a method for establishing a store data prediction model, comprising the following steps:

通过网络爬虫技术获取门店周边的第一门店信息，并通过企业内部管理系统获取第二门店信息，将所述第一门店信息和所述第二门店信息存入数据库；Obtain the information of the first store around the store through the web crawler technology, and obtain the information of the second store through the internal management system of the enterprise, and store the information of the first store and the information of the second store in the database;

对所述第一门店信息和所述第二门店信息进行清洗和结构化数据处理，得到不同类型的销售数据，将所述销售数据分为连续变量数据和离散变量数据，对所述连续变量数据进行归一化修正，对所述离散变量数据进行赋值连续优化；Perform cleaning and structured data processing on the first store information and the second store information to obtain different types of sales data, divide the sales data into continuous variable data and discrete variable data, and analyze the continuous variable data. Carry out normalization correction, and carry out continuous optimization of assignment to the discrete variable data;

利用机器学习的方法对所述归一化修正和所述连续优化后的特征数据进行打分，自动选取得分高的所述特征数据进行训练集学习，得到销售规模的预测模型，并利用所述待预测门店的实际销售规模数据对所述预测模型复盘优化。Use the method of machine learning to score the feature data after the normalization correction and the continuous optimization, and automatically select the feature data with high scores for training set learning to obtain a prediction model of sales scale, and use the The actual sales scale data of the store to be predicted is reviewed and optimized for the prediction model.

进一步的，根据待预测门店的具体位置及零售场景，利用所述预测模型输出所述待预测门店销售规模的预测值。Further, according to the specific location and retail scene of the store to be predicted, the prediction model is used to output the predicted value of the sales scale of the store to be predicted.

进一步的，所述爬虫技术爬取的途径至少包括爬取地图数据、爬取网购数据、爬取官方统计数据。Further, the crawling methods of the crawler technology include at least crawling map data, crawling online shopping data, and crawling official statistical data.

进一步的，所述第一门店信息包括城市数据特征、商圈数据特征、地理数据特征，所述第二门店信息包括门店会员数据特征和门店项目数据特征。Further, the first store information includes city data features, business district data features, and geographic data features, and the second store information includes store membership data features and store item data features.

进一步的，所述归一化修正包括：对接近正态分布的所述连续变量数据先进行对数纠偏，然后将纠偏后的所述连续变量数据进行均值归一化处理。Further, the normalization correction includes: first performing logarithmic bias correction on the continuous variable data that is close to a normal distribution, and then performing mean normalization processing on the continuous variable data after bias correction.

进一步的，所述赋值连续优化包括：对所述离散变量数据进行一元方差分析，筛选对门店的坪效影响较大的所述离散变量数据，然后对筛选后的所述离散变量数据转化为门店对应的坪效均值，将不同的所述坪效均值按照不同的定量数值进行连续排序。Further, the continuous optimization of the assignment includes: performing a univariate analysis of variance on the discrete variable data, screening the discrete variable data that has a greater impact on the sales performance of the store, and then converting the screened discrete variable data into storefronts. For the corresponding mean value of the plateau effect, the different mean values of the plateau effect are successively sorted according to different quantitative values.

进一步的，在利用所述预测模型输出预测结果之前还包括明确所述零售场景的业态类型，根据所述业态类型，自动关联并匹配到相对应的指定预测模型。Further, before using the prediction model to output the prediction result, it also includes specifying the business type of the retail scene, and according to the business type, automatically associate and match to the corresponding specified prediction model.

另一方面，本发明实施例还提供了一种门店数据的预测模型，包括：On the other hand, an embodiment of the present invention also provides a prediction model for store data, including:

数据采集模块，包括网络爬虫单元和企业数据采集单元，所述网络爬虫单元用于通过网络爬虫技术获取门店周边的第一门店信息，所述企业数据采集单元用于通过企业内部管理系统获取第二门店信息，所述数据采集模块再将所述第一门店信息和所述第二门店信息存入数据库；The data collection module includes a web crawler unit and an enterprise data collection unit, the web crawler unit is used to obtain the information of the first store around the store through the web crawler technology, and the enterprise data collection unit is used to obtain the second store information through the internal management system of the enterprise. Store information, the data collection module then stores the first store information and the second store information into a database;

数据处理模块，用于对所述第一门店信息和所述第二门店信息进行清洗和结构化数据处理，得到不同类型的销售数据，将所述销售数据分为连续变量数据和离散变量数据，对所述连续变量数据进行归一化修正，对所述离散变量数据进行赋值连续优化；The data processing module is used for cleaning and structured data processing on the first store information and the second store information, obtaining different types of sales data, and dividing the sales data into continuous variable data and discrete variable data, Carry out normalization and correction to the continuous variable data, and carry out assignment continuous optimization to the discrete variable data;

机器学习模块，用于对所述归一化修正和所述连续优化后的特征数据进行打分，自动选取得分高的所述特征数据进行训练集学习，得到销售规模的预测模型，并利用所述待预测门店的实际销售规模数据对所述预测模型复盘优化。The machine learning module is used to score the feature data after the normalization correction and the continuous optimization, and automatically select the feature data with high scores for training set learning to obtain a prediction model of the sales scale, and use the The actual sales scale data of the store to be predicted is used to review and optimize the prediction model.

进一步的，所述数据处理模块包括有连续变量修正单元，用于对接近正态分布的所述连续变量数据先进行对数纠偏，然后将纠偏后的所述连续变量数据进行均值归一化处理。Further, the data processing module includes a continuous variable correction unit, which is used to first perform logarithmic correction on the continuous variable data that is close to the normal distribution, and then perform mean normalization processing on the continuous variable data after the correction. .

进一步的，所述数据处理模块包括有离散变量优化单元，用于对所述离散变量数据进行一元方差分析，筛选对门店的坪效影响较大的所述离散变量数据，然后对筛选后的所述离散变量数据转化为门店对应的坪效均值，将不同的所述坪效均值按照不同的定量数值进行连续排序。Further, the data processing module includes a discrete variable optimization unit for performing univariate analysis of variance on the discrete variable data, screening the discrete variable data that has a greater impact on the store's floor efficiency, and then performing a univariate analysis on the filtered data. The discrete variable data is converted into the average per floor efficiency corresponding to the store, and the different average per floor efficiency values are sequentially sorted according to different quantitative values.

本发明实施例提供的技术方案带来的有益效果是：The beneficial effects brought by the technical solutions provided in the embodiments of the present invention are:

本发明实施例提供了一种门店数据预测模型的建立方法及系统，其中，通过计算机的网络爬虫技术采集了门店周边信息，由于网络爬虫可以大量准确的获取第一门店信息，显著提高了信息获取的数量和准确度；同时结合企业内部的管理系统获取第二门店信息，并对所述第一门店信息和第二门店信息利用计算机分析，清洗和优化合理、准确的目标特征数据，通过数据的处理使得采集的各种数据在机器学习之前就已经排除了多重干扰因素，显著提高了机器学习后得到的模型的准确度；利用门店的实际销售数据对本发明实施例中的预测模型进行复盘优化，再次将所述预测模型的输出准确度得以提高，最终使得所述预测模型的输出预测的数据合理有效。再通过有效的预测结果，合理安排门店的运营，提高了门店的坪效。The embodiment of the present invention provides a method and system for establishing a store data prediction model, wherein the information around the store is collected through the network crawler technology of the computer, and the network crawler can obtain the information of the first store in a large amount and accurately, which significantly improves the information acquisition. At the same time, the second store information is obtained in combination with the internal management system of the enterprise, and the first store information and the second store information are analyzed by computer to clean and optimize the reasonable and accurate target characteristic data. The processing makes the various data collected before machine learning to eliminate multiple interference factors, which significantly improves the accuracy of the model obtained after machine learning; the actual sales data of the store is used to review and optimize the prediction model in the embodiment of the present invention. , the output accuracy of the prediction model is improved again, and the data predicted by the output of the prediction model is finally made reasonable and effective. Then through the effective prediction results, the operation of the store is reasonably arranged, and the floor efficiency of the store is improved.

附图说明Description of drawings

为了更清楚地说明本发明实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings used in the description of the embodiments. Obviously, the accompanying drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative effort.

图1是本发明实施例公开的门店数据预测模型的建立方法的一种流程图；Fig. 1 is a kind of flow chart of the establishment method of the store data prediction model disclosed by the embodiment of the present invention;

图2是本示意图实施例公开的门店数据预测模型的建立方法中对于数据处理分析的一种流程图；Fig. 2 is a kind of flow chart for data processing analysis in the establishment method of store data prediction model disclosed in the embodiment of this schematic diagram;

图3是本发明实施例公开的门店数据的预测模型的一种架构示意图。FIG. 3 is a schematic structural diagram of a prediction model for store data disclosed in an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the objectives, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only Some, but not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

实施例一：Example 1:

如图1公开的本发明实施例的一种门店数据预测模型的建立方法，包括以下步骤：The establishment method of a kind of store data prediction model of the embodiment of the present invention disclosed in Figure 1, comprises the following steps:

S1：通过网络爬虫技术获取门店周边的第一门店信息，并通过企业内部管理系统获取第二门店信息，将所述第一门店信息和所述第二门店信息存入数据库；S1: Obtain the information of the first store around the store through the web crawler technology, obtain the information of the second store through the internal management system of the enterprise, and store the information of the first store and the information of the second store in the database;

S2：对所述第一门店信息和所述第二门店信息进行清洗和结构化数据处理，得到不同类型的销售数据，将所述销售数据分为连续变量数据和离散变量数据，对所述连续变量数据进行归一化修正，对所述离散变量数据进行赋值连续优化；S2: Perform cleaning and structured data processing on the first store information and the second store information to obtain different types of sales data, and divide the sales data into continuous variable data and discrete variable data. The variable data is normalized and corrected, and the discrete variable data is assigned continuous optimization;

S3：利用机器学习的方法对所述归一化修正和所述连续优化后的特征数据进行打分，自动选取得分高的所述特征数据进行训练集学习，得到销售规模的预测模型，并利用所述待预测门店的实际销售规模数据对所述预测模型复盘优化。S3: Use the method of machine learning to score the feature data after the normalization correction and the continuous optimization, and automatically select the feature data with high scores for training set learning to obtain a prediction model of sales scale, and use The actual sales scale data of the store to be predicted is reviewed and optimized for the prediction model.

优选地，根据待预测门店的具体位置及零售场景，利用所述预测模型输出所述待预测门店销售规模的预测值，再通过有效的预测结果，合理安排门店的运营，提高了门店的坪效。Preferably, according to the specific location and retail scene of the store to be predicted, the prediction model is used to output the predicted value of the sales scale of the store to be predicted, and then through the effective prediction result, the operation of the store is reasonably arranged, and the floor efficiency of the store is improved. .

爬虫技术是一种“自动化浏览网络”的程序，它按照一定的规则，自动在万维网上抓取用户需要的信息。随着互联网的发展，网络成为大量信息的载体。爬虫技术也成为了数据采集的重要组成部分，是大数据分析中最为基础的一步。优选地，所述爬虫技术爬取的途径包括但不限于爬取地图数据、爬取网购数据、爬取官方统计数据。具体地，利用网络爬虫工具爬取高德地图、国家数据、大众点评、美团等网站数据，获得周边客流、人口、商业分布、兴趣点分布、交通情况等信息。Crawler technology is a program that "automatically browses the web", which automatically crawls the information needed by users on the World Wide Web according to certain rules. With the development of the Internet, the network has become the carrier of a large amount of information. Crawler technology has also become an important part of data collection and is the most basic step in big data analysis. Preferably, the crawling methods of the crawler technology include but are not limited to crawling map data, crawling online shopping data, and crawling official statistical data. Specifically, use web crawler tools to crawl website data such as AutoNavi maps, national data, Dianping, and Meituan to obtain information such as surrounding passenger flow, population, business distribution, point of interest distribution, and traffic conditions.

优选地，所述第一门店信息包括城市数据特征、商圈数据特征、地理数据特征，具体地，由于城市的宏观经济特征与潜在客户的数量和质量具有一定的相关性，因此需要首先考虑城市数据特征，对于人口结构的细致分析也是必要的，因此在商圈辐射范围内进行分析会更加合理，其中城市数据特征的采集包括：常住人口数量、户籍人口数量、GDP总量、城镇化率、城镇居民人均可支配收入、社会消费品零售总额；对于商圈数据特征的采集包括：人口数据特征、客流数据特征、画像特征；根据“零售集群”理论，零售企业在同一地段开店可以产生“溢出效应”，从而吸引更大的客流，因此有必要考虑竞对的数量及分布情况。交通设施和POI则可以明显吸引客流量，通常人们会选择交通更为便捷的地点进行消费，同时大量聚客点可以直接带来客流，因此，对于地理数据特征的采集包括：竞争对手数据、交通设施、聚客点POI(兴趣点)。Preferably, the first store information includes city data features, business district data features, and geographic data features. Specifically, since the macroeconomic features of a city have a certain correlation with the quantity and quality of potential customers, it is necessary to first consider the city Data characteristics are also necessary for detailed analysis of population structure, so it is more reasonable to analyze within the radiation range of business districts. The collection of urban data characteristics includes: the number of permanent residents, the number of registered population, the total GDP, urbanization rate, The per capita disposable income of urban residents and the total retail sales of consumer goods; the collection of data characteristics of business districts includes: demographic data characteristics, passenger flow data characteristics, and portrait characteristics; according to the "retail cluster" theory, retail enterprises opening stores in the same area can produce "spillover effects" ”, thereby attracting greater passenger flow, so it is necessary to consider the number and distribution of competitors. Transportation facilities and POIs can obviously attract passenger flow. Usually, people choose places with more convenient transportation for consumption. At the same time, a large number of passenger gathering points can directly bring passenger flow. Therefore, the collection of geographic data features includes: competitor data, traffic Facilities, gathering point POI (point of interest).

所述第二门店信息包括门店会员数据特征和门店项目数据特征。具体地，由于门店的会员是主要消费人群，有必要对其特征进行分析，所述会员数据特征包括：会员数、会员渗透率、会员年成交率、会员年购买金额、会员客单价；由于门店的项目本身的特征与销售额显然是有直接关联的，后续的特征分析主要也是基于这些数据，所述门店项目数据特征包括：城市类别、市场级别、城市等级划分、所在商圈、商圈类型、门店性质、开业时间、全年含税销售、年租金、楼层分布、套内面积、建筑面积。The second store information includes store membership data features and store item data features. Specifically, since the members of the store are the main consumer groups, it is necessary to analyze their characteristics. The characteristics of the member data include: the number of members, the penetration rate of members, the annual turnover rate of members, the annual purchase amount of members, and the unit price of members; The characteristics of the project itself and sales are obviously directly related, and the subsequent characteristic analysis is mainly based on these data. The data characteristics of the store project include: city category, market level, city level division, business district, business district type , store nature, opening time, annual sales including tax, annual rent, floor distribution, interior area, building area.

如图2公开的数据处理分析的流程图，所述归一化修正包括：对接近正态分布的所述连续变量数据先进行对数纠偏，然后将纠偏后的所述连续变量数据进行均值归一化处理。具体地，在将上述的各种数据进行整理后，由于各种数据类型较为多元，对门店销售影响较大的连续变量数据例如：坪效、销售额、套内面积以及租金等进行线性分析后发现比较接近正态分布，但具有明显的正偏，如果直接对其进行归一化，达不到理想效果，因此，需要对其首先进行取对数log(1+x)来缓解这种正偏趋势，其中，x表示取对数前的连续变量数据，取对数后的得到的结果再次进行线性分析，通过计算机的分析可以看出其已经达到正态分布，在此基础上，对其做均值归一化：[x_scale＝(x₁-mean)/std]，其中，mean是一组对数化处理之后连续变量数据的均值，std是是一组对数化处理之后连续变量数据标准差，x₁是是一组对数化处理之后连续变量数据的一个元素，x_scale表示归一化修正后的结果。通过均值归一化，使得所述连续变量数据能够减小波动，从而到达全局最小值，方便后续的机器学习过程中的学习器的效果更有说服力和准确性，减小误差。As shown in the flowchart of data processing and analysis disclosed in FIG. 2 , the normalization correction includes: first performing logarithmic correction on the continuous variable data that is close to a normal distribution, and then performing mean normalization on the continuous variable data after correction. Unified processing. Specifically, after sorting out the above-mentioned various data, due to the diverse data types, the continuous variable data that has a greater impact on store sales, such as floor efficiency, sales, interior area and rent, etc., are subjected to linear analysis. It is found that it is relatively close to the normal distribution, but has obvious positive bias. If it is directly normalized, it will not achieve the desired effect. Therefore, it is necessary to take the logarithm log(1+x) first to alleviate this positive bias. Partial trend, where x represents the continuous variable data before taking the logarithm, and the result obtained after taking the logarithm is subjected to linear analysis again. It can be seen through computer analysis that it has reached a normal distribution. Do mean normalization: [x_scale=(x ₁ -mean)/std], where mean is the mean of a set of continuous variable data after logarithmic processing, and std is a set of continuous variable data standards after logarithmic processing Difference, x ₁ is an element of a group of continuous variable data after logarithmic processing, and x_scale represents the result after normalization and correction. Through mean normalization, the fluctuation of the continuous variable data can be reduced, thereby reaching the global minimum value, which facilitates the effect of the learner in the subsequent machine learning process to be more convincing and accurate, and reduces errors.

优选地，所述赋值连续优化包括：对所述离散变量数据进行一元方差分析，筛选对门店的坪效影响较大的所述离散变量数据，然后对筛选后的所述离散变量数据转化为门店对应的坪效均值，将不同的所述坪效均值按照不同的定量数值进行连续排序。具体地，对于所述离散变量数据，若计算其一元方差后数值较大，则证明对门店的坪效影响较大，因此，通过计算机对多组离散变量数据进行一元方差的计算，然后通过离散特性分析，发现多个离散变量数据对门店的坪效会产生较大影响，如门店性质和城市等级等，因此可以按照各离散变量数据相应取值下坪效的均值来给各个离散变量数据划定一个1,2,3,4的坪效均值，通过所述坪效均值来定量描述他们对坪效的影响，相当于离散变量数据赋值使其成为连续变数据量。此方法采用了与One-Hot编码不同的方法来处理离散数据，相比One-Hot编码，能够保留的信息更多，显著提高了重要销售数据采集的广度和处理的深度，从而保证了后续机器学习的准确性和代表性。Preferably, the continuous optimization of the assignment includes: performing a one-way variance analysis on the discrete variable data, screening the discrete variable data that has a greater impact on the sales performance of stores, and then converting the filtered discrete variable data into storefronts For the corresponding mean value of the plateau effect, the different mean values of the plateau effect are successively sorted according to different quantitative values. Specifically, for the discrete variable data, if the value after calculating the unary variance is large, it proves that the impact on the store's floor efficiency is large. Characteristic analysis, it is found that multiple discrete variable data will have a greater impact on the store's floor efficiency, such as the nature of the store and city level. Set a mean of 1, 2, 3, and 4, and use the mean value to quantitatively describe their impact on the efficiency, which is equivalent to the assignment of discrete variable data to make it a continuous variable amount of data. This method uses a different method from One-Hot encoding to process discrete data. Compared with One-Hot encoding, it can retain more information, significantly improving the breadth of important sales data collection and the depth of processing, thus ensuring that subsequent machines Accuracy and representation of learning.

优选地，在利用所述预测模型输出预测结果之前还包括明确所述零售场景的业态类型，根据所述业态类型，自动关联并匹配到相对应的指定预测模型。具体的，由于不同的零售场景会产生不同的所述预测模型，且不同的零售场景存在着不同的业态类型，因此，在机器学习后会根据不同零售场景产生不同的预测模型，在利用所述预测模型之前，需要首先明确将要预测的门店零售场景，计算机会根据输入的零售场景，自动关联、匹配指定的预测模型，这种关联、匹配相较于传统上是机器学习产生统一的预测模型而言，针对性更强，有更加准确的预测数据，对于企业的门店而言，其输出的销售规模更加准确，从而可以使得门店可以合理的安排门店的运营，提高了门店的坪效。Preferably, before using the prediction model to output the prediction result, the method further includes clarifying the business type of the retail scene, and according to the business type, automatically associate and match to the corresponding specified prediction model. Specifically, since different retail scenarios will generate different prediction models, and different retail scenarios have different business types, after machine learning, different prediction models will be generated according to different retail scenarios. Before the prediction model, it is necessary to first define the retail scene of the store to be predicted. The computer will automatically associate and match the specified prediction model according to the input retail scene. This association and matching are compared with the traditional machine learning. It is more targeted and has more accurate forecast data. For the company's stores, the output sales scale is more accurate, so that the store can reasonably arrange the operation of the store and improve the floor efficiency of the store.

机器学习是一门多领域交叉学科，涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为，以获取新的知识或技能，重新组织已有的知识结构使之不断改善自身的性能。具体地，基于现有特征数据及前期调研，在自动建立预测模型时，由于机器学习中包括有bagging和boosting两类算法，本实施例优选地利用bagging类算法中的随机森林回归。其中，随机森林是由多个分类与回归树作为弱学习器组成的强学习器，随机森林可以自动选择最优特征，并进行打分。其中，在对门店预测的模型建构时，输入对门店影响较大的20个处理后的特征数据，分别为下列特征：'门店级别','地区','城市等级','城市类别','市场级别','距离核心企业的距离','商业网点规划类型','门店性质','开业时间','是否为购物中心店','年租金','套内面积_SUM','套内面积_B2F','套内面积_B1F','套内面积_1F','套内面积_2F','套内面积_3F','套内面积_4F','套内面积_5F','套内面积_6F','套内面积_7F','套内面积_8F','套内面积_夹层','不含税销售坪效套内','还原全年含税销售'；然后计算机自动选择分值最高的16个特征数据，同时选择500个回归树作为弱学习器；样本集合选择80％作为训练集，剩下20％作为测试集；根据训练集学习得到的预测模型的score值为0.84。Machine learning is a multi-domain interdisciplinary subject involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. It specializes in how computers simulate or realize human learning behaviors to acquire new knowledge or skills, and to reorganize existing knowledge structures to continuously improve their performance. Specifically, based on existing feature data and previous research, when a prediction model is automatically established, since machine learning includes two types of algorithms, bagging and boosting, this embodiment preferably uses random forest regression in bagging algorithms. Among them, random forest is a strong learner composed of multiple classification and regression trees as weak learners. Random forest can automatically select optimal features and score. Among them, when constructing the model for store forecasting, input 20 processed feature data that have a greater impact on the store, which are the following features: 'store level', 'region', 'city level', 'city category', 'Market level', 'Distance from core enterprises', 'Business network planning type', 'Store nature', 'Opening time', 'Whether it is a shopping mall store', 'Annual rent', 'Interior area_SUM' ,'Inner area_B2F','Inner area_B1F','Inner area_1F','Inner area_2F','Inner area_3F','Inner area_4F',' Inner area_5F', 'Inner area_6F', 'Inner area_7F', 'Inner area_8F', 'Inner area_interlayer', 'Tax-excluded sales floor effect in sleeve', 'Restore tax-included sales throughout the year'; then the computer automatically selects the 16 feature data with the highest scores, and selects 500 regression trees as weak learners; 80% of the sample set is selected as the training set, and the remaining 20% is used as the test set; The score value of the predictive model learned from the training set is 0.84.

优选地，在利用本发明实施例中的预测模型时，不仅可以将本发明实施例中的预测模型预测即将开设的门店，也可以预测已经开设的门店，其中，对于即将开设的门店而言，通过该预测模型可以清楚的显示出在哪个位置开设门店销售规模最大、坪效最高等；对已经开设的门店而言，通过计算机实时的将已经开设的门店的实际销售规模和预测的销售规模进行比较，在验证预测结果的同时，将实际数据导入的机器学习的过程中去，从而完成预测模型的复盘优化，使得预测模型处于一个动态优化修正的过程中，进一步提高所述预测模型的可用性和准确度，减小预测模型的误差范围。Preferably, when using the prediction model in the embodiment of the present invention, the prediction model in the embodiment of the present invention can not only predict the store to be opened, but also predict the store that has been opened, wherein, for the store to be opened, The prediction model can clearly show the location where the store has the largest sales scale and the highest ping-pong efficiency. By comparison, while verifying the prediction results, the actual data is imported into the process of machine learning, so as to complete the review and optimization of the prediction model, so that the prediction model is in a process of dynamic optimization and correction, and further improve the availability of the prediction model. and accuracy, reducing the error margin of the prediction model.

实施例二：Embodiment 2:

如图3公开的本发明实施例提供的门店数据的预测模型的一种架构示意图，包括：A schematic diagram of a structure of the prediction model of store data provided by the embodiment of the present invention disclosed in FIG. 3 includes:

数据采集模块1，包括网络爬虫单元11和企业数据采集单元12，所述网络爬虫单元11用于通过网络爬虫技术获取门店周边的第一门店信息，所述企业数据采集单元12用于通过企业内部管理系统获取第二门店信息，所述数据采集模块1再将所述第一门店信息和所述第二门店信息存入数据库；The data collection module 1 includes a web crawler unit 11 and an enterprise data collection unit 12, the web crawler unit 11 is used to obtain the first store information around the store through the web crawler technology, and the enterprise data collection unit 12 is used to The management system obtains the second store information, and the data collection module 1 stores the first store information and the second store information into the database again;

数据处理模块2，用于对所述第一门店信息和所述第二门店信息进行清洗和结构化数据处理，得到不同类型的销售数据，将所述销售数据分为连续变量数据和离散变量数据，对所述连续变量数据进行归一化修正，对所述离散变量数据进行赋值连续优化；The data processing module 2 is used for cleaning and structured data processing of the first store information and the second store information, obtaining different types of sales data, and dividing the sales data into continuous variable data and discrete variable data , carry out normalization and correction to the continuous variable data, and carry out assignment continuous optimization to the discrete variable data;

机器学习模块3，用于对所述归一化修正和所述连续优化后的特征数据进行打分，自动选取得分高的所述特征数据进行训练集学习，得到销售规模的预测模型，并利用所述待预测门店的实际销售规模数据对所述预测模型复盘优化。The machine learning module 3 is used to score the feature data after the normalization correction and the continuous optimization, and automatically select the feature data with high scores for training set learning to obtain a forecast model of sales scale, and use The actual sales scale data of the store to be predicted is reviewed and optimized for the prediction model.

具体地，所述网络爬虫单元11采集的数据包括：城市数据特征、商圈数据特征、地理数据特征，所述企业数据采集单元12采集的数据包括会员数据特征和门店项目数据特征。通过所述网络爬虫单元11和所述企业数据采集单元12对于待预测门店的销售规模而言几乎可以涵盖所有关键特征，其中各个采集的细节特征详见实施例一中的介绍的，在此不再详细介绍。Specifically, the data collected by the web crawler unit 11 includes: city data features, business district data features, and geographic data features, and the data collected by the enterprise data collection unit 12 includes membership data features and store item data features. Through the web crawler unit 11 and the enterprise data collection unit 12, almost all key features can be covered for the sales scale of the store to be predicted, and the details of each collected feature can be found in the introduction in the first embodiment, and will not be discussed here. More details.

优选地，所述数据处理模块2包括有连续变量修正单元21，用于对接近正态分布的所述连续变量数据先进行对数纠偏，然后将纠偏后的所述连续变量数据进行均值归一化处理；进一步地，所述数据处理模块2包括有离散变量优化单元22，用于对所述离散变量数据进行一元方差分析，筛选对门店的坪效影响较大的所述离散变量数据，然后对筛选后的所述离散变量数据转化为门店对应的坪效均值，将不同的所述坪效均值按照不同的定量数值进行连续排序。由于所述网络爬虫单元11和所述企业数据采集单元12仅仅是对于数据的采集和获取，因此输入到数据库中是数据需要进一步地优化和修正，使得进入的到数据预测模块中的数据具有普遍的代表性和准确性。具体地，所述连续变量修正单元21首先进行取对数log(1+x)来缓解这种正偏趋势，其中，x表示取对数前的连续变量数据，取对数后的得到的结果再次进行线性分析，通过计算机的分析可以看出其已经达到正态分布，在此基础上，对其做均值归一化：[x_scale＝(x₁-mean)/std]；其中，详细的归一化修正方法见上述实施例一中的介绍，再次不再详细介绍。所述离散变量优化单元22首先将多组离散变量数据进行一元方差的计算，然后通过离散特性分析，发现多个离散变量数据对门店的坪效会产生较大影响，如门店性质和城市等级等，因此可以按照各离散变量数据相应取值下坪效的均值来给各个离散变量数据划定一个1,2,3,4的坪效均值，通过所述坪效均值来定量描述他们对坪效的影响，相当于离散变量数据赋值使其成为连续变数据量。Preferably, the data processing module 2 includes a continuous variable correction unit 21, which is used to first perform logarithmic correction on the continuous variable data that is close to a normal distribution, and then perform mean normalization on the continuous variable data after correction. Further, the data processing module 2 includes a discrete variable optimization unit 22, which is used to perform a one-way variance analysis on the discrete variable data, and screen the discrete variable data that has a greater impact on the store’s floor efficiency, and then The filtered discrete variable data is converted into the average per floor efficiency corresponding to the store, and the different average per floor efficiency values are sequentially sorted according to different quantitative values. Since the web crawler unit 11 and the enterprise data collection unit 12 are only for data collection and acquisition, the data input into the database needs to be further optimized and corrected, so that the data entered into the data prediction module has a common representativeness and accuracy. Specifically, the continuous variable correction unit 21 firstly takes the logarithm log(1+x) to alleviate this positive bias trend, wherein x represents the continuous variable data before taking the logarithm, and the result obtained after taking the logarithm Perform linear analysis again, and it can be seen through computer analysis that it has reached a normal distribution. On this basis, the mean is normalized: [x_scale=(x ₁ -mean)/std]; See the introduction in the above-mentioned first embodiment for the method of unification correction, which will not be described in detail again. The discrete variable optimization unit 22 first calculates the unary variance of the multiple sets of discrete variable data, and then through discrete characteristic analysis, it is found that the multiple discrete variable data will have a greater impact on the floor efficiency of the store, such as the nature of the store and the city level, etc. , so it is possible to delineate a mean value of 1, 2, 3, and 4 for each discrete variable data according to the mean value of the ping effect under the corresponding value of each discrete variable data, and quantitatively describe their effect on the ping effect through the mean value of the ping effect. The effect is equivalent to the assignment of discrete variable data to make it a continuous variable data volume.

优选地，在通过计算机利用预测模型对门店销售规模进行预测时，需要用到计算机的预测模型配对单元，所述预测模型配对单元用于在利用所述预测模型输出预测结果之前明确所述零售场景的业态类型，再根据所述业态类型，自动关联并匹配到相对应的指定预测模型；进一步地，所述机器学习模块3还包括模型复盘优化单元31，其中，由于不同的零售场景会产生不同的所述预测模型，且不同的零售场景存在着不同的业态类型，直接输入门店的零售场景，通过所述模型配对单元可以匹配到相对应的指定预测模型，进而完成对门店的销售规模的预测，预测模型会通过计算机上的预测结果显示单元将预测结果显示出来。为了进一步提高预测模型的精准性，通过所述模型复盘优化单元31将已经开设的门店的实际销售规模和预测的销售规模进行比较，在验证预测结果的同时，将实际数据导入的机器学习的过程中去，从而完成预测模型的复盘优化，使得预测模型处于一个动态优化修正的过程中，进一步提高所述预测模型的可用性和准确度，减小预测模型的误差范围。Preferably, when using a prediction model to predict the sales scale of a store by a computer, a prediction model pairing unit of a computer needs to be used, and the prediction model pairing unit is used to specify the retail scene before using the prediction model to output the prediction result. According to the type of business format, the corresponding designated prediction model is automatically associated and matched; further, the machine learning module 3 also includes a model review and optimization unit 31, wherein, due to different retail scenarios, there will be Different forecast models, and different retail scenarios have different types of formats, directly input the retail scenario of the store, through the model pairing unit can match the corresponding designated forecast model, and then complete the sales scale of the store. Prediction, the prediction model will display the prediction result through the prediction result display unit on the computer. In order to further improve the accuracy of the prediction model, the model review and optimization unit 31 is used to compare the actual sales scale of the stores that have been opened with the predicted sales scale, and while verifying the prediction results, import the actual data into the machine learning In the process, the review and optimization of the prediction model is completed, so that the prediction model is in a process of dynamic optimization and correction, which further improves the usability and accuracy of the prediction model and reduces the error range of the prediction model.

上述所有可选技术方案，可以采用任意结合形成本发明的可选实施例，在此不再一一赘述。All the above-mentioned optional technical solutions can be combined arbitrarily to form optional embodiments of the present invention, which will not be repeated here.

需要说明的是：上述实施例提供的门店数据预测模型在对门店的销售规模进行预测时，仅以上述各功能模块的划分进行举例说明，实际应用中，可以根据需要而将上述功能分配由不同的功能模块完成，即将门店数据预测模型的内部结构划分成不同的功能模块，以完成以上描述的全部或者部分功能。另外，上述实施例提供的门店数据预测模型与门店数据预测方法实施例属于同一构思，其具体实现过程详见方法实施例，这里不再赘述。It should be noted that when the store data prediction model provided in the above embodiment predicts the sales scale of the store, only the division of the above functional modules is used as an example for illustration. The functional module of the store data is completed, that is, the internal structure of the store data prediction model is divided into different functional modules to complete all or part of the functions described above. In addition, the store data prediction model provided in the above embodiment and the store data prediction method embodiment belong to the same concept, and the specific implementation process thereof is detailed in the method embodiment, which will not be repeated here.

本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成，也可以通过程序来指令相关的硬件完成，所述的程序可以存储于一种计算机可读存储介质中，上述提到的存储介质可以是只读存储器，磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps of implementing the above embodiments can be completed by hardware, or can be completed by instructing relevant hardware through a program, and the program can be stored in a computer-readable storage medium. The storage medium mentioned may be a read-only memory, a magnetic disk or an optical disk, etc.

以上所述仅为本发明的较佳实施例，并不用以限制本发明，凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included in the protection of the present invention. within the range.

Claims

1. A building method of a store data prediction model is characterized by comprising the following steps:

acquiring first store information around a store through a web crawler technology, acquiring second store information through an enterprise internal management system, and storing the first store information and the second store information into a database;

cleaning and structuring data processing are carried out on the first store information and the second store information to obtain different types of sales data, the sales data are divided into continuous variable data and discrete variable data, normalization correction is carried out on the continuous variable data, and assignment continuous optimization is carried out on the discrete variable data;

and (3) scoring the normalized and corrected and continuously optimized feature data by using a machine learning method, automatically selecting the feature data with high score to perform training set learning to obtain a prediction model of sales scale, and then repeating and optimizing the prediction model by using the actual sales scale data of the store to be predicted.

2. The method for building the store data prediction model according to claim 1, wherein the prediction model is used to output the predicted value of the sales scale of the store to be predicted according to the specific location and retail scene of the store to be predicted.

3. The method of building a store data prediction model according to claim 1, wherein the crawled paths by crawler technology include at least crawling map data, crawling online shopping data, crawling official statistics.

4. The method for building the store data prediction model according to claim 1, wherein the first store information includes city data features, business circle data features and geographic data features, and the second store information includes store member data features and store project data features.

5. The store data prediction model building method according to claim 1, wherein the normalization correction includes: and carrying out logarithmic deviation correction on the continuous variable data which are approximately normally distributed, and then carrying out mean value normalization processing on the corrected continuous variable data.

6. The method for building the store data prediction model according to claim 1, wherein the assigning continuous optimization comprises: and carrying out unitary variance analysis on the discrete variable data, screening the discrete variable data which has a large influence on the store plateau effect, converting the screened discrete variable data into the store plateau effect mean value corresponding to the store, and continuously sequencing the different plateau effect mean values according to different quantitative values.

7. The method for building the store data prediction model according to claim 2, further comprising the step of specifying a business type of the retail scene before outputting the prediction result by using the prediction model, and automatically associating and matching the business type with the corresponding specified prediction model.

8. A predictive model of store data, comprising:

the system comprises a data acquisition module and a database, wherein the data acquisition module comprises a web crawler unit and an enterprise data acquisition unit, the web crawler unit is used for acquiring first store information around stores through a web crawler technology, the enterprise data acquisition unit is used for acquiring second store information through an enterprise internal management system, and the data acquisition module stores the first store information and the second store information into the database;

the data processing module is used for cleaning and carrying out structured data processing on the first store information and the second store information to obtain different types of sales data, dividing the sales data into continuous variable data and discrete variable data, carrying out normalization correction on the continuous variable data, and carrying out assignment continuous optimization on the discrete variable data;

and the machine learning module is used for scoring the characteristic data after the normalization correction and the continuous optimization, automatically selecting the characteristic data with high score to perform training set learning to obtain a prediction model of the sales scale, and performing repeated optimization on the prediction model by using the actual sales scale data of the store to be predicted.

9. The store data prediction model of claim 8, wherein the data processing module comprises a continuous variable correction unit, and is configured to perform logarithmic correction on the continuous variable data that is approximately normally distributed, and then perform mean normalization on the corrected continuous variable data.

10. The store data prediction model according to claim 8, wherein the data processing module comprises a discrete variable optimization unit, and is configured to perform unitary variance analysis on the discrete variable data, screen the discrete variable data that has a larger influence on store floor effects, convert the screened discrete variable data into store-corresponding floor effect means, and continuously sort the different floor effect means according to different quantitative values.