CN117745340B - Cigarette market grid capacity rationality prediction method and system based on big data - Google Patents

Cigarette market grid capacity rationality prediction method and system based on big data Download PDF

Info

Publication number
CN117745340B
CN117745340B CN202410188091.XA CN202410188091A CN117745340B CN 117745340 B CN117745340 B CN 117745340B CN 202410188091 A CN202410188091 A CN 202410188091A CN 117745340 B CN117745340 B CN 117745340B
Authority
CN
China
Prior art keywords
model
business
prediction
data
cigarette
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410188091.XA
Other languages
Chinese (zh)
Other versions
CN117745340A (en
Inventor
王再东
胡佑安
姜兵仁
涂鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Xiaoxiang Big Data Technology Co ltd
Hunan Xiaoxiang Big Data Research Institute
Original Assignee
Hunan Xiaoxiang Big Data Technology Co ltd
Hunan Xiaoxiang Big Data Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Xiaoxiang Big Data Technology Co ltd, Hunan Xiaoxiang Big Data Research Institute filed Critical Hunan Xiaoxiang Big Data Technology Co ltd
Priority to CN202410188091.XA priority Critical patent/CN117745340B/en
Publication of CN117745340A publication Critical patent/CN117745340A/en
Application granted granted Critical
Publication of CN117745340B publication Critical patent/CN117745340B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method and a system for predicting rationality of cigarette market grid capacity based on big data. The invention belongs to the technical field of big data analysis, in particular to a big data-based cigarette market grid capacity rationality prediction method and a big data-based cigarette market grid capacity rationality prediction system.

Description

Cigarette market grid capacity rationality prediction method and system based on big data
Technical Field
The invention relates to the technical field of big data analysis, in particular to a cigarette market grid capacity rationality prediction method and system based on big data.
Background
The accurate delivery of the cigarette products has extremely important significance for commercial companies, the economic benefits of the commercial companies are directly influenced by the sales volume of the cigarettes caused by the delivery of the products, the complexity of market dynamics and consumer behaviors is not considered in the traditional prediction method, and the problem of inaccurate delivery of the cigarettes is caused due to limited data which can be collected; the traditional machine learning method has the problems of insufficient accuracy and poor stability of model prediction caused by incomplete consideration of factors influencing the market capacity of cigarettes.
Disclosure of Invention
Aiming at the situation, in order to overcome the defects of the prior art, the invention provides a method and a system for predicting the rationality of the grid capacity of the cigarette market based on big data, aiming at the problems that the traditional prediction method does not consider the complexity of market dynamics and consumer behaviors and the data which can be collected are limited so as to cause inaccurate cigarette delivery, the scheme introduces business circle data outside a tobacco database, drives intelligent strategies aiming at data of different sub-markets, generates customized marketing strategies and realizes intelligent and accurate cigarette delivery; aiming at the problems of insufficient accuracy and poor stability of model prediction caused by incomplete consideration of factors affecting the market capacity of cigarettes by the traditional machine learning method, the scheme combines the advantages of ARIMA, holt-windows and RF combined integrated learning algorithm to improve the accuracy and stability of model prediction.
The technical scheme adopted by the invention is as follows: the invention provides a big data-based cigarette market grid capacity rationality prediction method, which comprises the following steps:
Step S1: defining a business circle, and defining the business circle as a spatial range of cigarette sales capacity of retailers and a geographic area of distribution of cigarette consumers according to an area interaction theory;
Step S2: the business circle expanding method comprises the steps of expanding the business circle by taking an initial position of a retailer as an initial value to obtain an expanded business circle;
step S3: data preprocessing, namely acquiring data of an expanded business district, generating a business district data set, and dividing the business district data set into a training set and a testing set;
Step S4: predicting the grid capacity of the cigarette market, and predicting the grid capacity of the cigarette market by using a training set through an integrated learning algorithm to obtain an integrated model A;
Step S5: evaluating the integrated model A by using a test set to obtain an integrated model B;
Step S6: and (3) reasonably predicting the grid capacity of the cigarette market, inputting new grid data into the integrated model B, and predicting the grid capacity of the cigarette market to obtain a prediction result.
Further, in step S1, the definition of the business circle, specifically, the business circle is a spatial range of the sales capability of the retailer and a geographical area of distribution of cigarette consumers, and the probability of the consumer purchasing the cigarettes at the retailer is determined by the area of the retailer and the distance between the consumer and the retailer according to the area interaction theory, and the following formula is used:
In the method, in the process of the invention, Is located at/>Customer's travel/>Probability of purchasing cigarettes at the site,/>Is the sum of all retailers in the business circle,/>Is a retailer/>Scale of/>Is/>And/>Distance between/>Indicating how much importance is placed on time and distance when a customer purchases a cigarette.
Further, in step S2, the business district expansion method specifically includes the following steps:
Step S21: defining an initial value, calculating the probability of each retailer and surrounding customers purchasing cigarettes, and defining the positions of the retailers as the initial value;
step S22: calculating geographical range of business district, defining distance Representing the distance between the location of the retailer and the center of the business circle, where the initial n is the number of retailers contained in the grid, will/>Initial/>, as initial grid retailer
Step S23: expanding a business circle range, centering on an initial grid, expanding the business circle range into a square, and calculating the area of the expanded shopping area according to a calculation method of the initial gridIf/>Continuing to expand shopping area calculation/>Until/>And obtaining the expanded business circle.
Further, in step S3, the data preprocessing specifically includes the following steps:
Step S31: acquiring data, namely acquiring basic attributes, crowd characteristics and consumption capacity of the expanded business district, and integrating the basic attributes, crowd characteristics and consumption capacity into business district data 1; acquiring the market current situation, consumption index and consumption preference of the expanded business district, integrating the market current situation, consumption index and consumption preference into business district data 2, acquiring POI data related to the sales of the cigarette industry, and explaining the POI data by using the position and attribute characteristics as constraints so as to extract the enterprise number, shopping area, traffic type, walking distance, business type and longitude and latitude data to obtain a POI data set;
Step S32: the data conversion, the business turn data 1 and the business turn data 2 comprise numerical data and classification data, the numerical data are converted by using a log1p function to obtain data with Gaussian distribution, label-Encoder is carried out on the classification data to obtain numerical characteristics, and business turn data A and business turn data B are obtained;
step S33: constructing a data set, and constructing a cigarette market data set by utilizing PiFlow to fuse business circle data A and business circle data B, POI data sets;
Step S34: dividing a data set into a training set and a testing set;
Step S35: and storing the data sets, wherein the business turn data sets are stored in the Hive database in a distributed mode.
Further, in step S4, the method for predicting the grid capacity of the cigarette market specifically includes the following steps:
Step S41: ARIMA model training, parameters of ARIMA model include Will/>Fitting to training set, wherein/>Is the autoregressive term number,/>Is the differential order,/>Is the number of sliding average terms, and ARIMA model training comprises the following steps:
step S411: determining the number of autoregressive terms and the number of moving average terms by observing an autocorrelation graph ACF and a partial autocorrelation graph PACF;
step S412: determining the differential order, calculating First order difference/>The formula used is as follows:
In the method, in the process of the invention, Representing the time series at/>Value of time of day,/>Representing the time series at/>A value of time of day;
Calculation of Second order difference/>The formula used is as follows:
In the method, in the process of the invention, Representing the time series at/>A value of time of day;
the differential order is calculated using the following formula:
In the method, in the process of the invention, Is a parameter of the autoregressive part,/>Is a sliding average value/>Is an estimation error;
Step S413: model checking, namely selecting a proper autoregressive term number, a sliding average term number and a differential order number combination, and then performing significance checking on an ARIMA model;
Step S414: AIC was used to evaluate the accuracy of the predictions using the following formula:
In the method, in the process of the invention, Is the estimated error variance; /(I)Is the sample size,/>Is a parameter value;
According to AIC, predicting the optimal ARIMA model of the studied cigarette market capacity, and verifying the fitting property of the ARIMA model by using a white noise hypothesis;
step S42: holt-windows model training, calculating model equations, the following formulas are used:
In the method, in the process of the invention, Representing the time series at time points/>Actual observations of/(v)Representing intercept,/>The slope is indicated as such,Representing the time series at time points/>Seasonal component of/>Is an irregular component;
Three smoothing equations are calculated using the following formulas:
In the method, in the process of the invention, Is a smooth constant,/>Is the time sequence at time point/>Level of/>Is the time sequence at time point/>Trend of/(I)Is the time sequence at time point/>Seasonal component of/>Representing the time series at time points/>Inputting a training set into a Holt-windows model, solving parameters of three smooth equations by using a maximum likelihood estimation method, and evaluating the prediction accuracy of the Holt-windows model by using a mean square error MSE and a mean absolute error MAE;
step S43: RF model training, comprising the steps of:
Step S431: generating RF, specifically, firstly randomly and repeatedly extracting N samples from a training set to train a decision tree as a root node of the tree; secondly, when each sample has M attributes, when each node of the decision tree needs to be split, randomly selecting M attributes from the M attributes, wherein the general condition is M < < M; 1 attribute is selected from m attributes by utilizing information gain to serve as a splitting attribute of the node, and the node is split until the node cannot be split again, and pruning is not performed in the whole decision tree forming process; repeating the steps to construct a plurality of decision trees to form an RF model;
step 432: the MGF prediction model specifically comprises the following steps:
step 4321: hypothesized time series Average value of (2) isTime series/>Expressed as MGF, the formula used is as follows:
In the method, in the process of the invention, ,/>Or/>Using this formula, m average generation functions of the time series can be obtained and the periodicity is extended to/>
Step 4322: calculating a first order difference sequenceThe formula used is as follows:
In the method, in the process of the invention, Representing the time series at/>Value of time of day,/>Representing the time series at/>A value of time of day;
Step 4323: calculating a second order differential sequence The formula used is as follows:
In the method, in the process of the invention, Representing the time series at/>A first order differential sequence of moments;
definition of the primordial sequence The homogeneous function of (1)/>First order differential sequence/>And second order differential sequence/>The homogeneous functions of (a) are respectively denoted as/>And/>Their extension sequences/>The formula/>, can be usedObtaining;
step 4324: based on the extension sequence of MGF of the original sequence and the first order difference sequence, a cumulative extension sequence is established, and the following formula is used:
Step 433: RF-MGF model prediction, using RF model to obtain prediction data Obtaining prediction data/>, using an MGF prediction model
Step 434: the weight of the mixture of the two methods is calculated, and the formula is as follows:
In the method, in the process of the invention, Is/>Weights of/>Representing the total sales of business circles of sales history data of each retailer in the training set;
Step 435: for actual values First predictive value/>And error value/>Weighting is performed using the following formula:
In the method, in the process of the invention, Representing the benchmark predicted value,/>Representing a historical average;
Step 436: by time variation, And/>Error value/>, as input variablePerforming fitting analysis on the output variable by adopting a response surface method to obtain a final predicted value;
step S44: and (3) model fusion, namely distributing weights to the ARIMA model, the Holt-windows model and the RF model by using a weighted average method to obtain an integrated model A.
Further, in step S5, the test set is input into the integrated model a, and the evaluation index adopts the accuracy, and the following formula is used:
In the method, in the process of the invention, Location and actual sales for measurement prediction,/>Representing the actual sales,/>Representing a predicted offered sales;
Setting the super parameters of the integrated model A, comprising: learning rate and batch size, stopping training when the error of the test set of the integrated model A is no longer reduced in a plurality of continuous iteration times, and adjusting the super parameters of the integrated model A according to the performance of the integrated model A on the test set to obtain an integrated model B;
Further, in step S6, the rationality of the grid capacity of the cigarette market is predicted, specifically, new grid area data is input to the integrated model B, and the grid capacity is predicted, so as to obtain a prediction result.
The invention provides a big data-based cigarette market grid capacity rationality prediction system, which comprises a business circle defining module, a business circle expanding mode module, a data preprocessing module, a cigarette market grid capacity prediction method module, an evaluation module and a cigarette market grid capacity rationality prediction module;
The business circle definition module gives out the concept of the business circle, calculates the probability of the purchasing behavior of the customer in the store, and sends the probability of the purchasing behavior of the customer in the store to the business circle expansion mode module;
The business district outward expansion mode module receives probability data defining the purchasing behavior of customers in shops, which is sent by the business district module, expands the range by taking the initial position of a retailer as an initial value to obtain an expanded business district, and sends the expanded business district to the data preprocessing module;
The data preprocessing module receives the expanded business turn sent by the business turn expansion mode module, collects business turn data, constructs a business turn data set, divides the business turn data set into a training set and a testing set, sends the training set to the cigarette market grid capacity prediction method module, and sends the testing set to the evaluation module;
The cigarette market grid capacity prediction method module receives a training set sent by the data preprocessing module, trains a model by utilizing an integrated learning algorithm combining ARIMA, holt-windows and RF, obtains an integrated model A, and sends the integrated model A to the evaluation module;
The evaluation module receives the integrated model A sent by the cigarette market grid capacity prediction method module and the test set sent by the data preprocessing module, evaluates the integrated model A by using the test set to obtain an integrated model B, and sends the integrated model to the cigarette market grid capacity rationality prediction module;
and the cigarette market grid capacity rationality prediction module receives the integrated model B sent by the evaluation module, inputs data of a new grid area, and performs capacity prediction on the new grid area to obtain a predicted value.
By adopting the scheme, the beneficial effects obtained by the invention are as follows:
(1) Aiming at the problems that the complexity of market dynamics and consumer behaviors is not considered in the traditional prediction method, and the cigarette delivery is inaccurate due to limited data which can be collected, business circle data outside a tobacco database is introduced, intelligent strategies are driven according to data of different market segments, customized marketing strategies are generated, and intelligent and accurate cigarette delivery is realized.
(2) Aiming at the problems of insufficient accuracy and poor stability of model prediction caused by incomplete consideration of factors affecting the market capacity of cigarettes by the traditional machine learning method, the scheme combines the advantages of ARIMA, holt-windows and RF combined integrated learning algorithm to improve the accuracy and stability of model prediction.
Drawings
FIG. 1 is a flow diagram of a big data based cigarette market grid capacity rationality prediction method provided by the invention;
FIG. 2 is a schematic diagram of a big data based cigarette market grid capacity rationality prediction system provided by the invention;
FIG. 3 is a flow chart of step S2;
FIG. 4 is a flow chart of step S3;
fig. 5 is a flow chart of step S4.
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention; all other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the description of the present invention, it should be understood that the terms "upper," "lower," "front," "rear," "left," "right," "top," "bottom," "inner," "outer," and the like indicate orientation or positional relationships based on those shown in the drawings, merely to facilitate description of the invention and simplify the description, and do not indicate or imply that the devices or elements referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus should not be construed as limiting the invention.
In a first embodiment, referring to fig. 1, the method for predicting the rationality of the grid capacity of the cigarette market based on big data provided by the invention comprises the following steps:
Step S1: defining a business circle, and defining the business circle as a spatial range of cigarette sales capacity of retailers and a geographic area of distribution of cigarette consumers according to an area interaction theory;
Step S2: the business circle expanding method comprises the steps of expanding the business circle by taking an initial position of a retailer as an initial value to obtain an expanded business circle;
step S3: data preprocessing, namely acquiring data of an expanded business district, generating a business district data set, and dividing the business district data set into a training set and a testing set;
Step S4: predicting the grid capacity of the cigarette market, and predicting the grid capacity of the cigarette market by using a training set through an integrated learning algorithm to obtain an integrated model A;
Step S5: evaluating the integrated model A by using a test set to obtain an integrated model B;
Step S6: and (3) reasonably predicting the grid capacity of the cigarette market, inputting new grid data into the integrated model B, and predicting the grid capacity of the cigarette market to obtain a prediction result.
In the second embodiment, referring to fig. 1 and 3, the embodiment is based on the above embodiment, and in step S2, the business turn expansion method specifically includes the following steps:
Step S21: defining an initial value, calculating the probability of each retailer and surrounding customers purchasing cigarettes, and defining the positions of the retailers as the initial value;
step S22: calculating geographical range of business district, defining distance Representing the distance between the location of the retailer and the center of the business circle, where the initial n is the number of retailers contained in the grid, will/>Initial/>, as initial grid retailer
Step S23: expanding a business circle range, centering on an initial grid, expanding the business circle range into a square, and calculating the area of the expanded shopping area according to a calculation method of the initial gridIf/>Continuing to expand shopping area calculation/>Until/>And obtaining the expanded business circle.
Embodiment three, referring to fig. 1 and 4, based on the above embodiment, in step S3, the data preprocessing specifically includes the following steps:
Step S31: acquiring data, namely acquiring basic attributes, crowd characteristics and consumption capacity of the expanded business district, and integrating the basic attributes, crowd characteristics and consumption capacity into business district data 1; acquiring the market current situation, consumption index and consumption preference of the expanded business district, integrating the market current situation, consumption index and consumption preference into business district data 2, acquiring POI data related to the sales of the cigarette industry, and explaining the POI data by using the position and attribute characteristics as constraints so as to extract the enterprise number, shopping area, traffic type, walking distance, business type and longitude and latitude data to obtain a POI data set;
Step S32: the data conversion, the business turn data 1 and the business turn data 2 comprise numerical data and classification data, the numerical data are converted by using a log1p function to obtain data with Gaussian distribution, label-Encoder is carried out on the classification data to obtain numerical characteristics, and business turn data A and business turn data B are obtained;
step S33: constructing a data set, and constructing a cigarette market data set by utilizing PiFlow to fuse business circle data A and business circle data B, POI data sets;
Step S34: dividing a data set into a training set and a testing set;
Step S35: and storing the data sets, wherein the business turn data sets are stored in the Hive database in a distributed mode.
By executing the above operation, the problems that market dynamics and consumer behavior complexity are not considered in the traditional prediction method, and the cigarette delivery is inaccurate due to limited data which can be collected are solved.
Embodiment four, referring to fig. 1 and 5, based on the above embodiment, in step S4, the cigarette market grid capacity prediction specifically includes the following steps:
Step S41: ARIMA model training, parameters of ARIMA model include Will/>Fitting to training set, wherein/>Is the autoregressive term number,/>Is the differential order,/>Is the number of sliding average terms, and ARIMA model training comprises the following steps:
step S411: determining the number of autoregressive terms and the number of moving average terms by observing an autocorrelation graph ACF and a partial autocorrelation graph PACF;
step S412: determining the differential order, calculating First order difference/>The formula used is as follows:
In the method, in the process of the invention, Representing the time series at/>Value of time of day,/>Representing the time series at/>A value of time of day;
Calculation of Second order difference/>The formula used is as follows:
In the method, in the process of the invention, Representing the time series at/>A value of time of day;
the differential order is calculated using the following formula:
In the method, in the process of the invention, Is a parameter of the autoregressive part,/>Is a sliding average value/>Is an estimation error;
Step S413: model checking, namely selecting a proper autoregressive term number, a sliding average term number and a differential order number combination, and then performing significance checking on an ARIMA model;
Step S414: AIC was used to evaluate the accuracy of the predictions using the following formula:
In the method, in the process of the invention, Is the estimated error variance; /(I)Is the sample size,/>Is a parameter value;
According to AIC, predicting the optimal ARIMA model of the studied cigarette market capacity, and verifying the fitting property of the ARIMA model by using a white noise hypothesis;
step S42: holt-windows model training, calculating model equations, the following formulas are used:
In the method, in the process of the invention, Representing the time series at time points/>Actual observations of/(v)Representing intercept,/>The slope is indicated as such,Representing the time series at time points/>Seasonal component of/>Is an irregular component;
Three smoothing equations are calculated using the following formulas:
In the method, in the process of the invention, Is a smooth constant,/>Is the time sequence at time point/>Level of/>Is the time sequence at time point/>Trend of/(I)Is the time sequence at time point/>Seasonal component of/>Representing the time series at time points/>Inputting a training set into a Holt-windows model, solving parameters of three smooth equations by using a maximum likelihood estimation method, and evaluating the prediction accuracy of the Holt-windows model by using a mean square error MSE and a mean absolute error MAE;
step S43: RF model training, comprising the steps of:
Step S431: generating RF, specifically, firstly randomly and repeatedly extracting N samples from a training set to train a decision tree as a root node of the tree; secondly, when each sample has M attributes, when each node of the decision tree needs to be split, randomly selecting M attributes from the M attributes, wherein the general condition is M < < M; 1 attribute is selected from m attributes by utilizing information gain to serve as a splitting attribute of the node, and the node is split until the node cannot be split again, and pruning is not performed in the whole decision tree forming process; repeating the steps to construct a plurality of decision trees to form an RF model;
step 432: the MGF prediction model specifically comprises the following steps:
step 4321: hypothesized time series Average value of (2) isTime series/>Expressed as MGF, the formula used is as follows:
In the method, in the process of the invention, ,/>Or/>Using this formula, m average generation functions of the time series can be obtained and the periodicity is extended to/>
Step 4322: calculating a first order difference sequenceThe formula used is as follows:
In the method, in the process of the invention, Representing the time series at/>Value of time of day,/>Representing the time series at/>A value of time of day;
Step 4323: calculating a second order differential sequence The formula used is as follows:
In the method, in the process of the invention, Representing the time series at/>A first order differential sequence of moments;
definition of the primordial sequence The homogeneous function of (1)/>First order differential sequence/>And second order differential sequence/>The homogeneous functions of (a) are respectively denoted as/>And/>Their extension sequences/>The formula/>, can be usedObtaining;
step 4324: based on the extension sequence of MGF of the original sequence and the first order difference sequence, a cumulative extension sequence is established, and the following formula is used:
Step 433: RF-MGF model prediction, using RF model to obtain prediction data Obtaining prediction data/>, using an MGF prediction model
Step 434: the weight of the mixture of the two methods is calculated, and the formula is as follows:
In the method, in the process of the invention, Is/>Weights of/>Representing the total sales of business circles of sales history data of each retailer in the training set;
Step 435: for actual values First predictive value/>And error value/>Weighting is performed using the following formula:
In the method, in the process of the invention, Representing the benchmark predicted value,/>Representing a historical average;
Step 436: by time variation, And/>Error value/>, as input variablePerforming fitting analysis on the output variable by adopting a response surface method to obtain a final predicted value;
step S44: and (3) model fusion, namely distributing weights to the ARIMA model, the Holt-windows model and the RF model by using a weighted average method to obtain an integrated model A.
By executing the operation, the problems of insufficient accuracy and poor stability of model prediction caused by incomplete consideration of factors affecting the market capacity of cigarettes by a traditional machine learning method are solved, and the model prediction accuracy and stability are improved by combining the advantages of ARIMA, holt-windows and RF combined integrated learning algorithm.
Fifth embodiment, referring to fig. 2, the embodiment is based on the above embodiment, and the big data based cigarette market grid capacity rationality prediction system provided by the invention includes a business circle defining module, a business circle expanding mode module, a data preprocessing module, a cigarette market grid capacity prediction method module, an evaluation module, and a cigarette market grid capacity rationality prediction module;
The business circle definition module gives out the concept of the business circle, calculates the probability of the purchasing behavior of the customer in the store, and sends the probability of the purchasing behavior of the customer in the store to the business circle expansion mode module;
The business district outward expansion mode module receives probability data defining the purchasing behavior of customers in shops, which is sent by the business district module, expands the range by taking the initial position of a retailer as an initial value to obtain an expanded business district, and sends the expanded business district to the data preprocessing module;
The data preprocessing module receives the expanded business turn sent by the business turn expansion mode module, collects business turn data, constructs a business turn data set, divides the business turn data set into a training set and a testing set, sends the training set to the cigarette market grid capacity prediction method module, and sends the testing set to the evaluation module;
The cigarette market grid capacity prediction method module receives a training set sent by the data preprocessing module, trains a model by utilizing an integrated learning algorithm combining ARIMA, holt-windows and RF, obtains an integrated model A, and sends the integrated model A to the evaluation module;
The evaluation module receives the integrated model A sent by the cigarette market grid capacity prediction method module and the test set sent by the data preprocessing module, evaluates the integrated model A by using the test set to obtain an integrated model B, and sends the integrated model to the cigarette market grid capacity rationality prediction module;
and the cigarette market grid capacity rationality prediction module receives the integrated model B sent by the evaluation module, inputs data of a new grid area, and performs capacity prediction on the new grid area to obtain a predicted value.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
The invention and its embodiments have been described above with no limitation, and the actual construction is not limited to the embodiments of the invention as shown in the drawings. In summary, if one of ordinary skill in the art is informed by this disclosure, a structural manner and an embodiment similar to the technical solution should not be creatively devised without departing from the gist of the present invention.

Claims (4)

1. The cigarette market grid capacity rationality prediction method based on big data is characterized by comprising the following steps of: the method comprises the following steps:
Step S1: defining a business circle, and defining the business circle as a spatial range of cigarette sales capacity of retailers and a geographic area of distribution of cigarette consumers according to an area interaction theory;
Step S2: the business circle expanding method comprises the steps of expanding the business circle by taking an initial position of a retailer as an initial value to obtain an expanded business circle;
step S3: data preprocessing, namely acquiring data of an expanded business district, generating a business district data set, and dividing the business district data set into a training set and a testing set;
Step S4: predicting the grid capacity of the cigarette market, and predicting the grid capacity of the cigarette market by using a training set through an integrated learning algorithm to obtain an integrated model A;
Step S5: evaluating the integrated model A by using a test set to obtain an integrated model B;
Step S6: the rationality prediction of the grid capacity of the cigarette market is carried out, new grid data are input into the integrated model B, the prediction of the grid capacity of the cigarette market is carried out, and a prediction result is obtained;
in step S4, the cigarette market grid capacity prediction includes the following steps:
Step S41: ARIMA model training, parameters of ARIMA model include Will/>Fitting to training set, wherein/>Is the autoregressive term number,/>Is the differential order,/>Is the number of sliding average terms, and ARIMA model training comprises the following steps:
step S411: determining the number of autoregressive terms and the number of moving average terms by observing an autocorrelation graph ACF and a partial autocorrelation graph PACF;
step S412: determining the differential order, calculating First order difference/>The formula used is as follows:
In the method, in the process of the invention, Representing the time series at/>Value of time of day,/>Representing the time series at/>A value of time of day;
Calculation of Second order difference/>The formula used is as follows:
In the method, in the process of the invention, Representing the time series at/>A value of time of day;
the differential order is calculated using the following formula:
In the method, in the process of the invention, Is a parameter of the autoregressive part,/>Is a sliding average value/>Is an estimation error;
Step S413: model checking, namely selecting a proper autoregressive term number, a sliding average term number and a differential order number combination, and then performing significance checking on an ARIMA model;
Step S414: AIC was used to evaluate the accuracy of the predictions using the following formula:
In the method, in the process of the invention, Is the estimated error variance; /(I)Is the sample size,/>Is a parameter value;
According to AIC, predicting the optimal ARIMA model of the studied cigarette market capacity, and verifying the fitting property of the ARIMA model by using a white noise hypothesis;
step S42: holt-windows model training, calculating model equations, the following formulas are used:
In the method, in the process of the invention, Representing the time series at time points/>Actual observations of/(v)Representing intercept,/>Representing slope,/>Representing the time series at time points/>Seasonal component of/>Is an irregular component;
Three smoothing equations are calculated using the following formulas:
In the method, in the process of the invention, Is a smooth constant,/>Is the time sequence at time point/>Level of/>Is a time sequence at a time pointTrend of/(I)Is the time sequence at time point/>Seasonal component of/>Representing the time series at time points/>Inputting a training set into a Holt-windows model, solving parameters of three smooth equations by using a maximum likelihood estimation method, and evaluating the prediction accuracy of the Holt-windows model by using a mean square error MSE and a mean absolute error MAE;
step S43: RF model training, comprising the steps of:
Step S431: generating RF, specifically, firstly randomly and repeatedly extracting N samples from a training set to train a decision tree as a root node of the tree; secondly, when each sample has M attributes, when each node of the decision tree needs to be split, randomly selecting M attributes from the M attributes, wherein M < < M >; 1 attribute is selected from m attributes by utilizing information gain to serve as a splitting attribute of the node, and the node is split until the node cannot be split again, and pruning is not performed in the whole decision tree forming process; repeating the steps to construct a plurality of decision trees to form an RF model;
step 432: the MGF prediction model specifically comprises the following steps:
step 4321: hypothesized time series Average value of (2) isTime series/>Expressed as MGF, the formula used is as follows:
In the method, in the process of the invention, ,/>Or/>Using this formula, m average generation functions of the time series can be obtained and the periodicity is extended to/>
Step 4322: calculating a first order difference sequenceThe formula used is as follows:
In the method, in the process of the invention, Representing the time series at/>Value of time of day,/>Representing the time series at/>A value of time of day;
Step 4323: calculating a second order differential sequence The formula used is as follows:
In the method, in the process of the invention, Representing the time series at/>A first order differential sequence of moments;
definition of the primordial sequence The homogeneous function of (1)/>First order differential sequence/>And second order differential sequence/>The homogeneous functions of (a) are respectively denoted as/>And/>Their extension sequences/>The formula/>, can be usedObtaining;
step 4324: based on the extension sequence of MGF of the original sequence and the first order difference sequence, a cumulative extension sequence is established, and the following formula is used:
Step 433: RF-MGF model prediction, using RF model to obtain prediction data Obtaining prediction data/>, using an MGF prediction model
Step 434: the weight of the mixture of the two methods is calculated, and the formula is as follows:
In the method, in the process of the invention, Is/>Weights of/>Representing the total sales of business circles of sales history data of each retailer in the training set;
Step 435: for actual values First predictive value/>And error value/>Weighting is performed using the following formula:
In the method, in the process of the invention, Representing the benchmark predicted value,/>Representing a historical average;
Step 436: by time variation, And/>Error value/>, as input variablePerforming fitting analysis on the output variable by adopting a response surface method to obtain a final predicted value;
step S44: and (3) model fusion, namely distributing weights to the ARIMA model, the Holt-windows model and the RF model by using a weighted average method to obtain an integrated model A.
2. The big data based cigarette market grid capacity rationality prediction method of claim 1, wherein: in step S1, the definition of the business circle, specifically, the business circle is a spatial range of the cigarette sales capability of the retailer and a geographical area of the distribution of cigarette consumers, and according to the area interaction theory, the probability that the consumer purchases the cigarette at the retailer is determined by the area of the retailer and the distance between the consumer and the retailer, and the following formula is used:
In the method, in the process of the invention, Is located at/>Customer's travel/>Probability of purchasing cigarettes at the site,/>Is the sum of all retailers in the business community,Is a retailer/>Scale of/>Is/>And/>Distance between/>Indicating how much importance is placed on time and distance when a customer purchases a cigarette.
3. The big data based cigarette market grid capacity rationality prediction method of claim 2, wherein: in step S2, the business turn expansion method includes the following steps:
Step S21: defining an initial value, calculating the probability of each retailer and surrounding customers purchasing cigarettes, and defining the positions of the retailers as the initial value;
step S22: calculating geographical range of business district, defining distance Representing the distance between the location of the retailer and the center of the business circle, where the initial n is the number of retailers contained in the grid, will/>Initial/>, as initial grid retailer
Step S23: expanding a business circle range, centering on an initial grid, expanding the business circle range into a square, and calculating the area of the expanded shopping area according to a calculation method of the initial gridIf/>Continuing to expand shopping area calculationUntil/>And obtaining the expanded business circle.
4. A big data based cigarette market grid capacity rationality prediction system for implementing the big data based cigarette market grid capacity rationality prediction method according to any one of claims 1-3, characterized in that: the method comprises a business circle defining module, a business circle expanding mode module, a data preprocessing module, a cigarette market grid capacity prediction method module, an evaluation module and a cigarette market grid capacity rationality prediction module;
The business circle definition module gives out the concept of the business circle, calculates the probability of the purchasing behavior of the customer in the store, and sends the probability of the purchasing behavior of the customer in the store to the business circle expansion mode module;
The business district outward expansion mode module receives probability data defining the purchasing behavior of customers in shops, which is sent by the business district module, expands the range by taking the initial position of a retailer as an initial value to obtain an expanded business district, and sends the expanded business district to the data preprocessing module;
The data preprocessing module receives the expanded business turn sent by the business turn expansion mode module, collects business turn data, constructs a business turn data set, divides the business turn data set into a training set and a testing set, sends the training set to the cigarette market grid capacity prediction method module, and sends the testing set to the evaluation module;
The cigarette market grid capacity prediction method module receives a training set sent by the data preprocessing module, trains a model by utilizing an integrated learning algorithm combining ARIMA, holt-windows and RF, obtains an integrated model A, and sends the integrated model A to the evaluation module;
The evaluation module receives the integrated model A sent by the cigarette market grid capacity prediction method module and the test set sent by the data preprocessing module, evaluates the integrated model A by using the test set to obtain an integrated model B, and sends the integrated model to the cigarette market grid capacity rationality prediction module;
and the cigarette market grid capacity rationality prediction module receives the integrated model B sent by the evaluation module, inputs data of a new grid area, and performs capacity prediction on the new grid area to obtain a predicted value.
CN202410188091.XA 2024-02-20 2024-02-20 Cigarette market grid capacity rationality prediction method and system based on big data Active CN117745340B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410188091.XA CN117745340B (en) 2024-02-20 2024-02-20 Cigarette market grid capacity rationality prediction method and system based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410188091.XA CN117745340B (en) 2024-02-20 2024-02-20 Cigarette market grid capacity rationality prediction method and system based on big data

Publications (2)

Publication Number Publication Date
CN117745340A CN117745340A (en) 2024-03-22
CN117745340B true CN117745340B (en) 2024-05-24

Family

ID=90251184

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410188091.XA Active CN117745340B (en) 2024-02-20 2024-02-20 Cigarette market grid capacity rationality prediction method and system based on big data

Country Status (1)

Country Link
CN (1) CN117745340B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101080736A (en) * 2005-01-12 2007-11-28 国际商业机器公司 Automatically distributing a bid request for a grid job to multiple grid providers and analyzing responses to select a winning grid provider
CN110009419A (en) * 2019-02-21 2019-07-12 国家电网有限公司 Improvement time series electricity sales amount prediction technique and system based on Economic Climate method
CN114266395A (en) * 2021-12-22 2022-04-01 四川省烟草公司成都市公司 Cigarette logistics distribution center information system based on combined prediction method
CN114372848A (en) * 2021-12-30 2022-04-19 辽宁省烟草公司鞍山市公司 Tobacco industry intelligent marketing system based on machine learning
KR102520597B1 (en) * 2022-11-16 2023-04-10 윤지혜 Product matching method considering market analysis and company needs

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11688111B2 (en) * 2020-07-29 2023-06-27 International Business Machines Corporation Visualization of a model selection process in an automated model selection system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101080736A (en) * 2005-01-12 2007-11-28 国际商业机器公司 Automatically distributing a bid request for a grid job to multiple grid providers and analyzing responses to select a winning grid provider
CN110009419A (en) * 2019-02-21 2019-07-12 国家电网有限公司 Improvement time series electricity sales amount prediction technique and system based on Economic Climate method
CN114266395A (en) * 2021-12-22 2022-04-01 四川省烟草公司成都市公司 Cigarette logistics distribution center information system based on combined prediction method
CN114372848A (en) * 2021-12-30 2022-04-19 辽宁省烟草公司鞍山市公司 Tobacco industry intelligent marketing system based on machine learning
KR102520597B1 (en) * 2022-11-16 2023-04-10 윤지혜 Product matching method considering market analysis and company needs

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于实时路况的top-κ载客热门区域推荐;吴涛;毛嘉莉;谢青成;杨艳秋;王锦;;华东师范大学学报(自然科学版)(第05期);第3035页 *
连锁零售企业扩张中的商圈分析;张圣泉, 张雁白, 王树花;当代经济管理;20041230(第06期);第38-42页 *

Also Published As

Publication number Publication date
CN117745340A (en) 2024-03-22

Similar Documents

Publication Publication Date Title
CN106156809A (en) For updating the method and device of disaggregated model
KR101171539B1 (en) Model optimization system using variable scoring
CN101480143A (en) Method for predicating single yield of crops in irrigated area
CN105574607A (en) Electricity market monthly electricity utilization prediction method
CN107609289A (en) The building material cost management-control method and system of structural fortification based on BIM models
CN107644047A (en) Tag Estimation generation method and device
Delle Monache et al. Adaptive state space models with applications to the business cycle and financial stress
CN110633401A (en) Prediction model of store data and establishment method thereof
CN117745340B (en) Cigarette market grid capacity rationality prediction method and system based on big data
Ma et al. Ripple effects of house prices: considering spatial correlations in geography and demography
Favereau et al. Robust streamflow forecasting: a Student’st-mixture vector autoregressive model
van Leeuwen et al. Microsimulation as a tool in spatial decision making: simulation of retail developments in a Dutch town
Sergue Customer churn analysis and prediction using machine learning for a B2B SaaS company
CN110516890A (en) A kind of crop yield monitoring system based on Grey Combinatorial Model Method
CN116308486A (en) Target cigarette sales prediction method and device, electronic equipment and storage medium
Rajan et al. A GIS based integrated land use/cover change model to study human-land interactions
CN115048451A (en) System construction method and system based on business and data integration
CN108764583A (en) The unbiased predictor method of forest reserves
Wang et al. A price prediction method based on CatBoost
Bessonovs Suite of statistical models forecasting Latvian GDP
Murthy et al. Model for Predicting Prospective Big-Mart Sales Based on Grid Search Optimization (GSO)
Khumaidi et al. Forecasting of Sales Based on Long Short Term Memory Algorithm with Hyperparameter
Klepac The Schrödinger equation as inspiration for a client portfolio simulation hybrid system based on dynamic Bayesian networks and the REFII model
Jiang Econometric techniques for estimating construction demand in Australia
CN109345274A (en) Neighbour&#39;s user choosing method based on BP neural network score in predicting error

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant