CN117745340A - Cigarette market grid capacity rationality prediction method and system based on big data - Google Patents

Cigarette market grid capacity rationality prediction method and system based on big data Download PDF

Info

Publication number
CN117745340A
CN117745340A CN202410188091.XA CN202410188091A CN117745340A CN 117745340 A CN117745340 A CN 117745340A CN 202410188091 A CN202410188091 A CN 202410188091A CN 117745340 A CN117745340 A CN 117745340A
Authority
CN
China
Prior art keywords
model
business
prediction
data
cigarette
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410188091.XA
Other languages
Chinese (zh)
Other versions
CN117745340B (en
Inventor
王再东
胡佑安
姜兵仁
涂鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Xiaoxiang Big Data Technology Co ltd
Hunan Xiaoxiang Big Data Research Institute
Original Assignee
Hunan Xiaoxiang Big Data Technology Co ltd
Hunan Xiaoxiang Big Data Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Xiaoxiang Big Data Technology Co ltd, Hunan Xiaoxiang Big Data Research Institute filed Critical Hunan Xiaoxiang Big Data Technology Co ltd
Priority to CN202410188091.XA priority Critical patent/CN117745340B/en
Publication of CN117745340A publication Critical patent/CN117745340A/en
Application granted granted Critical
Publication of CN117745340B publication Critical patent/CN117745340B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method and a system for predicting rationality of cigarette market grid capacity based on big data. The invention belongs to the technical field of big data analysis, in particular to a big data-based cigarette market grid capacity rationality prediction method and a big data-based cigarette market grid capacity rationality prediction system.

Description

Cigarette market grid capacity rationality prediction method and system based on big data
Technical Field
The invention relates to the technical field of big data analysis, in particular to a cigarette market grid capacity rationality prediction method and system based on big data.
Background
The accurate delivery of the cigarette products has extremely important significance for commercial companies, the economic benefits of the commercial companies are directly influenced by the sales volume of the cigarettes caused by the delivery of the products, the complexity of market dynamics and consumer behaviors is not considered in the traditional prediction method, and the problem of inaccurate delivery of the cigarettes is caused due to limited data which can be collected; the traditional machine learning method has the problems of insufficient accuracy and poor stability of model prediction caused by incomplete consideration of factors influencing the market capacity of cigarettes.
Disclosure of Invention
Aiming at the situation, in order to overcome the defects of the prior art, the invention provides a method and a system for predicting the rationality of the grid capacity of the cigarette market based on big data, aiming at the problems that the traditional prediction method does not consider the complexity of market dynamics and consumer behaviors and the data which can be collected are limited so as to cause inaccurate cigarette delivery, the scheme introduces business circle data outside a tobacco database, drives intelligent strategies aiming at data of different sub-markets, generates customized marketing strategies and realizes intelligent and accurate cigarette delivery; aiming at the problems of insufficient accuracy and poor stability of model prediction caused by incomplete consideration of factors affecting the market capacity of cigarettes by the traditional machine learning method, the scheme combines the advantages of ARIMA, holt-windows and RF combined integrated learning algorithm to improve the accuracy and stability of model prediction.
The technical scheme adopted by the invention is as follows: the invention provides a big data-based cigarette market grid capacity rationality prediction method, which comprises the following steps:
step S1: defining a business circle, and defining the business circle as a spatial range of cigarette sales capacity of retailers and a geographic area of distribution of cigarette consumers according to an area interaction theory;
step S2: the business circle expanding method comprises the steps of expanding the business circle by taking an initial position of a retailer as an initial value to obtain an expanded business circle;
step S3: data preprocessing, namely acquiring data of an expanded business district, generating a business district data set, and dividing the business district data set into a training set and a testing set;
step S4: predicting the grid capacity of the cigarette market, and predicting the grid capacity of the cigarette market by using a training set through an integrated learning algorithm to obtain an integrated model A;
step S5: evaluating the integrated model A by using a test set to obtain an integrated model B;
step S6: and (3) reasonably predicting the grid capacity of the cigarette market, inputting new grid data into the integrated model B, and predicting the grid capacity of the cigarette market to obtain a prediction result.
Further, in step S1, the definition of the business circle, specifically, the business circle is a spatial range of the sales capability of the retailer and a geographical area of distribution of cigarette consumers, and the probability of the consumer purchasing the cigarettes at the retailer is determined by the area of the retailer and the distance between the consumer and the retailer according to the area interaction theory, and the following formula is used:
in the method, in the process of the invention,is positioned at->Is to go to->Probability of purchasing cigarettes at the site, < > A>Is the sum of all retailers in the business district, +.>Is a retailer->Scale of (A)>Is->And->Distance between->Indicating how much importance is placed on time and distance when a customer purchases a cigarette.
Further, in step S2, the business district expansion method specifically includes the following steps:
step S21: defining an initial value, calculating the probability of each retailer and surrounding customers purchasing cigarettes, and defining the positions of the retailers as the initial value;
step S22: calculating geographical range of business district, defining distanceRepresenting the distance between the location of the retailer and the center of the business circle, where the initial n is the number of retailers contained in the grid, will +.>Initial +.>
Step S23: expanding a business circle range, centering on an initial grid, expanding the business circle range into a square, and calculating the area of the expanded shopping area according to a calculation method of the initial gridIf->Continuing to expand shopping area calculation ++>Up to->And obtaining the expanded business circle.
Further, in step S3, the data preprocessing specifically includes the following steps:
step S31: acquiring data, namely acquiring basic attributes, crowd characteristics and consumption capacity of the expanded business district, and integrating the basic attributes, crowd characteristics and consumption capacity into business district data 1; acquiring the market current situation, consumption index and consumption preference of the expanded business district, integrating the market current situation, consumption index and consumption preference into business district data 2, acquiring POI data related to the sales of the cigarette industry, and explaining the POI data by using the position and attribute characteristics as constraints so as to extract the enterprise number, shopping area, traffic type, walking distance, business type and longitude and latitude data to obtain a POI data set;
step S32: the data conversion, the business turn data 1 and the business turn data 2 comprise numerical data and classification data, the numerical data are converted by using a log1p function to obtain data with Gaussian distribution, and Label-encoding is carried out on the classification data to obtain numerical characteristics, so that business turn data A and business turn data B are obtained;
step S33: constructing a data set, and constructing a cigarette market data set by using a PiFlow fusion business district data A and a business district data B, POI data set;
step S34: dividing a data set into a training set and a testing set;
step S35: and storing the data sets, wherein the business turn data sets are stored in the Hive database in a distributed mode.
Further, in step S4, the method for predicting the grid capacity of the cigarette market specifically includes the following steps:
step S41: ARIMA mouldParameters of ARIMA model includeWill->Fitting to training set, wherein->Is autoregressive item number,/->Is the differential order, +.>Is the number of sliding average terms, and ARIMA model training comprises the following steps:
step S411: determining the number of autoregressive terms and the number of moving average terms, and determining the number of autoregressive terms and the number of moving average terms in an ARIMA model by observing an autocorrelation graph ACF and a partial autocorrelation graph PACF;
step S412: determining the differential order, calculatingFirst order difference>The formula used is as follows:
in the method, in the process of the invention,representing the time sequence in->Value of time of day->Representing the time sequence in->A value of time of day;
calculation ofSecond order difference of +.>The formula used is as follows:
in the method, in the process of the invention,representing the time sequence in->A value of time of day;
the differential order is calculated using the following formula:
in the method, in the process of the invention,is a parameter of the autoregressive part,/->Is a sliding average value>Is an estimation error;
step S413: model checking, namely selecting a proper autoregressive term number, a sliding average term number and a differential order number combination, and then performing significance checking on an ARIMA model;
step S414: AIC was used to evaluate the accuracy of the predictions using the following formula:
in the method, in the process of the invention,is the estimated error variance; />Is the sample size,/->Is a parameter value;
according to AIC, predicting the optimal ARIMA model of the studied cigarette market capacity, and verifying the fitting property of the ARIMA model by using a white noise hypothesis;
step S42: holt-windows model training, calculating model equations, the following formulas are used:
in the method, in the process of the invention,representing the time sequence at the time point +.>Is>Represents the intercept (I)>The slope is indicated as such,representing the time sequence at the time point +.>Seasonal component of->Is an irregular component;
three smoothing equations are calculated using the following formulas:
in the method, in the process of the invention,is a smooth constant +.>Is a time sequence at the time point +.>Level of->Is a time sequence at the time point +.>Trend of->Is a time sequence at the time point +.>Season component of->Representing the time sequence at the time point +.>Inputting a training set into a Holt-windows model, solving parameters of three smooth equations by using a maximum likelihood estimation method, and evaluating the prediction accuracy of the Holt-windows model by using a mean square error MSE and a mean absolute error MAE;
step S43: RF model training, comprising the steps of:
step S431: generating RF, specifically, firstly randomly and repeatedly extracting N samples from a training set to train a decision tree as a root node of the tree; secondly, when each sample has M attributes, when each node of the decision tree needs to be split, randomly selecting M attributes from the M attributes, wherein the general condition is M < < M; 1 attribute is selected from m attributes by utilizing information gain to serve as a splitting attribute of the node, and the node is split until the node cannot be split again, and pruning is not performed in the whole decision tree forming process; repeating the steps to construct a plurality of decision trees to form an RF model;
step 432: the MGF prediction model specifically comprises the following steps:
step 4321: hypothesized time seriesAverage value of (2) isTime series->Expressed as MGF, the formula used is as follows:
in the method, in the process of the invention,,/>or->Using this formula, m average generating functions of the time series can be obtained and the periodicity is extended to +.>
Step 4322: calculating a first order difference sequenceThe formula used is as follows:
in the method, in the process of the invention,representing the time sequence in->Value of time of day->Representing the time sequence in->A value of time of day;
step 4323: calculating a second order differential sequenceThe formula used is as follows:
in the method, in the process of the invention,representing the time sequence in->A first order differential sequence of moments;
definition of the primordial sequenceThe homogeneous function of->First order differential sequence->And second order differential sequence->The homogeneous functions of (2) are respectively marked as +.>And->Their extension sequences->The formula +.>Obtaining;
step 4324: based on the extension sequence of MGF of the original sequence and the first order difference sequence, a cumulative extension sequence is established, and the following formula is used:
step 433: RF-MGF model prediction, using RF model to obtain prediction dataObtaining prediction data using MGF prediction model>
Step 434: the weight of the mixture of the two methods is calculated, and the formula is as follows:
in the method, in the process of the invention,is->Weight of->Representing the total sales of business circles of sales history data of each retailer in the training set;
step 435: for actual valuesFirst predictive value->Error value->Weighting is performed using the following formula:
in the method, in the process of the invention,representing a baseline prediction value->Representing a historical average;
step 436: by time variation,And->Error value +.>Performing fitting analysis on the output variable by adopting a response surface method to obtain a final predicted value;
step S44: and (3) model fusion, namely distributing weights to the ARIMA model, the Holt-windows model and the RF model by using a weighted average method to obtain an integrated model A.
Further, in step S5, the test set is input into the integrated model a, and the evaluation index adopts the accuracy, and the following formula is used:
in the method, in the process of the invention,for measuring the predicted position and the actual sales, < >>Representing the actual sales->Representing a predicted offered sales;
setting the super parameters of the integrated model A, comprising: learning rate and batch size, stopping training when the error of the test set of the integrated model A is no longer reduced in a plurality of continuous iteration times, and adjusting the super parameters of the integrated model A according to the performance of the integrated model A on the test set to obtain an integrated model B;
further, in step S6, the rationality of the grid capacity of the cigarette market is predicted, specifically, new grid area data is input to the integrated model B, and the grid capacity is predicted, so as to obtain a prediction result.
The invention provides a big data-based cigarette market grid capacity rationality prediction system, which comprises a business circle defining module, a business circle expanding mode module, a data preprocessing module, a cigarette market grid capacity prediction method module, an evaluation module and a cigarette market grid capacity rationality prediction module;
the business circle definition module gives out the concept of the business circle, calculates the probability of the purchasing behavior of the customer in the store, and sends the probability of the purchasing behavior of the customer in the store to the business circle expansion mode module;
the business district outward expansion mode module receives probability data defining the purchasing behavior of customers in shops, which is sent by the business district module, expands the range by taking the initial position of a retailer as an initial value to obtain an expanded business district, and sends the expanded business district to the data preprocessing module;
the data preprocessing module receives the expanded business turn sent by the business turn expansion mode module, collects business turn data, constructs a business turn data set, divides the business turn data set into a training set and a testing set, sends the training set to the cigarette market grid capacity prediction method module, and sends the testing set to the evaluation module;
the cigarette market grid capacity prediction method module receives a training set sent by the data preprocessing module, trains a model by utilizing an integrated learning algorithm combining ARIMA, holt-windows and RF, obtains an integrated model A, and sends the integrated model A to the evaluation module;
the evaluation module receives the integrated model A sent by the cigarette market grid capacity prediction method module and the test set sent by the data preprocessing module, evaluates the integrated model A by using the test set to obtain an integrated model B, and sends the integrated model to the cigarette market grid capacity rationality prediction module;
and the cigarette market grid capacity rationality prediction module receives the integrated model B sent by the evaluation module, inputs data of a new grid area, and performs capacity prediction on the new grid area to obtain a predicted value.
By adopting the scheme, the beneficial effects obtained by the invention are as follows:
(1) Aiming at the problems that the complexity of market dynamics and consumer behaviors is not considered in the traditional prediction method, and the cigarette delivery is inaccurate due to limited data which can be collected, business circle data outside a tobacco database is introduced, intelligent strategies are driven according to data of different market segments, customized marketing strategies are generated, and intelligent and accurate cigarette delivery is realized.
(2) Aiming at the problems of insufficient accuracy and poor stability of model prediction caused by incomplete consideration of factors affecting the market capacity of cigarettes by the traditional machine learning method, the scheme combines the advantages of ARIMA, holt-windows and RF combined integrated learning algorithm to improve the accuracy and stability of model prediction.
Drawings
FIG. 1 is a flow diagram of a big data based cigarette market grid capacity rationality prediction method provided by the invention;
FIG. 2 is a schematic diagram of a big data based cigarette market grid capacity rationality prediction system provided by the invention;
FIG. 3 is a flow chart of step S2;
FIG. 4 is a flow chart of step S3;
fig. 5 is a flow chart of step S4.
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention; all other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the description of the present invention, it should be understood that the terms "upper," "lower," "front," "rear," "left," "right," "top," "bottom," "inner," "outer," and the like indicate orientation or positional relationships based on those shown in the drawings, merely to facilitate description of the invention and simplify the description, and do not indicate or imply that the devices or elements referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus should not be construed as limiting the invention.
In a first embodiment, referring to fig. 1, the method for predicting the rationality of the grid capacity of the cigarette market based on big data provided by the invention comprises the following steps:
step S1: defining a business circle, and defining the business circle as a spatial range of cigarette sales capacity of retailers and a geographic area of distribution of cigarette consumers according to an area interaction theory;
step S2: the business circle expanding method comprises the steps of expanding the business circle by taking an initial position of a retailer as an initial value to obtain an expanded business circle;
step S3: data preprocessing, namely acquiring data of an expanded business district, generating a business district data set, and dividing the business district data set into a training set and a testing set;
step S4: predicting the grid capacity of the cigarette market, and predicting the grid capacity of the cigarette market by using a training set through an integrated learning algorithm to obtain an integrated model A;
step S5: evaluating the integrated model A by using a test set to obtain an integrated model B;
step S6: and (3) reasonably predicting the grid capacity of the cigarette market, inputting new grid data into the integrated model B, and predicting the grid capacity of the cigarette market to obtain a prediction result.
In the second embodiment, referring to fig. 1 and 3, the embodiment is based on the above embodiment, and in step S2, the business turn expansion method specifically includes the following steps:
step S21: defining an initial value, calculating the probability of each retailer and surrounding customers purchasing cigarettes, and defining the positions of the retailers as the initial value;
step S22: calculating geographical range of business district, defining distanceRepresenting the distance between the location of the retailer and the center of the business circle, where the initial n is the number of retailers contained in the grid, will +.>Initial +.>
Step S23: expanding a business circle range, centering on an initial grid, expanding the business circle range into a square, and calculating the area of the expanded shopping area according to a calculation method of the initial gridIf->Continuing to expand shopping area calculation ++>Up to->And obtaining the expanded business circle.
Embodiment three, referring to fig. 1 and 4, based on the above embodiment, in step S3, the data preprocessing specifically includes the following steps:
step S31: acquiring data, namely acquiring basic attributes, crowd characteristics and consumption capacity of the expanded business district, and integrating the basic attributes, crowd characteristics and consumption capacity into business district data 1; acquiring the market current situation, consumption index and consumption preference of the expanded business district, integrating the market current situation, consumption index and consumption preference into business district data 2, acquiring POI data related to the sales of the cigarette industry, and explaining the POI data by using the position and attribute characteristics as constraints so as to extract the enterprise number, shopping area, traffic type, walking distance, business type and longitude and latitude data to obtain a POI data set;
step S32: the data conversion, the business turn data 1 and the business turn data 2 comprise numerical data and classification data, the numerical data are converted by using a log1p function to obtain data with Gaussian distribution, and Label-encoding is carried out on the classification data to obtain numerical characteristics, so that business turn data A and business turn data B are obtained;
step S33: constructing a data set, and constructing a cigarette market data set by using a PiFlow fusion business district data A and a business district data B, POI data set;
step S34: dividing a data set into a training set and a testing set;
step S35: and storing the data sets, wherein the business turn data sets are stored in the Hive database in a distributed mode.
By executing the above operation, the problems that market dynamics and consumer behavior complexity are not considered in the traditional prediction method, and the cigarette delivery is inaccurate due to limited data which can be collected are solved.
Embodiment four, referring to fig. 1 and 5, based on the above embodiment, in step S4, the cigarette market grid capacity prediction specifically includes the following steps:
step S41: ARIMA model training, parameters of ARIMA model includeWill->Fitting to training set, wherein->Is autoregressive item number,/->Is the differential order, +.>Is the number of sliding average terms, and ARIMA model training comprises the following steps:
step S411: determining the number of autoregressive terms and the number of moving average terms, and determining the number of autoregressive terms and the number of moving average terms in an ARIMA model by observing an autocorrelation graph ACF and a partial autocorrelation graph PACF;
step S412: determining the differential order, calculatingFirst order difference>The formula used is as follows:
in the method, in the process of the invention,representing the time sequence in->Value of time of day->Representing the time sequence in->A value of time of day;
calculation ofSecond order difference of +.>The formula used is as follows:
in the method, in the process of the invention,representing the time sequence in->A value of time of day;
the differential order is calculated using the following formula:
in the method, in the process of the invention,is a parameter of the autoregressive part,/->Is a sliding average value>Is an estimation error;
step S413: model checking, namely selecting a proper autoregressive term number, a sliding average term number and a differential order number combination, and then performing significance checking on an ARIMA model;
step S414: AIC was used to evaluate the accuracy of the predictions using the following formula:
in the method, in the process of the invention,is the estimated error variance; />Is the sample size,/->Is a parameter value;
according to AIC, predicting the optimal ARIMA model of the studied cigarette market capacity, and verifying the fitting property of the ARIMA model by using a white noise hypothesis;
step S42: holt-windows model training, calculating model equations, the following formulas are used:
in the method, in the process of the invention,representing the time sequence at the time point +.>Is>Represents the intercept (I)>The slope is indicated as such,representing the time sequence at the time point +.>Seasonal component of->Is an irregular component;
three smoothing equations are calculated using the following formulas:
in the method, in the process of the invention,is a smooth constant +.>Is a time sequence at the time point +.>Level of->Is a time sequence at the time point +.>Trend of->Is a time sequence at the time point +.>Season component of->Representing the time sequence at the time point +.>Inputting a training set into a Holt-windows model, solving parameters of three smooth equations by using a maximum likelihood estimation method, and evaluating the prediction accuracy of the Holt-windows model by using a mean square error MSE and a mean absolute error MAE;
step S43: RF model training, comprising the steps of:
step S431: generating RF, specifically, firstly randomly and repeatedly extracting N samples from a training set to train a decision tree as a root node of the tree; secondly, when each sample has M attributes, when each node of the decision tree needs to be split, randomly selecting M attributes from the M attributes, wherein the general condition is M < < M; 1 attribute is selected from m attributes by utilizing information gain to serve as a splitting attribute of the node, and the node is split until the node cannot be split again, and pruning is not performed in the whole decision tree forming process; repeating the steps to construct a plurality of decision trees to form an RF model;
step 432: the MGF prediction model specifically comprises the following steps:
step 4321: hypothesized time seriesAverage value of (2) isTime series->Expressed as MGF, the formula used is as follows:
in the method, in the process of the invention,,/>or->Using this formula, m average generating functions of the time series can be obtained and the periodicity is extended to +.>
Step 4322: calculating a first order difference sequenceThe formula used is as follows:
in the method, in the process of the invention,representing the time sequence in->Value of time of day->Representing the time sequence in->A value of time of day;
step 4323: calculating a second order differential sequenceThe formula used is as follows:
in the method, in the process of the invention,representing the time sequence in->A first order differential sequence of moments;
definition of the primordial sequenceThe homogeneous function of->First order differential sequence->And second order differential sequence->The homogeneous functions of (2) are respectively marked as +.>And->Their extension sequences->The formula +.>Obtaining;
step 4324: based on the extension sequence of MGF of the original sequence and the first order difference sequence, a cumulative extension sequence is established, and the following formula is used:
step 433: RF-MGF model prediction, using RF model to obtain prediction dataObtaining prediction data using MGF prediction model>
Step 434: the weight of the mixture of the two methods is calculated, and the formula is as follows:
in the method, in the process of the invention,is->Weight of->Business circle representing sales history data of each retailer in training setIs a total sales amount of (2);
step 435: for actual valuesFirst predictive value->Error value->Weighting is performed using the following formula:
in the method, in the process of the invention,representing a baseline prediction value->Representing a historical average;
step 436: by time variation,And->Error value +.>Performing fitting analysis on the output variable by adopting a response surface method to obtain a final predicted value;
step S44: and (3) model fusion, namely distributing weights to the ARIMA model, the Holt-windows model and the RF model by using a weighted average method to obtain an integrated model A.
By executing the operation, the problems of insufficient accuracy and poor stability of model prediction caused by incomplete consideration of factors affecting the market capacity of cigarettes by a traditional machine learning method are solved, and the model prediction accuracy and stability are improved by combining the advantages of ARIMA, holt-windows and RF combined integrated learning algorithm.
Fifth embodiment, referring to fig. 2, the embodiment is based on the above embodiment, and the big data based cigarette market grid capacity rationality prediction system provided by the invention includes a business circle defining module, a business circle expanding mode module, a data preprocessing module, a cigarette market grid capacity prediction method module, an evaluation module, and a cigarette market grid capacity rationality prediction module;
the business circle definition module gives out the concept of the business circle, calculates the probability of the purchasing behavior of the customer in the store, and sends the probability of the purchasing behavior of the customer in the store to the business circle expansion mode module;
the business district outward expansion mode module receives probability data defining the purchasing behavior of customers in shops, which is sent by the business district module, expands the range by taking the initial position of a retailer as an initial value to obtain an expanded business district, and sends the expanded business district to the data preprocessing module;
the data preprocessing module receives the expanded business turn sent by the business turn expansion mode module, collects business turn data, constructs a business turn data set, divides the business turn data set into a training set and a testing set, sends the training set to the cigarette market grid capacity prediction method module, and sends the testing set to the evaluation module;
the cigarette market grid capacity prediction method module receives a training set sent by the data preprocessing module, trains a model by utilizing an integrated learning algorithm combining ARIMA, holt-windows and RF, obtains an integrated model A, and sends the integrated model A to the evaluation module;
the evaluation module receives the integrated model A sent by the cigarette market grid capacity prediction method module and the test set sent by the data preprocessing module, evaluates the integrated model A by using the test set to obtain an integrated model B, and sends the integrated model to the cigarette market grid capacity rationality prediction module;
and the cigarette market grid capacity rationality prediction module receives the integrated model B sent by the evaluation module, inputs data of a new grid area, and performs capacity prediction on the new grid area to obtain a predicted value.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
The invention and its embodiments have been described above with no limitation, and the actual construction is not limited to the embodiments of the invention as shown in the drawings. In summary, if one of ordinary skill in the art is informed by this disclosure, a structural manner and an embodiment similar to the technical solution should not be creatively devised without departing from the gist of the present invention.

Claims (5)

1. The cigarette market grid capacity rationality prediction method based on big data is characterized by comprising the following steps of: the method comprises the following steps:
step S1: defining a business circle, and defining the business circle as a spatial range of cigarette sales capacity of retailers and a geographic area of distribution of cigarette consumers according to an area interaction theory;
step S2: the business circle expanding method comprises the steps of expanding the business circle by taking an initial position of a retailer as an initial value to obtain an expanded business circle;
step S3: data preprocessing, namely acquiring data of an expanded business district, generating a business district data set, and dividing the business district data set into a training set and a testing set;
step S4: predicting the grid capacity of the cigarette market, and predicting the grid capacity of the cigarette market by using a training set through an integrated learning algorithm to obtain an integrated model A;
step S5: evaluating the integrated model A by using a test set to obtain an integrated model B;
step S6: and (3) reasonably predicting the grid capacity of the cigarette market, inputting new grid data into the integrated model B, and predicting the grid capacity of the cigarette market to obtain a prediction result.
2. The big data based cigarette market grid capacity rationality prediction method of claim 1, wherein: in step S1, the definition of the business circle, specifically, the business circle is a spatial range of the cigarette sales capability of the retailer and a geographical area of the distribution of cigarette consumers, and according to the area interaction theory, the probability that the consumer purchases the cigarette at the retailer is determined by the area of the retailer and the distance between the consumer and the retailer, and the following formula is used:
in the method, in the process of the invention,is positioned at->Is to go to->Probability of purchasing cigarettes at the site, < > A>Is the sum of all retailers in the business community,is a retailer->Scale of (A)>Is->And->Distance between->Indicating how much importance is placed on time and distance when a customer purchases a cigarette.
3. The big data based cigarette market grid capacity rationality prediction method of claim 2, wherein: in step S2, the business turn expansion method includes the following steps:
step S21: defining an initial value, calculating the probability of each retailer and surrounding customers purchasing cigarettes, and defining the positions of the retailers as the initial value;
step S22: calculating geographical range of business district, defining distanceRepresenting the distance between the location of the retailer and the center of the business circle, where the initial n is the number of retailers contained in the grid, will +.>Initial +.>
Step S23: extending business circle range, centering on initial grid, extending business circle range as square, rootCalculating the expanded shopping area according to the calculation method of the initial gridIf->Continuing to expand shopping area calculationUp to->And obtaining the expanded business circle.
4. The big data based cigarette market grid capacity rationality prediction method of claim 3, wherein: in step S4, the cigarette market grid capacity prediction includes the following steps:
step S41: ARIMA model training, parameters of ARIMA model includeWill->Fitting to training set, wherein->Is autoregressive item number,/->Is the differential order, +.>Is the number of sliding average terms, and ARIMA model training comprises the following steps:
step S411: determining the number of autoregressive terms and the number of moving average terms, and determining the number of autoregressive terms and the number of moving average terms in an ARIMA model by observing an autocorrelation graph ACF and a partial autocorrelation graph PACF;
step S412: determining the differential order, calculatingFirst order difference>The formula used is as follows:
in the method, in the process of the invention,representing the time sequence in->Value of time of day->Representing the time sequence in->A value of time of day;
calculation ofSecond order difference of +.>The formula used is as follows:
in the method, in the process of the invention,representing the time sequence in->A value of time of day;
the differential order is calculated using the following formula:
in the method, in the process of the invention,is a parameter of the autoregressive part,/->Is a sliding average value>Is an estimation error;
step S413: model checking, namely selecting a proper autoregressive term number, a sliding average term number and a differential order number combination, and then performing significance checking on an ARIMA model;
step S414: AIC was used to evaluate the accuracy of the predictions using the following formula:
in the method, in the process of the invention,is the estimated error variance; />Is the sample size,/->Is a parameter value;
according to AIC, predicting the optimal ARIMA model of the studied cigarette market capacity, and verifying the fitting property of the ARIMA model by using a white noise hypothesis;
step S42: holt-windows model training, calculating model equations, the following formulas are used:
in the method, in the process of the invention,representing the time sequence at the time point +.>Is>Represents the intercept (I)>Indicating slope, & lt->Representing the time sequence at the time point +.>Seasonal component of->Is an irregular component;
three smoothing equations are calculated using the following formulas:
in the method, in the process of the invention,is a smooth constant +.>Is a time sequence at the time point +.>Level of->Is a time sequence at a time pointTrend of->Is a time sequence at the time point +.>Season component of->Representing the time sequence at the time point +.>Inputting a training set into a Holt-windows model, solving parameters of three smooth equations by using a maximum likelihood estimation method, and evaluating the prediction accuracy of the Holt-windows model by using a mean square error MSE and a mean absolute error MAE;
step S43: RF model training, comprising the steps of:
step S431: generating RF, specifically, firstly randomly and repeatedly extracting N samples from a training set to train a decision tree as a root node of the tree; secondly, when each sample has M attributes, when each node of the decision tree needs to be split, randomly selecting M attributes from the M attributes, wherein the general condition is M < < M; 1 attribute is selected from m attributes by utilizing information gain to serve as a splitting attribute of the node, and the node is split until the node cannot be split again, and pruning is not performed in the whole decision tree forming process; repeating the steps to construct a plurality of decision trees to form an RF model;
step 432: the MGF prediction model specifically comprises the following steps:
step 4321: hypothesized time seriesAverage value of (2) isTime series->Expressed as MGF, the formula used is as follows:
in the method, in the process of the invention,,/>or->Using this formula, m average generating functions of the time series can be obtained and the periodicity is extended to +.>
Step 4322: calculating a first order difference sequenceThe formula used is as follows:
in the method, in the process of the invention,representing the time sequence in->Value of time of day->Representing the time sequence in->A value of time of day;
step 4323: calculating a second order differential sequenceThe formula used is as follows:
in the method, in the process of the invention,representing the time sequence in->A first order differential sequence of moments;
definition of the primordial sequenceThe homogeneous function of->First order differential sequence->And second order differential sequence->Homogeneous function of (2)Are respectively marked as->And->Their extension sequences->The formula +.>Obtaining;
step 4324: based on the extension sequence of MGF of the original sequence and the first order difference sequence, a cumulative extension sequence is established, and the following formula is used:
step 433: RF-MGF model prediction, using RF model to obtain prediction dataObtaining prediction data using MGF prediction model>
Step 434: the weight of the mixture of the two methods is calculated, and the formula is as follows:
in the method, in the process of the invention,is->Weight of->Representing the total sales of business circles of sales history data of each retailer in the training set;
step 435: for actual valuesFirst predictive value->Error value->Weighting is performed using the following formula:
in the method, in the process of the invention,representing a baseline prediction value->Representing a historical average;
step 436: by time variation,And->Error value +.>Performing fitting analysis on the output variable by adopting a response surface method to obtain a final predicted value;
step S44: and (3) model fusion, namely distributing weights to the ARIMA model, the Holt-windows model and the RF model by using a weighted average method to obtain an integrated model A.
5. A big data based cigarette market grid capacity rationality prediction system for implementing the big data based cigarette market grid capacity rationality prediction method according to any one of claims 1-4, characterized in that: the method comprises a business circle defining module, a business circle expanding mode module, a data preprocessing module, a cigarette market grid capacity prediction method module, an evaluation module and a cigarette market grid capacity rationality prediction module;
the business circle definition module gives out the concept of the business circle, calculates the probability of the purchasing behavior of the customer in the store, and sends the probability of the purchasing behavior of the customer in the store to the business circle expansion mode module;
the business district outward expansion mode module receives probability data defining the purchasing behavior of customers in shops, which is sent by the business district module, expands the range by taking the initial position of a retailer as an initial value to obtain an expanded business district, and sends the expanded business district to the data preprocessing module;
the data preprocessing module receives the expanded business turn sent by the business turn expansion mode module, collects business turn data, constructs a business turn data set, divides the business turn data set into a training set and a testing set, sends the training set to the cigarette market grid capacity prediction method module, and sends the testing set to the evaluation module;
the cigarette market grid capacity prediction method module receives a training set sent by the data preprocessing module, trains a model by utilizing an integrated learning algorithm combining ARIMA, holt-windows and RF, obtains an integrated model A, and sends the integrated model A to the evaluation module;
the evaluation module receives the integrated model A sent by the cigarette market grid capacity prediction method module and the test set sent by the data preprocessing module, evaluates the integrated model A by using the test set to obtain an integrated model B, and sends the integrated model to the cigarette market grid capacity rationality prediction module;
and the cigarette market grid capacity rationality prediction module receives the integrated model B sent by the evaluation module, inputs data of a new grid area, and performs capacity prediction on the new grid area to obtain a predicted value.
CN202410188091.XA 2024-02-20 2024-02-20 Cigarette market grid capacity rationality prediction method and system based on big data Active CN117745340B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410188091.XA CN117745340B (en) 2024-02-20 2024-02-20 Cigarette market grid capacity rationality prediction method and system based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410188091.XA CN117745340B (en) 2024-02-20 2024-02-20 Cigarette market grid capacity rationality prediction method and system based on big data

Publications (2)

Publication Number Publication Date
CN117745340A true CN117745340A (en) 2024-03-22
CN117745340B CN117745340B (en) 2024-05-24

Family

ID=90251184

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410188091.XA Active CN117745340B (en) 2024-02-20 2024-02-20 Cigarette market grid capacity rationality prediction method and system based on big data

Country Status (1)

Country Link
CN (1) CN117745340B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118071404A (en) * 2024-04-17 2024-05-24 湖南潇湘大数据科技有限公司 Multi-objective optimization-based grid reasonable capacity calculation method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060155633A1 (en) * 2005-01-12 2006-07-13 International Business Machines Corporation Automatically distributing a bid request for a grid job to multiple grid providers and analyzing responses to select a winning grid provider
CN110009419A (en) * 2019-02-21 2019-07-12 国家电网有限公司 Improvement time series electricity sales amount prediction technique and system based on Economic Climate method
US20220036610A1 (en) * 2020-07-29 2022-02-03 International Business Machines Corporation Visualization of a model selection process in an automated model selection system
CN114266395A (en) * 2021-12-22 2022-04-01 四川省烟草公司成都市公司 Cigarette logistics distribution center information system based on combined prediction method
CN114372848A (en) * 2021-12-30 2022-04-19 辽宁省烟草公司鞍山市公司 Tobacco industry intelligent marketing system based on machine learning
KR102520597B1 (en) * 2022-11-16 2023-04-10 윤지혜 Product matching method considering market analysis and company needs

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060155633A1 (en) * 2005-01-12 2006-07-13 International Business Machines Corporation Automatically distributing a bid request for a grid job to multiple grid providers and analyzing responses to select a winning grid provider
CN101080736A (en) * 2005-01-12 2007-11-28 国际商业机器公司 Automatically distributing a bid request for a grid job to multiple grid providers and analyzing responses to select a winning grid provider
CN110009419A (en) * 2019-02-21 2019-07-12 国家电网有限公司 Improvement time series electricity sales amount prediction technique and system based on Economic Climate method
US20220036610A1 (en) * 2020-07-29 2022-02-03 International Business Machines Corporation Visualization of a model selection process in an automated model selection system
CN114266395A (en) * 2021-12-22 2022-04-01 四川省烟草公司成都市公司 Cigarette logistics distribution center information system based on combined prediction method
CN114372848A (en) * 2021-12-30 2022-04-19 辽宁省烟草公司鞍山市公司 Tobacco industry intelligent marketing system based on machine learning
KR102520597B1 (en) * 2022-11-16 2023-04-10 윤지혜 Product matching method considering market analysis and company needs

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
吴涛;毛嘉莉;谢青成;杨艳秋;王锦;: "基于实时路况的top-κ载客热门区域推荐", 华东师范大学学报(自然科学版), no. 05, pages 3035 *
张圣泉, 张雁白, 王树花: "连锁零售企业扩张中的商圈分析", 当代经济管理, no. 06, 30 December 2004 (2004-12-30), pages 38 - 42 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118071404A (en) * 2024-04-17 2024-05-24 湖南潇湘大数据科技有限公司 Multi-objective optimization-based grid reasonable capacity calculation method and system

Also Published As

Publication number Publication date
CN117745340B (en) 2024-05-24

Similar Documents

Publication Publication Date Title
Bang et al. Fuzzy logic based crop yield prediction using temperature and rainfall parameters predicted through ARMA, SARIMA, and ARMAX models
CN106156809A (en) For updating the method and device of disaggregated model
Liu et al. Constraining land surface and atmospheric parameters of a locally coupled model using observational data
CN110097529A (en) A kind of farmland Grading unit division methods and system based on semantic rules
Mishra et al. Rainfall-runoff modeling using clustering and regression analysis for the river brahmaputra basin
CN110516835A (en) A kind of Multi-variable Grey Model optimization method based on artificial fish-swarm algorithm
CN107644047A (en) Tag Estimation generation method and device
CN110516890A (en) A kind of crop yield monitoring system based on Grey Combinatorial Model Method
Abbal et al. A decision support system for vine growers based on a Bayesian network
Patel et al. LSTM-RNN Combined Approach for Crop Yield Prediction On Climatic Constraints
CN117745340B (en) Cigarette market grid capacity rationality prediction method and system based on big data
CN113111256A (en) Production work order recommendation method based on depth knowledge map
Favereau et al. Robust streamflow forecasting: a Student’st-mixture vector autoregressive model
Rajan et al. A GIS based integrated land use/cover change model to study human-land interactions
CN108764583A (en) The unbiased predictor method of forest reserves
Murthy et al. Model for Predicting Prospective Big-Mart Sales Based on Grid Search Optimization (GSO)
Campbell et al. Effects of pricing influences and selling characteristics on plant sales in the green industry
Khumaidi et al. Forecasting of Sales Based on Long Short Term Memory Algorithm with Hyperparameter
De la Torre et al. Electricity price forecast in wholesale markets using conformal prediction: Case study in Mexico
CN109345274A (en) Neighbour&#39;s user choosing method based on BP neural network score in predicting error
CN118071404B (en) Multi-objective optimization-based grid reasonable capacity calculation method and system
Kangane et al. Analysis of different regression models for real estate price prediction
Akaichi et al. Pairwise Constrained Clustering and Robust Regression: A Case Study on French Enterprise Activities and Expenses Data
CN108776850A (en) A kind of accurate predictor method of forest reserves
Cárdenas et al. Comparison between composite index solution surfaces with fuzzy composite index decision surfaces

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant