CN115860800A - Festival and holiday commodity sales volume prediction method and device and computer storage medium - Google Patents
Festival and holiday commodity sales volume prediction method and device and computer storage medium Download PDFInfo
- Publication number
- CN115860800A CN115860800A CN202211655938.8A CN202211655938A CN115860800A CN 115860800 A CN115860800 A CN 115860800A CN 202211655938 A CN202211655938 A CN 202211655938A CN 115860800 A CN115860800 A CN 115860800A
- Authority
- CN
- China
- Prior art keywords
- sales
- commodity
- holiday
- data set
- prediction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 86
- 238000003860 storage Methods 0.000 title claims abstract description 19
- 238000010801 machine learning Methods 0.000 claims abstract description 56
- 238000012216 screening Methods 0.000 claims abstract description 33
- 230000002159 abnormal effect Effects 0.000 claims abstract description 13
- 241000531116 Blitum bonus-henricus Species 0.000 claims abstract description 7
- 235000008645 Chenopodium bonus henricus Nutrition 0.000 claims abstract description 7
- 238000012549 training Methods 0.000 claims description 37
- 238000012545 processing Methods 0.000 claims description 33
- 230000008569 process Effects 0.000 claims description 24
- 238000012360 testing method Methods 0.000 claims description 16
- 230000003442 weekly effect Effects 0.000 claims description 10
- 230000008859 change Effects 0.000 claims description 7
- 230000002776 aggregation Effects 0.000 claims description 6
- 238000004220 aggregation Methods 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 5
- 238000005457 optimization Methods 0.000 description 12
- 238000005070 sampling Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000007667 floating Methods 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000001556 precipitation Methods 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 238000013277 forecasting method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000012384 transportation and delivery Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0202—Market predictions or forecasting for commercial activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/10—Pre-processing; Data cleansing
- G06F18/15—Statistical pre-processing, e.g. techniques for normalisation or restoring missing data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Accounting & Taxation (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Finance (AREA)
- Artificial Intelligence (AREA)
- Development Economics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Software Systems (AREA)
- Entrepreneurship & Innovation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Economics (AREA)
- Marketing (AREA)
- Game Theory and Decision Science (AREA)
- Probability & Statistics with Applications (AREA)
- Medical Informatics (AREA)
- General Business, Economics & Management (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a method and a device for predicting commodity sales in holidays and a computer storage medium, wherein the method comprises the following steps: acquiring historical sales data of all goods sold by stores to form a goods sales standard data set; inputting the commodity sales standard data set into a machine learning reference model, screening screened commodities with prediction results lower than a preset threshold value according to the prediction precision of the commodity target holiday sales, and forming a screened commodity data set based on historical sales data of the screened commodities; performing holiday feature item expansion and commodity weight index feature item expansion on the screened commodity data set; inputting the screened commodity data set after the feature item expansion into a machine learning sales amount prediction model to obtain the sales amount prediction value of all commodities sold by the store in the target holiday. According to the invention, the commodities with abnormal sales volume fluctuation in holidays are screened out, and holiday characteristic items are established in a targeted manner to help model prediction, so that the sales volume prediction precision of the screened commodities in the holidays is effectively improved.
Description
Technical Field
The invention relates to the technical field of data analysis, in particular to a method and a device for predicting commodity sales in holidays and a computer storage medium.
Background
Accurate sales forecast is an important prerequisite for enterprises to make reasonable operation plans and develop efficient management of supply chains. Because commodity sales in daily period have certain time law, enterprises can usually adopt a traditional optimization algorithm to predict more accurately. In contrast, the sales regularity of the commodity during the special holidays is difficult to capture and tends to present a characteristic of great volatility. In practice, the sales volume of the enterprises predicted the holiday period by adopting the traditional optimization algorithm aiming at the daily period is often greatly deviated from the actual value. However, the explosive sales of goods during a particular holiday can result in high profits for the retail establishment, and the accuracy of the forecast is critical to the establishment.
Usually, before and after holidays, retailers face the peak period of customer purchase, and no matter an enterprise makes an over-optimistic prediction or an over-pessimistic prediction, the sales prediction of the holidays with low reliability can generate negative influence on links such as purchase, inventory management, production control and personnel arrangement, so that unnecessary operation cost and profit loss are brought. For example, in order to capture the sales profits during the spring festival, enterprises can transport a certain amount of commodities from the regional distribution center to a front distribution center closer to the retail store in advance based on the predicted spring festival sales volume, and once the predicted deviation occurs, especially fresh products, the quality guarantee period is short and the fresh products are easy to consume, and when the warehouse is squeezed, the enterprises can generate high inventory cost and operation risk, and waste of personnel and resources can be caused. In this case, if the predicted sales amount is smaller than the actual sales amount of the product, it means that the product inventory of the front distribution center will not meet the actual demand, increasing the delivery time of the product, decreasing the satisfaction of the customer, and eventually leading to the loss of the customer.
Therefore, in order to improve the core competitiveness of retail enterprises, how to improve the sales prediction accuracy of commodities during special holidays is an urgent problem to be solved. The popularity of enterprise informatization and the development of related hardware technologies have made it possible to collect historical transaction data. Each commodity trade order in the holiday can be sensed and stored, such as the price of the commodity, the sales volume, the customer order volume, and the like. The accumulation of these data brings an opportunity to scientifically predict sales volume during holidays of special festivals, but how to improve the accuracy of prediction by using these data is a hot problem facing the business and academic circles.
In view of this, the present invention is specifically disclosed.
Disclosure of Invention
In order to solve the technical problems, the invention provides an optimization processing method, a device and a storage medium suitable for big data indexes, and provides a novel method for order processing, commodity screening and sales volume prediction assistance in a mode based on holiday characteristics and weekly weight index characteristic expansion in a prediction process.
Specifically, the following technical scheme is adopted:
a method for predicting commodity sales in holidays comprises the following steps:
acquiring historical sales data of all goods sold by stores to form a goods sales standard data set;
inputting the commodity sales standard data set into a machine learning reference model, screening screened commodities with prediction results lower than a preset threshold value according to the prediction precision of the commodity target holiday sales, and forming a screened commodity data set based on historical sales data of the screened commodities;
performing holiday feature item expansion and commodity weight index feature item expansion on the screened commodity data set;
inputting the screened commodity data set after the feature item expansion into a machine learning sales amount prediction model to obtain the sales amount prediction value of all commodities sold by the store in the target holiday.
As an optional aspect of the present invention, in the holiday commodity sales volume prediction method according to the present invention, the step of inputting the commodity sales volume standard data set into the machine learning reference model and selecting the screened commodity having the prediction result lower than the preset threshold value according to the prediction accuracy of the commodity target holiday sales volume includes:
inputting the commodity sales standard data set into a machine learning reference model for model training;
the machine learning reference model after the operation training is finished gives out sales prediction values of all the commodities sold by the store on the test set;
sequentially calculating a sales prediction error value of each commodity based on the sales prediction values of all the commodities sold by the stores;
selecting a prediction error value critical point from sales prediction error values of all goods sold in a store, screening out goods with the sales prediction error values higher than the prediction error value critical point, and summarizing the goods into a screened goods list;
optionally, the machine learning reference model is an XGBoost model;
optionally, date data, and/or weather data of the area to which the store belongs, and/or commodity category data, and/or commodity price data are input into the machine learning reference model to assist in the prediction.
As an optional embodiment of the present invention, in the holiday commodity sales volume prediction method according to the present invention, the expanding holiday feature items on the screened commodity data set includes:
counting the dates of the target festivals and holidays, respectively expanding the date of each target festivals and holidays forwards by a first preset time interval Ts1 and backwards by a second preset time interval Ts2 to form a complete target festivals and holiday cycle, and constructing corresponding festivals and holiday characteristic items according to the complete target festivals and holiday cycle;
respectively numbering the dates of a first preset time interval Ts1 before each target holiday, a target holiday period and a second preset time interval Ts2 after the target holiday, so as to form a target holiday period interval coding feature item to identify each interval in each holiday period, wherein the number of the holiday feature item on other dates is 0;
optionally, the first preset time interval Ts1 is equal to the second preset time interval Ts2.
As an optional embodiment of the present invention, in the holiday commodity sales amount prediction method according to the present invention, in the holiday feature item expansion process performed on the screened commodity data set, if commodity sales amounts in a plurality of target holiday periods are predicted at the same time, holiday category coding feature items are added to identify dates of different target holiday periods, so that the machine learning sales amount prediction model is used to distinguish different target holiday periods.
As an optional embodiment of the present invention, in the holiday commodity sales amount prediction method according to the present invention, in the holiday feature item expansion process performed on the screened commodity data set, if commodity sales amounts in a plurality of target holiday periods are predicted at the same time and different target holiday periods are partially overlapped, the holiday overlap identification feature item is added to identify the date on which the different target holiday periods overlap, so that the machine learning sales amount prediction model can reasonably learn the sales amounts of the different target holidays on the overlapped date and give predicted sales amounts of the corresponding target holidays when predicting.
As an optional embodiment of the present invention, in the holiday commodity sales amount prediction method according to the present invention, in the process of expanding the holiday feature items for the screened commodity data set, if commodity sales amounts in a plurality of target holidays are predicted at the same time and there is an obvious classification in the plurality of target holidays, the class of different holidays is identified by adding a holiday major code, and the machine learning sales amount prediction model is used to learn the change rule of the commodity sales amount in each major holiday target holiday period.
As an optional embodiment of the present invention, in the holiday commodity sales prediction method according to the present invention, the expanding of the commodity weight index feature item for the screened commodity data set includes:
circularly recording the data in the screened commodity data set according to a cycle period T, summarizing and calculating the sales volume of each day with the same cycle period to obtain the average sales volume of the cycle day in the cycle period, sequentially calculating the average sales volume of each cycle day in the cycle period, selecting one cycle day as a reference cycle day, setting the weight index of the reference cycle day as a, setting the weight index of other cycle days as (Q/Q0) a, wherein Q is the average sales volume of other cycle days, and Q0 is the average sales volume of the reference cycle day to obtain the daily weight index characteristic of each cycle day in the cycle period;
sequentially adding the daily weight index of each circulation day of each commodity as the circulation period weight index characteristic of the corresponding commodity;
optionally, screening and calculating an average value of sales of all mondays of each commodity in the commodity data set, an average value of sales of all tuesdays, and a daily weight index of a reference cycle day as a until the average value of sales of all sundays, setting the day with the lowest average sales as a reference cycle day, and then using a weight coefficient of the rest 6 days as (Q/Q0) × a, wherein Q is the average sales of other days, and Q0 is the average sales of the reference cycle day, so as to obtain a daily weight index feature from monday to sunday; sequentially adding the daily weight index of Monday to Sunday of each commodity as the weekly weight index characteristic of the corresponding commodity;
and merging the daily weight index features and the weekly weight index features of all the commodities into a screened commodity data set.
As an alternative embodiment of the present invention, a holiday product sales prediction method of the present invention is a method for acquiring historical sales data of all products sold by stores, and forming a product sales standard data set, the method including:
performing data preprocessing steps of abnormal order processing, missing value processing, daily sales aggregation and data set division on historical sales data of commodities sold by stores to form a commodity sales standard data set;
the abnormal order processing is to delete the orders of which the sales number is less than or equal to a preset sales volume threshold;
the missing value processing is to delete or fill the order with missing sales value;
daily sales volume aggregation is to sum all sales orders according to the commodities and dates to form a daily sales volume record data set of each commodity;
and the data set division is to divide the commodity sales data set after being processed into a training set and a testing set according to a selected time interval, and the training set and the testing set are used for training and testing the machine learning reference model.
The invention also provides a device for predicting commodity sales in holidays, which comprises:
the data processing module is used for acquiring historical sales data of all commodities sold by stores to form a commodity sales standard data set;
the commodity data screening module is used for inputting the commodity sales standard data set into a machine learning reference model, screening screened commodities of which the prediction results are lower than a preset threshold value according to the prediction precision of the commodity target holiday sales, and forming a screened commodity data set based on the historical sales data of the screened commodities;
the characteristic item expansion module is used for performing holiday characteristic item expansion and commodity weight index characteristic item expansion on the screened commodity data set;
and the sales prediction module is used for inputting the screened commodity data set after the feature items are expanded into the machine learning sales prediction model to obtain the sales prediction value of all commodities sold by the store in the target holiday.
The invention also provides a computer storage medium which stores a computer executable program, and when the computer executable program is executed, the method for predicting the commodity sales volume on holidays is realized.
Compared with the prior art, the invention has the following beneficial effects:
the invention relates to a method for predicting commodity sales volume in holidays, which is based on a processing mode of order processing-commodity screening-characteristic expansion, can be used for sales volume prediction in holidays of commodities in retail stores, and performs data preprocessing on store commodity sales order data to form a sales volume data set in a standard format; then calculating corresponding prediction evaluation indexes according to the operation result of the standard data set in the machine learning reference model, and screening all commodities to form a screened commodity data set; and finally, constructing corresponding holiday characteristic items aiming at the target holiday period, expanding the corresponding holiday characteristic items to a screened commodity data set, and inputting the screened commodity data set into a machine learning sales amount prediction model to provide sales amount prediction values of all commodity holiday periods. Therefore, the method for predicting the sales volume of the commodity in the holidays can screen out the commodity with abnormal sales volume fluctuation in the holidays (the commodity with the sales volume which cannot be accurately predicted by the reference model), and pertinently construct the holiday characteristic items to help model prediction, so that the sales volume prediction precision of the screened commodity in the holidays is effectively improved.
The invention provides a method for predicting commodity sales in holidays, which comprises the following steps of: all commodities are trained and predicted through the machine learning reference model, then the commodities needing to be further improved by the festival and holiday sales prediction model are judged according to the accuracy of the prediction result, so that the machine learning sales prediction model can pertinently fit the sales fluctuation rule of the commodities in the festival and holiday period, the overall operation time is reduced, and the overall prediction accuracy of the machine learning sales prediction model is improved.
The invention provides a feature expansion method, which comprises the following feature item expansion processes: the method constructs several types of characteristic items for marking key time points, intervals, orders and categories of the holidays aiming at the target holiday period, and is beneficial to a holiday sales forecasting model taking a tree model as a main body to better learn various rules of commodity sales change in the holiday period, so that the holiday sales forecasting precision of the holiday sales forecasting model is effectively improved.
Therefore, the invention provides a method for predicting the commodity sales volume in holidays, which guides the commodity sales plan and the replenishment plan of a retail store during the holidays by performing order processing by using store historical sales data, performing commodity screening by using a reference model, performing holiday feature item expansion on a target holiday and giving the predicted commodity sales volume in the whole holiday through a machine learning sales volume prediction model.
Description of the drawings:
FIG. 1 is a flow chart of a method for predicting sales of commodities in holidays according to an embodiment of the invention;
fig. 2 is a flowchart of the establishment of a prediction model in the method for predicting sales of commodities on holidays, which is disclosed by the embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments.
Thus, the following detailed description of the embodiments of the invention is not intended to limit the scope of the invention as claimed, but is merely representative of some embodiments of the invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
It should be noted that the embodiments of the present invention and the features and technical solutions thereof may be combined with each other without conflict.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
In the description of the present invention, it should be noted that the terms "upper", "lower", and the like refer to orientations or positional relationships based on orientations or positional relationships shown in the drawings, orientations or positional relationships that are usually used for placing the products of the present invention, or orientations or positional relationships that are usually understood by those skilled in the art, and these terms are only used for convenience of description and simplification of the description, and do not indicate or imply that the devices or elements referred to must have specific orientations, be constructed and operated in specific orientations, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and the like are used merely to distinguish one description from another, and are not to be construed as indicating or implying relative importance.
Referring to fig. 1 and 2, a method for predicting sales of commodities on holidays in the embodiment includes:
acquiring historical sales data of all goods sold by stores to form a goods sales standard data set;
inputting the commodity sales standard data set into a machine learning reference model, screening screened commodities with prediction results lower than a preset threshold value according to the prediction precision of the commodity target holiday sales, and forming a screened commodity data set based on historical sales data of the screened commodities;
performing holiday feature item expansion and commodity weight index feature item expansion on the screened commodity data set;
inputting the screened commodity data set after the feature item expansion into a machine learning sales amount prediction model to obtain the sales amount prediction value of all commodities sold by the store in the target holiday.
The invention relates to a method for predicting commodity sales volume in holidays, which is based on a processing mode of order processing-commodity screening-characteristic expansion, can be used for predicting sales volume of commodities in retail stores during holidays, and performs data preprocessing on commodity sales order data of stores to form a sales volume data set in a standard format; then calculating corresponding prediction evaluation indexes according to the operation result of the standard data set in the machine learning reference model, and screening all commodities to form a screened commodity data set; and finally, constructing corresponding holiday characteristic items aiming at the target holiday period, expanding the corresponding holiday characteristic items to a screened commodity data set, inputting the screened commodity data set into a machine learning sales prediction model, and giving out sales prediction values of all commodity holiday periods. Therefore, the method for predicting the sales of the commodities in the holidays can screen the commodities with abnormal sales fluctuation in the holidays (the commodities with sales which cannot be accurately predicted by the reference model), and pertinently construct the holiday characteristic items to help model prediction, so that the sales prediction precision of the screened commodities in the holidays is effectively improved.
The invention provides a festival and holiday commodity sales forecasting method, which trains and forecasts all commodities through a machine learning reference model, and then judges commodities needing to be further improved by using the festival and holiday sales forecasting model according to the precision of the forecasting result, so that the machine learning sales forecasting model can pertinently fit the sales fluctuation rule of the commodities in the festival and holiday period, the overall operation time is reduced, and the overall forecasting precision of the machine learning sales forecasting model is improved.
Therefore, the invention provides a method for predicting the commodity sales volume in holidays, which guides the commodity sales plan and the replenishment plan of a retail store during the holidays by performing order processing by using store historical sales data, performing commodity screening by using a reference model, performing holiday feature item expansion on a target holiday and giving the predicted commodity sales volume in the whole holiday through a machine learning sales volume prediction model.
In the method for predicting the commodity sales in the holidays according to the embodiment, the step of inputting the commodity sales standard dataset into the machine learning benchmark model and screening screened commodities with the prediction result lower than the preset threshold value according to the prediction precision of the commodity target holiday sales comprises the following steps:
inputting the commodity sales standard data set into a machine learning reference model for model training;
the machine learning reference model after the operation training is finished gives out sales prediction values of all the commodities sold by the store on the test set;
sequentially calculating a sales prediction error value of each commodity based on the sales prediction values of all the commodities sold by the stores;
selecting a prediction error value critical point from sales prediction error values of all goods sold in stores, screening out goods with the sales prediction error values higher than the prediction error value critical point, and summarizing the goods into a screened goods list.
Optionally, the machine learning reference model is constructed by taking an XGBoost model as a main model. The XGboost (eXtreme Gradient Boosting) model is an open source framework of Gradient Boosting (Gradient Boosting) created by the Chentianqi of the doctor of Washington university, and is an efficient, flexible and portable machine learning algorithm. The model is an addition model consisting of a plurality of weak learners, the deviation of the prediction results of all the previous weak learners is corrected by the newly added weak learners, and the model uses the information of second derivative when solving an optimization target on the basis of a GBDT target function, so that the definition of the optimization target is more accurate, and the training speed is increased; meanwhile, a regular term is added into the optimization objective function to limit the complexity of the model and prevent overfitting. Overall, the XGBoost model has very excellent performance in terms of algorithm performance while the overall operation speed is fast.
Optionally, date data, and/or weather data of the area to which the store belongs, and/or commodity category data, and/or commodity price data are input into the machine learning reference model to assist in the prediction.
The method for predicting the commodity sales volume in the holidays provides a feature expansion method, and several types of feature items labeled on holiday key time points, intervals, orders and categories are constructed for a target holiday period, so that a holiday sales volume prediction model taking a tree model as a main body can better learn various rules of commodity sales volume changes in the holiday period, and the holiday sales volume prediction accuracy of the holiday sales volume prediction model is effectively improved.
Further, the screening process of the embodiment for the retail store whole commodities comprises the following steps:
and converting the date data into the characteristic items, merging the characteristic items into a commodity sales volume standard data set which comprises year numbers, quarter numbers, month numbers, week numbers and day numbers, and converting the characteristic items into an integer data format for storage so as to be directly input into the XGboost model for use in the subsequent process.
Weather data are converted into characteristic items to be merged into a commodity sales standard data set, the characteristic items comprise weather states, temperature equalization in the daytime, temperature equalization at night, wind power and air quality, and the characteristic items are converted into integer data formats to be stored so as to be directly input into the XGboost model in the subsequent process.
Optionally, the weather data information may be expanded, and precipitation, atmospheric pressure, humidity, wind direction, and extreme weather warning information may be converted into an integer data format, and incorporated into a commodity sales standard data set as a feature item.
And converting all levels of category information of the commodities into feature items, merging the feature items into a commodity sales standard data set, wherein the feature items comprise department numbers, major numbers, middle numbers, minor numbers and commodity numbers, and converting the feature items into an integer data format for storage so as to be directly input into an XGboost model for use.
And converting the commodity unit price information into characteristic items, merging the characteristic items into a commodity sales standard data set, and converting the characteristic items into a floating point type data format for storage so as to be directly input into the XGboost model for use in the following process.
Alternatively, a plurality of price information such as standard unit price, sale unit price, etc. of the commodity may be converted into a floating-point type data format, and incorporated into the commodity sales standard data set as the feature item.
And inputting the data set merged into the four major characteristic items into a reference model taking an XGboost model as a main body for training, and giving out sales forecast values of all commodities in a target holiday period to form a commodity target holiday forecast sales data set.
As an optional implementation manner of the invention, parameter optimization of the XGBoost model is performed for the whole training data set, and the XGBoost model parameters are optimized by using grid search.
And storing the optimal parameters, and calling the group of parameters to construct and predict a model when the sales volume is predicted.
Important parameters needing to be optimized and adjusted in the XGboost model training comprise boost, objective, learning _ rate, max _ depth, n _ estimators, n _ jobs, subsample, colsample _ byte and colsample _ byte, and respectively correspond to a model solving mode, a loss function, a model learning rate, the maximum depth of a tree, the number of submodels, the number of parallel threads, a training subsample proportion, a proportion of characteristic sampling during tree building and a proportion of characteristic sampling during tree node splitting.
Classifying the screened commodity data sets according to commodities, sequentially calculating the error between the predicted value and the true value of the sales volume in each commodity target holiday period, and calculating by adopting a WMAPE formula.
And calculating the average value of all commodity prediction errors WMAPE, and screening commodities by taking the selected multiple of the average value as a threshold value/critical value, namely screening out commodities with the prediction errors WMAPE larger than the threshold value/critical value to form a screened commodity list.
In the method for predicting sales of holiday commodities in this embodiment, the expanding holiday feature items for the screened commodity data set includes:
counting the dates of the target festivals and holidays, respectively expanding the date of each target festivals and holiday forward by a first preset time interval Ts1 and expanding the date of each target festivals and holiday backward by a second preset time interval Ts2 to form a complete target festivals and holiday cycle, and constructing corresponding festivals and holiday characteristic items according to the complete target festivals and holiday cycle;
respectively numbering the dates of a first preset time interval Ts1 before each target holiday, a target holiday period and a second preset time interval Ts2 after the target holiday, so as to form a target holiday period interval coding feature item to identify each interval in each holiday period, wherein the number of the holiday period interval coding feature item on other dates is 0.
Optionally, the first preset time interval Ts1 is equal to the second preset time interval Ts2.
Specifically, as an implementation manner of the present embodiment, when the first preset time interval Ts1 and the second preset time interval Ts2 are equal to 7 days (one week), the dates of three intervals, namely, one week before each holiday, a holiday period and one week after each holiday, are respectively numbered (sequentially coded by {1,2,3 }), so that a holiday period interval coding feature item is formed to identify each interval in each holiday period, and the value of the feature item on other dates is 0;
further, the day of the week before each holiday is coded with { -7, -6, -5, -4, -3, -2, -1} in that order, and then the holiday period and the day of the week after are coded with {1,2,3. } in that order, so as to label the order of the dates within each holiday period, while the value of the feature term on the other dates is 0.
It should be understood by those skilled in the art that the first preset time interval Ts1 and the second preset time interval Ts2 in the present embodiment may be the same or different, and may be preset according to the target holiday.
Optionally, in the method for predicting the commodity sales volume in holidays according to this embodiment, in the process of expanding the holiday feature items for the screened commodity data set, if commodity sales volumes in a plurality of target holiday periods are predicted at the same time, dates of different target holiday periods are identified by adding holiday category coding feature items, and the dates are used for the machine learning sales volume prediction model to distinguish the different target holiday periods.
Optionally, in the method for predicting the commodity sales volume in holidays according to this embodiment, in the process of expanding the holiday feature items for the screened commodity data set, if the commodity sales volumes in a plurality of target holiday periods are predicted at the same time and there is a case that different target holiday periods partially overlap, dates on which different target holiday periods overlap are identified by adding holiday overlap identification feature items, so that the machine learning sales volume prediction model can reasonably learn the sales volumes of different target holidays on the overlapping dates and give predicted sales volumes of corresponding target holidays when predicting.
Optionally, in the method for predicting commodity sales in holidays according to this embodiment, in the process of expanding the holiday feature items for the screened commodity data set, if commodity sales in a plurality of target holidays are predicted at the same time and there are obvious classifications (for example, common holidays, legal holidays, and the like) in the plurality of target holidays, the classes of different holidays are identified by adding a holiday major class code, and the method is used for the machine learning sales prediction model to learn the change rule of commodity sales in each major holiday target holiday period.
In the method for predicting commodity sales in holidays according to this embodiment, the expanding the commodity weight index feature term for the screened commodity data set includes:
screening data in the commodity data set, circularly recording according to a cycle period T, summarizing and calculating sales of all days with the same cycle period to obtain an average sales of the cycle day in the cycle period, sequentially calculating the average sales of all the cycle days in the cycle period, selecting one cycle day as a reference cycle day, setting a weight index of the reference cycle day as a, setting weight coefficients of other cycle days as (Q/Q0) × a, wherein Q is the average sales of other cycle days, and Q0 is the average sales of the reference cycle day to obtain a daily weight index characteristic of all the cycle days in the cycle period;
and sequentially adding the daily weight index of each circulation day of each commodity as the circulation period weight index characteristic of the corresponding commodity.
Optionally, screening and calculating an average value of sales of all mondays of each commodity in the commodity data set, an average value of sales of all tuesdays, and a daily weight index of a reference cycle day as a until the average value of sales of all sundays, setting the day with the lowest average sales as a reference cycle day, and then using a weight coefficient of the rest 6 days as (Q/Q0) × a, wherein Q is the average sales of other days, and Q0 is the average sales of the reference cycle day, so as to obtain a daily weight index feature from monday to sunday; sequentially adding the daily weight index of Monday to Sunday (preset period Tm) of each commodity as the weekly weight index characteristic of the corresponding commodity;
and merging the daily weight index features and the weekly weight index features of all the commodities into a screened commodity data set.
The daily weight index feature and the weekly weight index feature of the product are calculated by taking the week (7 days) as a cycle period, and the embodiment may also perform calculation by using other cycle periods, for example, taking 10 days as one cycle period.
Further, the training and predicting process of the machine learning sales prediction model of the embodiment includes:
and inputting the screened commodity sales data set which is merged into the festival and holiday characteristic items, the week weight index and the day weight index characteristic items into a festival and holiday sales prediction model which takes the XGboost model as a main body for training, and giving out a sales prediction value of the screened commodities in the spring festival period.
Alternatively, the overall prediction effect of the machine-learned sales prediction model can be analyzed by calculating a corresponding prediction error WMAPE value according to the commodity sales predicted value.
As an optional implementation manner of the invention, the parameter optimization of the XGboost model is carried out aiming at the whole training data set, and the XGboost model parameter is optimized by utilizing grid search;
storing the optimal parameters, and calling the group of parameters to construct and predict a model when predicting sales volume;
important parameters needing to be optimized and adjusted in the XGboost model training comprise boost, objective, learning _ rate, max _ depth, n _ estimators, n _ jobs, subsample, colsample _ byte and colsample _ byte, and respectively correspond to a model solving mode, a loss function, a model learning rate, the maximum depth of a tree, the number of submodels, the number of parallel threads, a training subsample proportion, a proportion of characteristic sampling during tree building and a proportion of characteristic sampling during tree node splitting.
In the method for predicting commodity sales in holidays according to this embodiment, the step of obtaining historical sales data of all commodities sold in stores and forming a commodity sales standard data set includes:
performing data preprocessing steps of abnormal order processing, missing value processing, daily sales volume aggregation and data set division on historical sales volume data of the commodities sold by the store, thereby forming a commodity sales volume standard data set;
the abnormal order processing is to delete orders with sales quantity less than or equal to a preset sales quantity threshold, and optionally, the preset sales quantity threshold is selected to be 0;
the missing value processing is to delete or fill the order with missing sales value;
daily sales volume aggregation is to sum up sales volumes of all sales orders according to commodities and dates so as to form a daily sales volume record data set of each commodity;
and the data set division is to divide the commodity sales data set after being processed into a training set and a testing set according to a selected time interval, and the training set and the testing set are used for training and testing the machine learning reference model.
Specifically, the processed data set is divided into a training data set and a testing data set according to a set proportion, wherein the daily sales volume record number proportion of each commodity in the two data sets is about 8.
And performing data format unified processing on the data set and checking missing values of each line of data to form a commodity sales standard data set for the machine learning reference model to use.
Optionally, the sales of all the commodities in the data set in all the target holiday periods can be supplemented completely, wherein dates without sales records are filled with 0, so that the subsequent model can capture the complete sales change rule of the commodities in the target holiday periods.
Optionally, abnormal value inspection and processing may be performed on all daily sales data of each commodity in sequence, and daily sales data with abnormally high or abnormally low may be removed or smoothed.
Optionally, commodities with no or few sales records in the sales order data set in the target holiday period can be eliminated, and the size of the whole data set is reduced.
The invention also provides a device for predicting commodity sales in holidays, which comprises:
the data processing module is used for acquiring historical sales data of all commodities sold by stores to form a commodity sales standard data set;
the commodity data screening module is used for inputting the commodity sales standard data set into the machine learning reference model, screening screened commodities of which the prediction results are lower than a preset threshold value according to the prediction precision of the commodity target holiday sales, and forming a screened commodity data set based on the historical sales data of the screened commodities;
the characteristic item expansion module is used for performing holiday characteristic item expansion and commodity weight index characteristic item expansion on the screened commodity data set;
and the sales prediction module is used for inputting the screened commodity data set after the feature items are expanded into the machine learning sales prediction model to obtain the sales prediction value of all commodities sold by the store in the target holiday.
The present embodiment also provides a computer storage medium storing a computer-executable program, which when executed, implements a method for predicting sales of commodities on holidays as described above.
The computer storage medium of the present embodiments may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable storage medium may be any computer readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
The embodiment also provides an electronic device, which comprises a processor and a memory, wherein the memory is used for storing a computer executable program, and when the computer program is executed by the processor, the processor executes the holiday commodity sales amount prediction method.
The electronic device is in the form of a general purpose computing device. The processor can be one or more and can work together. The invention also does not exclude that distributed processing is performed, i.e. the processors may be distributed over different physical devices. The electronic device of the present invention is not limited to a single entity, and may be a sum of a plurality of entity devices.
The memory stores a computer executable program, typically machine readable code. The computer readable program may be executed by the processor to enable an electronic device to perform the method of the invention, or at least some of the steps of the method.
The memory may include volatile memory, such as Random Access Memory (RAM) and/or cache memory, and may also be non-volatile memory, such as read-only memory (ROM).
It should be understood that elements or components not shown in the above examples may also be included in the electronic device of the present invention. For example, some electronic devices further include a display unit such as a display screen, and some electronic devices further include a human-computer interaction element such as a button, a keyboard, and the like. Electronic devices are considered to be covered by the present invention as long as the electronic devices are capable of executing a computer-readable program in a memory to implement the method of the present invention or at least a part of the steps of the method.
From the above description of the embodiments, those skilled in the art will readily appreciate that the present invention can be implemented by hardware capable of executing a specific computer program, such as the system of the present invention, and electronic processing units, servers, clients, mobile phones, control units, processors, etc. included in the system. The invention may also be implemented by computer software for performing the method of the invention, e.g. control software executed by a microprocessor, an electronic control unit, a client, a server, etc. It should be noted that the computer software for executing the method of the present invention is not limited to be executed by one or a specific hardware entity, and can also be realized in a distributed manner by non-specific hardware. For computer software, the software product may be stored in a computer readable storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or may be distributed over a network, as long as it enables the electronic device to perform the method according to the present invention.
Example one
The embodiment is constructed for the problem of commodity sales volume prediction of retail stores during the spring festival;
referring to fig. 2, the order processing procedure for the retail store sales data of the present embodiment includes:
the sales quantity of the sales orders of the retail stores is extracted, and the sales orders with the sales quantity less than or equal to zero are removed;
the sales quantity of the sales orders of the retail stores is completely extracted, and the sales orders with the missing sales quantity or the wrong data format are subjected to data filling or data format correction;
aggregating the sales order data subjected to abnormal order processing and missing value filling according to days to obtain daily sales data of all commodities of the retail store, and sequencing the commodities and the dates in sequence;
dividing the processed data set into a training data set and a testing data set according to a set proportion, wherein the ratio of daily sales volume records of each commodity in the two data sets is about 8;
and performing data format unified processing on the data set and checking missing values of each line of data to form a commodity sales standard data set for use by a subsequent model.
Optionally, the sales of all commodities in the data set in all spring festival periods can be supplemented completely, wherein the dates without sales records are filled with 0, so that the subsequent model can capture the complete sales change rule of the commodities in the spring festival period;
optionally, abnormal value inspection and processing may be performed on all daily sales data of each commodity in sequence, and daily sales data with abnormal high or low may be removed or smoothed;
alternatively, commodities in the sales order data set with no or very few sales records in the spring festival period can be eliminated, and the size of the whole data set is reduced.
Further, the screening process for the retail store whole commodities in the embodiment includes:
converting the date data into characteristic items, merging the characteristic items into a commodity sales standard data set, wherein the characteristic items comprise year numbers, quarter numbers, month numbers, week numbers and day numbers, and converting the characteristic items into integer data formats for storage so as to be directly input into an XGboost model for use in the following;
converting the weather data into characteristic items, merging the characteristic items into a commodity sales volume standard data set, including weather states, temperature equalization in the daytime, temperature equalization at night, wind power and air quality, and converting the characteristic items into an integer data format for storage so as to be directly input into an XGboost model for use in the subsequent process;
optionally, weather data information can be expanded, precipitation, atmospheric pressure, humidity, wind direction and extreme weather early warning information are converted into an integer data format and are incorporated into a commodity sales standard data set as a feature item;
converting all levels of category information of the commodities into characteristic items, merging the characteristic items into a commodity sales standard data set, wherein the characteristic items comprise department numbers, major numbers, middle numbers, minor numbers and commodity numbers, and converting the characteristic items into integer data formats for storage so as to be directly input into an XGboost model for use;
the commodity unit price information is converted into feature items, the feature items are merged into a commodity sales standard data set, and the feature items are all converted into a floating point type data format to be stored so as to be directly input into an XGboost model for use in the following process;
optionally, multiple price information of the commodity, such as standard unit price, sale unit price and the like, can be converted into a floating-point data format and incorporated into a commodity sales standard data set as a feature item;
inputting the data sets merged into the four major characteristic items into a reference model taking an XGboost model as a main body for training, and giving out sales forecast values of all commodities in a spring festival period to form a commodity spring festival forecast sales data set;
as an optional implementation manner of the invention, the parameter optimization of the XGboost model is carried out aiming at the whole training data set, and the XGboost model parameter is optimized by utilizing grid search;
storing the optimal parameters, and calling the group of parameters to construct and predict a model when predicting sales volume;
important parameters needing to be optimized and adjusted in the XGboost model training comprise boost, objective, learning _ rate, max _ depth, n _ estimators, n _ jobs, subsample, colsample _ byte and colsample _ byte, and the important parameters respectively correspond to a model solving mode, a loss function, a model learning rate, the maximum depth of a tree, the number of submodels, the number of parallel lines, a training subsample proportion, a proportion of characteristic sampling during tree building and a proportion of characteristic sampling during tree node splitting;
classifying the commodity spring festival forecast sales volume data set according to commodities, sequentially calculating the error between the sales volume forecast value and the true value in each commodity spring festival period, and calculating by adopting a WMAPE formula;
calculating the average value of all commodity prediction errors WMAPE, and screening commodities by taking the selected multiple of the average value as a threshold value/critical value, namely screening out commodities with the prediction errors WMAPE larger than the threshold value/critical value to form a screened commodity list;
integrating daily sales data of the screened commodities and corresponding characteristic item (date characteristic, weather characteristic, commodity coding characteristic and commodity price characteristic) data into a screened commodity data set according to the screened commodity list;
dividing the screened commodity data set into a training data set and a testing data set according to a selected dividing point, wherein the ratio of daily sales volume records of each commodity in the two data sets is about 8;
alternatively, an optimal threshold value may be selected by plotting the prediction errors WMAPE of all the commodities to select an appropriate threshold value selection range and sequentially testing the selection range for a plurality of threshold values.
Further, in this embodiment, the feature item expansion for screening the commodity sales volume data set includes three items, namely a spring festival period feature item, a week weight index and a day weight index feature item, and the processing procedure includes:
the construction process of the spring festival cycle characteristic item comprises the following steps:
integrating and extracting the dates of spring festival statutory seven-day holidays and the dates of one week before and after the holidays in the screened commodity sales data set to form a complete spring festival sales cycle each year;
numbering the dates of three intervals, namely the interval before the spring festival, the period of the spring festival and the interval after the spring festival respectively (sequentially coded by {1,2,3 }), thereby forming a spring festival period interval coding characteristic item to identify each interval in the spring festival period every year, and the value of the characteristic item on other dates is 0;
further, the dates of the previous week of the spring festival are sequentially coded with { -7, -6, -5, -4, -3, -2, -1}, and then the dates of the spring festival period and the next week are sequentially coded with {1,2,3. }, so that the date order in the spring festival period of each year is labeled, while the value of the feature item on the other dates is 0;
sequentially merging the constructed spring festival cycle characteristic items into a screened commodity data set;
alternatively, if commodity sales in a plurality of holiday periods are predicted at the same time, a holiday category coding feature item can be added to identify the dates of different holiday periods, so that a subsequent model can distinguish the different holiday periods;
alternatively, if commodity sales in a plurality of holiday periods are predicted simultaneously, and partial overlapping of different holiday periods exists, a holiday overlap identification characteristic item can be added to identify the date of the overlapping of the different holiday periods, so that the subsequent model can reasonably learn sales of the overlapping dates and give corresponding sales predicted values when prediction is made;
alternatively, if the commodity sales volume in a plurality of holidays are predicted simultaneously, and a plurality of holidays have obvious classifications (such as common holidays, legal holidays and the like), a holiday major class code can be added to identify the classes of different holidays, so that the subsequent model can learn the change rule of the commodity sales volume in each major holiday respectively.
The construction process of the characteristic items of the week weight index and the day weight index comprises the following steps:
sequentially calculating the average sales volume of each commodity from Monday to Sunday in the screened commodity data set, setting the day weight index of the day with the lowest average sales volume as 1.0, and dividing the average sales volume of the rest 6 days by the lowest value to obtain the day weight index from Monday to Sunday;
sequentially adding the daily weight index of each commodity from Monday to Sunday to serve as the weekly weight index of the corresponding commodity;
and (4) incorporating the daily weight index and the weekly weight index of all the commodities into the screened commodity data set as characteristic items.
Further, the training and predicting process of the holiday sales volume prediction model of the embodiment includes:
inputting the screened commodity sales data set which is merged into the feature items of the spring festival cycle, the week weight index and the day weight index into a festival and holiday sales prediction model which takes an XGboost model as a main body for training, and giving out a sales prediction value of the screened commodity in the spring festival;
optionally, the overall prediction effect of the holiday sales prediction model can be analyzed by calculating a corresponding prediction error WMAPE value according to the commodity sales prediction value;
as an optional implementation manner of the invention, parameter optimization of the XGBoost model is performed for the whole training data set, and the XGBoost model parameters are optimized by using grid search;
storing the optimal parameters, and calling the group of parameters to construct and predict a model when predicting sales volume;
important parameters needing optimization and adjustment in the XGboost model training comprise boost, objective, learning _ rate, max _ depth, n _ estimators, n _ jobs, subsample, colsample _ byte and colsample _ byte, and the important parameters respectively correspond to a model solving mode, a loss function, a model learning rate, the maximum depth of a tree, the number of sub-models, the number of parallel lines, a training sub-sample ratio, a ratio of feature sampling during tree building and a ratio of feature sampling during tree node splitting.
The method for predicting the commodity sales volume in the holiday of the festival based on order processing, commodity screening and characteristic expansion gives a predicted value of the commodity sales volume in the spring festival, and the sales and replenishment plan of retail stores in the spring festival can be guided after summary processing.
The embodiment of the invention provides an implementation process of a holiday commodity sales volume prediction method based on order processing, commodity screening and characteristic expansion, wherein in the order processing process, retail store sales order data are processed into a data set in a standard format, a commodity set needing further improvement of holiday sales volume prediction effect is screened out from a prediction result obtained after a reference model is input, and then corresponding characteristic items are constructed for the screened commodity set and a target holiday to expand the data set, so that a holiday sales volume prediction model can effectively predict sales volume values of commodities in the holiday period, and a model parameter optimization strategy can ensure the generalization capability of the overall prediction performance of the model.
The above embodiments are only used for illustrating the invention and not for limiting the technical solutions described in the invention, and although the present invention has been described in detail in the present specification with reference to the above embodiments, the present invention is not limited to the above embodiments, and therefore, any modification or equivalent replacement of the present invention is made; all such modifications and variations are intended to be included herein within the scope of this disclosure and the appended claims.
Claims (10)
1. A method for predicting commodity sales in holidays is characterized by comprising the following steps:
acquiring historical sales data of all goods sold by stores to form a goods sales standard data set;
inputting the commodity sales standard data set into a machine learning reference model, screening screened commodities with prediction results lower than a preset threshold value according to the prediction precision of the commodity target holiday sales, and forming a screened commodity data set based on historical sales data of the screened commodities;
performing holiday feature item expansion and commodity weight index feature item expansion on the screened commodity data set;
inputting the screened commodity data set after the feature item expansion into a machine learning sales amount prediction model to obtain the sales amount prediction value of all commodities sold by the store in the target holiday.
2. The method according to claim 1, wherein the step of inputting the commodity sales standard data set into a machine learning reference model and screening the screened commodities having the prediction results lower than a preset threshold value according to the prediction accuracy of the commodity target holiday sales comprises the steps of:
inputting the commodity sales standard data set into a machine learning reference model for model training;
the machine learning reference model after the operation training is finished gives out sales prediction values of all the commodities sold by the store on the test set;
sequentially calculating a sales prediction error value of each commodity based on the sales prediction values of all the commodities sold by the stores;
selecting a prediction error value critical point from sales prediction error values of all goods sold in a store, screening out goods with the sales prediction error values higher than the prediction error value critical point, and summarizing the goods into a screened goods list;
optionally, the machine learning reference model is constructed by taking an XGBoost model as a main model;
optionally, date data, and/or weather data of the area to which the store belongs, and/or commodity category data, and/or commodity price data are input into the machine learning reference model to assist in the prediction.
3. The method according to claim 1 or 2, wherein the expanding the holiday feature term for the screened commodity data set comprises:
counting the dates of the target festivals and holidays, respectively expanding the date of each target festivals and holidays forwards by a first preset time interval Ts1 and backwards by a second preset time interval Ts2 to form a complete target festivals and holiday cycle, and constructing corresponding festivals and holiday characteristic items according to the complete target festivals and holiday cycle;
respectively numbering the dates of a first preset time interval Ts1 before each target holiday, a target holiday period and a second preset time interval Ts2 after the target holiday, so as to form a target holiday period interval coding feature item to identify each interval in each holiday period, wherein the number of the holiday feature item on other dates is 0;
optionally, the first preset time interval Ts1 is equal to the second preset time interval Ts2.
4. The method according to claim 3, wherein in the process of performing holiday feature item expansion on the screened commodity data set, if commodity sales in a plurality of target holiday periods are predicted at the same time, holiday category coding feature items are added to identify dates of different target holiday periods, so that the machine learning sales prediction model can distinguish different target holiday periods.
5. The method according to claim 3, wherein in the process of expanding the holiday feature items for the screened commodity data set, if commodity sales in a plurality of target holiday periods are predicted at the same time and different target holiday periods are partially overlapped, the dates on which the different target holiday periods are overlapped are identified by adding holiday overlap identification feature items, so that the machine learning sales prediction model can reasonably learn sales of different target holidays on the overlapped dates and give predicted sales of corresponding target holidays during prediction.
6. The method as claimed in claim 3, wherein in the process of expanding the holiday feature items for the screened commodity data set, if commodity sales in a plurality of target holiday periods are predicted at the same time and obvious classifications exist in a plurality of target holidays, the classes of different holidays are identified by adding holiday major codes, and the method is used for the machine learning sales prediction model to learn the change law of commodity sales in each major holiday target holiday period.
7. The method according to claim 1, wherein the expanding of the commodity weight index feature term for the screened commodity data set comprises:
circularly recording the data in the screened commodity data set according to a cycle period T, summarizing and calculating the sales volume of each day with the same cycle period to obtain the average sales volume of the cycle day in the cycle period, sequentially calculating the average sales volume of each cycle day in the cycle period, selecting one cycle day as a reference cycle day, setting the weight index of the reference cycle day as a, setting the weight index of other cycle days as (Q/Q0) a, wherein Q is the average sales volume of other cycle days, and Q0 is the average sales volume of the reference cycle day to obtain the daily weight index characteristic of each cycle day in the cycle period;
sequentially adding the daily weight index of each circulation day of each commodity as the circulation period weight index characteristic of the corresponding commodity;
optionally, sequentially screening and calculating the average value of sales of all Mondays of each commodity in the commodity data set and the average value of sales of all Tuesdays until the average value of sales of all Sundays, setting the day with the lowest average sales as a reference cycle day, setting the daily weight index of the reference cycle day as a, and then using the weight coefficient of the rest 6 days as (Q/Q0) × a, wherein Q is the average sales of other days, and Q0 is the average sales of the reference cycle day, so as to obtain the daily weight index characteristics from Monday to Sunday;
sequentially adding the daily weight index of Monday to Sunday (preset period Tm) of each commodity as the weekly weight index characteristic of the corresponding commodity;
and merging the daily weight index features and the weekly weight index features of all the commodities into a screened commodity data set.
8. The method for predicting the commodity sales volume in holidays according to claim 1, wherein the step of obtaining historical sales volume data of all commodities sold in stores and forming a commodity sales volume standard data set comprises the steps of:
performing data preprocessing steps of abnormal order processing, missing value processing, daily sales volume aggregation and data set division on historical sales volume data of the commodities sold by the store, thereby forming a commodity sales volume standard data set;
the abnormal order processing is to delete the orders of which the sales number is less than or equal to a preset sales volume threshold;
the missing value processing is to delete or fill the order with missing sales value;
daily sales volume aggregation is to sum all sales orders according to the commodities and dates to form a daily sales volume record data set of each commodity;
and the data set division is to divide the commodity sales data set after being processed into a training set and a testing set according to a selected time interval, and the training set and the testing set are used for training and testing the machine learning reference model.
9. A festival and holiday commodity sales amount prediction device is characterized by comprising:
the data processing module is used for acquiring historical sales data of all commodities sold by stores to form a commodity sales standard data set;
the commodity data screening module is used for inputting the commodity sales standard data set into a machine learning reference model, screening screened commodities of which the prediction results are lower than a preset threshold value according to the prediction precision of the commodity target holiday sales, and forming a screened commodity data set based on the historical sales data of the screened commodities;
the characteristic item expansion module is used for performing holiday characteristic item expansion and commodity weight index characteristic item expansion on the screened commodity data set;
and the sales prediction module is used for inputting the screened commodity data set after the feature items are expanded into the machine learning sales prediction model to obtain the sales prediction value of all commodities sold by the store in the target holiday.
10. A computer storage medium storing a computer executable program, wherein the computer executable program when executed implements a method for predicting sales of holiday products according to any one of claims 1 to 8.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211655938.8A CN115860800A (en) | 2022-12-22 | 2022-12-22 | Festival and holiday commodity sales volume prediction method and device and computer storage medium |
CN202310380967.6A CN116579804A (en) | 2022-12-22 | 2023-04-11 | Holiday commodity sales prediction method, holiday commodity sales prediction device and computer storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211655938.8A CN115860800A (en) | 2022-12-22 | 2022-12-22 | Festival and holiday commodity sales volume prediction method and device and computer storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115860800A true CN115860800A (en) | 2023-03-28 |
Family
ID=85653815
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211655938.8A Pending CN115860800A (en) | 2022-12-22 | 2022-12-22 | Festival and holiday commodity sales volume prediction method and device and computer storage medium |
CN202310380967.6A Pending CN116579804A (en) | 2022-12-22 | 2023-04-11 | Holiday commodity sales prediction method, holiday commodity sales prediction device and computer storage medium |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310380967.6A Pending CN116579804A (en) | 2022-12-22 | 2023-04-11 | Holiday commodity sales prediction method, holiday commodity sales prediction device and computer storage medium |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN115860800A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116151861A (en) * | 2023-04-21 | 2023-05-23 | 杭州比智科技有限公司 | Sales volume prediction model constructed based on intermittent time sequence samples and construction method |
CN116188061A (en) * | 2023-04-27 | 2023-05-30 | 北京永辉科技有限公司 | Commodity sales predicting method and device, electronic equipment and storage medium |
-
2022
- 2022-12-22 CN CN202211655938.8A patent/CN115860800A/en active Pending
-
2023
- 2023-04-11 CN CN202310380967.6A patent/CN116579804A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116151861A (en) * | 2023-04-21 | 2023-05-23 | 杭州比智科技有限公司 | Sales volume prediction model constructed based on intermittent time sequence samples and construction method |
CN116188061A (en) * | 2023-04-27 | 2023-05-30 | 北京永辉科技有限公司 | Commodity sales predicting method and device, electronic equipment and storage medium |
CN116188061B (en) * | 2023-04-27 | 2023-10-17 | 北京永辉科技有限公司 | Commodity sales predicting method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN116579804A (en) | 2023-08-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2002353396B2 (en) | Sales optimization | |
US11372896B2 (en) | Method and apparatus for grouping data records | |
US11790383B2 (en) | System and method for selecting promotional products for retail | |
US11875367B2 (en) | Systems and methods for dynamic demand sensing | |
CN115860800A (en) | Festival and holiday commodity sales volume prediction method and device and computer storage medium | |
EP3876177A1 (en) | System and method for retail price optimization | |
CN110555578B (en) | Sales prediction method and device | |
CN109740624B (en) | Logistics supply chain demand prediction method based on big data | |
US11537825B2 (en) | Systems and methods for features engineering | |
WO2021072128A1 (en) | Systems and methods for big data analytics | |
CN111768243A (en) | Sales prediction method, prediction model construction method, device, equipment and medium | |
CN114782065A (en) | Commodity sales volume prediction method and device based on model combination and storage medium | |
CN111784385A (en) | Manufacturing industry-oriented client portrait construction method and device and computer storage medium | |
US20230418563A1 (en) | Dynamic application builder for multidimensional database environments | |
CN117076770A (en) | Data recommendation method and device based on graph calculation, storage value and electronic equipment | |
JP2007122264A (en) | Prediction system for management or demand and prediction program to be used for the same | |
CN116308494A (en) | Supply chain demand prediction method | |
CN115689713A (en) | Abnormal risk data processing method and device, computer equipment and storage medium | |
CN112669093A (en) | Ocean economy prediction method, system, electronic device and storage medium | |
CN117520624B (en) | Configuration and calculation method and device for big data index | |
CN116308465B (en) | Big data analysis system based on mobile payment | |
CN113095870B (en) | Prediction method, prediction device, computer equipment and storage medium | |
CN113240353B (en) | Cross-border e-commerce oriented export factory classification method and device | |
Polam | Sales and Logistics Analysis in E-Commerce using Machine Learning Models: UK | |
Jebali | Transforming Retail Dynamics: Exploration of a Machine Learning Approach for Sales Forecasting: Case Study of an Athletic Digital Retailer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20230328 |
|
WD01 | Invention patent application deemed withdrawn after publication |