CN116596580A - Seasonal clothing short-term sales prediction method based on weather forecast information - Google Patents

Seasonal clothing short-term sales prediction method based on weather forecast information Download PDF

Info

Publication number
CN116596580A
CN116596580A CN202310579820.XA CN202310579820A CN116596580A CN 116596580 A CN116596580 A CN 116596580A CN 202310579820 A CN202310579820 A CN 202310579820A CN 116596580 A CN116596580 A CN 116596580A
Authority
CN
China
Prior art keywords
data
sales
characteristic
seasonal
weather
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310579820.XA
Other languages
Chinese (zh)
Inventor
韩曙光
吕杰妮
胡觉亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Sci Tech University ZSTU
Original Assignee
Zhejiang Sci Tech University ZSTU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Sci Tech University ZSTU filed Critical Zhejiang Sci Tech University ZSTU
Priority to CN202310579820.XA priority Critical patent/CN116596580A/en
Publication of CN116596580A publication Critical patent/CN116596580A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Artificial Intelligence (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Databases & Information Systems (AREA)
  • Algebra (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Fuzzy Systems (AREA)
  • Evolutionary Biology (AREA)
  • Marketing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Operations Research (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Game Theory and Decision Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Economics (AREA)
  • Medical Informatics (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a seasonal clothing short-term sales prediction method based on weather forecast information, which comprises the following steps: s1, collecting sales data and weather data corresponding to sales days; s2, summarizing and sorting the sales data and the weather data according to a time sequence to obtain time sequence summarized data of the sales data with the weather information; s3, carrying out characteristic variable analysis and characteristic variable screening on the time sequence summarized data; s4, dividing a data set formed by the characteristic index data and the sales data according to a rolling time window method; s5, establishing a Stacking integrated regression model; s6, inputting the new characteristic index data into a Stacking integrated learning model to predict seasonal clothing sales. The method has the characteristics of effectively processing the nonlinear relation between short-term weather and seasonal clothing, improving the prediction effect of a single model on seasonal clothing sales and improving the generalization of the model.

Description

Seasonal clothing short-term sales prediction method based on weather forecast information
Technical Field
The invention relates to a clothes sales predicting method, in particular to a seasonal clothes short-term sales predicting method based on weather forecast information.
Background
The clothing retail market competition is vigorous, the reasonable, reliable and accurate sales prediction is significant to the clothing industry, the clothing retail market competition is a precondition for the scientific control production of enterprises, profit loss caused by commodity stock backlog can be effectively prevented, and the profit margin and the overall benefit of the enterprises are improved. However, the clothing product has the properties different from other products, has strong fashion and seasonality, and has short life cycle and long production advance period. These factors all contribute to a significant increase in the risk and difficulty of sales prediction for clothing items. One of the major uncertainty factors is the weather, which has a significant impact on the sales business of clothing.
At present, many students have studied clothing sales predictions, for example: patent 1: an artificial intelligence based clothing sales predicting method, system and equipment with publication number of CN112686713A and publication date of 2021.04.20 discloses: the extracted clothing sales data and executing data preparation; identifying distorted sales data in the garment sales data and performing data cleaning and data repair; based on the cleaned clothing sales data, establishing an artificial intelligence-based forecasting algorithm model, and forecasting sales volume trend of the next sales period; performing air temperature correction; and performing holiday and promotional program corrections. Patent 2: the utility model discloses a clothes sales prediction method and system based on artificial intelligence and big data, with publication number CN113506144A and publication date 2021.10.15, which discloses: and processing the clothing collocation image through the clothing analysis neural network to obtain clothing styles and feature vectors of each clothing item, and constructing a style feature distribution map through the clothing styles and the feature vectors. And obtaining the clothing style heat and the clothing single product heat through the network platform data. And resetting pixel values of corresponding positions of the style characteristic distribution graphs according to the style heat of the clothing and the individual heat of the clothing to obtain a clothing individual heat graph of each individual clothing, and calibrating and updating the clothing individual heat graph through a clustering center of each style characteristic distribution graph to obtain a standard clothing individual heat graph. And obtaining predicted clothing sales according to the standard clothing single-article heat map.
Although the above-mentioned published patents all disclose a method for predicting sales of clothes, the method disclosed in patent 1 is based on the clothes sales data after cleaning, establishes a model of a prediction algorithm based on artificial intelligence, and performs certain correction on the model result after obtaining the basic prediction result of the model, which increases the prediction steps, and the accuracy of the corrected result of the model is not high; the method disclosed in patent 2 takes clothing style emotion tendencies, clothing single article emotion tendencies, clothing style evaluation point praise numbers, clothing single article evaluation point praise numbers and clothing single article sales amounts as influencing factors, and needs to collect a large amount of data from a network platform, wherein the data comprise text data and picture data, are usually sparse and disordered, the data collection time is long, and the data processing cost is high; also, none of the above-mentioned published patents disclose an impact on the sales of garments with respect to short term weather.
Disclosure of Invention
The invention aims to provide a seasonal clothing short-term sales prediction method based on weather forecast information. The method has the characteristics of effectively processing the nonlinear relation between short-term weather and seasonal clothing, improving the prediction effect of a single model on seasonal clothing sales and improving the generalization of the model.
The technical scheme of the invention is as follows: a seasonal clothing short-term sales prediction method based on weather forecast information comprises the following steps:
s1, data collection: collecting sales data and weather data corresponding to sales days;
s2, data processing: summarizing and sorting the sales data and the weather data according to the time sequence to obtain time sequence summarized data of the sales data with the weather information;
s3, feature engineering treatment: carrying out characteristic variable analysis and characteristic variable screening on the time sequence summarized data to obtain characteristic index data;
s4, dividing a training set and a verification set of a data set formed by the characteristic index data and the sales data according to a rolling time window method;
s5, establishing a Stacking integrated regression model;
s6, inputting the new characteristic index data into a Stacking integrated regression model to predict seasonal clothing sales.
In the foregoing seasonal apparel short-term sales prediction method based on weather forecast information, sales data in step S1 includes: product basic attributes, historical sales data.
In the method for predicting the short-term sales volume of seasonal clothing based on weather forecast information, in step S2, the missing values are filled up by using a near-average method during the sales data processing, and the abnormal values are deleted and marked by using a box graph.
In the seasonal clothing short-term sales volume prediction method based on weather forecast information, deleting and marking abnormal values by using a box graph specifically comprises the following steps: when the value of a data point in the box graph exceeds the upper and lower limits and the data point has no periodical trend and is a sudden change caused by an irreproducible event factor, the data point is judged to be an abnormal point and is deleted.
In the foregoing seasonal apparel short-term sales prediction method based on weather forecast information, in step S3, the feature variable analysis includes the following steps:
a. performing time sequence removing operation on the time sequence summary data according to the sales date, and increasing characteristic variables of years, months, days, weeks, working days and non-working days;
b. the characteristic variables of the wind-cold index and the somatic temperature sensitivity are increased;
c. according to the national calendar, holiday and promotional event day characteristic variables are added.
In the foregoing seasonal apparel short-term sales prediction method based on weather forecast information, in step S3, feature variable screening includes the following steps:
a. and (3) establishing characteristic indexes: dividing characteristic variables and tag data according to the time sequence summary data;
b. screening characteristic indexes: according to the divided characteristic variables, the first 15 characteristic variables with the highest correlation coefficients with the tag data are selected by a pearson correlation coefficient method to serve as characteristic indexes.
In the seasonal clothing short-term sales prediction method based on weather forecast information, if the feature index contains category data, the category data is subjected to feature coding.
In the foregoing seasonal apparel short-term sales prediction method based on weather forecast information, the step S5 specifically includes the following steps:
s501, establishing a Stacking integrated learning model: selecting a random forest, XGB and GBDT as a base learner of a first layer, selecting linear regression as a meta learner of a second layer, and establishing a Stacking integrated learning model;
s502, acquiring a Stacking integrated regression model: and inputting the data of the training set and the verification set into a Stacking integrated learning model to perform model training and super-parameter adjustment, and obtaining a final Stacking integrated regression model.
In the foregoing seasonal apparel short-term sales prediction method based on weather forecast information, in step S6, the new characteristic index data is characteristic index data including weather information characteristics.
Compared with the prior art, the invention has the beneficial effects that:
according to the method, the future daily sales of the seasonal clothing product is predicted by using the recent sales data of the product and the future short-term weather forecast data to establish the Stacking integrated learning model, and the weather information is added into the model, so that the nonlinear relation between the weather and the seasonal clothing sales can be effectively processed, a large amount of sales data can be effectively processed, the prediction error is reduced, and the accuracy of model sales prediction is improved; the influence of weather on the seasonal clothing sales plan is reduced to the greatest extent, the seasonal clothing sales short-term plan is well referenced, and the inventory quantity is reduced;
the adopted Stacking integrated learning model improves the prediction effect of a single model on seasonal clothing sales, further reduces seasonal clothing sales prediction errors, improves model sales prediction accuracy, improves model overall generalization, and avoids risks of local minima caused by selecting a single model. The data set is divided by adopting a rolling time window, the model is subjected to parameter adjustment by adopting random grid search, the modeling times and the search space of parameters are reduced, and the running speed of the model is improved on the basis of ensuring the precision, so that the running speed and the precision of the model are faster under the condition of facing big data.
In addition, the method also adds somatosensory temperature, wind-cold index, holiday and promotion day information into the model, improves the robustness of the model, and can effectively predict clothing sales in holiday.
The time sequence summary data is processed in a time sequence mode, the time sequence summary data is split into specific month, week and other data, meanwhile, holidays, major sales days and other information are marked, and a complete and comprehensive characteristic index is constructed, so that the time sequence summary data can be suitable for day-level data.
Drawings
FIG. 1 is a schematic diagram of the structure of the present invention;
fig. 2 is a data variable diagram included in the sample data of example 2;
FIG. 3 is a padding of missing values in sample data of example 2;
FIG. 4 is an identification chart of outliers in the sample data of example 2;
FIG. 5 is a graph of labeling and processing outliers in sample data of example 2;
FIG. 6 is a thermodynamic diagram of the first 15 variables with the highest correlation to seasonal sales in the sample data of example 2;
FIG. 7 is a graph of a rolling time window division in sample data of example 2;
fig. 8 is a random grid search pattern in sample data of example 2.
Detailed Description
The invention is further illustrated by the following examples, which are not intended to be limiting.
Example 1:
as shown in fig. 1, a seasonal apparel short-term sales prediction method based on weather forecast information includes the following steps:
s1, data collection: collecting sales data and weather data corresponding to sales days;
s2, data processing: summarizing and sorting the sales data and the weather data according to the time sequence to obtain time sequence summarized data of the sales data with the weather information;
when sales data is processed, the missing values are filled in by using a near-average method, and abnormal values are deleted and marked by using a box graph.
S3, feature engineering treatment: characteristic variable analysis and characteristic variable screening are carried out on characteristic variables in the time sequence summary data, so that characteristic index data are obtained;
s301, characteristic variable analysis:
a. and (3) performing time-sequence removing operation on the time sequence summary data according to the sales date, decomposing the sales date of the clothing, and increasing the characteristic variables such as the year, month, day, week, working day and non-working day.
b. The characteristic variables of the wind-cold index and the somatic temperature sensitivity are increased.
Wind-cold index:
K=13.127+0.6215T-13.947v 0.16 +0.486Tv 0.16
where K represents the wind-cold index, v represents wind speed (miles per hour), and T represents air temperature (degrees Celsius).
Body temperature sensing:
HI=-42.379+2.049*T+10.1433*RH-0.2248*T*RH
-6.83783*10 -3 *T 2 -0.5481717*RH 2 +1.22874*10 -2 *T 2 *RH
+0.0085282*RH 2 -1.99*10 -6 *T 2 *RH 2
where HI is the thermal index in degrees Fahrenheit for the apparent temperature, T is the temperature in degrees Fahrenheit and RH is the relative humidity in percent.
If the relative humidity is less than 13% and the temperature is between 80 and 112 degrees Fahrenheit, the following adjustments are subtracted from HI:
if the relative humidity is greater than 85% and the temperature is between 80 and 87 degrees Fahrenheit, the following adjustments are added to the HI:
c. and marking all legal holidays and common clothing sales promotion days according to the national calendar, and adding characteristic variables such as holidays, sales promotion days and the like. The clothing sales promotion activities are as long as goddess festival, six, eight, twenty, etc.
S302, screening characteristic variables:
a. and (3) establishing characteristic indexes: dividing characteristic variables and tag data according to the time sequence summary data; the characteristic variables comprise time, temperature and the like, and the tag data are sales data;
b. screening characteristic indexes: according to the divided characteristic variables, selecting the first 15 characteristic variables with the highest correlation coefficients with sales data by using a Pearson correlation coefficient method as characteristic indexes;
c. characteristic index coding: if the feature index contains category data, the category data is feature-coded as shown in table 1:
table 1 feature encoding of category data
S4, dividing a data set formed by the characteristic index data and the sales data into a training set and a verification set according to a rolling time window method; the training set comprises 15 characteristic indexes x+1 sales data y; the validation set is also 15 feature indicators x+1 sales data y.
S5, establishing a Stacking integrated regression model:
s501, establishing a Stacking integrated learning model: random forests, XGB and GBDT are selected as a base learner of a first layer, linear regression is selected as a meta learner of a second layer, and a Stacking integrated learning model is established.
S502, acquiring a Stacking integrated regression model: and inputting the divided training set and verification set data into a Stacking integrated learning model for training to obtain a final Stacking integrated regression model.
S6, evaluating the finally obtained Stacking integrated regression model, taking the new 15 characteristic index data containing the weather data as a test set to be brought into the Stacking integrated regression model, and carrying out seasonal clothing sales prediction.
The new 15 characteristic index data are average retail unit price/average discount rate over a certain number of days in the future + weather forecast data, for example: the new 15 characteristic index data adopts average retail unit price/average discount rate in the future 30 days and weather forecast data, and is brought into a Stacking integrated regression model, so that the seasonal clothing short-term sales in the future 30 days can be predicted.
Example 2:
taking Hangzhou women's fashion retail brand as an example, a seasonal clothing short-term sales prediction method based on weather forecast information comprises the following steps:
s1, data collection: sales data in the store and weather data corresponding to the sales day are collected.
Sales data: daily sales transaction data from 2018 to 2019 in all physical store POS systems in the Hangzhou area of a fashion retail brand for a woman add up to twenty-thousand records as shown in table 2.
Table 2 daily sales transaction data in a Hangzhou women's brand POS system
Weather data:
the weather data is derived from a comatic weather data website, and data with day as granularity are collected, wherein the weather data comprise 12 weather variables including average temperature, highest temperature, lowest temperature, humidity, wind speed, wind level, daily rainfall, visibility, average total cloud amount, air pressure and weather type.
S2, data processing:
s201, summarizing and sorting sales data and weather data according to a time sequence to obtain total time sequence summarized data of the sales data with weather information;
specifically, the original sales data is first aggregated, and in the time dimension, aggregation is performed in "days"; at the product level, the aggregation is performed in "clothing individual categories". The specific method comprises the following steps: sales data of the same SKU on the same date in different stores are summarized into a table, and then the quantity of different product categories is counted.
When sales data is processed, the missing values are filled in by using a near-average method, and abnormal values are deleted and marked by using a box graph.
Filling of missing values: and (3) carrying out data analysis on sales data by using SPSS software, and according to the result of the data analysis, if individual missing values exist in sales data and continuous blank values of the sales data are within 3 days, taking the condition that no mutation exists in sales at adjacent time intervals and the correlation of the sales is high in a continuous time period into consideration, filling by adopting an adjacent mean value method, namely using the average value of the upper data and the lower data adjacent to the blank values. The filling mode is shown in fig. 3. The specific calculation formula is as follows:
wherein the method comprises the steps ofRepresenting the missing value to be filled in, y i+1 And y i-1 The upper and lower data representing the nearest missing values.
Treatment of outliers:
firstly, drawing box-shaped graphs of sales of clothing products of different types, and finding out abnormal points, as shown in fig. 4; and when the value of the data point in the box graph exceeds the upper and lower limits and the data point has no periodical trend and is a sudden change caused by an irreproducible event factor, judging the data point as an abnormal point and deleting the abnormal point.
Taking outliers of the box-shaped diagram as outliers and combining a time sequence curve thereof, observing outliers generated by data fluctuation, analyzing possible reasons of occurrence of the outliers, finding out factors influencing clothing sales, such as sales promotion activities, new product activities, holiday activities and the like, and marking influence factors possibly having fluctuation at corresponding data fluctuation points or deleting the outliers. The specific method comprises the following steps: for periodic outliers, such as fixed promotional nodes like e-commerce activities, events are noted at and around this point in time to provide directions for accurate predictions later. For the mutation caused by the event factor which has no periodic trend and can not be reproduced, the influence term can not appear in the subsequent time sequence again, and the previous mutation can adversely affect the fitting of the subsequent model. Therefore, to eliminate the impact of these events, we directly cull these outliers. As shown in fig. 5.
For example, in the box graph, sales data is large on day 3 and day 7, because day 3 and day 8 are female festival, sales around this time are high each year, and there is a periodic trend in holidays, and although sales exceed upper and lower bounds in the box graph, such data points are not true outliers, but are marked as holidays without deleting this point.
For example, in the box graph, there is a point that exceeds the upper and lower limits, the time of this point is 4 months and 5 days, there is no holiday or no periodic trend, and a high sales will not necessarily be reproduced in the future. Then this point is determined to be an outlier and is deleted.
And secondly, matching the obtained weather data according to the date, and summarizing the weather data into a sales data table.
Finally, obtaining time sequence summarized data of sales records of different products in Hangzhou area with weather information. The characteristic variables in the summary data are shown in fig. 2.
S3, feature engineering treatment: characteristic variable analysis and characteristic variable screening are carried out on characteristic variables in the time sequence summary data, so that characteristic index data are obtained;
s301, characteristic variable analysis:
a. and (3) performing time-sequence removing operation on the time sequence summary data according to the sales date, decomposing the sales date of the clothing, and increasing the characteristic variables such as the year, month, day, week, working day and non-working day.
b. The characteristic variables such as wind-cold index, somatosensory temperature and the like are increased.
c. And marking all legal holidays and common clothing sales promotion days according to the national calendar, and adding characteristic variables such as holidays, sales promotion days and the like. The clothing sales promotion activities are as long as goddess festival, six, eight, twenty, etc.
S302, screening characteristic variables:
a. and (3) establishing characteristic indexes: dividing characteristic variables and tag data according to summarized data; the results of the divided feature variables are shown in table 3, and the tag data is sales data.
TABLE 3 characterization variable Table included in summary data
When the feature variables are too many, dimension disasters may be caused, so that the model effect is reduced. Too much data added and brought in can also cause the increase of calculation time, and influence the training speed of the model. Thus, when many feature variables are created, careful screening of features that are brought into modeling is required.
b. Screening characteristic indexes: according to the characteristic variables divided in the table 2, the top 15 characteristics with the highest correlation coefficient with sales data are selected as characteristic indexes by using a pearson correlation coefficient method, and the characteristic indexes are shown in the table 4;
table 4 top 15 characteristic variables and correlation coefficients with highest correlation to seasonal sales
The calculation formula of the pearson correlation coefficient method is as follows:
where n is the number of samples and where,and->Is the sample mean of X, Y, p is the correlation coefficient. If 'p' is equal to 1, there is a complete positive correlation between the two values; if 'p' is equal to-1, there is a complete negative correlation between the two values; if 'p' is equal to 0, there is no correlation between the two values.
The thermodynamic diagram of the first 15 variables with the highest relevance to seasonal sales is shown in fig. 6.
c. Characteristic index coding: and if the characteristic index contains the category data, carrying out characteristic coding on the category data.
S4, dividing a data set formed by the characteristic index data and the sales data into a training set and a verification set data according to a rolling time window method, as shown in FIG. 7; each selected validation set was 30 days long.
S5, building a learning model:
s501, establishing a Stacking integrated learning model: random forests, XGB and GBDT are selected as a base learner of a first layer, linear regression is selected as a meta learner of a second layer, and a Stacking integrated learning model is established.
S502, acquiring a Stacking integrated regression model: and inputting the training set and the verification set data into a Stacking integrated learning model to perform training, verification and parameter optimization, so as to obtain a final Stacking integrated regression model.
The method comprises the following steps: the training set is divided into a plurality of subsets, a portion of the subsets are selected for training with the base learner, for each base learner, the remaining portion of the subset of the training set is used for verification, and the verification set is used for testing. The verification results and test results of each base learner are then used as new features to construct a new data set. The element learner is then trained with the newly constructed data set with the test results of the base learner as input and the true value as output. And finally, predicting the verification set by using a trained meta learner, calculating the error between the predicted value and the true value, and performing parameter adjustment and optimization on model parameters by using random grid search to obtain a final Stacking integrated regression model.
The random grid search method is to select partial parameter combinations, construct a super parameter subspace, and search only in the subspace, as shown in fig. 8. Because of the reduction of the search space, the number of parameter sets to be enumerated and compared is correspondingly reduced, the overall search time consumption is reduced, and the method improves the operation speed without excessively damaging the search precision.
And S6, evaluating the finally obtained Stacking integrated regression model, and taking new characteristic index data into the model to predict seasonal clothing sales.
To better verify the accuracy of the model, certain statistics must be employed to evaluate the differences between the model predictions established and the actual values.
MSE, MAE and R are selected 2 The model is evaluated. The MSE is calculated by squaring the error, and if outliers exist in the data, the square of the error calculated from the outliers may be significant. The MAE is calculated by absolute value of error, and compared with MSE, the MAE has better robustness to abnormal points. The smaller the MSE or MAE, the higher the accuracy of the prediction representing the prediction model. Mean square error (Mean Square Error, MSE), mean absolute error (MeanAbsolute Error, MAE) and R 2 The calculation formula of (2) is as follows:
wherein y is i Representing the actual sales of a certain clothing class on the same day.Predicted sales value indicating the current day of a clothing class,/->The average value of the samples of a certain clothing class is represented, and n represents the number of the samples. SST represents the total sum of squares of the dispersion and SSE represents the sum of squares of the residual.
Comparison of experimental prediction results:
respectively inputting the data set 1 and the data set 2 into a Stacking integrated learning model to predict clothing sales, wherein the data set 1 is characteristic index data containing weather data; data set 2 is characteristic index data without weather data, as shown in table 5. The error pairs for clothing sales prediction are shown in table 6.
Table 5 data set 1 and data set 2
Table 6 season style clothing sales prediction error comparison
The experimental results show that: after weather information is added into the sales prediction model, the MSE is reduced by 12%; the MSE of the XGB prediction error is reduced by 10%; GBDT prediction error MSE 9% lower; the prediction error of stacking ensemble learning is reduced by 12%. This illustrates that adding weather to the model does improve the accuracy of seasonal apparel sales predictions. In addition, the prediction error MSE of the Stacking integrated learning model is lower than that of any machine learning single model, and the Stacking integrated learning model added with weather information is reduced by 18% compared with the optimal machine learning single model RF prediction error MSE. This shows that the established Stacking integration strategy can effectively improve the prediction accuracy of the single model. The model improves the generalization performance of the whole model by integrating the advantages of a plurality of learners. And the Stacking fusion model can effectively reduce the overfitting risk of the base learner through the training of the meta model, thereby avoiding the risk of trapping in local minima caused by selecting a single model.

Claims (9)

1. A seasonal clothing short-term sales prediction method based on weather forecast information is characterized by comprising the following steps of: the method comprises the following steps:
s1, data collection: collecting sales data and weather data corresponding to sales days;
s2, data processing: summarizing and sorting the sales data and the weather data according to the time sequence to obtain time sequence summarized data of the sales data with the weather information;
s3, feature engineering treatment: carrying out characteristic variable analysis and characteristic variable screening on the time sequence summarized data to obtain characteristic index data;
s4, dividing a training set and a verification set of a data set formed by the characteristic index data and the sales data according to a rolling time window method;
s5, establishing a Stacking integrated regression model;
s6, inputting the new characteristic index data into a Stacking integrated regression model to predict seasonal clothing sales.
2. The seasonal apparel short-term sales prediction method based on weather forecast information according to claim 1, wherein: the sales data in step S1 includes: product basic attributes, historical sales data.
3. The seasonal apparel short-term sales prediction method based on weather forecast information according to claim 1, wherein: in step S2, when the sales data is processed, the missing value is filled in by using a near-average method, and the abnormal value is deleted and marked by using a box graph.
4. A seasonal apparel short term sales prediction method based on weather forecast information according to claim 3, characterized in that: deleting and marking abnormal values by using a box graph specifically comprises the following steps: when the value of a data point in the box graph exceeds the upper and lower limits and the data point has no periodical trend and is a sudden change caused by an irreproducible event factor, the data point is judged to be an abnormal point and is deleted.
5. The seasonal apparel short-term sales prediction method based on weather forecast information according to claim 1, wherein: in step S3, the feature variable analysis includes the steps of:
a. performing time sequence removing operation on the time sequence summary data according to the sales date, and increasing characteristic variables of years, months, days, weeks, working days and non-working days;
b. the characteristic variables of the wind-cold index and the somatic temperature sensitivity are increased;
c. according to the national calendar, holiday and promotional event day characteristic variables are added.
6. The seasonal apparel short-term sales prediction method based on weather forecast information according to claim 1, wherein: in step S3, the feature variable screening includes the steps of:
a. and (3) establishing characteristic indexes: dividing characteristic variables and tag data according to the time sequence summary data;
b. screening characteristic indexes: according to the divided characteristic variables, the first 15 characteristic variables with the highest correlation coefficients with the tag data are selected by a pearson correlation coefficient method to serve as characteristic indexes.
7. The seasonal apparel short-term sales prediction method based on weather forecast information of claim 6, wherein: and if the characteristic index contains the category data, carrying out characteristic coding on the category data.
8. The seasonal apparel short-term sales prediction method based on weather forecast information according to claim 1, wherein: the step S5 specifically comprises the following steps:
s501, establishing a Stacking integrated learning model: selecting a random forest, XGB and GBDT as a base learner of a first layer, selecting linear regression as a meta learner of a second layer, and establishing a Stacking integrated learning model;
s502, acquiring a Stacking integrated regression model: and inputting the data of the training set and the verification set into a Stacking integrated learning model to perform model training and super-parameter adjustment, and obtaining a final Stacking integrated regression model.
9. The seasonal apparel short-term sales prediction method based on weather forecast information according to claim 1, wherein: in step S6, the new feature index data is feature index data including weather information features.
CN202310579820.XA 2023-05-22 2023-05-22 Seasonal clothing short-term sales prediction method based on weather forecast information Pending CN116596580A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310579820.XA CN116596580A (en) 2023-05-22 2023-05-22 Seasonal clothing short-term sales prediction method based on weather forecast information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310579820.XA CN116596580A (en) 2023-05-22 2023-05-22 Seasonal clothing short-term sales prediction method based on weather forecast information

Publications (1)

Publication Number Publication Date
CN116596580A true CN116596580A (en) 2023-08-15

Family

ID=87595197

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310579820.XA Pending CN116596580A (en) 2023-05-22 2023-05-22 Seasonal clothing short-term sales prediction method based on weather forecast information

Country Status (1)

Country Link
CN (1) CN116596580A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117035848A (en) * 2023-10-10 2023-11-10 山东浪潮新世纪科技有限公司 Instant lottery sales predicting method, device, equipment and medium
CN117710008A (en) * 2024-02-06 2024-03-15 贵州师范大学 Ecological product sales information management system suitable for karst region

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117035848A (en) * 2023-10-10 2023-11-10 山东浪潮新世纪科技有限公司 Instant lottery sales predicting method, device, equipment and medium
CN117710008A (en) * 2024-02-06 2024-03-15 贵州师范大学 Ecological product sales information management system suitable for karst region
CN117710008B (en) * 2024-02-06 2024-04-30 贵州师范大学 Ecological product sales information management system suitable for karst region

Similar Documents

Publication Publication Date Title
CN116596580A (en) Seasonal clothing short-term sales prediction method based on weather forecast information
JP7120649B2 (en) Information processing system, information processing device, prediction model extraction method, and prediction model extraction program
CN109741082B (en) Seasonal commodity demand prediction method based on time series decomposition
CN101783004A (en) Fast intelligent commodity recommendation system
CN116431931B (en) Real-time incremental data statistical analysis method
CN112149352B (en) Prediction method for marketing activity clicking by combining GBDT automatic characteristic engineering
CN113159461A (en) Small and medium-sized micro-enterprise credit evaluation method based on sample transfer learning
JP5031715B2 (en) Product demand forecasting system, product sales volume adjustment system
US20030120370A1 (en) Electric power consumer data analyzing method
CN116579804A (en) Holiday commodity sales prediction method, holiday commodity sales prediction device and computer storage medium
CN116468536A (en) Automatic risk control rule generation method
CN114372848A (en) Tobacco industry intelligent marketing system based on machine learning
WO2021252815A1 (en) Activity level measurement using deep learning and machine learning
CN116304374B (en) Customer matching method and system based on package data
CN111950775A (en) Economic operation monitoring method based on big data
CN115952914A (en) Big data-based electric power metering operation and maintenance work judgment planning method
CN115689713A (en) Abnormal risk data processing method and device, computer equipment and storage medium
CN114971805A (en) Electronic commerce platform commodity intelligent analysis recommendation system based on deep learning
TW202312060A (en) Prediction devices and methods for predicting whether users belong to valuable user groups based on short-term user characteristics, and storage media for storing the methods
Arai et al. Customer Profiling Method with Big Data based on BDT and Clustering for Sales Prediction
Huang et al. Sales forecast for O2O services-based on incremental random forest method
WO2013055257A1 (en) Method for predicting a target for events on the basis of an unlimited number of characteristics
JPH08212191A (en) Commodity sales estimation device
CN116777508B (en) Medical supply analysis management system and method based on big data
Lin et al. A Practical Framework for Forecasting Stock Keeping Unit Level Seasonal Sales

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination