WO2019183973A1 - 基于机器学习的服装销售的预测方法和预测装置 - Google Patents

基于机器学习的服装销售的预测方法和预测装置 Download PDF

Info

Publication number
WO2019183973A1
WO2019183973A1 PCT/CN2018/081470 CN2018081470W WO2019183973A1 WO 2019183973 A1 WO2019183973 A1 WO 2019183973A1 CN 2018081470 W CN2018081470 W CN 2018081470W WO 2019183973 A1 WO2019183973 A1 WO 2019183973A1
Authority
WO
WIPO (PCT)
Prior art keywords
social media
data
sales
network
magazines
Prior art date
Application number
PCT/CN2018/081470
Other languages
English (en)
French (fr)
Inventor
葛仪文
姚磊
廖骁
任智锋
Original Assignee
香港纺织及成衣研发中心有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 香港纺织及成衣研发中心有限公司 filed Critical 香港纺织及成衣研发中心有限公司
Priority to PCT/CN2018/081470 priority Critical patent/WO2019183973A1/zh
Publication of WO2019183973A1 publication Critical patent/WO2019183973A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce

Definitions

  • the present invention relates to the field of information processing technologies, and in particular, to a prediction method and a prediction apparatus for clothing sales based on machine learning.
  • the existing predictions on the fashion trends of fashion products mainly rely on manual screening of information and input systems for prediction, so that information cannot be automatically obtained and filtered, and then input into the system for prediction. Manual screening of information has a large impact on the results and cannot be automated.
  • the prediction results of the existing methods are also limited to the trend analysis of fashion color trends, and can not quantitatively predict the future sales of fashion products of different colors.
  • the existing methods are not able to make different predictions for different products of users, but only for the general fashion market to make predictions under normal circumstances. After obtaining the predicted trend, users still need to make their own production and sales arrangements.
  • the present invention constructs a fashion product sales forecasting model, such as a color sales forecasting model, by collecting social media speeches and then using statistical knowledge and artificial intelligence systems.
  • One technical problem to be solved by the present invention is to effectively utilize natural language processing methods and machine learning and statistical machine model techniques to improve the accuracy of social media speech and color and fashion apparel relationships, and to build a fashion product color sales based on machine learning. Forecast model.
  • the invention provides a prediction method for clothing sales based on machine learning, which comprises the following steps:
  • the social media data includes at least content that is publicly spoken on the social media and information of each of the statements, the information including, for example, one of a publisher, a reading amount, a forwarding amount, a comment amount, and a like number Or multiple or all.
  • the publisher includes at least one or more or all of a brand, a designer, a magazine, and a network red.
  • it also includes calculating the relationship between the social media data and the actual product sales time lag in the forecast.
  • the social media data includes at least social media speech data, Google trend data, and color sales data.
  • the method further includes obtaining a mean squared error MSE of the sales data to obtain the most suitable optimal lag time:
  • X i refers to a certain type of social media data that is standardized for a particular product at a particular time
  • Y i is the standardized actual sales data for a particular product at another specific time lag
  • n represents the summed social media data. The number of species.
  • the apparel sales forecasting model is a linear model
  • the predicted standardized sales volume Y i is obtained by the following equation:
  • Y i AW i1 X i1 +W i2 X i2 +W i3 X i3 –W i4 X i4 +W i5 X i5 +W i6 X i6 –W i7 X i7 –W i8 X i8 –W i9 X i9 +W i10 X i10 +W i11 X i11 +W i12 X i12 -W i13 X i13 +W i14 X i14
  • i denotes the serial number of the specific product for a specific period of time
  • X ij denotes a certain type of social media data standardized for a particular product at a specific time
  • A represents the model matching value.
  • the predictive method further comprises predicting using a support vector regression method.
  • i the serial number of the particular color product for a particular period of time
  • d represents the shortest distance from the optimized hyperplane H of the two sets of data to the nearest positive and negative points, the shortest distance being expressed as 1/
  • X 1 discount rate
  • X 2 suggested retail price
  • X 3 number of occurrences of brands in social media
  • X 4 favorite or liked number of brands in social media
  • X 5 social The number of comments in the brand in the media
  • X 6 number of shares shared by designers in social media
  • X 7 number of magazines in social media
  • X 8 number of shares in magazines in social media
  • X 9 Magazines in social media are liked or liked
  • X 10 number of reviews of magazines in social media
  • X 11 count of network reds in social media
  • X 12 network reds in social media Love or be liked
  • X 13 number of comments from network reds in social media
  • X 14 SVI
  • w and b are parameters to be predicted, which are calculated using Lagrangian multipliers
  • y i refers to Sales volume.
  • two loose variables ⁇ i and ⁇ i* are inserted to resolve the existence of the error to
  • C is the constant of a normalized term in the Lagrange equation
  • X 1 discount rate
  • X 2 suggested retail price
  • X 3 number of occurrences of brands in social media
  • X 4 in social media Brand favorites or likes
  • X 5 number of comments for brands in social media
  • X 6 number of shares shared by designers in social media
  • X 7 number of magazines in social media
  • X 8 Number of magazines shared on social media
  • X 9 favorite or liked by magazines in social media
  • X 10 number of reviews of magazines in social media
  • X 11 network reds in social media Count
  • X 12 favorite or liked number of network reds in social media
  • X 13 number of comments of network reds in social media
  • X 14 SVI
  • w and b are parameters to be predicted, which are used The Lagrangian multiplier is used to calculate
  • y i refers to the sales volume.
  • the method utilizes the apparel sales forecasting model to predict sales of each color of the apparel product.
  • the present invention also provides a prediction device for machine-based clothing sales using the above method, comprising the following modules:
  • the first storage module stores a sales history database including sales history data
  • a collection module that collects social media data from a social media network
  • the second storage module stores a social media database including collected social media data
  • model component module that builds a clothing sales forecasting model using sales history data and social media data
  • a prediction module that predicts apparel sales using the constructed model.
  • the present invention also provides a computer readable storage medium having stored thereon a computer program that, when executed by a processor, implements the following steps:
  • the clothing sales forecasting model is used to predict clothing sales.
  • the computer program when executed by a processor, implements the method according to the above.
  • FIG. 1 illustrates a method of predicting clothing sales based on machine learning in accordance with one embodiment of the present invention.
  • FIG. 2 shows a schematic diagram of whether social media data and color sales have a linear correlation, in accordance with one embodiment of the present invention.
  • Figure 3 shows the percentage of sales of various color garments from 2015 to 2016.
  • Figure 4 shows the percentage of sales per week for each color category from 2015 to 2016.
  • Figure 5 shows the mean square error of the time lag of each standardized variable between 2 and 52 weeks and color sales, respectively, in accordance with one embodiment of the present invention.
  • Figure 6 shows the coefficients of emerging social media data in accordance with one embodiment of the present invention.
  • FIGS. 7A-7D are diagrams showing diagnostic analysis performed to detect the legitimacy of a hypothesis of a linear model of a color sales prediction model, in accordance with one embodiment of the present invention.
  • FIGS 8A and 8B are diagrams showing the separation of data by an optimized hyperplane in accordance with one embodiment of the present invention.
  • FIG. 9 shows a machine learning-led color sales prediction model created using a support vector regression method in accordance with one embodiment of the present invention.
  • Figure 10 shows a schematic diagram of 10 cross-validation of color sales quantities with social media speaking variables and Google Trends data, in accordance with one embodiment of the present invention.
  • Figure 11 shows a schematic diagram of the results of another 10 cross-checks in accordance with one embodiment of the present invention.
  • the invention provides an artificial intelligence method for predicting color sales of fashion products based on social media speech data, which analyzes the media information and the sales data of each color of the fashion product, and models the relationship between the two.
  • it is an object of the present invention to find the difference in the time from the reaction of a social media speech to the sale of a real product, and then build a sales forecasting model based on the time difference.
  • a preferred embodiment of the present invention provides a method of predicting apparel sales, particularly color sales.
  • 1 illustrates a method for predicting clothing sales based on machine learning in accordance with a preferred embodiment of the present invention, comprising the steps of: storing sales history data into a sales history database; collecting social media data from a social media network, and The collected social media data is stored in a social media database; the clothing sales forecasting model is established by using the sales history data and the social media data, and the model is used to predict the clothing sales volume.
  • the media information related to the clothing prediction is collected from the network according to the set keywords, for example, through a social media application interface and an automated testing tool such as Selenium WebDriver, as a basis for establishing a clothing color sales forecast.
  • Collecting information of social media speech data through a network such as Facebook (Facebook), Weibo, Twitter (tweet), blog, QQ space, website message, etc., any media through which data is published on the network
  • the data includes the content of the public speech, as well as information on each statement, including the publisher, the amount of reading, the amount of forwarding, the amount of comments, and the number of likes.
  • the content of the public speech may include content published in text in Chinese and English. Data captured from the Internet is entered into the database.
  • Data analysis is performed on the data obtained above to detect the relationship between social media speech and fashion product sales.
  • the data analysis includes the method of natural language processing, converting the information collected in the above steps, including the collected text form data content, into corresponding keywords.
  • the keywords include a description of the characteristics of the garment product and a representation of the garment product itself, wherein the description of the characteristics of the garment product includes descriptive words for color, style, and fabric, and the expression of the garment product itself is a noun for the expression of the garment product.
  • the information analysis further includes analyzing keywords in the stored data and information of each statement, and counting the total number of occurrences of each keyword, the total number of forwards, the total number of comments, and the total number of points.
  • Statistics can be in weeks. Statistics can be grouped by publisher category. Publisher categories include, for example, brands, designers, magazines, and online reds.
  • the statistical data is stored in the memory, in particular in a database, for example in a memory on the server side.
  • the threshold is set by the number of "likes" to select the publisher, such as the thresholds shown in Tables 1 and 2 below:
  • the number of accounts for each publisher category that satisfies the "threshold" in social media such as Facebook and Weibo can be obtained.
  • the publisher categories and number of accounts in accordance with one embodiment are as shown in Tables 3 and 4 below:
  • Google Trends data is a public web tool based on Google search that shows the frequency of entering specific search terms relative to the total search volume of each region and language in the world, so that it can be used to mine different colors in color sales in this application. The frequency of their search terms during the time period.
  • Google Trend has a certain relationship with the color sales. The darker the color, the stronger the relationship that represents linearity.
  • the statistical method can be to use statistical software. as shown in picture 2.
  • an artificial intelligence prediction model is established, for example, using analysis software to establish a sales prediction model. Forecasting the color sales of fashion products based on the characteristics of the user's product market, discovering the relationship between the social media speech and the actual product color sales time lag, and using the latest social media speech to predict the future sales of clothing product color, thus giving a prediction on the sales of apparel products. result.
  • the actual sales data is then used to optimize the model to make the model prediction more in line with the product market characteristics.
  • the specific method may be:
  • (1) Establish historical sales database store historical sales data of individual products of individual users in the historical sales database, including characteristics of three fashion products such as color, style, and fabric, as well as suggested sales unit price, sales volume, actual sales unit price, and Four business operations information such as inventory conditions.
  • the historical sales data includes a period of six months, or more than half a year, such as one year, preferably two years.
  • the social media speech data collected from the network is stored in the social media statistical database, and the data includes a period of at least one week, such as 10 weeks, 20 weeks, 1 year, or 0.5 years.
  • the machine learning algorithm in the artificial intelligence is used to establish the prediction model, and the weight adjustment of the four types of publisher categories in the social media speech is also included, and the actual sales data is used to test and optimize the model.
  • data that has an impact on sales volume is determined, wherein the data includes information of each speech on the social media, including the total number of times, the total number of reposts, the total number of comments, and the total number of points.
  • component models such as linear models, support vector machine models, are trained with reference to historical data, and are fitted using historical data to obtain a sales model for social media data.
  • the sales model is adjusted and trained with reference to historical data, and the predicted sales result of the week obtained by the sales model is compared with historical data, and the parameters of the sales model are adjusted to obtain The predicted sales volume is closer to the actual sales volume. This enables training in the sales model.
  • a method of predicting apparel sales includes the following steps. First, find out all the keywords (English and Chinese) about colors and fashions from different social media channels on the web, such as cowboys, jackets and suits, and then filter the excavated social media speech based on these keywords ( See Table 6). According to statistics, there are 955 and 563 Facebook and Weibo keywords in color keywords, and 872 and 447 Facebook and Weibo in the keyword acquisition of fashion apparel. Compared with Table 5, after using the keywords of color and trendy clothing, the data retention rates of Facebook and Weibo were 4.7% and 4.1%, respectively.
  • Natural language processing methods mainly have three directions. First, the words with the highest frequency of 5% are removed. Second, some adjectives and adverbs are removed or keywords that are not directly related to color and fashion apparel. Third, Some of the wrong color phrases such as vinyl and gold are removed. After the above natural language processing method, the social media speech is further filtered. Table 7 shows the number of Facebook and Weibo speakers on each publisher category after natural language processing. Comparing the number of social media speeches in Table 6, after natural language processing, the data retention rates of Facebook and Weibo were 85% and 73%, respectively.
  • the above-mentioned natural language processing method is used to remove social media speech irrelevant to color and fashion apparel, so as to improve the prediction accuracy of the sales prediction model.
  • the sales data provided by the user can be color-classified, such as black, gray, red, green, yellow, purple, orange, brown, blue, and white, before the predicted product color sales model is established.
  • the sales data is aggregated according to the planning of the above color categories.
  • Figure 3 shows the percentage of sales for each color. It can be seen that the sales percentage of black, gray and blue has accounted for more than 50% of the total.
  • Figure 4 shows the percentage of sales per week for each color category from 2015 to 2016.
  • a mean square error method is used to find the relationship of the time lag.
  • the present invention utilizes standard normal variables to process social media speaking variables.
  • the standard normal variable is a data preprocessing method. The purpose is to standardize the social media speech variables by de-means and variance scaling, including the total number of occurrences of each keyword, the total number of reposts, the total number of comments, and the total point.
  • Standardization of data from Likes and Google Trends as well as standardization of color sales data.
  • For the mined social media data it is classified, for example, what is the forwarding amount for a given style, what is the amount of the like, and then the classification of the social media data by the capture of the keyword.
  • i the serial number of the specific product of the year, week and color category at a specific time, wherein the maximum length of i is 2 (2015-2016) * 52 weeks * 10 (species color), ie 1040;
  • M i the value of the actual sales data for a particular product at a particular time, such as the value of the actual sales data for a garment of a certain color (eg, red) such as a dress for a certain week of the year;
  • an average value of actual sales data over a longer period of time for a particular product, the longer period being longer than the particular period and preferably including the particular period, such as clothing that is valued in a certain color (eg, red) For example, the average sales data of a dress for a period of eight weeks, ten weeks, etc.;
  • a standard deviation of actual sales data for a particular product over a longer period of time, the longer period being longer than the specified period and preferably including the particular period, such as a value of a certain color (eg, red)
  • the standard deviation of sales data for a garment such as a dress for a period of eight weeks, ten weeks, and the like.
  • the standard deviation represented by ⁇ can be calculated by the calculation method of the existing standard deviation, for example, by the standard deviation S Calculation, among them, Represents the mean of the samples X1, X2, ..., Xn used.
  • i the serial number of the specific product of the year, week and color category at a specific time, wherein the maximum length of i is 2 (2015-2016) * 52 weeks * 10 (species color), ie 1040;
  • Li a certain product of a certain product in a certain period of time, such as a certain color (such as red) clothing such as the value of a certain social media data of a dress in a certain week, including social media speech data, Google trend data and Color sales data;
  • an average of social media data taken over a longer period of time for a particular product, the longer period being longer than the particular period of time and preferably including the particular period, such as a dress that takes a certain color (eg, red) Average social media data over a period of eight weeks, ten weeks, etc.;
  • a standard deviation of the above-described social media data for a particular product over a longer period of time, the longer period being longer than the particular period of time and preferably including the particular period, such as a value of a certain color (eg, red)
  • the optimal lag time is used to calculate the predicted amount of sales when using the captured media data for forecasting. For a certain media data, it can be concluded that the smaller the mean square error, the stronger the correspondence between the actual sales volume of the week and the social media data.
  • the difference between the week in which the actual sales volume is obtained and the week in which the social media data is captured is used as a representation of which week the forecast result is. For example, when the actual sales volume is different from the week in which the social media data is captured, the MSE is the smallest, and the sales value is predicted to be the sales value after 8 weeks.
  • is the sum of the specific periods representing the required calculations;
  • X i refers to a certain type of social media data standardized for a particular product at a specific time, calculated according to the above formula (1.2);
  • Y i is the specific product at a specific time
  • the standardized actual sales data is calculated according to the above formula (1.1);
  • n represents the number of types of social media data that are summed.
  • a linear model of color sales and these variables is established.
  • FIG. 6 shows the total number of occurrences (brand) and the total number of points (brand).
  • the total number of comments (brands), the total number of occurrences (magazines), and Google Trends data are highly important, and brand data seems to be more important than other variables.
  • X 1 -X 14 indicate that different types of social media data are converted/normalized into statistical data (for example, total number of times, total number of forwards, total number of comments, and total number of points, etc.) by the above formula (1.2).
  • X i1 discount rate
  • X i2 suggested retail price
  • X i3 appearance count of brands in social media
  • X i4 favorite or liked by brands in social media Number
  • X i5 number of comments for brands in social media
  • X i6 number of shares shared by designers in social media
  • X i7 number of magazines in social media
  • X i8 number of magazines in social media Share number
  • X i9 favorite or liked by magazines in social media
  • X i10 number of reviews of magazines in social media
  • X i11 count of network reds in social media
  • X i12 social media The favorite of the network reds in the network is liked
  • X i13 number of comments of
  • Y i represents a standardized sales amount for a certain color of a week or year.
  • W ij is a weight based on a linear model-based hypothesis, using training data for training, and may be derived from the following least squares estimator: (X T X) -1 X T Y , where X refers to at least the above One of X i1 -X i14 , X T is the transpose of X, and (X T X) -1 is the inverse of its matrix.
  • Y i represents the standardized sales volume of a specific product at a specific period (for example, a certain color of a certain week or year), and the actual sales amount is calculated by the following formula (4)
  • i the serial number of the specific product for a particular period of time
  • Y' i the actual sales data for a particular product at a particular time period
  • the average of the standardized sales data obtained for a particular product over a longer period of time Value
  • the standard deviation of the actual sales data obtained for a particular product over a longer period of time.
  • Individual social media data is collected from the web via social media application interfaces and automated testing tools such as Selenium WebDriver based on set keywords. Since all data schemas are based on year, week, and color as the total amount, the social media data values such as the total number of occurrences, the total number of forwards, the total number of comments, and the total number of points are counted according to the above. 0.60 in the formula represents Intercept, which is the result of multiple training of the model through existing data. In the actual processing, the data of social media such as Facebook and Weibo after natural language processing is first used for training.
  • the fashion brand's Facebook in Table 6 is “36009”, which only represents the number of social media speakers, which is to be converted into statistical data (such as total number of times, total number of forwards, total number of comments, total number of points, etc.) ) can be used as model training.
  • the weight of each parameter varies according to its importance. For data that has a positive impact on sales volume, it is added, and data that has a negative impact on sales volume is subtracted. According to an embodiment of the present invention, there may be a weight range as follows.
  • the weight of X i1 is between 1-2, 1.44 in the prediction as in the specific embodiment; the weight of X i2 is between 0.0005 and 0.002, and 0.001 in the prediction as in the specific embodiment; X i3 The weight of the weight is between 0.1 and 0.4, which is 0.3 in the prediction as in the specific embodiment; the weight of X i4 is between 3-5, 4.64 in the prediction as in the specific embodiment; the weight of X i5 is 3 -5, in this particular embodiment as a prediction for 4.71; X i6 weights of between 0.05 to 0.2, in this particular embodiment as a prediction for 0.10; X i7 weights of between 0.05 to 0.2 in this particular embodiment as a prediction for 0.13; X i8 weights of between 0.02 to 0.1, such as in this specific embodiment for the prediction of 0.05; lies between the right X i9 0.3-1, such as in the DETAILED predicted embodiment is 0.86; X i10 weights of between 0.1 and
  • the brand's count represents the number of times the brand has been mentioned on social media; comments about magazines and online reds comments, both positive and negative comments, as long as the social media is about color and fashion Relevant, the data is included in the statistics.
  • a machine learning method is used to find an optimal model prediction for multiple variables.
  • Figures 7A-7D show diagnostic analysis performed to detect the legitimacy of the hypothesis of a linear model of a color sales prediction model.
  • the Q-Q map of the right upper normal distribution shows that the points are almost in line, which is consistent with the assumption of normal distribution.
  • the results show that the coefficient of determination increases from 0.433 to 0.44.
  • the present invention uses a Support Vector Regression method to create a color prediction model dominated by machine learning.
  • machine learning support vector machines are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. It retains all the main features of the characterization algorithm (such as maximum margin) in its use as a regression method.
  • support vector regression is to map training data to higher-dimensional spaces by using nonlinear mapping functions in Support Vector Machine, and then perform linear regression to separate data and perform regression analysis. The above data map is performed using a predetermined kernel function, and data separation is accomplished by finding an optimized Hyperplane.
  • Figures 8A and 8B illustrate how the optimized hyperplane separates the data, with Figure 8A showing the boundary separation of the hyperplanes with different possibilities, the boundaries of the two sets of data separation being finer, and Figure 8B showing the optimized hyperplane Maximize the boundaries of data separation.
  • Figure 9 shows the green and red sets of data and their respective linear function planes H1 and H2.
  • the green red dots on the H1 and H2 lines are Support Vectors, and H is the optimized hyperplane of the two sets of data. . It is worth noting that d+ and d- are the shortest distances from plane H to the nearest positive and negative points, and their addition represents the maximum value of this hyperplane boundary.
  • w i the serial number of the specific color product for a specific period
  • y i the standardized sales volume of the specific color product for a specific period
  • w and b are calculated using the following Lagrangian Multipliers. The results can be used to compare results obtained in a linear manner.
  • X 1 discount rate
  • X 2 suggested retail price
  • X 3 appearance count of brands in social media
  • X 4 favorite or liked number of brands in social media
  • X 5 social media Number of comments for the brand
  • X 6 number of shares shared by designers in social media
  • X 7 number of magazines in social media
  • X 8 number of magazines shared in social media
  • X 9 social media The favorite of the magazines in the magazine is liked
  • X 10 number of reviews of magazines in social media
  • X 11 count of network reds in social media
  • X 12 favorite of network reds in social media or Be liked
  • X 13 number of comments from network reds in social media
  • X 14 SVI
  • Equation (5) is used with equation (6) to find the optimum parameters using Lagrangian Multipliers.
  • C is a constant of a normalized term in the Lagrange equation, which represents a penalty for prediction errors greater than d, used to balance model training errors and model flatness.
  • the goal of using this method is to find a value of C as a compromise between linearity flatness and d.
  • the above-mentioned constrained optimization problem can be solved by a Lagrangian Multipliers for a quadratic programming problem.
  • the equations for the following regression estimates are obtained by the associated algorithm and the optimized process:
  • ⁇ i and ⁇ i * are Lagrangian Multipliers.
  • K(.) is a kernel function that projects the training data into a three-dimensional space so that it can be linearly segmented.
  • Table 8 shows the three most commonly used kernel functions.
  • the Radial basis function kernel can handle nonlinear conditions and is most often used.
  • ⁇ i and ⁇ i * are selected using Lagrangian Multipliers;
  • x i represents the social media data X 1 -X 14 described above;
  • b is selected by Lagrangian multiplier;
  • f(x) represents The sales data of Y i actually establishes the regression equation using equations (5) and (6).
  • Grid Search can be further used to select the optimal parameters of the model.
  • d is set to a distance between (0.0, 0.2), and then 10 cross-validations are performed.
  • the result is as shown in Fig. 11, wherein the optimum d and C are 0.11 and 256, respectively.
  • Y* and Y are the values of the test set that are not used for modeling in the predicted value and historical data, respectively. For example, for 100 historical data, 80 are used for modeling and model training, and 20 are used as test sets for testing.
  • Machine learning method Linear model Support vector regression Forecast sales quantity 2,428 2127 Different from the actual sales quantity 1,180 1,481
  • embodiments of the invention may consist essentially of the features disclosed herein. Alternatively, embodiments of the invention may be comprised of the features disclosed herein. The inventions exemplarily disclosed herein suitably may be practiced without any element not specifically disclosed herein.

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种基于机器学习的服装销售的预测方法和预测装置,计算机可读存储介质。上述预测方法包括以下步骤:将销售历史数据存储到销售历史数据库;从社交媒体网络收集社交媒体数据,并将收集的社交媒体数据存储到社交媒体数据库;利用销售历史数据和社交媒体数据建立服装销售预测模型,并利用该模型对服装销量进行预测。上述方法有效地利用机器学习技术,提高社交媒体发言与颜色和潮流服饰关系的预测准确性。

Description

基于机器学习的服装销售的预测方法和预测装置 技术领域
本发明涉及信息处理技术领域,特别涉及基于机器学习的服装销售的预测方法和预测装置。
背景技术
对服装销售的预测是时装企业做预算和规划的重要参考,其对于企业减少库存从而提高企业的竞争力和利润率具有重要的意义。但是,对于服装企业来说,由于各种原因,有效的历史数据往往很少,因此时装企业在生产控制中难以制定合理的生产量以满足市场需求且扩大企业利润。一旦出现销售问题,企业难以及时做出反应,其决策具有延迟性,相比于迅速变换的服装市场,该延迟性是不利的。此外,服装企业可能还难以准确定位销售问题的原因点,往往需要很长时间的排查才发现出现问题的原因。
现有针对时装产品色彩流行趋势的预测,主要依靠人工筛选信息,输入系统进行预测,从而无法自动获取信息并进行筛选、再输入系统进行预测。人工筛选信息对结果的影响很大,无法进行自动化预测。现有方法的预测结果也仅限于时装色彩流行趋势的潮流分析,并不能对不同的颜色的时装产品的未来销售情况进行定量预测。而且,现有的方法也无法对用户不同的产品进行差异化预测,而仅仅是针对整个时装市场进行普通情况下的预测。用户在获得预测潮流趋势后,仍然需要自行进行生产销售安排。
近年来,随着信息科技的发展,媒体信息传播方便快捷,目前在很多社交媒体上,品牌、设计师、杂志以及网络红人都可以影响服装的流行趋势,进而影响服装的销售。传统的方法无法满足预测的准确性和可靠性。因此非常有必要寻找一种高准确性和高可靠性的服装销售的预测方法,以有效地指导时装企业的生产。
发明内容
本发明透过采集社交媒体发言,再利用统计学知识及人工智能系统的方法,去建造一个能够被广泛应用的时装产品销售预测模型,例如各颜色销售预测模型。
本发明要解决的一个技术问题在于有效地利用自然语言处理的方法及机器学习和统计机器模型技术,提高社交媒体发言与颜色和潮流服饰关系的准确性,建造一个基于机器学习的时装产品颜色销售预测模型。
本发明提出一种基于机器学习的服装销售的预测方法,包括以下步骤:
将销售历史数据存储到销售历史数据库;
从社交媒体网络收集社交媒体数据,并将收集的社交媒体数据存储到社交媒体数据库;
利用销售历史数据和社交媒体数据建立服装销售预测模型;以及
利用该模型对服装销量进行预测。
在一个方面,所述社交媒体数据至少包括在社交媒体上公开发言的内容和各条发言的信息,所述信息包括例如发布者、阅读量、转发量、评论量和点赞数中的一种或多种或全部。在进一步的方面,所述发布者至少包括品牌、设计师、杂志和网络红人中的一个或多个或全部。
在一个方面,还包括在预测中计算社交媒体数据与实际产品销售时间滞后的关系。在一个方面,通过对历史销量数据Z i根据Z i=(M i-μ)/σ进行标准化,其中,i=特定时期的特定产品的序号;M i=特定产品在特定时期的实际销售数据;μ=取值于特定产品在更长时期内的实际销售数据的平均值;σ=取值于特定产品在更长时期内的实际销售数据的的标准偏差。
在一个方面,所述社交媒体数据至少包括社交媒体发言数据﹑谷歌趋势数据和颜色销售数据。
在一个方面,通过对社交媒体数据X ij根据X ij=(L ijj)/σ j进行标准化, 其中,i=特定时期的特定产品的序号;j=特定类型的社交媒体数据,L i=特定产品在特定时期的某一社交媒体数据;μ=取值于特定产品在更长时期内的社交媒体数据的平均值;σ=取值于特定产品在更长时期内的上述社交媒体数据的标准偏差。
在一个方面,还包括求取销售数据的均方误差MSE以获得最合适的最佳滞后时间的步骤:
Figure PCTCN2018081470-appb-000001
其中X i是指特定产品在特定时期的标准化的某一类社交媒体数据,Y i是特定产品在在时间滞后的另一特定时期的标准化的实际销售数据,n表示所求和的社交媒体数据的种类的数目。
在一个方面,所述服装销售预测模型为线性模型,预测的标准化的销售量Y i通过以下方程得到:
Y i=A-W i1X i1+W i2X i2+W i3X i3–W i4X i4+W i5X i5+W i6X i6–W i7X i7–W i8X i8–W i9X i9+W i10X i10+W i11X i11+W i12X i12-W i13X i13+W i14X i14
其中,i表示特定时期的特定产品的序号,X ij(j=1,2,3…14)表示特定产品在特定时期的标准化的某一类型的社交媒体数据,W ij(j=1,2,3…14)表示各标准化的社交媒体数据的权重,且A表示模型配比值。
在一个方面,在预测的标准化的销售量Y i的方程中,X i1=折扣率;X i2=建议的零售价;X i3=社交媒体中的品牌的出现计数;X i4=社交媒体中的品牌的喜爱或被点赞数;X i5=社交媒体中的品牌的被评论数;X i6=社交媒体中的设计师的被分享数;X i7=社交媒体中的杂志的计数;X i8=社交媒体中的杂志的被分享数;X i9=社交媒体中的杂志的喜爱或被点赞数;X i10=社交媒体中的杂志的被评论数;X i11=社交媒体中的网络红人的计数;X i12=社交媒体中的网络红人的喜爱或被点赞数;X i13=社交媒体中的网络红人的评论数;X i14=SVI。
在一个方面,对于预测的标准化的销售量Y i,Y i=0.60-1.44X i1+0.001X i2+0.30X i3–4.64X i4+4.71X i5+0.10X i6–0.13X i7–0.05X i8–0.86X i9+1.03X i10+0.09X i11+5.14X i12-5.12X i13+0.28X i14,其中,X i1=折扣率;X i2=建议的零售价;X i3=社交媒体中的品牌的出现计数;X i4=社交媒 体中的品牌的喜爱或被点赞数;X i5=社交媒体中的品牌的被评论数;X i6=社交媒体中的设计师的被分享数;X i7=社交媒体中的杂志的计数;X i8=社交媒体中的杂志的被分享数;X i9=社交媒体中的杂志的喜爱或被点赞数;X i10=社交媒体中的杂志的被评论数;X i11=社交媒体中的网络红人的计数;X i12=社交媒体中的网络红人的喜爱或被点赞数;X i13=社交媒体中的网络红人的评论数;X i14=SVI。
在一个方面,所述预测方法还包括利用支持向量回归方法进行预测。
在一个方面,采用两组数据以及其线性函数f(x)=∑w·x+b进行预测,并且
Figure PCTCN2018081470-appb-000002
Figure PCTCN2018081470-appb-000003
其中,i=特定时期的特定颜色产品的序号,x ij(j=1,2,3…14)表示特定产品在特定时期的标准化的某一类型的社交媒体数据,w ij(j=1,2,3…14)表示各标准化的社交媒体数据的权重,d代表两组数据的最优化超平面H到最近的正点和负点的最短距离,所述最短距离表示为1/||w||,并且通过最小化||w||来将d值最大化,从而优化所述线性函数。
在一个方面,X 1=折扣率;X 2=建议的零售价;X 3=社交媒体中的品牌的出现计数;X 4=社交媒体中的品牌的喜爱或被点赞数;X 5=社交媒体中的品牌的被评论数;X 6=社交媒体中的设计师的被分享数;X 7=社交媒体中的杂志的计数;X 8=社交媒体中的杂志的被分享数;X 9=社交媒体中的杂志的喜爱或被点赞数;X 10=社交媒体中的杂志的被评论数;X 11=社交媒体中的网络红人的计数;X 12=社交媒体中的网络红人的喜爱或被点赞数;X 13=社交媒体中的网络红人的评论数;X 14=SVI;w和b是待预测的参数,其用拉格朗乘数来计算;而y i指代销售量。
在一个方面,加插两个松弛变量ξi和ξi*以解决误差的存在,以
最小化:1/2||w|| 2+C∑(ξ i+ξi*)
其中
Figure PCTCN2018081470-appb-000004
Figure PCTCN2018081470-appb-000005
ξ ii *≥0
其中,C是拉格朗日方程式中一个正规化项的常数,X 1=折扣率;X 2=建议的零售价;X 3=社交媒体中的品牌的出现计数;X 4=社交媒体中的品牌的喜爱或被点赞数;X 5=社交媒体中的品牌的被评论数;X 6=社交媒体中的设计师的被分享数;X 7=社交媒体中的杂志的计数;X 8=社交媒体中的杂志的被分享数;X 9=社交媒体中的杂志的喜爱或被点赞数;X 10=社交媒体中的杂志的被评论数;X 11=社交媒体中的网络红人的计数;X 12=社交媒体中的网络红人的喜爱或被点赞数;X 13=社交媒体中的网络红人的评论数;X 14=SVI;w和b是待预测的参数,其用拉格朗乘数来计算;而y i指代销售量。
在一个方面,所述方法利用所述服装销售预测模型对服装产品各颜色的销量进行预测。
本发明还提供了一种采用以上方法的基于机器学习的服装销售的预测装置,包括以下模块:
第一存储模块,所述第一存储模块中存储有包括销售历史数据的销售历史数据库;
收集模块,所述收集模块从社交媒体网络收集社交媒体数据;
第二存储模块,所述第二存储模块中存储有包括收集的社交媒体数据的社交媒体数据库;
模型构件模块,所述模型构件模块利用销售历史数据和社交媒体数据建立服装销售预测模型;
预测模块,所述预测模块利用构建的模型对服装销量进行预测。
本发明还提供一种计算机可读存储介质,其上存储有计算机程序,所 述计算机程序在被处理器执行时实现以下步骤:
将销售历史数据存储到销售历史数据库;
从社交媒体网络收集社交媒体数据,并将收集的社交媒体数据存储到社交媒体数据库;
利用销售历史数据和社交媒体数据建立服装销售预测模型;以及
利用该服装销售预测模型对服装销量进行预测。
在一个方面,所述计算机程序在被处理器执行时实现根据以上所述的方法。
附图说明
通过参考下面的详细描述,可以最好地理解本公开的实施例及其优点。应该理解的是,相似的附图标记用于表示在一个或多个附图中示出的相似元件。
图1示出了根据本发明一个实施例的一种基于机器学习的服装销售的预测方法。
图2示出了根据本发明一个实施例的社交媒体数据跟颜色销量是否具有线性相关性的示意图。
图3显示由2015至2016年各颜色服装的销售的百分比。
图4显示由2015至2016年各颜色类别在每一周销售的百分比。
图5显示根据本发明一个实施例的各标准化的变量分别在2至52周与颜色销售的时间滞后的均方误差。
图6显示了根据本发明一个实施例的出现的社交媒体数据的系数。
图7A-7D显示了根据本发明一个实施例的为了检测颜色销售预测模型线性模型的假设的合法性而进行的诊断分析的示意图。
图8A和8B显示了根据本发明一个实施例的最优化的超平面将数据分离的示意图。
图9显示了根据本发明一个实施例的采用支持向量回归方法创建的由机器学习主导的颜色销售预测模型。
图10显示了根据本发明一个实施例的对颜色销售数量与社交媒体发言变量和谷歌趋势的数据进行10次交叉验证的示意图。
图11显示了根据本发明一个实施例的另一个10次交叉验证的结果的示意图。
具体实施方式
本发明提供一种基于社交媒体发言数据对时装产品颜色销量进行预测的人工智能方法,其通过分析这些媒体信息和时装产品各颜色销售数据,建模,从而找到两者之间的关系。换句话说,本发明的一个目的是找出由社交媒体发言发布的反应到真实产品销售的时间的相差,然后在该时间相差的基础上去建造一个销售预测模型。
本发明一个较佳实施例提供一种对服装销量,特别是颜色销量进行预测的方法。图1示出了根据本发明一个较佳实施例的一种基于机器学习的服装销售的预测方法,包括以下步骤:将销售历史数据存储到销售历史数据库;从社交媒体网络收集社交媒体数据,并将收集的社交媒体数据存储到社交媒体数据库;利用销售历史数据和社交媒体数据建立服装销售预测模型,并利用该模型对服装销量进行预测。
首先,例如通过社交媒体的应用程序接口和自动化测试工具例如Selenium WebDriver根据设定的关键词从网络上收集与服装预测相关的媒体信息,以作为建立服装颜色销售预测的根据。通过网络收集社交媒体发言数据的信息,社交媒体例如是Facebook(脸书)、微博、Twitter(推特)、博客、QQ空间、网站留言等任何通过其在网络上发表数据的媒体,所述数据包括公开发言的内容,以及各条发言的信息,包括发布者、阅读量、转发量、评论量、以及点赞数。公开发言的内容可包括中文和英文以文本形式发布的内容。将从互联网上抓取的数据录入数据库。这些社交媒体发言主要来自以下四类的发布者:品牌﹑设计师﹑杂志及网络红人。
对以上获取到的数据进行数据分析,检测社交媒体发言和时装产品销售的关系。数据分析包括通过自然语言处理的方法,将在以上步骤收集的信息,包括所收集的文本形式数据内容,转换为对应的关键词。关键词包括服装产品特性的描述以及服装产品本身的表述,其中服装产品特性的描述包括对颜色、款式、以及面料的描述性词语,服装产品本身的表述为表述服装类产品的名词。
根据本发明一具体实施方式,信息分析还包括分析所存储数据中的关键词以及每条发言的信息,并统计各关键词的出现总次数、总转发数、总评论数、以及总点赞数。统计可以以周为单位。统计可以以发布者类别做分组统计。发布者类别包含例如品牌、设计师、杂志、以及网络红人等四类。统计数据存储在存储器内,特别是存储在数据库中,例如位于服务器端的存储器内。
根据一个具体实施方式,设定被“喜欢”数目的门槛来选择发布者,例如下表1和表2所示的门槛:
脸书
Figure PCTCN2018081470-appb-000006
表1
微博
Figure PCTCN2018081470-appb-000007
表2
通过设定“喜欢”数目的门槛,可以获得社交媒体例如脸书和微博中满足该“门槛”的各发布者类别的账号数目。根据一个具体实施例的发布者类别和账号数目如以下表3和表4所示:
脸书
Figure PCTCN2018081470-appb-000008
表3
微博
Figure PCTCN2018081470-appb-000009
Figure PCTCN2018081470-appb-000010
表4
从以上所获得的账号内,可以挖掘由二零一三年一月一日开始至二零一七年九月三十日的社交媒体发言的总数,参见如下表5所示。
Figure PCTCN2018081470-appb-000011
表5
除了以上的统计各关键词的出现总次数﹑总转发数﹑总评论数以及总点赞数的方法,还可以运用谷歌趋势(Search Volume Index)的数据,去检测这些数据跟颜色销量是否具有线性的相关性。谷歌趋势的数据是基于谷歌搜索的公共网络工具,它显示了相对于世界各地区以及各种语言的总搜索量输入特定搜索词的频率,从而可以在本申请中用于挖掘不同颜色在颜色销量时段内它们的搜索词的频率。通过统计的方式可以看出谷歌趋势的数据与颜色销量具有一定的关系。颜色越深,代表线性的关系越强。统计的方式可以是利用统计软件。如图2所示。
再然后,基于针对个别用户个别产品所挖掘到的数据,建立人工智能预测模型,例如利用分析软件去建立销售预测模型。针对用户产品市场特征进行时装产品颜色销量预测,发现社交媒体发言与真实产品颜色销售时间滞后的关系,并利用最新的社交媒体发言去预测服装产品颜色未来销量,从而给出对服装产品销量的预测结果。再应用实际销售数据对模型进行优化,使模型预测更符合产品市场特性。
基于个别用户个别产品的人工智能预测模型的建立,其具体方法可以为:
(1)建立历史销量数据库:在历史销量数据库中存储个别用户个别产品的历史销量数据,包括颜色、款式、以及面料等三种时装产品的特性,以及建议销售单价、销量、实际销售单价、以及库存情况等四种商业经营信息。所述的历史销售数据包含时期可以是半年,或多于半年,例如1年,优选为2年。
(2)预定义个别用户个别产品的实际生产所需时长,以周为单位。
(3)建立社交媒体统计数据库:在社交媒体统计数据库中存储从网络收集的社交媒体发言数据,该数据包括的时期至少为1周,例如10周、20周、1年、或0.5年等。
(4)依据实际生产所需时长,依次将数据库中的每周历史销量数据、在实际生产所需时长前一周的存储的社交媒体统计数据调出。
(5)运用人工智能中机器学习算法,例如人工神经网络、决策树、支持向量机等方法,建立针对个别用户个别产品的预测模型。
优选地,运用人工智能中机器学习算法建立预测模型,还包括对社交媒体发言中四类发布者类别的权重调整,并应用实际销售数据对模型实行检验及优化。举例而言,首先判断对于销量具有影响的数据,其中数据包括社交媒体上每条发言的信息,包括总次数、总转发数、总评论数、以及总点赞数等。相对于历史数据,构件模型,例如线性模型、支持向量机模型,参照历史数据对该模型进行训练,并利用历史数据进行拟合,得到针对于社交媒体数据的销售模型。此外,每隔一段时间,例如每周,参照历史数据对该销售模型进行调整训练,使该销售模型得到的该周的预测销量结果与历史数据进行比较,并且调整销售模型其中的参数,使得得到的预测销量与实际销量更接近。由此实现销售模型的训练。
自然语言处理方法
根据本发明一具体实施方式,对服装销量进行预测的方法包括以下步 骤。首先,从网络上不同社交媒体渠道,找出所有关于颜色和潮流服饰的关键词(英文和中文),例如牛仔﹑外套和西装等,再根据这些关键词对已挖掘的社交媒体发言作出筛选(见表6)。根据统计,在颜色的关键词获取上脸书和微博分别有955个和563个,而在潮流服饰的关键词获取上脸书和微博分别有872个和447个。与表5相比,利用颜色和潮流服饰的关键词筛选后,脸书和微博的数据保留率分别是4.7%和4.1%。
Figure PCTCN2018081470-appb-000012
表6
作为检测,分别对以上从脸书和微博的发言中抽样进行人工鉴定。结果显示52%和65%的发言是与颜色和潮流服饰相关。
对已既定的颜色和潮流服饰的关键词作出筛选,以除去无相关的社交媒体发言,目的在于提高销售预测模型的预测准确率。自然语言处理的方法主要有三个方向,第一,将5%出现频率最高的词语除去;第二,将部分形容词和副词等或没有直接与颜色和潮流服饰相关的关键词除去;第三,将部分错误的颜色短语例如黑胶和黄金除去。通过上述自然语言处理的方法后,对社交媒体发言再作出筛选。表7显示在自然语言处理后脸书和微博在各发布者类别上发言的数目。对比在表6社交媒体发言的数目,通过自然语言处理后,脸书和微博的数据保留率分别是85%和73%。
Figure PCTCN2018081470-appb-000013
Figure PCTCN2018081470-appb-000014
表7
在表7中对脸书和微博的发言分别抽样10%来再次进行鉴定。结果显示79%和84%的发言是与颜色和潮流服饰相关,与从表6的抽样结果相比,脸书和微博的相关准确率分别提高了27%和19%。
利用机器学习的方法,将10%的数据分开,其中80%用作训练数据,而20%用作验证数据,然后利用自然语言处理的机器学习进行模型的训练。结果显示脸书的机器学习的准确率为81%,而微博的准确率为85%。通过机器学习的训练,本发明能够用该自然语言处理的机器模型来预测其他社交媒体发言是否与颜色和潮流服饰相关。
销售数量统计
获得了社交媒体和时尚及颜色相关的发言信息后,通过上述自然语言处理的方法来除去与颜色和潮流服饰无关的社交媒体发言,以提高销售预测模型的预测准确率。在建立预测产品颜色销售模型之前,可以对用户所提供的销售数据进行颜色分类,例如黑﹑灰﹑红﹑绿﹑黄﹑紫﹑橙﹑棕﹑蓝和白。根据上述颜色类别的规划,将销售数据进行合计。图3显示各颜色的销售数量百分比,可见,黑﹑灰和蓝的销售百分比已占总和的50%以上。图4显示由2015至2016年各颜色类别在每一周销售的百分比。
社交媒体发言与真实产品销售时间滞后的关系
通常地,社交媒体发言与真实产品销售时间会存在滞后的关系。换句话说,为了更准确地预测销售情况,需要找出由在社交媒体的反应直到买下货品的一段时间的滞后。为有效地建造出具预测力的颜色销售预测模型,根据本发明一较佳实施例,运用均方误差的方法来找出该时间滞后的关系。在一个例子中,本发明利用标准正态变量来处理社交媒体发言变量。标准正态变量是一种数据预处理的方法,目的在于通过去均值和方差缩 放,将各社交媒体发言变量标准化,包括将各关键词的出现总次数﹑总转发数﹑总评论数以及总点赞数和谷歌趋势的数据进行标准化,以及颜色销售数据标准化。对于挖掘的社交媒体数据,对其进行分类,例如对于给定样式其转发量是多少,点赞量是多少,再通过关键词的抓取得到社交媒体数据的分类。
对于历史销量数据Z i,其标准正态变量(即标准化)的方程式可以例如为:
Z i=(M i-μ)/σ         (1.1)
其中,i=年﹑周和颜色类别在特定时期的特定产品的序号,其中i的最大长度为2(2015-2016年)*52周*10(种颜色),即1040;
M i=特定产品在特定时期的实际销售数据的数值,例如某种颜色(如红色)的服装例如连衣裙在某年某一周的实际销售数据的数值;
μ=取值于特定产品在更长时期内的实际销售数据的平均值,该更长时期长于所述特定时期并且最好包括该特定时期,例如取值为某种颜色(如红色)的服装例如连衣裙在连续八周、十周等时期内的平均销售数据;以及
σ=取值于特定产品在更长时期内的实际销售数据的的标准偏差,该更长时期长于所述特定时期并且最好包括该特定时期,例如取值为某种颜色(如红色)的服装例如连衣裙在连续八周、十周等时期内的销售数据的标准偏差。
σ表示的标准偏差可以通过目前已有的标准偏差的计算方法进行计算,例如,通过标准偏差S为
Figure PCTCN2018081470-appb-000015
计算,其中,
Figure PCTCN2018081470-appb-000016
代表所采用的样本X1,X2,...,Xn的均值。
通过对社交媒体数据X ij根据下列方程进行标准化
X ij=(L ijj)/σ j         (1.2)
其中,i=年﹑周和颜色类别在特定时期的特定产品的序号,其中i的最大长度为2(2015-2016年)*52周*10(种颜色),即1040;
j=特定类型的社交媒体数据;
Li=特定产品在特定时期的某一社交媒体数据,例如某种颜色(如红色)的服装例如连衣裙在某年某一周的某一社交媒体数据的数值,包括社交媒体发言数据﹑谷歌趋势数据和颜色销售数据;
μ=取值于特定产品在更长时期内的社交媒体数据的平均值,该更长时期长于所述特定时期并且最好包括该特定时期,例如取值为某种颜色(如红色)的连衣裙在连续八周、十周等时期内的平均社交媒体数据;
σ=取值于特定产品在更长时期内的上述社交媒体数据的标准偏差,该更长时期长于所述特定时期并且最好包括该特定时期,例如取值为某种颜色(如红色)的连衣裙在连续八周、十周等时期内的社交媒体数据的标准偏差。
其后,求取均方误差(Mean Square Error)以获得最合适的最佳滞后时间。最佳滞后时间用于计算利用抓取的媒体数据进行预测时,预测的销售量具体预测的是未来哪一周的销售量。对于某一个媒体数据,可以得出,均方误差越小,表示该周的实际销售量与社交媒体数据的对应性越强。将得到的实际销售量所在的周与抓取社交媒体数据的所在的周之间的差,作为预测结果是哪一周的表示。例如实际销售量所在的周与抓取社交媒体数据的所在的周之间的相差8周时,MSE最小,则预测销量值为8周后的销量值。将各标准化的变量(以统计方式得到的关键词的出现总次数﹑总转发数﹑总评论数以及总点赞数和谷歌趋势)与时间滞后的颜色销售数据相减,以求其均方误差(Mean Square Error):
Figure PCTCN2018081470-appb-000017
其中,∑是代表所需计算的特定时期相加;X i是指特定产品在特定时期的标准化的某一类社交媒体数据,根据上式(1.2)计算;Y i是特定产品在特定时期的标准化的实际销售数据,根据上式(1.1)计算;n表示所求和的社交媒体数据的种类的数目。
以出现总次数为例,将其每一年(2015/2016)中每一周(共52周)中十种颜色类别的数据进行合计,理论上应共有2x 52x 10=1,040种不同的数据组合。然后将标准化的颜色销售数据用时间滞后的方法后退2至52周,与其标准化的出现总次数作逐点比较,并用上述方程(2)求其每一周的均方误差。图5显示各标准化的变量分别在2至52周与颜色销售的时间滞后的均方误差,而表8则显示从上述图5中这些变量的最小均方误差及其最适合的时间滞后的统计。
Figure PCTCN2018081470-appb-000018
表8
线性预测模型的建立
利用所得的时间滞后关系,对颜色销售与这些变量进行线性模型的建立。线性模型的建立,是基于以下的假设:其一,线性关系--社交媒体发言变量(关键词的出现总次数、总转发数、总评论数以及总点赞数和谷歌趋势的数据)和颜色销售之间存在线性关系;其二,常态分布--误差项ε i是独立和同样的常态分布;和其三,ε i方差的均匀性--对于所有的i=1,...,N,误差ε i的变化是均匀的。
根据本发明一具体实施例,对于一特定品牌特定颜色的特定服装,利用统计方法建立线性模型后,结果如图6所示,其显示了出现总次数(品牌)、总点赞数(品牌)、总评论数(品牌)、出现总次数(杂志)和谷歌趋势的数据高度重要,品牌数据似乎比其他变量更加重要。其多元线性回归的方程为: Y i=0.60-1.44X i1+0.001X i2+0.30X i3–4.64X i4+4.71X i5+0.10X i6–0.13X i7–0.05X i8–0.86X i9+1.03X i10+0.09X i11+5.14X i12-5.12X i13+0.28X i14            (3)
其中,X 1-X 14表示不同类型的社交媒体数据通过上面的公式(1.2)被转化成/标准化为统计数据(例如总次数、总转发数、总评论数、以及总点赞数等)以用作模型训练的统计数据,其中,X i1=折扣率;X i2=建议的零售价;X i3=社交媒体中的品牌的出现计数;X i4=社交媒体中的品牌的喜爱或被点赞数;X i5=社交媒体中的品牌的被评论数;X i6=社交媒体中的设计师的被分享数;X i7=社交媒体中的杂志的计数;X i8=社交媒体中的杂志的被分享数;X i9=社交媒体中的杂志的喜爱或被点赞数;X i10=社交媒体中的杂志的被评论数;X i11=社交媒体中的网络红人的计数;X i12=社交媒体中的网络红人的喜爱或被点赞数;X i13=社交媒体中的网络红人的评论数;X i14=SVI(Search Volume Index,是基于一项基于Google搜索的公共网络工具,它显示了相对于世界各地区以及各种语言的总搜索量输入特定搜索词的频率);i所指的是上述方程式(3)中的年、周和颜色类别的数字合计。例如,Y i表示某周或某年的某一颜色的被标准化的销售量。W ij是基于线性模型的假设,利用训练数据进行训练而得出的权重,并且可以是根据以下最小平方估计式:(X TX) -1X TY得出的,其中X指代至少上述X i1-X i14中的一个,X T是X的转置(Transpose),而(X TX) -1是其矩阵的倒数(Inverse)。
Y i表示特定产品在特定时期(例如某周或某年的某一颜色)的被标准化的销售量,实际的销售量通过下式(4)计算
Y' i=Y i·σ+μ      (4)
其中,i=特定时期的特定产品的序号;Y’ i=特定产品在特定时期的实际销售数据;μ=取值于特定产品在之前的更长时期内的已得到的标准化的销售数据的平均值;σ=取值于特定产品在之前的更长时期内的已得到的实际销售数据的标准偏差。其中,对于之前的更长时期内的已得到的标准化的销售数据,可以根据需要进行更新,并且重新进行以上各公式(1.1)-(4)的计算以使得模型更准确。
各个社交媒体数据是通过社交媒体的应用程序接口和自动化测试工具例如Selenium WebDriver根据设定的关键词从网络上收集。由于所有数据编排的模式是以年﹑周和颜色作为总量的基准,其社交媒体数据数值例如出现总次数﹑总转发数﹑总评论数以及总点赞数是根据上述而作统计。该公式中的0.60表示截距(Intercept),其是通过现有数据对该模型进行多次训练得到的结果。在实际处理过程中,首先利用通过自然语言处理后社交媒体例如脸书和微博的数据作训练。例如表6中的时尚品牌的脸书中“36009”,其只是代表社交媒体发言的数目,要把它转化成统计数据(例如总次数、总转发数、总评论数、以及总点赞数等)才可以用作模型训练。
如上所示,对于待预测的不同的产品,各个参数的权重是根据其重要程度变化的。对于销售量具有正面影响的数据,对其进行相加的操作,而对于销售量具有负面影响的数据,对其进行相减的操作。根据本发明的一个实施方式,可具有如下的权重范围。例如,X i1的权重在1-2之间,在如该具体实施方式的预测中为1.44;X i2的权重在0.0005-0.002之间,在如该具体实施方式的预测中为0.001;X i3的权重在0.1-0.4之间,在如该具体实施方式的预测中为0.3;X i4的权重在3-5之间,在如该具体实施方式的预测中为4.64;X i5的权重在3-5之间,在如该具体实施方式的预测中为4.71;X i6的权重在0.05-0.2之间,在如该具体实施方式的预测中为0.10;X i7的权重在0.05-0.2之间,在如该具体实施方式的预测中为0.13;X i8的权重在0.02-0.1之间,在如该具体实施方式的预测中为0.05;X i9的权重在0.3-1之间,在如该具体实施方式的预测中为0.86;X i10的权重在0.5-1.5之间,在如该具体实施方式的预测中为1.03;X i11的权重在0.05-0.15之间,在如该具体实施方式的预测中为0.09;X i12的权重在3-8之间,在如该具体实施方式的预测中为5.14;X i13的权重在3-8之间,在如该具体实施方式的预测中为5.12;X i14的权重在0.1-0.7之间,在如该具体实施方式的预测中为0.28。
品牌的计数代表该品牌在社交媒体上被提及的次数;关于杂志的评论和网络红人的评论,既抓取正面评论也抓取负面评论,只要该条社交媒体发言是关于颜色和潮流服饰相关的,就将数据纳入统计内。在高维的线性 模型建立过程中,利用机器学习的方法找出关于多个变量的一个最适的模型预测。
图7A-7D显示为了检测颜色销售预测模型线性模型的假设的合法性而进行的诊断分析。上角的残差与因变量估计值的散点图,显示点与中轴(残差=0)的距离不远,这符合了正态分布和方差的均匀性的假设。同时,通过去除离群的数据,右上方常态分布的Q-Q图显示点几乎成一直线,这符合常态分布的假设。结果显示决定系数由0.433增加到0.44。
机器学习预测模型的建立
本发明采用支持向量回归(Support Vector Regression)法去创建一个由机器学习主导的颜色销售预测模型。在机器学习中,支持向量机是具有相关学习算法的监督学习模型,其分析用于分类和回归分析的数据。它在用作回归方法方面,保留了所有表征算法的主要特征(例如最大余量)。简单来说,支持向量回归法是通过利用支持向量机(Support Vector Machine)中的非线性映射函数将训练数据映射到更高维的空间,然后执行线性回归以分离数据及作回归分析。上述的数据映像是使用预定的内核函数来执行的,而数据分离是通过找到最优化的超平面(Optimal Hyperplane)来完成的。图8A和图8B说明了最优化的超平面怎样将数据分离,其中图8A显示不同可能性的超平面的边界分离,其两组数据分离的边界比较细,而图8B显示最优化的超平面将数据分离的边界最大化。
所以,目标是要找出一个线性函数f(x)=<w,x>+b,也表示为f(x)=∑w·x+b,以将两组数据点最大化。在一较佳实施例中,采用绿红两组数据以及它们各自的线性函数f(x)=<w,x>+b。图9显示绿红两组数据及其各自的线性函数平面H1和H2,在H1和H2线上的绿红点是支持向量(Support Vector),而H是这两组数据的最优化的超平面。值得注意的是d+和d-是从平面H到最近的正点和负点的最短距离,它们的相加代表这超平面边界的最大化值。由于H和H1的距离是1/||w||,则H1和H2的距离为2/||w||。因此,为了将d值最大化,需要将||w||最小化。换句话说,需要将 欧氏范数(Euclidean Norm)||w|| 2最小化,而优化问题是:
最小化1/2||w|| 2
其中:
Figure PCTCN2018081470-appb-000019
Figure PCTCN2018081470-appb-000020
i=特定时期的特定颜色产品的序号,y i指代特定时期的特定颜色产品的标准化的销售量,x ij(j=1,2,3…14)表示特定产品在特定时期的标准化的某一类型的社交媒体数据,例如代表在线性方式中使用的社交媒体数据X 1-X 14,w ij(j=1,2,3…14)表示各标准化的社交媒体数据的权重,w和b是待预测的参数。w、b用以下的拉格朗乘数(Lagrangian Multipliers)来计算。结果可以用于与线性方式获得的结果比较。其中,X 1=折扣率;X 2=建议的零售价;X 3=社交媒体中的品牌的出现计数;X 4=社交媒体中的品牌的喜爱或被点赞数;X 5=社交媒体中的品牌的被评论数;X 6=社交媒体中的设计师的被分享数;X 7=社交媒体中的杂志的计数;X 8=社交媒体中的杂志的被分享数;X 9=社交媒体中的杂志的喜爱或被点赞数;X 10=社交媒体中的杂志的被评论数;X 11=社交媒体中的网络红人的计数;X 12=社交媒体中的网络红人的喜爱或被点赞数;X 13=社交媒体中的网络红人的评论数;X 14=SVI,
以上的优化问题是在f(x)实际地存在和所有(x i,y i)都被前述公式(5)清析地界定下而成立的。然而,出现一些误差是可能的。在支持向量机分析中,利用历史数据进行验证时会发现,以上公式(5)的方式会导致误差。为了解决误差的存在,加插了两个松弛变量ξi和ξi*。而优化问题就变成为将1/2||w|| 2+C∑(+ξi*)最小化,以:
最小化:1/2||w|| 2+C∑(ξ i+ξi*)
其中
Figure PCTCN2018081470-appb-000021
Figure PCTCN2018081470-appb-000022
ξ ii *≥0     (6)
公式(5)与公式(6)一起使用,利用拉格朗乘数(Lagrangian Multipliers)来找出最适的参数。
C是拉格朗日方程式中一个正规化项的常数,它代表对大过d的预测错误的惩罚,用来平衡模型训练错误和模型的平整度。采用该方法的目标是找出一个C的数值去作为线性函数平坦度与d之间的折衷。上述带约束的优化问题可以通过拉格朗乘数(Lagrangian Multipliers)来进行一元二次的规划问题。通过相关的算法和优化的过程,获得以下回归估计的方程式:
f(x)=∑(α ii *)K(x i,x)+b      (7)
α i和α i *是拉格朗乘数(Lagrangian Multipliers)。K(.)则是内核函数(Kernel Function),作用是把训练数据投影到一个三维空间,让其变得可以被线性分割。表8显示三种最常用的内核函数,通常地,因基函数核(Radial basis function kernel)能处理非线性的情况,所以最常被取用。该公式中α i和α i *利用拉格朗乘数(Lagrangian Multipliers)选取;x i代表以上记载的社交媒体数据X 1-X 14;b通过拉格朗乘数选取;f(x)代表Y i的销售数据,实际上利用公式(5)与(6)建立该回归方程。
多项式内核函数 K(x i,x j)=(x i.x j+1)d
双曲内切正核 K(x i,x j)=tanh(c 1(x i.x j)+c 2)
基函数核 K(x i,x j)=exp(|x j-x i|/2p 2)
表8
通过运用支持向量回归法,对颜色销售数量与社交媒体发言变量和谷歌趋势的数据进行10次交叉验证。结果如图10所示,颜色越浅,其模型的最适度越高,而符合该结果的相关的d和C分别是0.1和256。
为了提高支持向量回归的性能,可以进一步进行网格搜寻(Grid Search)来选择模型的最佳参数。通过网格搜寻,将d设置在(0.0,0.2)之间的距离,然后再进行10次交叉验证,结果如图11所述,其中最适度的d和C分别 是0.11和256。
为了比较线性模型和支持向量回归的表现,我们利用平方均方误差(Root Mean Square Error)去作基准:
MSE=√1/n∑(Y *-Y) 2      (8)
其中,Y*和Y分别是预测值和历史数据中未用于建模,而是用于测试的测试集的数值。例如对于100个历史数据,其中80个用于建模和对模型训练,20个作为测试集的数值用于测试。
预测与实时销售对比
通过运用实时的社交媒体发言数据,预测颜色销售的情况。采用了2015年第十周黑色衣服销售的社交媒体发言数据,来预测8周(最适时间滞后)之后的销售情况。表9显示各发布者类别与其社交媒体发言的数据分布,谷歌趋势的数据值为43。线性模型的结果显示其销售数量大约为2428件,支持向量回归的机器学习结果大约为2127件,而其真实销售数量为3608。表10总结以上两种机器学习预测销售数量以及与真实销售数相差的结果。
Figure PCTCN2018081470-appb-000023
表9
机器学习方法 线性模型 支持向量回归
预测销售数量 2,428 2127
与真实销售数量相差 1,180 1,481
表10
除非上下文清楚地另外要求,在整个说明书和权利要求书中,词语“包括”,“包含”等将被解释为包含性的意思,而不是排他的或穷举的意思;也就是说,具有“包括、但不限于”的意思。
当本文中使用时,除非另外指出,使用序数形容词“第一”,“第二”,“第三”等来描述共同的对象仅仅指示相似对象的不同实例被引用,而不是旨在暗示所描述的对象必须在时间上、在空间上、按排名或以任何其他方式按照给定的顺序。
贯穿本说明书对“一个实施例”或“实施例”的引用意味着结合该实施例描述的特定特征、结构或特性包括在至少一个实施例中。因此,贯穿本说明书在各个地方出现的短语“在一个实施例中”或“在实施例中”并不一定全部指的是相同的实施例,但是可以指的是相同的实施例。此外,本领域普通技术人员从本公开显而易见,在一个或多个实施例中,特定的特征、结构或特性可以以任何合适的方式进行组合。
此外,尽管本文中描述的一些实施例包括其他实施例中包括的一些特征但不包括其他特征,但是不同实施例的特征的组合意味着在本发明的范围内,并且形成不同的实施例,如本领域技术人员将理解。例如,在以下权利要求中,任何要求保护的实施例可以以任何组合被使用。
在本文提供的描述中,阐述了许多具体细节。然而,应当理解,可以在没有这些具体细节的情况下实践本发明的实施例。在其他情况下,公知的方法、结构和技术未被详细示出以免混淆对本描述的理解。尽管已参考具体示例描述了本发明,但是本领域技术人员应当领会,本发明可以以许 多其他形式来体现。
应当领会,本发明的实施例可以基本上由本文中公开的特征组成。替代地,本发明的实施例可以由本文中公开的特征组成。本文中示例性公开的发明适当地可以在不存在本文中未具体公开的任何要素的情况下实施。

Claims (19)

  1. 一种基于机器学习的服装销售的预测方法,其特征在于,包括以下步骤:
    将销售历史数据存储到销售历史数据库;
    从社交媒体网络收集社交媒体数据,并将收集的社交媒体数据存储到社交媒体数据库;
    利用销售历史数据和社交媒体数据建立服装销售预测模型;以及
    利用该服装销售预测模型对服装销量进行预测。
  2. 根据权利要求1所述的预测方法,其特征在于,所述社交媒体数据至少包括在社交媒体上公开发言的内容和各条发言的信息,所述信息包括发布者、阅读量、转发量、评论量和点赞数中的一种或多种。
  3. 根据权利要求2所述的预测方法,其特征在于,所述发布者至少包括品牌、设计师、杂志和网络红人中的一个或多个。
  4. 根据权利要求1所述的预测方法,其特征在于,还包括在预测中计算社交媒体数据与实际产品销售时间滞后的关系。
  5. 根据权利要求4所述的预测方法,其特征在于,通过对历史销量数据Z i根据Z i=(M i-μ)/σ进行标准化,其中,i=特定时期的特定产品的序号;M i=特定产品在特定时期的实际销售数据;μ=取值于特定产品在更长时期内的实际销售数据的平均值;σ=取值于特定产品在更长时期内的实际销售数据的标准偏差。
  6. 根据权利要求5所述的预测方法,其特征在于,所述社交媒体数据至少包括社交媒体发言数据﹑谷歌趋势数据和颜色销售数据。
  7. 根据权利要求4所述的预测方法,其特征在于,通过对社交媒体数据X ij根据X ij=(L ijj)/σ j进行标准化,其中,i=特定时期的特定产品的序号;j=特定类型的社交媒体数据,L i=特定产品在特定时期的某一社交媒体数据;μ=取值于特定产品在更长时期内的社交媒体数据的平均值;σ= 取值于特定产品在更长时期内的上述社交媒体数据的标准偏差。
  8. 根据权利要求5所述的预测方法,其特征在于,还包括求取销售数据的均方误差MSE以获得最合适的最佳滞后时间的步骤:
    Figure PCTCN2018081470-appb-100001
    Figure PCTCN2018081470-appb-100002
    其中X i是指特定产品在特定时期的标准化的某一类社交媒体数据,Y i是特定产品在时间滞后的另一特定时期的标准化的实际销售数据,n表示所求和的社交媒体数据的种类的数目。
  9. 根据权利要求5所述的预测方法,其特征在于,所述服装销售预测模型为线性模型,预测的标准化的销售量Y i通过以下方程得到:
    Y i=A-W i1X i1+W i2X i2+W i3X i3–W i4X i4+W i5X i5+W i6X i6–W i7X i7–W i8X i8–W i9X i9+W i10X i10+W i11X i11+W i12X i12-W i13X i13+W i14X i14
    其中,i表示特定时期的特定产品的序号,X ij(j=1,2,3…14)表示特定产品在特定时期的标准化的某一类型的社交媒体数据,W ij(j=1,2,3…14)表示各标准化的社交媒体数据的权重,且A表示模型配比值。
  10. 根据权利要求9所述的预测方法,其特征在于,在预测的标准化的销售量Y i的方程中,X i1=折扣率;X i2=建议的零售价;X i3=社交媒体中的品牌的出现计数;X i4=社交媒体中的品牌的喜爱或被点赞数;X i5=社交媒体中的品牌的被评论数;X i6=社交媒体中的设计师的被分享数;X i7=社交媒体中的杂志的计数;X i8=社交媒体中的杂志的被分享数;X i9=社交媒体中的杂志的喜爱或被点赞数;X i10=社交媒体中的杂志的被评论数;X i11=社交媒体中的网络红人的计数;X i12=社交媒体中的网络红人的喜爱或被点赞数;X i13=社交媒体中的网络红人的评论数;X i14=SVI。
  11. 根据权利要求9所述的预测方法,其特征在于,对于预测的标准化的销售量Y i,Y i=0.60-1.44X i1+0.001X i2+0.30X i3–4.64X i4+4.71X i5+0.10X i6–0.13X i7–0.05X i8–0.86X i9+1.03X i10+0.09X i11+5.14X i12-5.12X i13+0.28X i14,其中,X i1=折扣率;X i2=建议的零售价;X i3=社交媒体中的品牌的出现计数;X i4=社交媒体中的品牌的喜爱或被点赞数;X i5=社交媒体中的品牌的被评论数;X i6=社交媒体中的设计师的被分享数;X i7=社交媒体中的杂志的计数;X i8=社交媒体中的杂志的被分享数;X i9=社 交媒体中的杂志的喜爱或被点赞数;X i10=社交媒体中的杂志的被评论数;X i11=社交媒体中的网络红人的计数;X i12=社交媒体中的网络红人的喜爱或被点赞数;X i13=社交媒体中的网络红人的评论数;X i14=SVI。
  12. 根据权利要求1所述的预测方法,其特征在于,所述预测方法还包括利用支持向量回归方法进行预测。
  13. 根据权利要求12所述的预测方法,其特征在于,采用两组数据以及其线性函数f(x)=∑w·x+b进行预测,并且
    Figure PCTCN2018081470-appb-100003
    Figure PCTCN2018081470-appb-100004
    其中,i=特定时期的特定颜色产品的序号,x ij(j=1,2,3…14)表示特定产品在特定时期的标准化的某一类型的社交媒体数据,w ij(j=1,2,3…14)表示各标准化的社交媒体数据的权重,d代表两组数据的最优化超平面H到最近的正点和负点的最短距离,所述最短距离表示为1/||w||,并且通过最小化||w||来将d值最大化,从而优化所述线性函数。
  14. 根据权利要求13所述的预测方法,其特征在于,X 1=折扣率;X 2=建议的零售价;X 3=社交媒体中的品牌的出现计数;X 4=社交媒体中的品牌的喜爱或被点赞数;X 5=社交媒体中的品牌的被评论数;X 6=社交媒体中的设计师的被分享数;X 7=社交媒体中的杂志的计数;X 8=社交媒体中的杂志的被分享数;X 9=社交媒体中的杂志的喜爱或被点赞数;X 10=社交媒体中的杂志的被评论数;X 11=社交媒体中的网络红人的计数;X 12=社交媒体中的网络红人的喜爱或被点赞数;X 13=社交媒体中的网络红人的评论数;X 14=SVI;w和b是待预测的参数,其用拉格朗乘数来计算;而y i指代销售量。
  15. 根据权利要求14所述的预测方法,其特征在于,加插两个松弛变量ξi和ξi*以解决误差的存在,以
    最小化:1/2||w|| 2+C∑(ξ i+ξi*)
    其中
    Figure PCTCN2018081470-appb-100005
    Figure PCTCN2018081470-appb-100006
    ξ ii *≥0
    其中,C是拉格朗日方程式中一个正规化项的常数,X 1=折扣率;X 2=建议的零售价;X 3=社交媒体中的品牌的出现计数;X 4=社交媒体中的品牌的喜爱或被点赞数;X 5=社交媒体中的品牌的被评论数;X 6=社交媒体中的设计师的被分享数;X 7=社交媒体中的杂志的计数;X 8=社交媒体中的杂志的被分享数;X 9=社交媒体中的杂志的喜爱或被点赞数;X 10=社交媒体中的杂志的被评论数;X 11=社交媒体中的网络红人的计数;X 12=社交媒体中的网络红人的喜爱或被点赞数;X 13=社交媒体中的网络红人的评论数;X 14=SVI;w和b是待预测的参数,其用拉格朗乘数来计算;而y i指代销售量。
  16. 根据权利要求1-15任一项所述的预测方法,其特征在于,所述方法利用所述服装销售预测模型对服装产品各颜色的销量进行预测。
  17. 一种采用权利要求1-16中任一项所述方法的基于机器学习的服装销售的预测装置,其特征在于,包括以下模块:
    第一存储模块,所述第一存储模块中存储有包括销售历史数据的销售历史数据库;
    收集模块,所述收集模块从社交媒体网络收集社交媒体数据;
    第二存储模块,所述第二存储模块中存储有包括收集的社交媒体数据的社交媒体数据库;
    模型构件模块,所述模型构件模块利用销售历史数据和社交媒体数据建立服装销售预测模型;
    预测模块,所述预测模块利用构建的模型对服装销量进行预测。
  18. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于, 所述计算机程序在被处理器执行时实现以下步骤:
    将销售历史数据存储到销售历史数据库;
    从社交媒体网络收集社交媒体数据,并将收集的社交媒体数据存储到社交媒体数据库;
    利用销售历史数据和社交媒体数据建立服装销售预测模型;以及
    利用该服装销售预测模型对服装销量进行预测。
  19. 根据权利要求18所述的计算机可读存储介质,其特征在于,所述计算机程序在被处理器执行时实现根据权利要求1-16中任一项所述的方法。
PCT/CN2018/081470 2018-03-30 2018-03-30 基于机器学习的服装销售的预测方法和预测装置 WO2019183973A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/081470 WO2019183973A1 (zh) 2018-03-30 2018-03-30 基于机器学习的服装销售的预测方法和预测装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/081470 WO2019183973A1 (zh) 2018-03-30 2018-03-30 基于机器学习的服装销售的预测方法和预测装置

Publications (1)

Publication Number Publication Date
WO2019183973A1 true WO2019183973A1 (zh) 2019-10-03

Family

ID=68062133

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/081470 WO2019183973A1 (zh) 2018-03-30 2018-03-30 基于机器学习的服装销售的预测方法和预测装置

Country Status (1)

Country Link
WO (1) WO2019183973A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109376929A (zh) * 2018-10-24 2019-02-22 北京小度信息科技有限公司 配送参数的确定方法、确定装置、存储介质和电子设备
CN112819540A (zh) * 2021-02-08 2021-05-18 佛山科学技术学院 预测售货机商品销量方法及装置、计算机可读存储介质
US20210326909A1 (en) * 2020-04-17 2021-10-21 Accenture Global Solutions Limited Stakeholder and impact discovery

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103984998A (zh) * 2014-05-30 2014-08-13 成都德迈安科技有限公司 基于云服务平台大数据挖掘的销售预测方法
CN105956699A (zh) * 2016-04-29 2016-09-21 连云港天马网络发展有限公司 基于电商销售数据的商品分类提取及在该分类提取下的销量预测方法
CN106408483A (zh) * 2016-08-31 2017-02-15 国信优易数据有限公司 一种气象云智能商务方法与系统
CN107545461A (zh) * 2017-08-01 2018-01-05 云天弈(北京)信息技术有限公司 一种出版物选题及发行的分析系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103984998A (zh) * 2014-05-30 2014-08-13 成都德迈安科技有限公司 基于云服务平台大数据挖掘的销售预测方法
CN105956699A (zh) * 2016-04-29 2016-09-21 连云港天马网络发展有限公司 基于电商销售数据的商品分类提取及在该分类提取下的销量预测方法
CN106408483A (zh) * 2016-08-31 2017-02-15 国信优易数据有限公司 一种气象云智能商务方法与系统
CN107545461A (zh) * 2017-08-01 2018-01-05 云天弈(北京)信息技术有限公司 一种出版物选题及发行的分析系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
FAN, YOUNING ET AL.: "A Solution for Sales Forecasts of Fashion Products Based on Electronic Word-of-Mouth", JOURNAL OF INFORMATION MANAGEMENT, vol. 19, no. 1, 31 December 2012 (2012-12-31) *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109376929A (zh) * 2018-10-24 2019-02-22 北京小度信息科技有限公司 配送参数的确定方法、确定装置、存储介质和电子设备
US20210326909A1 (en) * 2020-04-17 2021-10-21 Accenture Global Solutions Limited Stakeholder and impact discovery
US11727417B2 (en) * 2020-04-17 2023-08-15 Accenture Global Solutions Limited Stakeholder and impact discovery
CN112819540A (zh) * 2021-02-08 2021-05-18 佛山科学技术学院 预测售货机商品销量方法及装置、计算机可读存储介质

Similar Documents

Publication Publication Date Title
CN110322263B (zh) 基于机器学习的服装销售的预测方法和预测装置
US20230052823A1 (en) System and method for synthesizing data
TWI631518B (zh) 具有一或多個計算裝置的電腦伺服系統及訓練事件分類器模型的電腦實作方法
US20120130771A1 (en) Chat Categorization and Agent Performance Modeling
Chen et al. Distributed customer behavior prediction using multiplex data: a collaborative MK-SVM approach
WO2019183973A1 (zh) 基于机器学习的服装销售的预测方法和预测装置
Li et al. Integrating Kano model, AHP, and QFD methods for new product development based on text mining, intuitionistic fuzzy sets, and customers satisfaction
WO2018142753A1 (ja) ディープラーニングを用いる情報処理装置、情報処理方法及び情報処理プログラム
CN110930017A (zh) 数据处理的方法及装置
Lim et al. Mitigating online product rating biases through the discovery of optimistic, pessimistic, and realistic reviewers
Orogun et al. Predicting consumer behaviour in digital market: a machine learning approach
Naeem et al. Machine learning-based USD/PKR exchange rate forecasting using sentiment analysis of Twitter data
Li Accurate digital marketing communication based on intelligent data analysis
Conrad et al. ELM: An extended logic matching method on record linkage analysis of disparate databases for profiling data mining
CN114997916A (zh) 潜在用户的预测方法、系统、电子设备和存储介质
Chou et al. The RFM Model Analysis for VIP Customer: A case study of golf clothing brand
Zhao Research on e-commerce customer churning modeling and prediction
Thorström Applying machine learning to key performance indicators
Sharma et al. Big Data Analysis for Revenue and Sales Prediction using Support Vector Regression with Auto-regressive Integrated Moving Average
Vandure et al. Trend Projection using Predictive Analytics
Ding et al. Credit scoring using ensemble classification based on variable weighting clustering
Makatjane Deep Learning for Sentiment Analysis to Predict the Probability of Bank Loan Default
US11842533B2 (en) Predictive search techniques based on image analysis and group feedback
Qiu [Retracted] Systematic Risk Analysis of Semiconductor Global Market Based on Deep Feature Fusion K‐Means Algorithm
Sharma et al. Review of credit risk and credit scoring models based on computing paradigms in financial institutions

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18911815

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 18911815

Country of ref document: EP

Kind code of ref document: A1