CN110826579A - Commodity classification method and device - Google Patents

Commodity classification method and device Download PDF

Info

Publication number
CN110826579A
CN110826579A CN201810891322.8A CN201810891322A CN110826579A CN 110826579 A CN110826579 A CN 110826579A CN 201810891322 A CN201810891322 A CN 201810891322A CN 110826579 A CN110826579 A CN 110826579A
Authority
CN
China
Prior art keywords
sales
commodity
sales data
data
commodity sales
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810891322.8A
Other languages
Chinese (zh)
Inventor
张建申
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201810891322.8A priority Critical patent/CN110826579A/en
Publication of CN110826579A publication Critical patent/CN110826579A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Marketing (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method and a device for classifying commodities, and relates to the technical field of computers. One embodiment of the method comprises: acquiring commodity sales volume data to carry out preprocessing of filtering, filling, smoothing and standardization on the commodity sales volume data; and weighting based on time attenuation according to the preprocessed commodity sales data, and classifying by using a classification algorithm model. The embodiment can solve the problem of huge commodity difference in the same class and realize accurate commodity classification.

Description

Commodity classification method and device
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for classifying commodities.
Background
Currently, sales prediction and replenishment are involved in the retail industry, the supply chain field and the like, and in order to achieve higher accuracy in the current replenishment and prediction, commodity classification is generally performed so as to achieve different processing strategies for different types of commodities.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:
in the prior art, most of the common classification methods are classified according to the existing commodity class information, the accuracy is poor, and the commodities in the same class have great difference.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for classifying commodities, which can solve the problem of huge differences between commodities in the same class, and implement accurate commodity classification.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a method for classifying commodities, including obtaining commodity sales data, to perform preprocessing of filtering, filling, smoothing and standardizing the commodity sales data; and weighting based on time attenuation according to the preprocessed commodity sales data, and classifying by using a classification algorithm model.
Optionally, weighting based on time decay according to the pre-processed commodity sales data, and classifying by using a classification algorithm model, including:
the pre-processed commodity sales data is weighted based on time decay and classified using a DTW-based K-means method.
Optionally, weighting the pre-processed commodity sales data based on time decay, and classifying by using a DTW-based K-means method, including:
assuming that the sales of the product at time n is sale (n), the sales of the product weighted based on time is changed as follows:
sales_new(n)=w(n)*sales(n)
wherein:
Figure BDA0001757009250000021
and then selecting K time sequences as initial centroids, assigning each point to the nearest centroid by using a K-means algorithm of DTW to form K clusters, and then recalculating the centroid of each cluster until the cluster is not changed or the maximum iteration number is reached to obtain the divided K classes.
Optionally, the method further comprises:
respectively calculating the offline growth coefficient, the inclination, the skewness and the variation coefficient of each class as class characteristics for the divided K classes;
wherein, the growth coefficient is: sigma (sales _ i-avg (sales)) (x _ i-avg (x))/Sigma (x _ i-avg (x)))2Wherein x is a sequential time value and sales _ i is the sales volume of the commodity on day i;
skewness:
Figure BDA0001757009250000022
xi is the sales volume on day i, X is the average sales volume, and var is the sales volume standard;
kurtosis:
Figure BDA0001757009250000023
xi is the sales volume on day i, X is the average sales volume, and var is the sales volume standard;
coefficient of variation: standard deviation of sales/mean of sales.
Optionally, filtering the commodity sales data includes:
filtering commodity sales data with too short or too long sales time through model filtering, and processing the commodity sales data according to rule filtering configured by service requirements;
and taking intersection of commodity sales data after model filtering and rule filtering to obtain a final filtering result.
Optionally, smoothing the commodity sales data comprises:
sorting the commodity sales data from large to small according to the time sequence value, stripping the first i largest sales, calculating the mean value and standard deviation of the remaining commodity sales data, determining that the ith value is less than or equal to the sum of the mean value and the m standard deviations, and further identifying abnormal commodity sales data;
and filling the identified abnormal commodity sales data.
Optionally, the commodity sales data is standardized, including:
the sales in the sales data are scaled to fall within a specified interval.
Optionally, the commodity sales data is populated, including:
and filling discontinuous commodity sales data according to the configured filling method.
In addition, according to an aspect of the embodiments of the present invention, there is provided a device for classifying commodities, including an obtaining module, configured to obtain commodity sales data, so as to perform preprocessing of filtering, filling, smoothing and standardizing on the commodity sales data; and the classification module is used for weighting based on time attenuation according to the preprocessed commodity sales data and classifying by utilizing a classification algorithm model.
Optionally, the classifying module performs weighting based on time attenuation according to the pre-processed commodity sales data, and performs classification by using a classification algorithm model, including:
the pre-processed commodity sales data is weighted based on time decay and classified using a DTW-based K-means method.
Optionally, the classifying module weights the pre-processed commodity sales data based on time attenuation and classifies the commodity sales data by using a DTW-based K-means method, including:
assuming that the sales of the product at time n is sale (n), the sales of the product weighted based on time is changed as follows:
sales_new(n)=w(n)*sales(n)
wherein:
Figure BDA0001757009250000041
and then selecting K time sequences as initial centroids, assigning each point to the nearest centroid by using a K-means algorithm of DTW to form K clusters, and then recalculating the centroid of each cluster until the cluster is not changed or the maximum iteration number is reached to obtain the divided K classes.
Optionally, the classification module is further configured to:
respectively calculating the offline growth coefficient, the inclination, the skewness and the variation coefficient of each class as class characteristics for the divided K classes;
wherein, the growth coefficient is: sigma (sales _ i-avg (sales)) (x _ i-avg (x))/Sigma (x _ i-avg (x)))2Wherein x is a sequential time value and sales _ i is the sales volume of the commodity on day i;
skewness:
Figure BDA0001757009250000042
xi is the sales volume on day i, X is the average sales volume, and var is the sales volume standard;
kurtosis:
Figure BDA0001757009250000043
xi is the sales volume on day i, X is the average sales volume, and var is the sales volume standard;
coefficient of variation: standard deviation of sales/mean of sales.
Optionally, the obtaining module filters the commodity sales data, and includes:
filtering commodity sales data with too short or too long sales time through model filtering, and processing the commodity sales data according to rule filtering configured by service requirements;
and taking intersection of commodity sales data after model filtering and rule filtering to obtain a final filtering result.
Optionally, the obtaining module smoothes the commodity sales data, including:
sorting the commodity sales data from large to small according to the time sequence value, stripping the first i largest sales, calculating the mean value and standard deviation of the remaining commodity sales data, determining that the ith value is less than or equal to the sum of the mean value and the m standard deviations, and further identifying abnormal commodity sales data;
and filling the identified abnormal commodity sales data.
Optionally, the obtaining module standardizes commodity sales data, and includes:
the sales in the sales data are scaled to fall within a specified interval.
Optionally, the obtaining module fills the commodity sales data, and includes:
and filling discontinuous commodity sales data according to the configured filling method.
According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any of the above embodiments of article classification.
According to another aspect of an embodiment of the present invention, there is also provided a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the method according to any one of the above embodiments based on article classification.
One embodiment of the above invention has the following advantages or benefits: the commodity sales data are filtered, filled, smoothed and standardized by acquiring the commodity sales data; and weighting based on time attenuation according to the preprocessed commodity sales data, and classifying by using a classification algorithm model. Therefore, the invention realizes scientific and fine effective classification of commodities and can effectively support life cycle division and potential explosive identification.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
fig. 1 is a schematic diagram of a main flow of a method of classifying commodities, according to an embodiment of the present invention;
fig. 2 is a schematic view of a main flow of a method of classifying commodities according to a referential embodiment of the present invention;
FIG. 3 is a schematic diagram of zero-padding of filter results according to an embodiment of the invention;
FIG. 4 is a schematic diagram of weighting factors according to an embodiment of the invention;
FIG. 5 is a schematic illustration of a time window according to an embodiment of the invention;
FIG. 6 is a schematic diagram of the main modules of an apparatus for sorting articles according to an embodiment of the present invention;
FIG. 7 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 8 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a schematic diagram of a main flow of a method of classifying commodities, which may include:
and step S101, acquiring commodity sales volume data to perform preprocessing of filtering, filling, smoothing and standardizing the commodity sales volume data.
In an embodiment, there is no orderliness in the various pre-processing, i.e., filtering, padding, smoothing, and normalization, performed on the commodity sales data. The sequence of filtering, filling, smoothing and normalizing the preprocessing can of course be determined according to the actual requirements.
Further, when filtering the commodity sales data, the commodity sales data with too short or too long sales time can be filtered through model filtering. Preferably, the 3sigma criterion is employed as model filtering.
It should be further noted that, when filtering the commodity sales data, the commodity sales data may also be processed according to the rule configured by the business requirement, for example: filtering for a particular SKU, etc. In addition, the invention can adopt one or more filtering modes to process the acquired commodity sales data, and if the filtering modes are multiple (for example, model filtering and rule filtering), the intersection is taken for the data processed by the multiple filtering modes to obtain the final filtering result.
In an embodiment, when the commodity sales data are smoothed, the commodity sales data may be sorted from large to small according to the time-series values, the first i largest sales are separated, the mean value and the standard deviation of the remaining commodity sales data are calculated, the ith value is determined to be less than or equal to the sum of the mean value and the m standard deviations, and then the abnormal commodity sales data are identified. Thus, the identified abnormal commodity sales data can be populated.
Preferably, the manner of filling in the abnormal commodity sales data includes mean filling with non-abnormal data or a manner of local weighted (localized weighted) filling.
In an embodiment, when the commodity sales data is normalized, the sales in the commodity sales data may be scaled to fall within a specific interval. That is, normalization of data is to scale the data to fall within a small specific interval. In some index processing for comparison and evaluation, unit limitation of data is removed and converted into a dimensionless pure numerical value, so that indexes of different units or orders can be compared and weighted conveniently.
In addition, when the commodity sales data is filled, discontinuous commodity sales data can be filled according to the configured filling method. The configured filling method comprises zero value filling, mean value filling, median filling, random filling and the like.
And S102, weighting based on time attenuation according to the preprocessed commodity sales data, and classifying by using a classification algorithm model.
Preferably, the pre-processed commodity sales data is weighted based on time decay and classified using a DTW-based K-means method. In a further embodiment, the method comprises:
assuming that the sales of the product at time n is sale (n), the sales of the product weighted based on time is changed as follows:
sales_new(n)=w(n)*sales(n)
wherein:
Figure BDA0001757009250000081
and then selecting K time sequences as initial centroids, assigning each point to the nearest centroid by using a K-means algorithm of DTW to form K clusters, and then recalculating the centroid of each cluster until the cluster is not changed or the maximum iteration number is reached to obtain the divided K classes.
Preferably, for the divided K classes, calculating the offline growth coefficient, the inclination, the skewness and the variation coefficient of each class as class characteristics respectively;
wherein, the growth coefficient is: sigma (sales _ i-avg (sales)) (x _ i-avg (x))/Sigma (x _ i-avg (x)))2Wherein x is a sequential time value and sales _ i is the sales volume of the commodity on day i;
skewness:
Figure BDA0001757009250000082
xi is the sales volume on day i, X is the average sales volume, and var is the sales volume standard;
kurtosis:
Figure BDA0001757009250000083
xi is the sales volume on day i, X is the average sales volume, and var is the sales volume standard;
coefficient of variation: standard deviation of sales/mean of sales.
According to the various embodiments, the invention provides a new method for measuring and classifying commodities, realizes fine-grained subdivision, and is important for judging and dividing the life cycle of the commodities, the identification of potential explosive products and the like. In addition, the commodity classification obtained by the invention can better serve supply chain optimization, and different prediction models and replenishment models are applied to different categories, so that the prediction accuracy can be improved, the turnover is reduced, and the stock rate is improved.
Fig. 2 is a schematic view of a main flow of a method of sorting commodities according to a referential embodiment of the present invention, which may include:
step S201, commodity sales data is acquired.
In the embodiment, the commodity sales data configured and classified according to the needs in a specific time period is acquired, for example, the commodity sales data of the current latest week is acquired, and the data is required to include the complete sales information of the commodity, that is, once the commodity starts to be sold, the sales information cannot be lost, wherein if the sales amount is 0 on a certain day, the sales information can be lost. For example, as shown in Table one, the item sales data includes a date, a SKU designation, and a sales volume, where SKU refers to a unique identification for the item.
Watch 1
Date (time stamp) SKU sign Amount of sales
Date
1 Commodity 1 Amount of sales
Date
2 Commodity 1 Amount of sales
Date
3 Commodity 1 Amount of sales
Date
4 Commodity 1 Amount of sales
Date
5 Commodity 1 Amount of sales
Date
6 Commodity 1 Amount of sales
Date
7 Commodity 1 Amount of sales
Date
8 Commodity 1 Amount of sales
Date
9 Commodity 1 Amount of sales
Date
10 Commodity 1 Amount of sales
Date 11 Commodity 1 Amount of sales
And step S202, filtering the commodities with too short or too long sale time through the 3sigma criterion.
In the embodiment, the filtering of invalid commodities is realized by performing model filtering, namely processing of 3sigma criterion on the obtained commodity sales data to identify commodities with too short or too long sales time.
The 3sigma criterion is also called Laudea criterion, and is that a group of detection data is assumed to only contain random errors, the detection data is calculated to obtain standard deviation, an interval is determined according to a certain probability, the error exceeding the interval is considered not to belong to the random errors but to be coarse errors, and the data containing the errors are removed. If the data obeys a normal distribution, the probability of the numerical distribution in (μ -3 σ, μ +3 σ) is 99.73%.
For example: assuming 100 SKUs, each SKU has a time to sell, typically on a daily basis, and the data is in the form:
Figure BDA0001757009250000091
the mean and standard deviation can be calculated from the days on sale:
average value of sum (days available for sale)/number of sku
The standard deviation is sqr [ tsum ((sku days available-average) ^2)/sku quantity ]
If sku has a number of days sold < mean-3 standard deviation, the time for selling the commodity is too short;
if sku has days sold > mean +3 standard deviation, the time to market for the goods is too long.
And step S203, filtering the commodity sales data according to the rule configured by the service demand.
In the embodiment, when filtering the commodity sales data, the commodity sales data may be further processed according to a rule configured by a business requirement, for example: filtering for a particular SKU, etc.
It should be noted that, the steps S202 and S203 are not sequentially performed, and the steps S202 and S203 may be performed first, or the steps S203 and S202 may be performed first, or the steps S202 and S203 may be performed simultaneously.
And step S204, taking intersection of commodity sales volume data after the commodity sales volume data are filtered through the 3sigma criterion and the rule so as to obtain a final filtering result.
And step S205, performing data filling on the filtering result according to the configured filling method.
In an embodiment, the sales in the commodity sales data may be discontinuous with respect to date and need to be filled, and the filling method of the configuration includes zero value filling, mean value filling, median filling, random filling and the like. For example: zero padding is shown in fig. 3.
In step S206, the filled commodity sales data is smoothed.
In the embodiment, the smoothing processing is a use mode in the time sequence data, and is used for enabling the data to be smooth, eliminating abnormal values and eliminating data abnormality in the time sequence. The smooth commodity sales data needs to select a long time window as known prior information, namely the known long-term sales volume of the commodity, including the sales volume abnormal value, and then the sales volume information with the abnormal value is smoothly slid out. The specific implementation process comprises the following steps:
first, abnormal commodity sales data is identified. Specifically, the method comprises the following steps:
the method comprises the following steps: and sorting according to the time sequence values from large to small.
Step two: stripping the first i (the initial value of i is 1) largest sales, calculating the average value u and the standard deviation sigma of the rest commodity sales data, checking whether the ith value is Xi > u + m sigma, if so, performing the third step, and otherwise, performing the fourth step. Wherein m is a preset integer constant.
Step three: and increasing the value of i (i + ═ 1) and returning to the step two.
Step four: abnormal commodity sales data is obtained.
Second, the identified anomalous merchandise sales data is smoothed. Specifically, the method comprises the following steps:
after the abnormal commodity sales data is detected, the mode of filling the abnormal commodity sales data comprises mean filling by using non-abnormal data or filling by using a local weighted (localized weighted) mode.
In step S207, the product sales data after smoothing is subjected to normalization processing.
By way of example, the sales data of different commodities may have dimensional differences in the order of magnitude, and in order to eliminate such dimensions, the sales data need to be standardized.
Further, the pin count is scaled to fall within a small specified interval.
Still further, using statistical methods, the mean μ, standard deviation σ of sales were calculated:
Figure BDA0001757009250000112
sales_normali=(salesi-μ)/σ
wherein i refers to the sales in the ith calculation unit (e.g., day). sales _ normaliRefers to the sales of the ith calculation unit (e.g., day) after normalization.
In step S208, the products are classified based on the standardized product sales data.
In the embodiment, when the classification is performed according to the standardized commodity sales data, the sales volume influences at different times are different, and obviously, the closer the current time is, the greater weight should be given; conversely, if the distance from the current time is longer, a smaller weight should be given, as shown in fig. 4.
Further, the dynamic determination of the weight specifically implements the process including:
weight formula:
wherein, W' ═ 1/(1+ exp (alpha x n + beta x n ^ 2)). n is a predetermined unit of time, typically days, for example. alpha is a scaling factor with a positive value and beta is a scaling factor describing prior distribution information.
Preferably, the normalized merchandise sales data is weighted based on time decay and classified using a DTW based K-means method. Wherein the K-means method of DTW is time series analysis of trends.
Assuming that the normalized sales volume of the product at time n is sales _ normal (n), the sales volume is changed based on the time weighting as follows:
sales_normal_new(n)=w(n)*sales_normal(n)
then, the DTW-based K-means algorithm is classified:
the method comprises the following steps: k time series (i.e., sales series of K items) are selected as the initial centroid.
Step two: and assigning each point to the nearest mass center by using a K-means algorithm of DTW to form K clusters, and then recalculating the mass center of each cluster, namely the average value of each node of the time sequence in the cluster until the cluster is not changed or the maximum iteration number is reached.
It should be noted that, when classifying commodities based on time series data, it is a very critical issue to consider how long data has been in the past, that is, to select a time window. Further, where m is a fixed value (as shown in fig. 5), it can be determined according to the service requirement. n is a variable parameter, a commodity sales sequence with the length of m + n is selected, the first n values serve as a training set, the second m values serve as evaluation, and the method for determining n, alpha and beta comprises the following steps:
the method comprises the following steps: go through the possible set of n, alpha, beta.
In an embodiment, the value of n is an integer, which may be set, and is assumed to be set to [10,20 ]. Similarly, alpha and beta can be set, and the value of alpha is [0.1,0.5] and the value of beta is [0.6,0.7 ]. Then the possible set of n, alpha, beta is [10,0.1,0.6], [10,0.1,0.7], [10,0.5,0.6], [20,0.5,0.7], [20,0.1,0.6], [20,0.1,0.7], [20,0.5,0.6], [20,0.5,0.7 ].
Step two: and calculating the weighted sales corresponding to the corresponding front n value, classifying by using a DTW-based K-means to obtain a classification label, and calculating the integral dispersion of the m values according to the label and recording as v.
It should be noted that the classified class is generally marked as a cluster center of the class, and the main purpose of the class is to subdivide for subsequent prediction, and is not labeled.
Step three: and selecting the alpha and beta values corresponding to the minimum dispersion v value as reasonable values.
It is worth to be further noted that, for the K classes that have been divided, features such as an offline growth coefficient, a slope, a skewness, a coefficient of variation, and the like of each class may be calculated respectively to mark class features. Specifically, the method comprises the following steps:
the following calculations are for a single item SKU, and the correlation values for a class need only be averaged over all values in the class.
The growth coefficient is as follows: sigma (sales _ i-avg (sales)) (x _ i-avg (x))/Sigma (x _ i-avg (x)))2Where x is the sequential time value and sales _ i is the sales of the product sku on day i.
Skewness:
xi is the sales volume on day i, X is the average sales volume, and var is the sales volume standard;
kurtosis:
Figure BDA0001757009250000132
xi is the sales on day i, X is the mean sales, var is the sales standard.
Coefficient of variation: standard deviation of sales/mean of sales.
Wherein, if the sales data sales described in the above formulas are normalized, the sales should be expressed as sales _ normal.
Fig. 6 is an apparatus for classifying an article according to an embodiment of the present invention, and as shown in fig. 6, the apparatus for classifying an article includes an obtaining module 601 and a classifying module 602. The obtaining module 601 obtains the commodity sales data to perform preprocessing of filtering, filling, smoothing and standardizing the commodity sales data. And the classification module 602 performs weighting based on time attenuation according to the pre-processed commodity sales data, and performs classification by using a classification algorithm model.
In a preferred embodiment, the classification module 602 may weight the pre-processed merchandise sales data based on time decay and classify the pre-processed merchandise sales data using a DTW based K-means method. In a further embodiment, the method comprises:
assuming that the sales of the product at time n is sale (n), the sales of the product weighted based on time is changed as follows:
sales_new(n)=w(n)*sales(n)
wherein:
Figure BDA0001757009250000141
and then selecting K time sequences as initial centroids, assigning each point to the nearest centroid by using a K-means algorithm of DTW to form K clusters, and then recalculating the centroid of each cluster until the cluster is not changed or the maximum iteration number is reached to obtain the divided K classes.
In a preferred embodiment, the classification module 602 calculates, for the divided K classes, an offline growth coefficient, a slope, a skewness, and a coefficient of variation of each class as the class characteristics.
Wherein, the growth coefficient is: sigma (sales _ i-avg (sales)) (x _ i-avg (x))/Sigma (x _ i-avg (x)))2Wherein x is the orderThe time value of (1), sales _ i is the sales volume of the commodity on the ith day;
skewness:
xi is the sales volume on day i, X is the average sales volume, and var is the sales volume standard;
kurtosis:
Figure BDA0001757009250000143
xi is the sales volume on day i, X is the average sales volume, and var is the sales volume standard;
coefficient of variation: standard deviation of sales/mean of sales.
As another embodiment, when the obtaining module 601 filters the commodity sales data, the commodity sales data with too short or too long sales time may be filtered through model filtering, and the commodity sales data may be processed through filtering according to rules configured by the business requirements. And then, taking intersection of commodity sales volume data after model filtering and rule filtering to obtain a final filtering result.
As another example, when the obtaining module 602 smoothes the commodity sales data, the commodity sales data may be sorted from large to small according to the time-series values, the first i largest sales are stripped, the mean value and the standard deviation of the remaining commodity sales data are calculated, the ith value is determined to be less than or equal to the sum of the mean value and the m standard deviations, and then the abnormal commodity sales data is identified. Thereafter, the identified abnormal commodity sales data is populated.
As another example, when the obtaining module 602 normalizes the commodity sales data, the sales in the commodity sales data may be scaled to fall within a specific interval.
As another example, when the obtaining module 602 fills the commodity sales data, the discontinuous commodity sales data may be filled according to a configured filling method.
It should be noted that the detailed description of the embodiment of the apparatus for classifying a product according to the present invention is already described in detail in the above method for classifying a product, and therefore, the repeated description is not repeated here.
Fig. 7 illustrates an exemplary system architecture 700 of a method of item classification or an apparatus of item classification to which embodiments of the present invention may be applied.
As shown in fig. 7, the system architecture 700 may include terminal devices 701, 702, 703, a network 704, and a server 705. The network 704 serves to provide a medium for communication links between the terminal devices 701, 702, 703 and the server 705. Network 704 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
A user may use the terminal devices 701, 702, 703 to interact with a server 705 over a network 704, to receive or send messages or the like. The terminal devices 701, 702, 703 may have installed thereon various communication client applications, such as a shopping-like application, a web browser application, a search-like application, an instant messaging tool, a mailbox client, social platform software, etc. (by way of example only).
The terminal devices 701, 702, 703 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 705 may be a server providing various services, such as a background management server (for example only) providing support for shopping websites browsed by users using the terminal devices 701, 702, 703. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.
It should be noted that the method for classifying commodities provided by the embodiment of the present invention is generally executed by the server 705, and accordingly, a device for classifying commodities is generally disposed in the server 305.
It should be understood that the number of terminal devices, networks, and servers in fig. 7 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 8, shown is a block diagram of a computer system 800 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 8, the computer system 800 includes a Central Processing Unit (CPU)801 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM803, various programs and data necessary for the operation of the system 800 are also stored. The CPU801, ROM 802, and RAM803 are connected to each other via a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program executes the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 801.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes an acquisition module and a classification module. Wherein the names of the modules do not in some cases constitute a limitation of the module itself.
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: acquiring commodity sales volume data to carry out preprocessing of filtering, filling, smoothing and standardization on the commodity sales volume data; and weighting based on time attenuation according to the preprocessed commodity sales data, and classifying by using a classification algorithm model.
According to the technical scheme of the embodiment of the invention, the problem of huge commodity difference in the same class can be solved, and accurate commodity classification is realized.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (18)

1. A method of classifying an article, comprising:
acquiring commodity sales volume data to carry out preprocessing of filtering, filling, smoothing and standardization on the commodity sales volume data;
and weighting based on time attenuation according to the preprocessed commodity sales data, and classifying by using a classification algorithm model.
2. The method of claim 1, wherein weighting based on time decay from the pre-processed commodity sales data and classifying using a classification algorithm model comprises:
the pre-processed commodity sales data is weighted based on time decay and classified using a DTW-based K-means method.
3. The method of claim 2, wherein weighting the pre-processed commodity sales data based on time decay and sorting using a DTW-based K-means method comprises:
assuming that the sales of the product at time n is sale (n), the sales of the product weighted based on time is changed as follows:
sales_new(n)=w(n)*sales(n)
wherein:
Figure FDA0001757009240000011
and then selecting K time sequences as initial centroids, assigning each point to the nearest centroid by using a K-means algorithm of DTW to form K clusters, and then recalculating the centroid of each cluster until the cluster is not changed or the maximum iteration number is reached to obtain the divided K classes.
4. The method of claim 3, further comprising:
respectively calculating the offline growth coefficient, the inclination, the skewness and the variation coefficient of each class as class characteristics for the divided K classes;
wherein, the growth coefficient is: sigma (sales _ i-avg (sales)) (x _ i-avg (x))/Sigma (x _ i-avg (x)))2Wherein x is a sequential time value and sales _ i is the sales volume of the commodity on day i;
skewness:
Figure FDA0001757009240000021
xi is the sales volume on day i, X is the average sales volume, and var is the sales volume standard;
kurtosis:
Figure FDA0001757009240000022
xi is the sales volume on day i, X is the average sales volume, and var is the sales volume standard;
coefficient of variation: standard deviation of sales/mean of sales.
5. The method of claim 1, wherein filtering commodity sales data comprises:
filtering commodity sales data with too short or too long sales time through model filtering, and processing the commodity sales data according to rule filtering configured by service requirements;
and taking intersection of commodity sales data after model filtering and rule filtering to obtain a final filtering result.
6. The method of claim 1, wherein smoothing the commodity sales data comprises:
sorting the commodity sales data from large to small according to the time sequence value, stripping the first i largest sales, calculating the mean value and standard deviation of the remaining commodity sales data, determining that the ith value is less than or equal to the sum of the mean value and the m standard deviations, and further identifying abnormal commodity sales data;
and filling the identified abnormal commodity sales data.
7. The method of claim 1, wherein normalizing commodity sales data comprises:
the sales in the sales data are scaled to fall within a specified interval.
8. The method of claim 1, wherein populating commodity sales data comprises:
and filling discontinuous commodity sales data according to the configured filling method.
9. An apparatus for sorting articles, comprising:
the acquisition module is used for acquiring commodity sales data so as to carry out preprocessing of filtering, filling, smoothing and standardizing on the commodity sales data;
and the classification module is used for weighting based on time attenuation according to the preprocessed commodity sales data and classifying by utilizing a classification algorithm model.
10. The apparatus of claim 9, wherein the classification module weights the pre-processed commodity sales data based on time decay and performs classification using a classification algorithm model, comprising:
the pre-processed commodity sales data is weighted based on time decay and classified using a DTW-based K-means method.
11. The apparatus of claim 10, wherein the classification module weights the pre-processed merchandise sales data based on time decay and classifies the pre-processed merchandise sales data using a DTW-based K-means method comprising:
assuming that the sales of the product at time n is sale (n), the sales of the product weighted based on time is changed as follows:
sales_new(n)=w(n)*sales(n)
wherein:
Figure FDA0001757009240000031
and then selecting K time sequences as initial centroids, assigning each point to the nearest centroid by using a K-means algorithm of DTW to form K clusters, and then recalculating the centroid of each cluster until the cluster is not changed or the maximum iteration number is reached to obtain the divided K classes.
12. The apparatus of claim 11, wherein the classification module is further configured to:
respectively calculating the offline growth coefficient, the inclination, the skewness and the variation coefficient of each class as class characteristics for the divided K classes;
wherein, the growth coefficient is: sigma (sales _ i-avg (sales)) (x _ i-avg (x))/Sigma (x _ i-avg (x)))2Wherein x is a sequential time value and sales _ i is the sales volume of the commodity on day i;
skewness:
xi is the sales volume on day i, X is the average sales volume, and var is the sales volume standard;
kurtosis:
Figure FDA0001757009240000041
xi is the sales volume on day i, X is the average sales volume, and var is the sales volume standard;
coefficient of variation: standard deviation of sales/mean of sales.
13. The apparatus of claim 9, wherein the acquisition module filters commodity sales data comprising:
filtering commodity sales data with too short or too long sales time through model filtering, and processing the commodity sales data according to rule filtering configured by service requirements;
and taking intersection of commodity sales data after model filtering and rule filtering to obtain a final filtering result.
14. The apparatus of claim 9, wherein the obtaining module smoothes commodity sales data, comprising:
sorting the commodity sales data from large to small according to the time sequence value, stripping the first i largest sales, calculating the mean value and standard deviation of the remaining commodity sales data, determining that the ith value is less than or equal to the sum of the mean value and the m standard deviations, and further identifying abnormal commodity sales data;
and filling the identified abnormal commodity sales data.
15. The apparatus of claim 9, wherein the acquisition module normalizes commodity sales data comprising:
the sales in the sales data are scaled to fall within a specified interval.
16. The apparatus of claim 9, wherein the acquisition module populates commodity sales data comprising:
and filling discontinuous commodity sales data according to the configured filling method.
17. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-8.
18. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-8.
CN201810891322.8A 2018-08-07 2018-08-07 Commodity classification method and device Pending CN110826579A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810891322.8A CN110826579A (en) 2018-08-07 2018-08-07 Commodity classification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810891322.8A CN110826579A (en) 2018-08-07 2018-08-07 Commodity classification method and device

Publications (1)

Publication Number Publication Date
CN110826579A true CN110826579A (en) 2020-02-21

Family

ID=69533907

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810891322.8A Pending CN110826579A (en) 2018-08-07 2018-08-07 Commodity classification method and device

Country Status (1)

Country Link
CN (1) CN110826579A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113673626A (en) * 2021-09-02 2021-11-19 北京智思迪科技有限公司 Hardware sales information classification method and device
CN113888235A (en) * 2021-10-22 2022-01-04 创优数字科技(广东)有限公司 Training method of sales forecasting model, sales forecasting method and related device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8762227B1 (en) * 2011-07-01 2014-06-24 Amazon Technologies, Inc. Automatic product groupings for merchandising
CN105956699A (en) * 2016-04-29 2016-09-21 连云港天马网络发展有限公司 Commodity classification and delivery and sales prediction method based on e-commerce sales data
CN107093122A (en) * 2016-12-02 2017-08-25 北京小度信息科技有限公司 Object classification method and device
CN107273454A (en) * 2017-05-31 2017-10-20 北京京东尚科信息技术有限公司 User data sorting technique, device, server and computer-readable recording medium
CN108021929A (en) * 2017-11-16 2018-05-11 华南理工大学 Mobile terminal electric business user based on big data, which draws a portrait, to establish and analysis method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8762227B1 (en) * 2011-07-01 2014-06-24 Amazon Technologies, Inc. Automatic product groupings for merchandising
CN105956699A (en) * 2016-04-29 2016-09-21 连云港天马网络发展有限公司 Commodity classification and delivery and sales prediction method based on e-commerce sales data
CN107093122A (en) * 2016-12-02 2017-08-25 北京小度信息科技有限公司 Object classification method and device
CN107273454A (en) * 2017-05-31 2017-10-20 北京京东尚科信息技术有限公司 User data sorting technique, device, server and computer-readable recording medium
CN108021929A (en) * 2017-11-16 2018-05-11 华南理工大学 Mobile terminal electric business user based on big data, which draws a portrait, to establish and analysis method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王永贵等: "基于改进聚类和矩阵分解的协同过滤推荐算法", 计算机应用, pages 1 - 2 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113673626A (en) * 2021-09-02 2021-11-19 北京智思迪科技有限公司 Hardware sales information classification method and device
CN113888235A (en) * 2021-10-22 2022-01-04 创优数字科技(广东)有限公司 Training method of sales forecasting model, sales forecasting method and related device

Similar Documents

Publication Publication Date Title
CN110751497A (en) Commodity replenishment method and device
US20150278813A1 (en) Determining a temporary transaction limit
CN113627846A (en) Inventory adjusting method and device, electronic equipment and computer readable medium
CN113095893A (en) Method and device for determining sales of articles
CN113743971A (en) Data processing method and device
CN113051480A (en) Resource pushing method and device, electronic equipment and storage medium
CN110826579A (en) Commodity classification method and device
CN111612385B (en) Method and device for clustering articles to be distributed
CN114663015A (en) Replenishment method and device
CN109902847B (en) Method and device for predicting amount of orders in branch warehouse
CN112016581A (en) Multidimensional data processing method and device, computer equipment and storage medium
CN112667770A (en) Method and device for classifying articles
CN113780912A (en) Method and device for determining safety stock
CN111832782A (en) Method and device for determining physical distribution attribute of article
CN110766431A (en) Method and device for judging whether user is sensitive to coupon
CN112783468A (en) Target object sorting method and device
CN114677174A (en) Method and device for calculating sales volume of unladen articles
CN112418898A (en) Article demand data analysis method and device based on multi-time window fusion
CN108109002B (en) Data processing method and device
CN112784861A (en) Similarity determination method and device, electronic equipment and storage medium
CN112528103A (en) Method and device for recommending objects
CN111401935A (en) Resource allocation method, device and storage medium
CN113554041B (en) Method and device for marking labels for users
CN112529708B (en) Customer identification method and device and electronic equipment
CN116308465B (en) Big data analysis system based on mobile payment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination