CN109961165B - Method, device, equipment and storage medium for predicting part quantity - Google Patents

Method, device, equipment and storage medium for predicting part quantity Download PDF

Info

Publication number
CN109961165B
CN109961165B CN201711429088.9A CN201711429088A CN109961165B CN 109961165 B CN109961165 B CN 109961165B CN 201711429088 A CN201711429088 A CN 201711429088A CN 109961165 B CN109961165 B CN 109961165B
Authority
CN
China
Prior art keywords
percentile
predicted
date
random forest
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711429088.9A
Other languages
Chinese (zh)
Other versions
CN109961165A (en
Inventor
黄心远
张耀武
王本玉
郭九双
金晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SF Technology Co Ltd
Original Assignee
SF Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SF Technology Co Ltd filed Critical SF Technology Co Ltd
Priority to CN201711429088.9A priority Critical patent/CN109961165B/en
Publication of CN109961165A publication Critical patent/CN109961165A/en
Application granted granted Critical
Publication of CN109961165B publication Critical patent/CN109961165B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/083Shipping

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Databases & Information Systems (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • Operations Research (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Fuzzy Systems (AREA)
  • Game Theory and Decision Science (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a piece quantity prediction method, a piece quantity prediction device, piece quantity prediction equipment and a storage medium thereof. The method comprises the following steps: acquiring historical part quantity data of a first preset time range, and dividing the historical part quantity data into a training data set and a verification data set; creating a random forest model based on the training data set and the characteristic attribute; predicting the verification data set by using a random forest model to obtain a prediction percentile; and inputting the historical piece quantity data of the second preset time range into a random forest model, and acquiring the piece quantity of the date to be predicted according to the prediction percentile. According to the technical scheme provided by the embodiment of the application, the random forest model is created for predicting the quantity of the part by distinguishing the characteristic attributes such as the special date, the quantity fluctuation abnormal date and the like, so that the accuracy of the model is enhanced. And a predicted percentile is obtained by statistically analyzing the predicted result, and the model is adjusted by using the predicted percentile, so that the problem of under fitting in the prior art is solved, and the verification error is reduced.

Description

Method, device, equipment and storage medium for predicting part quantity
Technical Field
The present application relates generally to the field of computers, and in particular, to the field of data mining processing, and more particularly, to a method, apparatus, device, and storage medium for predicting a piece amount.
Background
With the continuous deepening of the information management reform of the logistics industry. In the face of massive data in the logistics industry, informationized management in the logistics industry faces how to effectively extract useful information from a large amount of complex data.
Data mining techniques may extract useful information implied from a large, ambiguous data to analyze and predict future developments, e.g., predicting the quantity of pieces at a business point on a day in the future using a common random forest algorithm.
However, a common random forest algorithm is currently used for averaging a large number of decision trees on different dates, such as different working days or non-working days in different months, and the method lacks interpretation. Secondly, the random forest algorithm belongs to a non-parameter method, has no adjustability, can only improve the prediction accuracy by a method for improving the number of decision trees, and correspondingly, also can pay the cost of modeling operation time, thereby causing the reduction of model efficiency.
Therefore, a new solution is needed to solve the above problems.
Disclosure of Invention
In view of the above-described drawbacks or deficiencies of the prior art, it is desirable to provide a solution for introducing a percentile predictor quantity in a random forest model.
In a first aspect, an embodiment of the present application provides a method for predicting a part quantity, including:
acquiring historical part quantity data of a first preset time range, and dividing the historical part quantity data into a training data set and a verification data set;
creating a random forest model based on the training data set and the characteristic attribute;
predicting the verification data set by using a random forest model to obtain a prediction percentile;
and inputting the historical piece quantity data of the second preset time range into a random forest model, and acquiring the piece quantity of the target date to be predicted according to the prediction percentile.
In a second aspect, an embodiment of the present application provides a piece amount predicting device, including:
the data acquisition unit is used for acquiring historical part quantity data in a first preset time range and dividing the historical part quantity data into a training data set and a verification data set;
the model creation unit is used for creating a random forest model based on the training data set and the characteristic attribute;
the prediction percentile unit is used for predicting the verification data set by utilizing the random forest model to obtain a prediction percentile;
the piece quantity prediction unit is used for inputting historical piece quantity data of a second preset time range into the random forest model, and obtaining the piece quantity of the target date to be predicted according to the prediction percentile.
In a third aspect, an embodiment of the present application provides an apparatus, including a processor, a storage device;
the aforementioned storage means for storing one or more programs;
the program or programs described above are executed by the processor(s) to cause the processor(s) to implement the methods described in embodiments of the present application.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as described in embodiments of the present application.
According to the piece quantity prediction scheme provided by the embodiment of the application, the predicted result is adjusted by introducing the percentile into the random forest model, so that the predicted result accords with the trend of the historical data. According to the technical scheme provided by the embodiment of the application, the random forest model is created for predicting the quantity of the part by distinguishing the characteristic attributes such as the special date, the quantity fluctuation abnormal date and the like, so that the accuracy of the model is enhanced. And a predicted percentile is obtained by statistically analyzing the predicted result, and the model is adjusted by using the predicted percentile, so that the problem of under fitting in the prior art is solved, and the verification error is reduced.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the accompanying drawings in which:
FIG. 1 shows a schematic flow chart of a part quantity method provided by an embodiment of the application;
FIG. 2 is a flow chart of step 103 according to another embodiment of the present application;
fig. 3 is a schematic structural diagram of a component amount predicting device according to an embodiment of the present application;
fig. 4 shows a schematic structural diagram of a prediction percentile unit 303 according to another embodiment of the present application;
fig. 5 shows a schematic diagram of a computer system suitable for use in implementing the terminal device of an embodiment of the application.
Detailed Description
The application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be noted that, for convenience of description, only the portions related to the application are shown in the drawings.
It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.
Referring to fig. 1, fig. 1 is a flow chart illustrating a method for predicting a quantity of a part according to an embodiment of the present application.
As shown in fig. 1, the method includes:
step 101, acquiring historical part quantity data of a first preset time range, and dividing the historical part quantity data into a training data set and a verification data set.
In the embodiment of the application, the historical part quantity data is acquired to predict the part quantity of a certain date in the future. The first preset time range can extract piece quantity data of a certain business area for a certain period of time according to the data selection and processing requirements, wherein the piece quantity data comprises piece delivery quantity data, piece receiving quantity data and the like.
For example, the dispatch amount data of a business district, which is assumed to be 010 area, is extracted as the history dispatch amount data, and the first time preset range is 2015, 1, and 2017, 8, and 15. The data is extracted by taking province and city as business areas. Wherein, the data from 1 month, 2015, 1 day, to 15 months, 2017, 6 months, 15 days is used as the training data set, and the data from 16 months, 2017, 6 months, 16 days, to 15 months, 2017, 8 months is used as the verification data set.
Step 102, creating a random forest model based on the training data set and the characteristic attribute.
In the embodiment of the application, the selection limitation of the fracture attribute (or referred to as the feature attribute, the main feature and the influence factor) of the existing random forest model is considered, so that the piece quantity of a special date cannot be well predicted. Decision trees are generated by introducing more specific temporal features as splitting attributes (also referred to as feature attributes, or primary features). The splitting attribute includes at least: holidays, special dates, months, days of the week, dates, site locations, dates of abnormal fluctuation of the quantity of pieces, and the like. Wherein, the special date includes: merchant activity dates, such as dates on which merchant activities of twenty-one, twenty-two, etc. are prominent. The network site location is a business area divided according to a geographic location, for example, a business area divided according to a city area code. The date of the abnormal quantity fluctuation refers to quantity of the components in a preset statistical range, and after sorting according to the quantity of the components, the quantity date higher than a first threshold value is marked, and the quantity date lower than a second threshold value is marked. For example, for a 010AA dot, 100 monday pieces are extracted from the training dataset and sorted to obtain an ordered sequence, wherein the first 30 monday pieces of the sequence are all above a first threshold, the value of the threshold may be 200 pieces, the last 20 monday pieces of the sequence are all below a second threshold, the value of the threshold may be 50 pieces, and the remaining 50 monday pieces between the first and second thresholds are used as the normal piece count date. By identifying the date of the abnormal fluctuation of the piece quantity, the random forest model can be helped to capture the date characteristic of the abnormal fluctuation of the piece quantity, so that the accuracy of prediction is improved.
Step 102 may further include:
extracting samples from the training data set with a put-back ground, and creating a plurality of sample sets;
determining characteristic attributes at least including holidays, special dates, site positions and dates of fluctuation amplitude of the quantity;
training by using the plurality of sample sets and the characteristic attributes to obtain a plurality of decision trees so as to construct a random forest model.
Taking 010 area as an example, selecting data from 2015, 1, 6, 15, as a training data set, and data from 2017, 6, 16, 8, 15 as a verification data set, and creating a random forest model.
The training data set is subjected to row random extraction, N training samples are extracted in a put-back way assuming that the total sample size is N, then M characteristic attributes are extracted in a put-back-free way for M columns of attributes (namely characteristic attributes or split attributes) by column attribute sampling of the training data set. N, M, N, M all belong to natural numbers, where N is less than or equal to N and M is much less than M. The m feature attributes may include at least holidays, special dates, site locations, dates of piece-quantity fluctuation anomalies, and the like.
After the sampling is completed, a plurality of decision trees are created in a completely split mode, and the decision trees are combined to form a random forest, namely, a random forest model is built.
And step 103, predicting the verification data set by using a random forest model to obtain a prediction percentile.
After creating the random forest model, predictions are made using the validation dataset to obtain a number of predicted values. In the prior art, the average value of the predicted values is taken as a predicted final result. However, this approach is prone to model under-fitting problems. In order to overcome the problem, the embodiment of the application processes the plurality of predicted values to obtain the distribution characteristics of the predicted values, thereby introducing the distribution characteristics of the predicted values to adjust the random forest model, solving the problem of under fitting and effectively reducing verification errors.
Optionally, data from 16 days from 6 months in 2017 to 15 days from 8 months in 2017 is used as a verification data set, and is input into a random forest model to obtain a predicted value which is the same as the number of decision trees contained in the model. The representation may be referred to as a first predictor for distinguishing. For example, assuming that the random forest model is formed by combining 500 decision trees, predicting a certain sample in the verification data set by using the random forest model can obtain 500 predicted values of the sample, sorting the 500 predicted values from small to large (i.e., increasing order) to obtain a sorting result, calculating a corresponding cumulative percentile of the sorting result, and obtaining a value at a p% position, which is referred to herein as a p percentile, for distinguishing the first percentile set.
The p-th percentile may divide the piece-quantity prediction value into two parts. About p% of the predicted piece count is less than the p-th percentile; and about (100-p)% of the predicted amount is greater than the p-th percentile. For example, predicting 150 parts for a friday in the verification data set based on the historical part quantity data does not know what relationship exists between 150 parts and other friday parts relative to other friday parts. To further clarify the relationship between the data, the percentile calculation may be performed on all friday pieces. For example, if 150 pieces happens to correspond to the 70 th percentile, then the 70% friday piece count is less than 150 pieces and the 30% friday piece count is greater than 150 pieces.
In the embodiment of the application, the verification data set is input into a random forest model to obtain a prediction result. The prediction result is the same as the number of decision trees in the random forest model. Taking 500 decision trees as an example, the predicted result includes 500 predicted values. The 500 predicted values are arranged in increasing order to obtain a first ordering result. The first percentile set is obtained by using the first ranking result, and it is understood that each percentile corresponds to a value, for example, the 75 th percentile corresponds to the 375 th predicted value in the first ranking result.
After the first percentile set is obtained based on the first predicted value set, comparing the real part quantity value of the sample in the verification data set with the predicted values corresponding to all percentiles in the first percentile set, and determining the percentile corresponding to the predicted value with the smallest real part quantity error as the predicted percentile according to the minimum average absolute percent error (MAPE) principle.
Taking data from 16 days of 6 months in 2017 to 15 days of 8 months in 2017 as a verification data set as an example, the quantity of 010AA website pieces in 11 days of 8 months in 2017 is predicted. Assuming that the real part quantity value of 010AA website in 2017 8 month 11 days is 150, 500 predicted values of 010AA website in 2017 8 month 11 days are predicted by using the verification data set, and the part quantity of 010AA website in 2017 8 month 11 days can be determined by using the MAPE minimum principle.
Referring to fig. 2, fig. 2 is a schematic flow chart of step 103 in the embodiment of the present application:
step 103, may include:
step 201, predicting samples in a verification data set by using a random forest model to obtain a first predicted value set, wherein the first predicted value set comprises a plurality of first predicted values;
step 202, calculating a first percentile set according to the first predicted value set;
in step 203, a predicted percentile is determined in the first set of percentiles using the real parts corresponding to the samples in the validation dataset.
Wherein, step 202 may further include:
step 2021, sorting the first set of predicted values to obtain a first sorting result;
at step 2022, the first ranking result is statistically analyzed to obtain a first percentile set.
Wherein, step 203 may further include:
step 2031, obtaining a real part quantity of a sample from a verification data set;
step 2032, determining a first predicted value with the smallest error of the real part quantity according to the average absolute percentage error minimum principle, and taking the percentile corresponding to the first predicted value in the first percentile data set as a predicted percentile.
In the embodiment of the application, the predicted percentile is introduced into the random forest model, the parameter options are increased, and the verification error of the model is reduced by utilizing the adjustability of the predicted percentile.
And 104, inputting historical piece quantity data of a second preset time range into a random forest model, and acquiring the piece quantity of the date to be predicted according to the prediction percentile.
In order to predict the quantity of the pieces on a certain date, historical piece quantity data in a second set time range before the certain date is determined, and the data is used as a new training data set to train a random forest model for predicting the quantity of the pieces on the certain date. The second set time range may be limited to a certain date, and may acquire the piece count data of a natural number of months before the certain date, for example, 2 months or 3 months. Preferably, 3 months are chosen as an example.
Suppose that the volume of 010AA dots is predicted for 12 months 1 (friday) in 2017. Taking data from 1 st 9 th 2017 to 1 st 11 th 2017 as a training data set, inputting the training data set into a random forest model to obtain 500 predicted values output by the random forest model, arranging the 500 predicted values in order from small to large to obtain an arrangement result, calculating a corresponding accumulated percentile of the arrangement result, and obtaining a value at a p% position, which is called a p percentile for distinguishing and representing a second percentile set.
And (3) determining the position of the predicted percentile in the second percentile set by utilizing the predicted percentile determined in the step (103), and obtaining the piece quantity of the date to be predicted according to the piece quantity corresponding to the position.
Step 104 may further include:
acquiring historical piece quantity data in a second preset time range, taking the historical piece quantity data in the second preset time range as a new training data set, and determining the second preset time range according to the date;
inputting the new training data set into a random forest model to obtain a second prediction set, wherein the second prediction set comprises a plurality of second prediction values;
calculating a second percentile set according to the second predicted value set;
determining the position of the predicted percentile in the second percentile set;
and determining a corresponding second predicted value as the piece quantity of the date to be predicted based on the position.
Optionally, calculating the second percentile set from the second set of predictors may further include:
sequencing the second predicted value set to obtain a second sequencing result;
and statistically analyzing the second sequencing result to obtain a second percentile set.
It should be noted that although the operations of the method of the present application are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in that particular order or that all of the illustrated operations be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform. For example, calculating the first percentile set from the first predictor set may include ranking the first predictor set to obtain a first ranking result; and statistically analyzing the first sequencing result to obtain a first percentile set.
Referring to fig. 3, fig. 3 is a schematic structural diagram of a part quantity predicting device according to an embodiment of the application.
As shown in fig. 3, the apparatus 300 includes:
the data acquisition unit 301 acquires the historic piece amount data of the first preset time range, and divides the historic piece amount data into a training data set and a verification data set.
In the embodiment of the application, the historical part quantity data is acquired to predict the part quantity of a certain date in the future. The first preset time range can extract piece quantity data of a certain business area for a certain period of time according to the data selection and processing requirements, and the piece quantity data can be piece delivery quantity data, piece receiving quantity data and the like.
For example, the dispatch amount data of a business district, which is assumed to be 010 area, is extracted as the history dispatch amount data, and the first time preset range is 2015, 1, and 2017, 8, and 15. The data is extracted by taking province and city as business areas. Wherein, the data from 1 month, 2015, 1 day, to 15 months, 2017, 6 months, 15 days is used as the training data set, and the data from 16 months, 2017, 6 months, 16 days, to 15 months, 2017, 8 months is used as the verification data set.
The model creation unit 302 creates a random forest model based on the training data set and the feature attributes.
In the embodiment of the application, the selection limitation of the fracture attribute (or referred to as the feature attribute, the main feature and the influence factor) of the existing random forest model is considered, so that the piece quantity of a special date cannot be well predicted. Decision trees are generated by introducing more specific temporal features as splitting attributes (also referred to as feature attributes, or primary features). The splitting attribute includes at least: holidays, special dates, months, days of the week, dates, site locations, dates of abnormal fluctuation of the quantity of pieces, and the like. Wherein, the special date includes: merchant activity dates, such as dates on which merchant activities of twenty-one, twenty-two, etc. are prominent. The network site location is a business area divided according to a geographic location, for example, a business area divided according to a city area code. The date of the abnormal quantity fluctuation refers to quantity of the components in a preset statistical range, and after sorting according to the quantity of the components, the date of the component higher than a first threshold value is marked, and the date of the component lower than a second threshold value is marked. For example, for a 010AA dot, 100 monday pieces are extracted from the training dataset and sorted to obtain an ordered sequence, wherein the first 30 monday pieces of the sequence are all above a first threshold, the value of the threshold may be 200 pieces, the last 20 monday pieces of the sequence are all below a second threshold, the value of the threshold may be 50 pieces, and the remaining 50 monday pieces between the first and second thresholds are used as the normal piece count date. By identifying the date of the abnormal fluctuation of the piece quantity, the random forest model can be helped to capture the date characteristic of the abnormal fluctuation of the piece quantity, so that the accuracy of prediction is improved.
The model creation unit 302 may further include:
a sample extraction subunit for extracting samples from the training dataset with a put-back, creating a plurality of sample sets;
the characteristic attribute determining subunit is used for determining that the characteristic attribute at least comprises holidays, special dates, site positions and dates of fluctuation abnormality of the quantity;
the training subunit is used for training by utilizing the plurality of sample sets and the characteristic attributes to obtain a plurality of decision trees so as to construct a random forest model.
Taking 010 area as an example, selecting data from 2015, 1, 6, 15, as a training data set, and data from 2017, 6, 16, 8, 15 as a verification data set, and creating a random forest model.
The training data set is subjected to row random extraction, N training samples are extracted in a put-back way assuming that the total sample size is N, then M characteristic attributes are extracted in a put-back-free way for M columns of attributes (namely characteristic attributes or split attributes) by column attribute sampling of the training data set. N, M, N, M all belong to natural numbers, where N is less than or equal to N and M is much less than M. The m feature attributes may include at least holidays, special dates, site locations, dates of piece-quantity fluctuation anomalies, and the like.
After the sampling is completed, a plurality of decision trees are created in a completely split mode, and the decision trees are combined to form a random forest, namely, a random forest model is built.
The prediction percentile unit 303 is configured to predict the verification data set by using a random forest model, so as to obtain a prediction percentile.
After creating the random forest model, predictions are made using the validation dataset to obtain a number of predicted values. In the prior art, the average value of the predicted values is taken as a predicted final result. However, this approach is prone to model under-fitting problems. In order to overcome the problem, the embodiment of the application processes the plurality of predicted values to obtain the distribution characteristics of the predicted values, thereby introducing the distribution characteristics of the predicted values to adjust the random forest model, solving the problem of under fitting and effectively reducing verification errors.
Optionally, data from 16 days from 6 months in 2017 to 15 days from 8 months in 2017 is used as a verification data set, and is input into a random forest model to obtain a predicted value which is the same as the number of decision trees contained in the model. The representation may be referred to as a first predictor for distinguishing. For example, assuming that the random forest model is formed by combining 500 decision trees, predicting a certain sample in the verification data set by using the random forest model can obtain 500 predicted values of the sample, sorting the 500 predicted values from small to large (i.e., increasing order) to obtain a sorting result, calculating a corresponding cumulative percentile of the sorting result, and obtaining a value at a p% position, which is referred to herein as a p percentile, for distinguishing the first percentile set.
The p-th percentile divides the piece quantity prediction value into two parts. About p% of the predicted piece count is less than the p-th percentile; and about (100-p)% of the predicted amount is greater than the p-th percentile. For example, the amount of a part on friday in the historical part amount data is 150 parts, and it is not known what kind of relation exists between 150 parts and the amount of other friday parts in the historical part amount data. To further clarify the relationship between the data, a percentile calculation may be performed on all friday pieces in the historical piece count data. For example, if 150 happens to correspond to the 70 th percentile, then the 70% friday part count is below 150 parts and the 30% friday part count is above 150 parts.
After the first percentile set is obtained based on the first predicted value set, comparing the real part quantity value of the sample in the verification data set with the predicted values corresponding to all percentiles in the first percentile set, and determining the percentile corresponding to the predicted value with the smallest real part quantity error as the predicted percentile according to the average absolute percent error principle.
Taking data from 16 days of 6 months in 2017 to 15 days of 8 months in 2017 as a verification data set as an example, the quantity of 010AA website pieces in 11 days of 8 months in 2017 is predicted. Assuming that the real part quantity value of 010AA website in 2017 8 month 11 days is 150, 500 predicted values of 010AA website in 2017 8 month 11 days are predicted by using the verification data set, and the part quantity of 010AA website in 2017 8 month 11 days can be determined by using the MAPE minimum principle.
Referring to fig. 4, fig. 4 shows a schematic structural diagram of the prediction percentile unit 303 in an embodiment of the present application:
the prediction percentile unit 303 may include:
a first predicted value subunit 401, configured to predict a sample in the verification data set by using a random forest model, to obtain a first predicted value set, where the first predicted value set includes a plurality of first predicted values;
a first percentile calculation subunit 402, configured to calculate a first percentile set according to the first predicted value set;
a predictive percentile determination subunit 403, configured to determine a predictive percentile in the first set of percentiles using the real parts amounts corresponding to the samples in the validation dataset.
Wherein the first percentile computing subunit 402 comprises:
a first sorting subunit 4021, configured to sort the first set of predicted values to obtain a first sorting result;
the first statistical analysis subunit 4022 is configured to statistically analyze the first ranking result to obtain a first percentile set.
Wherein the prediction percentile determination subunit 403 comprises:
a verification value extraction subunit 4031 for obtaining the real-world quantity of the sample from the verification data set;
the percentile determining subunit 4032 is configured to determine, according to the average absolute percent error principle, a first predicted value with the smallest real-part quantity error, and take a percentile corresponding to the first predicted value in the first percentile data set as a predicted percentile.
In the embodiment of the application, the predicted percentile is introduced into the random forest model, the parameter options are increased, and the verification error of the model is reduced by utilizing the adjustability of the predicted percentile.
And the part quantity predicting unit 304 is configured to input historical part quantity data in a second preset time range into a random forest model, and obtain the part quantity of the date to be predicted according to the prediction percentile.
In order to predict the quantity of the pieces on a certain date, historical piece quantity data in a second set time range before the certain date is determined, and the data is used as a new training data set to train a random forest model for predicting the quantity of the pieces on the certain date. The second set time range may be limited to a certain date, and may acquire the piece count data of a natural number of months before the certain date, for example, 2 months or 3 months. Preferably, 3 months are chosen as an example.
Suppose that the volume of 010AA dots is predicted for 12 months 1 (friday) in 2017. Taking data from 1 st 9 th 2017 to 1 st 11 th 2017 as a training data set, inputting the training data set into a random forest model to obtain 500 predicted values output by the random forest model, arranging the 500 predicted values in order from small to large to obtain an arrangement result, calculating a corresponding accumulated percentile of the arrangement result, and obtaining a value at a p% position, which is called a p percentile for distinguishing and representing a second percentile set.
And determining the position of the predicted percentile in the second percentile set by using the predicted percentile determined by the predicted percentile unit 303, and obtaining the piece quantity of the date to be predicted according to the piece quantity corresponding to the position.
Optionally, the piece quantity prediction unit 304 may further include:
the data acquisition subunit is used for acquiring historical piece quantity data in a second preset time range, taking the historical piece quantity data in the second preset time range as a new training data set, and determining the second preset time range according to the date;
the second predicted value subunit is used for inputting the new training data set into the random forest model to obtain a second predicted set, and the second predicted value set comprises a plurality of second predicted values;
a second percentile calculation subunit, configured to calculate a second percentile set according to the second predicted value set;
a position determination subunit for determining a position of the predictive percentile in the second percentile set;
and the piece quantity determining subunit is used for determining the corresponding second predicted value as the piece quantity of the date to be predicted based on the position.
Optionally, the second percentile computing subunit may further include:
the second sequencing subunit is used for sequencing the second predicted value set to obtain a second sequencing result;
and the second statistical analysis subunit is used for statistically analyzing the second sorting result to obtain a second percentile set.
It should be understood that the elements or modules depicted in apparatus 300 correspond to the various steps in the method described with reference to fig. 1. Thus, the operations and features described above with respect to the method are equally applicable to the apparatus 300 and the units contained therein, and are not described in detail herein. The apparatus 300 may be implemented in advance in a browser or other security application of the electronic device, or may be loaded into the browser or security application of the electronic device by means of downloading or the like. The corresponding units in the apparatus 300 may cooperate with units in an electronic device to implement aspects of embodiments of the present application.
Referring now to FIG. 5, there is illustrated a schematic diagram of a computer system 500 suitable for use in implementing a terminal device or server in accordance with an embodiment of the present application.
As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU) 501, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following components are connected to the I/O interface 505: an input section 506 including a keyboard, a mouse, and the like; an output portion 507 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as LA several cards, modems, etc. The communication section 509 performs communication processing via a network such as the internet. The drive 510 is also connected to the I/O interface 505 as needed. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as needed so that a computer program read therefrom is mounted into the storage section 508 as needed.
In particular, according to embodiments of the present disclosure, the process described above with reference to fig. 1 may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program containing program code for performing the method of fig. 1. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 509, and/or installed from the removable media 511.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules involved in the embodiments of the present application may be implemented in software or in hardware. The described units or modules may also be provided in a processor, for example, as: a processor includes a data acquisition unit, a model creation unit, a prediction percentile unit, and a quantity prediction unit. Where the names of these units or modules do not constitute a limitation on the unit or module itself in some cases, for example, the model creation unit may also be described as "a unit for creating a model".
As another aspect, the present application also provides a computer-readable storage medium, which may be a computer-readable storage medium contained in the foregoing apparatus in the foregoing embodiment; or may be a computer-readable storage medium, alone, that is not assembled into a device. The computer-readable storage medium stores one or more programs for use by one or more processors in performing the component amount prediction method described in the present application.
The above description is only illustrative of the preferred embodiments of the present application and of the principles of the technology employed. It will be appreciated by persons skilled in the art that the scope of the application referred to in the present application is not limited to the specific combinations of the technical features described above, but also covers other technical features formed by any combination of the technical features described above or their equivalents without departing from the inventive concept described above. Such as the above-mentioned features and the technical features disclosed in the present application (but not limited to) having similar functions are replaced with each other.

Claims (13)

1. A method of predicting a quantity of a part, the method comprising:
acquiring historical piece quantity data of a first preset time range, and dividing the historical piece quantity data into a training data set and a verification data set;
creating a random forest model based on the training data set and the characteristic attribute;
predicting the verification data set by utilizing the random forest model to obtain a prediction percentile;
inputting historical piece quantity data of a second preset time range into the random forest model, and acquiring piece quantity of a date to be predicted according to the prediction percentile;
the creating a random forest model based on the training data set and the characteristic attribute comprises:
extracting samples from the training data set with a place back, and creating a plurality of sample sets;
determining the characteristic attribute at least comprising holidays, special dates, site positions and dates with abnormal fluctuation of the quantity; the date of the abnormal quantity fluctuation is quantity of the pieces in a preset statistical range, after sorting according to quantity of the pieces, the quantity date higher than a first threshold value is marked, and the quantity date lower than a second threshold value is marked;
training by using the plurality of sample sets and the characteristic attribute to obtain a plurality of decision trees so as to construct the random forest model;
inputting historical piece quantity data of a second preset time range into the random forest model, and acquiring the piece quantity of a date to be predicted according to the prediction percentile, wherein the method comprises the following steps:
acquiring historical piece quantity data in a second preset time range, and taking the historical piece quantity data in the second preset time range as a new data set, wherein the second preset time range is determined according to the date to be predicted;
inputting the new data set into the random forest model to obtain a second prediction set, wherein the second prediction set comprises a plurality of second prediction values;
calculating a second percentile set according to the second predicted value set;
determining a location of the predictive percentile in the second percentile set;
and determining a corresponding second predicted value as the piece quantity of the date to be predicted based on the position.
2. The method of claim 1, wherein predicting the validation data set using the random forest model to obtain a prediction percentile comprises:
predicting samples in the verification data set by using the random forest model to obtain a first predicted value set, wherein the first predicted value set comprises a plurality of first predicted values;
calculating a first percentile set according to the first predicted value set;
and determining a prediction percentile in the first percentage set by utilizing the real part quantity corresponding to the sample in the verification data set.
3. The method of claim 2, wherein said calculating a first set of percentiles from said first set of predictors comprises:
sequencing the first predicted value set to obtain a first sequencing result;
and statistically analyzing the first sequencing result to obtain a first percentile set.
4. A method according to claim 3, wherein said determining a predictive percentile in said first set of percentiles using the corresponding real parts amounts of the samples in said validation dataset comprises:
obtaining a real-world volume of a sample from the verification dataset;
and determining a first predicted value with the smallest error of the real piece quantity according to the average absolute percentage error minimum principle, and taking the percentile corresponding to the first predicted value in the first percentile data set as a predicted percentile.
5. The method of claim 1, wherein said calculating a second set of percentiles from said second set of predictors comprises:
sequencing the second predicted value set to obtain a second sequencing result;
and statistically analyzing the second sequencing result to obtain a second percentile set.
6. The method according to any one of claims 1 to 5, wherein,
the holiday is a national legal holiday;
the special date is the merchant activity date;
the network point positions are business areas divided according to geographic positions;
the date of the abnormal quantity fluctuation comprises a positive marking value and a negative marking value, the quantity of the pieces in a preset statistical range is counted, the pieces are ordered, the date of the piece with the quantity higher than a first threshold value is defined as the positive marking value, and the date of the piece with the quantity lower than a second threshold value is defined as the negative marking value.
7. A piece quantity predicting apparatus, characterized by comprising:
the data acquisition unit is used for acquiring historical part quantity data in a first preset time range and dividing the historical part quantity data into a training data set and a verification data set;
a model creation unit for creating a random forest model based on the training data set and the feature attribute;
the prediction percentile unit is used for predicting the verification data set by utilizing the random forest model to obtain a prediction percentile;
the piece quantity predicting unit is used for inputting historical piece quantity data of a second preset time range into the random forest model, and acquiring the piece quantity of a date to be predicted according to the prediction percentile;
the model creation unit includes:
a sample extraction subunit, configured to extract samples from the training dataset with a place back, and create a plurality of sample sets;
a characteristic attribute determining subunit, configured to determine that the characteristic attribute at least includes a holiday, a special date, a site location, and a date of abnormality of a component quantity fluctuation; the date of the abnormal quantity fluctuation is quantity of the pieces in a preset statistical range, after sorting according to quantity of the pieces, the quantity date higher than a first threshold value is marked, and the quantity date lower than a second threshold value is marked;
the training subunit is used for training by utilizing the plurality of sample sets and the characteristic attribute to obtain a plurality of decision trees so as to construct the random forest model;
the piece amount prediction unit includes:
the data acquisition subunit is used for acquiring historical piece quantity data in a second preset time range, taking the historical piece quantity data in the second preset time range as a new data set, and determining the second preset time range according to the date to be predicted;
the second predicted value subunit is used for inputting the new data set into a random forest model to obtain a second predicted set, and the second predicted value set comprises a plurality of second predicted values;
a second percentile calculation subunit, configured to calculate a second percentile set according to the second predicted value set;
a position determination subunit configured to determine a position of the prediction percentile in the second percentile set;
and the piece quantity determining subunit is used for determining a corresponding second predicted value as the piece quantity of the date to be predicted based on the position.
8. The apparatus of claim 7, wherein the predictive percentile unit comprises:
the first predicted value subunit is used for predicting the samples in the training data set by utilizing the random forest model to obtain a first predicted value set, and the first predicted value set comprises a plurality of first predicted values;
a first percentage calculating subunit, configured to calculate a first percentile set according to the first predicted value set;
and the prediction percentile determining subunit is used for determining the prediction percentile in the first percentile set by utilizing the real part quantity corresponding to the sample in the verification data set.
9. The apparatus of claim 8, wherein the first percentage calculation subunit comprises:
the first sequencing subunit is used for sequencing the first predicted value set to obtain a first sequencing result;
and the first statistical analysis subunit is used for statistically analyzing the first sequencing result to obtain the first percentile set.
10. The apparatus according to claim 8 or 9, wherein the prediction percentile determination subunit comprises:
a verification value extraction subunit for obtaining a real quantity of the sample from the verification dataset;
and the percentile determining subunit is used for determining a first predicted value with the smallest real part quantity error according to an average absolute percent error principle, and taking the percentile corresponding to the first predicted value in the first percentile data set as a predicted percentile.
11. The apparatus of claim 10, wherein the second percentile computing subunit comprises:
the second sorting subunit is used for sorting the second predicted value set to obtain a second sorting result;
and the second statistical analysis subunit is used for statistically analyzing the second sequencing result to obtain a second percentile set.
12. A computer device comprising a processor, a storage means; the method is characterized in that:
the storage device is used for storing one or more programs;
the one or more programs, when executed by the processor, cause the processor to implement the method of any of claims 1-5.
13. A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any of claims 1-5.
CN201711429088.9A 2017-12-25 2017-12-25 Method, device, equipment and storage medium for predicting part quantity Active CN109961165B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711429088.9A CN109961165B (en) 2017-12-25 2017-12-25 Method, device, equipment and storage medium for predicting part quantity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711429088.9A CN109961165B (en) 2017-12-25 2017-12-25 Method, device, equipment and storage medium for predicting part quantity

Publications (2)

Publication Number Publication Date
CN109961165A CN109961165A (en) 2019-07-02
CN109961165B true CN109961165B (en) 2023-11-28

Family

ID=67021780

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711429088.9A Active CN109961165B (en) 2017-12-25 2017-12-25 Method, device, equipment and storage medium for predicting part quantity

Country Status (1)

Country Link
CN (1) CN109961165B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110400024B (en) * 2019-07-31 2021-03-30 京东城市(北京)数字科技有限公司 Order prediction method, device, equipment and computer readable storage medium
CN112862137A (en) * 2019-11-27 2021-05-28 顺丰科技有限公司 Method and device for predicting quantity, computer equipment and computer readable storage medium
CN112906930A (en) * 2019-12-04 2021-06-04 顺丰科技有限公司 Site cargo quantity prediction method, device, equipment and storage medium
CN112990520A (en) * 2019-12-13 2021-06-18 顺丰科技有限公司 Mesh point connection quantity prediction method and device, computer equipment and storage medium
CN112966849B (en) * 2019-12-13 2024-06-07 顺丰科技有限公司 Method, device and equipment for establishing part quantity prediction model
CN112990526A (en) * 2019-12-16 2021-06-18 顺丰科技有限公司 Method and device for predicting logistics arrival quantity and storage medium
CN112183832A (en) * 2020-09-17 2021-01-05 上海东普信息科技有限公司 Express pickup quantity prediction method, device, equipment and storage medium
CN114548565A (en) * 2022-02-24 2022-05-27 天津大学 Express prediction method based on random forest

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103257921A (en) * 2013-04-16 2013-08-21 西安电子科技大学 Improved random forest algorithm based system and method for software fault prediction
CN105427194A (en) * 2015-12-21 2016-03-23 西安美林数据技术股份有限公司 Method and device for electricity sales amount prediction based on random forest regression
CN106991437A (en) * 2017-03-20 2017-07-28 浙江工商大学 The method and system of sewage quality data are predicted based on random forest
CN107092751A (en) * 2017-04-24 2017-08-25 厦门大学 Variable weight model combination forecasting method based on Bootstrap
CN107194216A (en) * 2017-05-05 2017-09-22 中南大学 A kind of mobile identity identifying method and system of the custom that swiped based on user

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103257921A (en) * 2013-04-16 2013-08-21 西安电子科技大学 Improved random forest algorithm based system and method for software fault prediction
CN105427194A (en) * 2015-12-21 2016-03-23 西安美林数据技术股份有限公司 Method and device for electricity sales amount prediction based on random forest regression
CN106991437A (en) * 2017-03-20 2017-07-28 浙江工商大学 The method and system of sewage quality data are predicted based on random forest
CN107092751A (en) * 2017-04-24 2017-08-25 厦门大学 Variable weight model combination forecasting method based on Bootstrap
CN107194216A (en) * 2017-05-05 2017-09-22 中南大学 A kind of mobile identity identifying method and system of the custom that swiped based on user

Also Published As

Publication number Publication date
CN109961165A (en) 2019-07-02

Similar Documents

Publication Publication Date Title
CN109961165B (en) Method, device, equipment and storage medium for predicting part quantity
Peling et al. Implementation of Data Mining To Predict Period of Students Study Using Naive Bayes Algorithm
CN109961248B (en) Method, device, equipment and storage medium for predicting waybill complaints
CN110443657B (en) Client flow data processing method and device, electronic equipment and readable medium
CN114048436A (en) Construction method and construction device for forecasting enterprise financial data model
CN112184304A (en) Method, system, server and storage medium for assisting decision
CN112508456A (en) Food safety risk assessment method, system, computer equipment and storage medium
CN113032403A (en) Data insight method, device, electronic equipment and storage medium
CN113516417A (en) Service evaluation method and device based on intelligent modeling, electronic equipment and medium
CN115936895A (en) Risk assessment method, device and equipment based on artificial intelligence and storage medium
CN112990583A (en) Method and equipment for determining mold entering characteristics of data prediction model
CN114092230A (en) Data processing method and device, electronic equipment and computer readable medium
CN111460293B (en) Information pushing method and device and computer readable storage medium
CN113283806A (en) Enterprise information evaluation method and device, computer equipment and storage medium
CN115757075A (en) Task abnormity detection method and device, computer equipment and storage medium
CN115689713A (en) Abnormal risk data processing method and device, computer equipment and storage medium
CN114925275A (en) Product recommendation method and device, computer equipment and storage medium
US20220156666A1 (en) Systems and methods for confidence interval transaction settlement range predictions
CN114565470A (en) Financial product recommendation method based on artificial intelligence and related equipment thereof
CN114626940A (en) Data analysis method and device and electronic equipment
CN110264251B (en) Data organization form for representing cash flow and prediction method based on multi-task learning
CN113704407A (en) Complaint amount analysis method, device, equipment and storage medium based on category analysis
CN112069807A (en) Text data theme extraction method and device, computer equipment and storage medium
CN117172632B (en) Enterprise abnormal behavior detection method, device, equipment and storage medium
CN111353751A (en) Batch card supplementing recovery method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant