CN109767024B

CN109767024B - Method and device for predicting quantity of components, electronic equipment and storage medium

Info

Publication number: CN109767024B
Application number: CN201711108284.6A
Authority: CN
Inventors: 王本玉; 许颖聪; 金晶
Original assignee: SF Technology Co Ltd
Current assignee: SF Technology Co Ltd
Priority date: 2017-11-09
Filing date: 2017-11-09
Publication date: 2023-04-07
Anticipated expiration: 2037-11-09
Also published as: CN109767024A

Abstract

The invention provides a method and a device for predicting a component, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring historical data, and generating a data set marked with a plurality of express industry characteristics; inputting the training set into a linear regression model, extracting first linear sequence information and first residual sequence information, and generating a periodic pool according to the first residual sequence information; inputting the verification set into a linear regression model to extract second linear sequence information and second residual sequence information, and extracting a plurality of periodic sequences of the second residual sequence information according to a periodic pool to check whether the second residual sequence information has periodicity: if so, extracting the periodic effect and the lag phase information through the state space model, and generating and outputting a second prediction result according to the second linear sequence information and the periodic effect and lag phase information. The invention realizes the long-term piece quantity prediction with better effect.

Description

Method and device for predicting quantity of component, electronic equipment and storage medium

Technical Field

The application relates to the technical field of express delivery management, in particular to a method and a device for predicting a quantity, electronic equipment and a storage medium.

Background

For the express delivery industry, long-term piece quantity prediction is very important, and the matters such as resource allocation, personnel scheduling, peak commanding and scheduling and the like can be supported. However, due to the fact that the quantity change of the express delivery industry has multiple periods (such as weeks, months, quarters and the like), a long-term trend is obvious, a lag period exists, the difference between different network points is large, the method is influenced by exogenous impacts such as weather and major holidays, and data characteristics such as effect before and after the festival exist, and the prediction scene is very complex.

Fig. 1 is a schematic diagram of the real dispatch volume of a certain network point. As shown in fig. 1, in the network, the holidays such as the qingming festival, the noon festival, the wuyi festival and the eleven festival form downward exogenous impact on the pieces, the shopping festivals such as the twenty-one festival and the twenty-two festival form upward exogenous impact on the pieces, the spring festival, the mid-autumn festival and the like form upward exogenous impact and downward exogenous impact, and the like.

The complex scenes generate great challenges for the existing component prediction technology, so that the prediction results of the existing component prediction technology have large deviation. The current component prediction methods applied in the industry all have significant defects, wherein an autoregressive integrated moving average model (ARIMA) drifts in the long-term prediction direction, and effective long-term prediction cannot be carried out; the random forest model, the XGboost model and the Prophet model do not consider the sequence of time sequences, namely samples far away in time and new samples have the same weight, information cannot be fully utilized, and the long-term prediction effect is poor.

Disclosure of Invention

In view of the above-mentioned defects or shortcomings in the prior art, it is desirable to provide a method, an apparatus, a device and a storage medium for predicting a long-term parcel quantity in a complex scene in the express delivery industry.

In a first aspect, the present invention provides a method for predicting a quantity, including:

acquiring historical data, generating a data set marked with a plurality of express industry characteristics, and dividing a training set and a verification set from the data set; express industry features include long-term trend features, and at least one of: holiday features, shopping day features, and other exogenous impact features;

establishing a linear regression model of a first quantity sequence of a training set and express industry characteristics, and extracting first linear sequence information;

obtaining first residual sequence information according to the first quantity sequence and the first linear sequence information, and generating a periodic pool according to the first residual sequence information;

inputting the second vector sequence of the verification set into a linear regression model to extract second linear sequence information;

obtaining second residual sequence information according to the second component sequence and the second linear sequence information, and extracting a plurality of periodic sequences of the second residual sequence information according to the periodic pool to check whether the second residual sequence information has periodicity:

if so, extracting the periodic effect and the lag phase information from the second residual sequence information through the state space model, and generating and outputting a second prediction result according to the second linear sequence information and the periodic effect and lag phase information.

In a second aspect, the present invention provides a component prediction apparatus including a data set configuration unit, a first linear extraction unit, a cycle pool generation unit, a second linear extraction unit, a periodicity verification unit, and a second prediction unit.

The data set configuration unit is used for acquiring historical data, generating a data set marked with a plurality of express industry characteristics, and dividing a training set and a verification set from the data set. Wherein the express industry characteristics include long-term trend characteristics, and at least one of: holiday features, shopping day features, and other exogenous impact features.

The first linear extraction unit is configured to establish a linear regression model of a first quantity sequence of the training set and express industry characteristics, and extract first linear sequence information.

The period pool generating unit is configured to obtain first residual sequence information according to the first quantitative sequence and the first linear sequence information, and generate a period pool according to the first residual sequence information.

The second linear extraction unit is configured to input the second sequence of quantities of the verification set into the linear regression model to extract second linear sequence information.

The periodicity verifying unit is configured to obtain second residual sequence information according to the second component sequence and the second linear sequence information, and extract a plurality of periodic sequences of the second residual sequence information according to the periodic pool to verify whether the second residual sequence information has periodicity.

And the second prediction unit is configured to extract periodic effect and lag phase information from the second residual sequence information through the state space model when the second residual sequence information has periodicity, and generate and output a second prediction result according to the second linear sequence information and the periodic effect and lag phase information.

In a third aspect, the present invention also provides an electronic device comprising one or more processors and a memory, wherein the memory contains instructions executable by the one or more processors to cause the one or more processors to perform a method of component prediction provided according to embodiments of the present invention.

In a fourth aspect, the present invention also provides a computer-readable storage medium storing a computer program for causing a computer to execute the method for predicting a component provided according to the embodiments of the present invention.

According to the method, the device, the equipment and the storage medium for predicting the component, which are provided by the embodiments of the invention, long-term trend characteristics are marked in a data set, linear information representing long-term trend and exogenous impact is extracted through a linear regression model, and nonlinear information representing periodicity and a lag phase is extracted when the residual sequence information has periodicity, so that long-term component prediction with a good effect is finally realized;

the method, the device, the equipment and the storage medium for predicting the quantity further determine a power spectrogram period by using a power spectrogram generated by residual sequence information of a training set, and then generate a period pool by integrating the commonly used periods of the express industry obtained by experience, so that the accuracy and the comprehensiveness of the period pool are guaranteed, and the prediction effect is further guaranteed;

the component prediction method, the device, the equipment and the storage medium provided by some embodiments of the invention further perform white noise test on the third residual sequence information after the periodic effect and the lag phase information are extracted to judge whether all effective information is completely extracted, and return to an optimized linear regression model or extract error information when the effective information is not completely extracted to further guarantee the accuracy of prediction;

the method, the device, the equipment and the storage medium for predicting the component further adopt a test set to evaluate the prediction effect by taking MAPE as an evaluation standard, so that the prediction effect is objectively evaluated.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:

fig. 1 is a schematic diagram of a real dispatch volume of a certain website.

Fig. 2 is a flowchart of a component prediction method according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of second linear sequence information extracted in the method shown in fig. 2.

FIG. 4 is a diagram illustrating the cycle effect and the lag information extracted in the method shown in FIG. 2.

FIG. 5 is a diagram illustrating a comparison between the second predicted result and the actual dispatch amount output in the method of FIG. 2.

FIG. 6 is a flow chart of a preferred embodiment of the method shown in FIG. 2.

Fig. 7 is a flow chart of step S10 in a preferred embodiment of the method of fig. 2.

Fig. 8 is a flowchart of a preferred embodiment of step S10 shown in fig. 7.

Fig. 9 is a flow chart of step S30 in a preferred embodiment of the method shown in fig. 2.

Fig. 10 is a power spectrum generated in the method of fig. 9.

Fig. 11 is a flow chart of step S50 in a preferred embodiment of the method of fig. 2.

Fig. 12 is a flow chart of step S70 in a preferred embodiment of the method of fig. 2.

FIG. 13 is a flow chart of a preferred embodiment of the method of FIG. 2.

Fig. 14 is a schematic structural diagram of a component prediction apparatus according to an embodiment of the present invention.

Fig. 15 is a schematic structural view of a preferred embodiment of the apparatus shown in fig. 14.

Fig. 16 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

As shown in fig. 2, in this embodiment, the method for predicting a component provided by the present invention includes:

s10: acquiring historical data, generating a data set marked with a plurality of express industry characteristics, and dividing a training set and a verification set from the data set; express industry features include long-term trend features, and at least one of: holiday features, shopping day features, and other exogenous impact features;

s20: establishing a linear regression model of a first quantity sequence of a training set and express industry characteristics, and extracting first linear sequence information;

s30: obtaining first residual sequence information according to the first quantity sequence and the first linear sequence information, and generating a periodic pool according to the first residual sequence information;

s40: inputting the second vector sequence of the verification set into a linear regression model to extract second linear sequence information;

s50: obtaining second residual sequence information according to the second component sequence and the second linear sequence information, and extracting a plurality of periodic sequences of the second residual sequence information according to the periodic pool to check whether the second residual sequence information has periodicity:

if yes, the flow proceeds to step S70: and extracting periodic effect and lag phase information from the second residual sequence information through a state space model, and generating and outputting a second prediction result according to the second linear sequence information and the periodic effect and lag phase information.

Specifically, in step S10, in this embodiment, the historical data is imported into the template with each express industry feature as the name of each column to complete the labeling of the express industry feature. For example, the first column of the template is set to "date", the second column is "whether or not to holiday", the third column is "holiday name", and so on. In further embodiments, the labeling may be accomplished in different ways, such as automatic labeling by a configuration program or manual labeling.

In this embodiment, the courier industry features include long term trend features, holiday features, shopping day features, and other exogenous impact features.

In particular, the long-term trend characteristics include in particular: setting the daily trend index of the first day in the acquired historical data as 1, and increasing the daily trend index by 1 every subsequent day; and the week tendency index is set to be 1 in the week tendency index of each day in the first week in the acquired historical data, and the week tendency index is increased by 1 every week later. Still further, the long-term trend characteristics may also include monthly trend indices, quarterly trend indices, annual trend indices, cyclical trend indices, and the like.

The holiday characteristics specifically include: whether a holiday is a holiday, a holiday name, whether a holiday is a day before, whether a holiday is a day after, whether a rest is a day, whether a weekend is a weekend, and the like.

The shopping segment features specifically include: whether to shop for a section, whether to dueleven, whether to 618, etc.

Other exogenous impact characteristics include in particular: weather, whether a branch is on a workday, whether a particular customer is on a workday, etc.

In addition, express industry features may also include some commonly used periodic features such as whether monday, whether tuesday, \8230, whether sunday, etc.

In more embodiments, different configurations can be adopted for the characteristics of the express industry according to different requirements such as actual conditions of different network points, and on the premise of including the long-term trend characteristic, the characteristics can include any one or more of a holiday characteristic, a shopping festival characteristic and other exogenous impact characteristics, and each type of characteristics can include any one or more of the characteristics. For example, if a store customer is unavailable to a store for a long time due to geographical reasons, the store can be predicted without configuring the shopping mall feature. In addition, different initial values and incremental changes can be configured for each long-term trend characteristic according to actual conditions.

After a data set marked with a plurality of express industry characteristics is generated, the data set can be divided into a training set and a verification set to respectively train and optimize the models for prediction, and after a predicted time period passes, the prediction effect is evaluated by using data of the predicted time period; the data set can also be divided into a training set, a verification set and a test set, and after the prediction of the time period of the test set is completed, the prediction effect is directly evaluated by using the test set. The former approach is typically employed in practical application scenarios, while the latter approach is employed in model and method evaluation scenarios.

In step S20, a linear regression model of the first quantity sequence of the training set and the express industry feature is established, so as to extract first linear sequence information. Specifically, the purpose of extracting the linear sequence information in step S20 is to enable step S30 to extract a more accurate period from the remaining residual sequence information by extracting the linear sequence information representing the long-term trend and the exogenous impact.

In step S30, the first linear sequence information extracted in step S20 is subtracted from the first residual sequence of the training set to obtain first residual sequence information. Since the linear sequence information characterizing long-term trends and exogenous impacts has been removed, several cycles can be extracted from the first residual sequence information, thereby generating a cycle pool comprising several cycles. Furthermore, the cycle pool can be generated by combining the common cycle of the express delivery industry which is known according to experience. In this embodiment, if the total number of cycles in a predetermined range (e.g., 3 days to 91.3 days, or other customized range) exceeds 8, the largest 8 cycles are selected to generate the cycle pool, and if the total number of cycles in the predetermined range is less than 8, all cycles are added to the cycle pool. In further embodiments, different numbers of cycles and screening methods may be configured for the cycle pool based on actual demand.

In step S40, after the cycle pool is generated, since the cycle pool is generated based on the data of the training set and cannot be verified by the training set, it is necessary to verify with the verification set.

Fig. 3 is a schematic diagram of second linear sequence information extracted in the method shown in fig. 2. As shown in fig. 3, in the same manner as in step S20, second linear sequence information is extracted from the second vector sequence of the verification set by the linear regression model.

In step S50, the second linear sequence information extracted in step S40 is subtracted from the second component sequence of the verification set to obtain second residual sequence information. The periodicity pool generated in step S30 is reused to check whether the second residual sequence information has periodicity. Specifically, the method of inspection shown in fig. 10, which will be described later, may be used, and other different periodic inspection algorithms commonly used in the art may also be used.

When the second residual sequence information does not have periodicity as a result of the check in step S50, the linear regression model may be optimized to further fully extract the linear sequence information in the second component sequence, and a first prediction result is directly generated and output, which is specifically described in the method shown in fig. 6 below; after adjusting the parameters of the linear regression model, the process may return to step S20 to perform the loop.

When the second residual sequence information has periodicity as a result of the check in step S50, it is necessary to further extract nonlinear information representing the cycle and lag phase of the express delivery industry, and therefore step S70 is performed.

FIG. 4 is a diagram illustrating the cycle effect and lag information extracted in the method shown in FIG. 2. As shown in fig. 4, the periodic effect and the lag phase information are extracted from the second residual sequence information by a state space model.

And finally, generating and outputting a second prediction result by combining second linear sequence information representing long-term trends, weather and other exogenous impacts and periodic effect and lag period information representing the period and the lag period of the express industry. FIG. 5 is a diagram illustrating comparison between the second prediction result and the real dispatch amount output by the method shown in FIG. 2. As shown in fig. 5, the solid line in fig. 5 is the second prediction result obtained by the method, and the dotted line is the real quantity, and a good long-term prediction effect can be achieved by using the quantity prediction method shown in fig. 2.

FIG. 6 is a flow chart of a preferred embodiment of the method shown in FIG. 2.

As shown in fig. 6, in a preferred embodiment, the method further includes:

if the second residual sequence information does not have periodicity as a result of the check in step S50, the process proceeds to step S60: and optimizing the linear regression model to generate and output a first prediction result.

Specifically, when the second residual sequence information does not have periodicity, the non-linear information representing the periodicity and the lag phase cannot be extracted therefrom, so step S60 is performed, the linear regression model is optimized to further fully extract the linear sequence information in the second residual sequence, and finally, the first prediction result is generated according to the linear sequence information extracted in step S60 and is output. The specific optimization method is to further establish linear regression of the second quantity sequence and more comprehensive express industry characteristics.

Fig. 7 is a flow chart of step S10 in a preferred embodiment of the method shown in fig. 2.

As shown in fig. 7, in a preferred embodiment, step S10 includes:

s13: acquiring historical data, and storing the historical data as an original data set;

s15: importing an original data set into a holiday template to generate a data set marked with a plurality of express industry characteristics;

s17: the data set is divided into a training set, a validation set, and a test set.

Specifically, in step S13, the history data is stored as a character-divided text file (csv); in step S15, reading the csv file generated in step S13 and the preset holiday template by using a read.csv () function of the R language, and adjusting the generated data set according to actual requirements, for example, extracting a month corresponding to each row as a new column according to a date; in step S17, the data set generated in step S15 is divided into a training set, a validation set, and a test set. In further embodiments, different programming languages may also be used to implement the data set configuration process of steps S13-S17 described above.

Fig. 8 is a flowchart of a preferred embodiment of step S10 shown in fig. 7.

As shown in fig. 8, in a preferred embodiment, before step S13, the method further includes:

s11: and generating a holiday template marked with a plurality of express industry characteristics.

In particular, the long-term trend characteristics include at least one of: a daily trend index, a weekly trend index, a monthly trend index, a quarterly trend index, a yearly trend index, a periodic trend index;

the holiday characteristics include at least one of: the name of a holiday, whether the holiday is before, after, and whether the holiday is a rest day;

the shopping node features include at least one of: the name of the shopping node, whether the shopping node is purchased, whether the shopping node is in front of the shopping node and whether the shopping node is behind the shopping node;

other exogenous impact characteristics include at least one of: weather, whether a branch is on a workday, whether a particular customer is on a workday.

Fig. 9 is a flow chart of step S30 in a preferred embodiment of the method shown in fig. 2. Fig. 10 is a power spectrum generated in the method of fig. 9.

As shown in fig. 9 and 10, in a preferred embodiment, the step S30 filters the period in the first residual sequence information through a power spectrogram, specifically including:

s31: obtaining first residual sequence information according to the first quantity sequence and the first linear sequence information;

s33: generating a power spectrogram according to the first residual sequence information so as to screen out power spectrogram cycles in a plurality of preset ranges;

s35: and generating a period pool according to each power spectrogram period and a plurality of common periods of the express industry.

In further embodiments, other different periodic screening methods known in the art may also be used.

As shown in fig. 11, in a preferred embodiment, step S50 includes:

s51: obtaining second residual sequence information according to the second quantity sequence and the second linear sequence information;

s53: generating a plurality of periodic sequences according to the periodic pool, arranging and combining the periodic sequences, and screening out a plurality of periodic sequence combinations with the minimum average absolute percent error (MAPE) value in the periodic sequence combinations;

s55: judging whether MAPE of each selected cycle sequence combination is the same: if yes, the second residual sequence information has no periodicity; otherwise, the second residual sequence information has periodicity.

Specifically, in step S53, the period sequence refers to the sequence information of the period extracted from the second residual sequence information according to the period in the period pool, for example, if the period pool includes 8 periods, 8 period sequences may be extracted from the second residual sequence information according to 8 periods respectively. Then, the cyclic sequence of no more than 3 items is used as a group (namely 1 item can be used as a group, or 2 items can be used as a group, or 3 items can be used as a group), and the cyclic sequence is obtained by permutation and combination

Item period sequence combinations. Then, 5 combinations of cyclic sequences with the smallest MAPE were selected from the 92 combinations of cyclic sequences.

In step S55, it is determined whether the second residual sequence information has periodicity by comparing whether MAPEs of the 5 cyclic sequence combinations selected in step S53 are identical. That is, after the linear information is extracted, if the MAPE of the 5 periodic sequence combinations is still the same, it indicates that the remaining second residual sequence information has no periodicity, otherwise it has periodicity.

In the above embodiment, the period pool includes 8 periods, the permutation and combination of the period sequence does not exceed 3 items, and finally 5 items are screened for comparison.

As shown in fig. 12, in a preferred embodiment, step S70 includes:

s71: extracting periodic effect and lag phase information from the second residual sequence information through a state space model;

s73: generating third residual sequence information according to the second residual sequence information and the periodic effect and lag phase information;

s75: performing white noise test on the third residual sequence information:

if the verification is passed, step S77 is executed: generating and outputting a second prediction result according to the second linear sequence information and the periodic effect and lag phase information;

if the check fails, step S79 is executed: and returning to the optimized linear regression model, or extracting error information from the third residual sequence information, and combining the second linear sequence information with the periodic effect and the lag phase information to generate and output a third prediction result.

Specifically, in step S75, it is determined through white noise check whether the effective information in the second component sequence has been completely extracted by the information extraction in the first two steps, and if the check is successful, it indicates that the effective information has been completely extracted, so as to generate a prediction result; if the check fails, the effective information is not completely extracted.

In case of failure of the verification, there are two cases in practical application: in one case, it can be confirmed by comparing the weather information that the weather information is caused by unpredictable factors such as extreme abnormal weather (for example, typhoon), and the error information can be directly extracted from the third residual sequence information, and the prediction result can be generated by combining the error information;

in another case, after it is determined that the result is not due to unpredictable factors such as extreme abnormal weather, the process returns to step S20 to optimize the parameters of the linear regression model.

Further, when the verification is successful, the training set and the verification set may be merged, and the process returns to step S40 to perform a loop, that is, on the premise that the model verification is successful, the prediction may be performed by using more comprehensive data.

FIG. 13 is a flow chart of a preferred embodiment of the method of FIG. 2.

As shown in fig. 13, in a preferred embodiment, the method further comprises:

s80: and evaluating the predicted effect according to the third component sequence of the test set and the Mean Absolute Percent Error (MAPE) value of the output prediction result.

Specifically, a smaller MAPE indicates a smaller error, and the prediction is more effective. MAPE is well interpretable and represents the ratio of error to true value. The prediction effect of the prediction method using the state space model and the prediction method using other models commonly used in the field on the same data sample are compared by taking MAPE as a judgment standard:

therefore, the component prediction method provided by the patent can achieve a very excellent prediction effect.

Fig. 14 is a schematic structural diagram of a component prediction apparatus according to an embodiment of the present invention. The apparatus of fig. 14 may correspondingly perform the methods of fig. 1-5.

As shown in fig. 13, in the present embodiment, the present invention provides a component prediction apparatus including a data set configuration unit 10, a first linear extraction unit 20, a cycle pool generation unit 30, a second linear extraction unit 40, a periodicity verification unit 50, and a second prediction unit 70.

The data set configuration unit 10 is configured to acquire historical data, generate a data set labeled with a plurality of express industry characteristics, and partition a training set and a verification set from the data set. Wherein the express industry characteristics include long-term trend characteristics, and at least one of: holiday features, shopping day features, and other exogenous impact features.

The first linear extraction unit 20 is configured to establish a linear regression model of a first quantity sequence of the training set and express industry characteristics, and extract first linear sequence information.

The periodic pool generating unit 30 is configured to obtain first residual sequence information according to the first quantitative sequence and the first linear sequence information, and generate a periodic pool according to the first residual sequence information.

The second linear extraction unit 40 is configured to input the second sequence of quantities of the verification set into the linear regression model to extract second linear sequence information.

The periodicity verifying unit 50 is configured to obtain second residual sequence information according to the second component sequence and the second linear sequence information, and extract a number of periodic sequences of the second residual sequence information according to the periodic pool to verify whether the second residual sequence information has periodicity.

The second prediction unit 70 is configured to extract periodic effect and lag phase information from the second residual sequence information through the state space model when the second residual sequence information has periodicity, and generate and output a second prediction result according to the second linear sequence information and the periodic effect and lag phase information.

The prediction principle of the above device specifically refers to the method shown in fig. 1-5, and is not described herein again.

Fig. 15 is a schematic structural view of a preferred embodiment of the apparatus shown in fig. 14. The apparatus shown in fig. 15 may correspondingly perform any of the methods shown in fig. 6-13.

As shown in fig. 15, in a preferred embodiment, the quantity prediction apparatus further includes a first prediction unit 60. The first prediction unit 60 is configured to optimize the linear regression model when the second residual sequence information does not have periodicity, generate and output a first prediction result.

In a preferred embodiment, the data set configuration unit 10 comprises a data acquisition subunit 13, a data set generation subunit 15 and a data set partitioning subunit 17.

The data acquisition subunit 13 is configured to acquire history data, and store the history data as an original data set.

The data set generating subunit 15 is configured to import the original data set into a holiday template, and generate a data set labeled with a plurality of express industry characteristics.

The data set partitioning subunit 17 is configured to partition the data set into a training set, a validation set, and a test set.

Further preferably, the data set configuration unit 10 further includes a template generation subunit 11 configured to generate a holiday template labeled with several express industry characteristics.

The configuration principle of the data set configuration unit 10 can be referred to the methods shown in fig. 7-8, respectively, and will not be described herein again.

As also shown in fig. 15, in a preferred embodiment, the period pool generating unit 30 includes a first residual extracting sub-unit 31, a power spectrogram period screening sub-unit 33, and a period pool generating sub-unit 35.

The first residual extraction subunit 31 is configured to obtain first residual sequence information from the first quantitative sequence and the first linear sequence information.

The power spectrum period screening subunit 33 is configured to generate a power spectrum according to the first residual sequence information to screen out power spectrum periods within a number of predetermined ranges.

The cycle pool generation subunit 35 is configured to generate a cycle pool according to each power spectrum cycle and a plurality of common cycles in the express delivery industry.

The generation principle of the period pool generation unit 30 can refer to the methods shown in fig. 9-10, and will not be described herein.

As also shown in fig. 15, in a preferred embodiment, the periodic inspection unit 50 includes a second residual extraction subunit 51, a screening subunit 53, and a judgment subunit 55.

The second residual extraction subunit 51 is configured to obtain second residual sequence information according to the second component sequence and the second linear sequence information.

The screening subunit 53 is configured to generate a plurality of periodic sequences according to the periodic pool, arrange and combine the periodic sequences, and screen out a plurality of periodic sequence combinations with the minimum Mean Absolute Percent Error (MAPE) in each periodic sequence combination.

The judging subunit 55 is configured to judge whether the MAPEs of the screened combinations of the periodic sequences are the same: if yes, the second residual sequence information has no periodicity; otherwise, the second residual sequence information has periodicity.

The verification principle of the periodic verification unit 50 can refer to the method shown in fig. 11, and is not described herein again.

As also shown in fig. 15, in a preferred embodiment, the second prediction unit 70 includes an extraction sub-unit 71, a third residual extraction sub-unit 73, a white noise check sub-unit 75, a prediction sub-unit 77, and an optimization sub-unit 79.

The extraction subunit 71 is configured to extract the periodicity effect and the lag phase information from the second residual sequence information through a state space model.

The third residual extraction subunit 73 is configured to generate third residual sequence information from the second residual sequence information and the periodicity effect and lag phase information.

The white noise check subunit 75 is configured to perform white noise check on the third residual sequence information.

The predicting subunit 77 is configured to generate and output a second prediction result according to the second linear sequence information and the periodic effect and lag period information when the white noise check passes.

The optimization subunit 79 is configured to optimize the linear regression model when the white noise check fails, or extract error information from the third residual sequence information, so that the prediction subunit 77 generates and outputs a third prediction result according to the second linear sequence information, the periodic effect, the lag phase information, and the error information.

The prediction principle of the second prediction unit 70 can refer to the method shown in fig. 12, and is not described herein again.

As also shown in FIG. 15, in a preferred embodiment, the apparatus further comprises an evaluation unit 90.

The evaluation unit 90 is configured to evaluate the prediction effect based on the third sequence of components of the test set and a mean absolute percentage error value (MAPE) of the outputted prediction result.

The evaluation principle of the evaluation unit 90 can refer to the method shown in fig. 13, and is not described herein again.

As shown in fig. 16, as another aspect, the present application also provides an electronic device 1600 including one or more Central Processing Units (CPUs) 1601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1602 or a program loaded from a storage portion 1608 into a Random Access Memory (RAM) 1603. In the RAM1603, various programs and data necessary for the operation of the electronic apparatus 1600 are also stored. The CPU1601, ROM1602, and RAM1603 are connected to one another via a bus 1604. An input/output (I/O) interface 1605 is also connected to the bus 1604.

The following components are connected to the I/O interface 1605: an input portion 1606 including a keyboard, a mouse, and the like; an output portion 1607 including a display device such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 1608 including a hard disk and the like; and a communication section 1609 including a network interface card such as a LAN card, a modem, or the like. The communication section 1609 performs communication processing via a network such as the internet. The driver 1610 is also connected to the I/O interface 1605 as needed. A removable medium 1611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1610 as necessary, so that a computer program read out therefrom is mounted in the storage portion 1608 as necessary.

In particular, according to an embodiment of the present disclosure, the component prediction method described in any of the above embodiments may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing a component prediction method. In such embodiments, the computer program may be downloaded and installed from a network via the communication portion 1609, and/or installed from the removable media 1611.

As yet another aspect, the present application also provides a computer-readable storage medium, which may be the computer-readable storage medium included in the apparatus of the above-described embodiment; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium stores one or more programs for use by one or more processors in performing the method for component prediction described herein.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units or modules described in the embodiments of the present application may be implemented by software or hardware. The described units or modules may also be provided in a processor, for example, each of the described units may be a software program provided in a computer or a mobile intelligent device, or may be a separately configured hardware device. Wherein the designation of such a unit or module does not in some way constitute a limitation on the unit or module itself.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the present application. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A method for predicting a quantity, comprising:

acquiring historical data, generating a data set marked with a plurality of express industry characteristics, and dividing a training set and a verification set from the data set; the express industry features include long-term trend features, and at least one of: holiday features, shopping day features, and other exogenous impact features;

establishing a linear regression model of a first express quantity sequence of the training set and the express industry characteristics, and extracting first linear sequence information;

obtaining first residual sequence information according to the first quantitative sequence and the first linear sequence information, and generating a periodic pool according to the first residual sequence information;

inputting a second sequence of components of the validation set into the linear regression model to extract second linear sequence information;

if yes, extracting periodic effect and lag phase information from the second residual sequence information through a state space model, and generating and outputting a second prediction result according to the second linear sequence information and the periodic effect and lag phase information.

2. The method of claim 1, wherein the obtaining historical data, generating a dataset labeled with express industry characteristics, and partitioning a training set and a validation set from the dataset comprises:

acquiring historical data, and storing the historical data as an original data set;

importing the original data set into a holiday template to generate a data set marked with a plurality of express industry characteristics;

the data set is divided into a training set, a validation set, and a test set.

3. The method of claim 2, wherein the obtaining historical data, prior to storing the historical data as a raw data set, further comprises:

generating a holiday template marked with a plurality of express industry characteristics;

the long-term trend characteristics include at least one of: a daily trend index, a weekly trend index, a monthly trend index, a quarterly trend index, a yearly trend index, a periodic trend index;

the holiday characteristics include at least one of: the name of the holiday, whether the holiday is before holiday, whether the holiday is after holiday, and whether the holiday is a rest day;

the shopping node features include at least one of: the name of the shopping node, whether the shopping node is available, whether the shopping node is in front of the shopping node or not, and whether the shopping node is behind the shopping node or not;

the other exogenous impact characteristics include at least one of: weather, whether a branch is on a workday, whether a particular customer is on a workday.

4. The method according to any one of claims 1-3, further comprising:

and when the second residual sequence information does not have periodicity, optimizing the linear regression model, generating a first prediction result and outputting the first prediction result.

5. The method according to any of claims 1-3, wherein the obtaining first residual sequence information from the first quantitative sequence and the first linear sequence information, and generating a periodic pool from the first residual sequence information comprises:

obtaining first residual sequence information according to the first quantity sequence and the first linear sequence information;

generating a power spectrogram according to the first residual sequence information so as to screen out power spectrogram cycles in a plurality of preset ranges;

and generating a period pool according to each power spectrogram period and a plurality of common periods of the express industry.

6. The method according to any of claims 1-3, wherein said extracting a number of periodic sequences of the second residual sequence information from the periodic pool to check whether the second residual sequence information has periodicity comprises:

generating a plurality of periodic sequences according to the periodic pool, arranging and combining the periodic sequences, and screening out a plurality of periodic sequence combinations with the minimum average absolute percentage error value MAPE in the periodic sequence combinations;

judging whether MAPE of each selected cycle sequence combination is the same:

if yes, the second residual sequence information has no periodicity;

and if not, the second residual error sequence information has periodicity.

7. The method according to any one of claims 1 to 3, wherein the extracting, by the state space model, the periodic effect and the lag phase information from the second residual sequence information, and generating and outputting a second prediction result according to the second linear sequence information and the periodic effect and the lag phase information comprises:

extracting periodic effect and lag phase information from the second residual sequence information through a state space model;

generating third residual sequence information according to the second residual sequence information and the periodic effect and lag phase information;

performing white noise test on the third residual sequence information:

if the verification is passed, generating a second prediction result according to the second linear sequence information and the periodic effect and lag phase information and outputting the second prediction result;

and if the detection fails, returning to optimize the linear regression model, or extracting error information from the third residual sequence information, combining the second linear sequence information and the periodic effect and lag phase information to generate and output a third prediction result.

8. The method of claim 2 or 3, further comprising:

and evaluating the prediction effect according to the third component sequence of the test set and the average absolute percentage error value MAPE of the output prediction result.

9. A quantity prediction apparatus, comprising:

the system comprises a data set configuration unit, a verification unit and a data processing unit, wherein the data set configuration unit is used for acquiring historical data, generating a data set marked with a plurality of express industry characteristics, and dividing a training set and a verification set from the data set; the express industry features include long-term trend features, and at least one of: holiday features, shopping day features, and other exogenous impact features;

the first linear extraction unit is configured to establish a linear regression model of a first quantity sequence of the training set and the express industry characteristics, and extract first linear sequence information;

the cycle pool generating unit is configured to obtain first residual sequence information according to the first quantity sequence and the first linear sequence information, and generate a cycle pool according to the first residual sequence information;

a second linear extraction unit configured to input a second sequence of components of the verification set into the linear regression model to extract second linear sequence information;

a periodicity checking unit configured to obtain second residual sequence information according to the second component sequence and the second linear sequence information, and extract a plurality of periodic sequences of the second residual sequence information according to the periodic pool to check whether the second residual sequence information has periodicity;

and the second prediction unit is configured to extract periodic effect and lag phase information from the second residual sequence information through a state space model when the second residual sequence information is periodic, and generate and output a second prediction result according to the second linear sequence information and the periodic effect and lag phase information.

10. The apparatus of claim 9, wherein the data set configuration unit comprises:

the data acquisition subunit is configured to acquire historical data and store the historical data as an original data set;

the data set generating subunit is configured to import the original data set into a holiday template to generate a data set marked with a plurality of express industry characteristics;

and the data set dividing subunit is configured to divide the data set into a training set, a verification set and a test set.

11. The apparatus of claim 10, wherein the data set configuration unit further comprises:

the template generating subunit is configured to generate a holiday template marked with a plurality of express industry characteristics;

the holiday features include at least one of: the name of a holiday, whether the holiday is before, after, and whether the holiday is a rest day;

12. The apparatus of any one of claims 9-11, further comprising:

and the first prediction unit is configured to optimize the linear regression model when the second residual sequence information does not have periodicity, generate a first prediction result and output the first prediction result.

13. The apparatus according to any one of claims 9-11, wherein the periodic pool generating unit comprises:

a first residual extraction subunit configured to obtain first residual sequence information according to the first quantity sequence and the first linear sequence information;

a power spectrogram period screening subunit configured to generate a power spectrogram according to the first residual sequence information to screen out power spectrogram periods within a plurality of predetermined ranges;

and the cycle pool generation subunit is configured to generate cycle pools according to the power spectrogram cycles and the common cycles of the plurality of express industries.

14. The apparatus of any one of claims 9-11, wherein the periodic inspection unit comprises:

a second residual extraction subunit, configured to obtain second residual sequence information according to the second component sequence and the second linear sequence information;

the screening subunit is configured to generate a plurality of periodic sequences according to the periodic pool, arrange and combine the periodic sequences, and screen out a plurality of periodic sequence combinations with the minimum average absolute percentage error value MAPE in each periodic sequence combination;

a judging subunit configured to judge whether the MAPEs of the screened periodic sequence combinations are the same:

if yes, the second residual sequence information has no periodicity;

and if not, the second residual error sequence information has periodicity.

15. The apparatus according to any of claims 9-11, wherein the second prediction unit comprises:

the extraction subunit is configured to extract periodic effect and lag phase information from the second residual sequence information through a state space model;

a third residual extraction subunit, configured to generate third residual sequence information according to the second residual sequence information and the periodic effect and lag phase information;

a white noise checking subunit configured to perform white noise checking on the third residual sequence information;

the predicting subunit is configured to generate and output a second prediction result according to the second linear sequence information and the periodic effect and lag phase information when the white noise check passes;

and the optimization subunit is configured to optimize the linear regression model when the white noise test fails, or extract error information from the third residual sequence information, so that the prediction subunit generates and outputs a third prediction result according to the second linear sequence information, the periodic effect and lag phase information and the error information.

16. The apparatus of claim 10 or 11, further comprising:

and the evaluation unit is configured to evaluate the prediction effect according to the third component sequence of the test set and the average absolute percentage error value MAPE of the output prediction result.

17. An electronic device, characterized in that the electronic device comprises:

one or more processors;

a memory for storing one or more programs,

the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method recited in any of claims 1-8.

18. A storage medium storing a computer program, characterized in that the program, when executed by a processor, implements the method according to any one of claims 1-8.