CN111401940B

CN111401940B - Feature prediction method, device, electronic equipment and storage medium

Info

Publication number: CN111401940B
Application number: CN202010148299.0A
Authority: CN
Inventors: 王迪; 肖伟集; 朱旭律; 杨杰
Original assignee: Hangzhou Netease Zaigu Technology Co Ltd
Current assignee: Hangzhou Netease Zaigu Technology Co Ltd
Priority date: 2020-03-05
Filing date: 2020-03-05
Publication date: 2023-07-04
Anticipated expiration: 2040-03-05
Also published as: CN111401940A

Abstract

The application discloses a feature prediction method, a device, electronic equipment and a storage medium to improve prediction accuracy, wherein the method comprises the following steps: training the feature prediction model according to sample data of a plurality of products to obtain a trained feature prediction model, obtaining the reference time and the prediction time of the target product, and obtaining a historical feature sequence and a historical association feature sequence of the target product according to the reference time; acquiring a future time sequence characteristic sequence of a target product in a reference time to a predicted time; and inputting the historical feature sequence, the historical associated feature sequence and the future time sequence feature sequence into a trained feature prediction model to obtain a predicted value of the specified feature. Thus, the feature prediction model is obtained according to the sample data training of the products, and feature prediction is carried out through the feature data of the dimensions of the target products and the feature prediction model, so that the data volume and coverage of the input data are enlarged, and the accuracy of the feature prediction is improved.

Description

Feature prediction method, device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a feature prediction method, a feature prediction device, an electronic device, and a storage medium.

Background

This section is intended to provide a background or context for embodiments of the present application that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.

In recent years, with the development of the internet, the retail industry has also been rapidly developed. For better enterprise decisions, accurate predictions of product sales and like characteristics are often required.

Most of traditional prediction modes have higher requirements on data continuity, data volume, stability and the like, cannot adapt to complex service scenes, and most of data requirements and service scenes cannot be well met, so that the error of the prediction of the traditional prediction modes is larger, and an ideal prediction effect is difficult to obtain.

Thus, how to improve the accuracy of prediction is a problem to be solved.

Disclosure of Invention

In view of the foregoing technical problems, there is a great need for an improved method to increase the accuracy of feature predictions.

In one aspect, an embodiment of the present application provides a feature prediction method, including:

acquiring reference time and prediction time of a target product, wherein the reference time is used for dividing historical data and future data;

Acquiring a historical characteristic sequence and a historical association characteristic sequence of a target product according to the reference time;

acquiring a future time sequence characteristic sequence of a target product in a reference time to a predicted time;

and inputting the historical feature sequence, the historical associated feature sequence and the future time sequence feature sequence into a trained feature prediction model to obtain a predicted value of the specified feature, wherein the feature prediction model is obtained by training according to sample data of a plurality of products.

Preferably, according to the reference time, acquiring a historical feature sequence and a historical association feature sequence of the target product includes:

acquiring a characteristic value of a designated characteristic of a target product before a reference time;

according to the obtained characteristic values of the appointed characteristic, forming a historical characteristic sequence;

acquiring a characteristic value of an associated characteristic of a designated characteristic of a target product before a reference time;

and according to the obtained characteristic values of the associated characteristics, forming a historical associated characteristic sequence.

Preferably, the step of obtaining a future time sequence characteristic sequence of the target product from the reference time to the predicted time includes:

acquiring characteristic values of future time sequence characteristics of a target product from a reference time to a predicted time, wherein each characteristic value of the future time sequence characteristics is preset;

And forming a future time sequence characteristic sequence according to each characteristic value of the future time sequence characteristic.

Preferably, before inputting the historical feature sequence, the historical associated feature sequence and the future time sequence feature sequence into the trained feature prediction model to obtain the predicted value of the specified feature, the method further comprises:

filling the history associated feature sequence and the future time sequence feature sequence of the data deletion by adopting a designated filling value so as to update the history associated feature sequence and the future time sequence feature sequence;

for each filled history associated feature sequence and each future time sequence feature sequence, a corresponding auxiliary feature sequence is generated, wherein the auxiliary feature sequence is used for indicating whether elements in the filled history associated feature sequence or the future time sequence feature sequence are filled.

acquiring static characteristic values corresponding to all static characteristics of a target product, wherein the static characteristics are characteristics related to the product and irrelevant to time;

according to the obtained static characteristic values, forming a static characteristic sequence of the target product;

Inputting the historical feature sequence, the historical associated feature sequence and the future time sequence feature sequence into a trained feature prediction model to obtain a predicted value of a specified feature, wherein the method comprises the following steps:

and inputting the historical feature sequence, the historical associated feature sequence, the future time sequence feature sequence, the auxiliary feature sequence and the static feature sequence into a feature prediction model to obtain a predicted value of the specified feature.

Preferably, before the reference time and the predicted time of the target product are obtained, the method further comprises:

and training the feature prediction model according to the historical feature sample sequence, the historical associated feature sample sequence, the future time sequence feature sample sequence, the auxiliary feature sequence, the predicted sample value and the static feature sequence of each product to obtain a trained feature prediction model.

Preferably, training the feature prediction model according to the historical feature sample sequence, the historical associated feature sample sequence, the future time sequence feature sample sequence, the auxiliary feature sequence, the predicted sample value and the static feature sequence of each product to obtain a trained feature prediction model, including:

combining every two sampling times in a set time period respectively to obtain corresponding binary groups, wherein the first sampling time contained in the binary groups is earlier than the second sampling time;

Taking each binary group of each product as a sample point, taking the first sampling time contained in the binary group as the reference time of the corresponding sample point, and taking the second sampling time contained in the binary group as the prediction time of the corresponding sample point;

according to the reference time and the prediction time corresponding to each sample point, determining a historical characteristic sample sequence, a historical correlation characteristic sample sequence, a future time sequence characteristic sample sequence, an auxiliary characteristic sequence, a prediction sample value and a static characteristic sequence corresponding to each sample point;

and training the feature prediction model according to the historical feature sample sequence, the historical associated feature sample sequence, the future time sequence feature sample sequence, the auxiliary feature sequence, the predicted sample value and the static feature sequence corresponding to each sample point to obtain a trained feature prediction model.

In one aspect, an embodiment of the present application provides a feature prediction apparatus, including:

the first acquisition unit is used for acquiring reference time and prediction time of a target product, wherein the reference time is used for dividing historical data and future data;

the second acquisition unit is used for acquiring a historical characteristic sequence and a historical association characteristic sequence of the target product according to the reference time;

The third acquisition unit is used for acquiring a future time sequence characteristic sequence of the target product in the range from the reference time to the prediction time;

the prediction unit is used for inputting the historical feature sequence, the historical associated feature sequence and the future time sequence feature sequence into a trained feature prediction model to obtain a predicted value of a specified feature, wherein the feature prediction model is obtained by training according to sample data of a plurality of products.

Preferably, the second obtaining unit is configured to:

Preferably, the third obtaining unit is configured to:

Preferably, the prediction unit is further configured to:

filling the historical association characteristic sequence and the future time sequence characteristic sequence of the data deletion by adopting a designated filling value so as to update the historical association characteristic sequence and the future time sequence characteristic sequence of the data deletion;

Preferably, the prediction unit is further configured to:

the prediction unit is used for:

Preferably, the first obtaining unit is further configured to:

In one aspect, an embodiment of the present application provides a computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of any of the methods described above.

In one aspect, an embodiment of the present application provides a computer program product comprising a computer program stored on a computer readable storage medium, the computer program comprising program instructions which, when executed by a processor, implement the steps of any of the methods described above.

According to the feature prediction method, the device, the electronic equipment and the storage medium, the feature prediction model is trained according to sample data of a plurality of products, and a trained feature prediction model is obtained; acquiring reference time and prediction time of a target product, and acquiring a historical characteristic sequence and a historical association characteristic sequence of the target product according to the reference time; acquiring a future time sequence characteristic sequence of a target product in a reference time to a predicted time; and inputting the historical feature sequence, the historical associated feature sequence and the future time sequence feature sequence into a trained feature prediction model to obtain a predicted value of the specified feature. Thus, the feature prediction model is obtained according to the sample data training of a plurality of products, and feature prediction is carried out through the feature data of a plurality of dimensions of the target product and the feature prediction model, so that the data volume and coverage of input data are enlarged, the data quality requirement is reduced, and the accuracy of feature prediction is improved.

Drawings

The above, as well as additional purposes, features, and advantages of exemplary embodiments of the present application will become readily apparent from the following detailed description when read in conjunction with the accompanying drawings. Several embodiments of the present application are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings, in which:

FIG. 1 is a flowchart of a method for training a feature prediction model according to an embodiment of the present application;

FIG. 2 is an exemplary diagram of a binary set of data according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating a feature prediction method according to an embodiment of the present disclosure;

FIG. 4 is a sample point illustration provided in accordance with one embodiment of the present application;

FIG. 5 is a schematic structural diagram of a feature prediction apparatus according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The principles and spirit of the present application will be described below with reference to several exemplary embodiments. It should be understood that these embodiments are presented merely to enable one skilled in the art to better understand and practice the present application and are not intended to limit the scope of the present application in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Those skilled in the art will appreciate that embodiments of the present application may be implemented as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the following forms, namely: complete hardware, complete software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.

In this document, it should be understood that any number of elements in the drawings is for illustration and not limitation, and that any naming is used only for distinction and not for any limitation.

For ease of understanding, the terms referred to in the embodiments of the present application are explained below:

the terminal device can be used for installing various applications and displaying objects provided in the installed applications, and can be mobile or fixed. Desktop computers, mobile phones, mobile computers, tablet computers, media players, smart wearable devices, smart televisions, vehicle-mounted devices, personal digital assistants (personal digital assistant, PDAs), point of sale (POS) or other electronic devices capable of achieving the above functions, and the like.

Reference time: for dividing historical data and future data. The data before the reference time is the history data, and the data after the reference time is the future data.

Sample points: representing a binary group of a product, and when the products are different or the binary groups are different, the corresponding sample points are different.

Future timing characteristics: typically a characteristic of a planned or well-defined regular change in data.

Static characteristics: is a product-related and time-independent feature.

Auxiliary feature sequence: for indicating whether an element in the filled feature sequence is filled.

Extreme gradient lifting (eXtreme Gradient Boosting, XGBoost) model: is a decision tree model, which is itself a general machine learning algorithm.

Feature prediction model: the predicted value of the specified feature for predicting the product at the predicted time is obtained by training based on the sample data of each product. The feature prediction model may be obtained by training an XGBoost model, or may be another model, which is not limited herein.

The principles and spirit of the present application are explained in detail below with reference to several representative embodiments thereof.

Summary of The Invention

The inventor of the application finds that most of traditional prediction modes have higher requirements on data and application scenes of products, and most of data requirements and business scenes cannot well meet the requirements, so that errors of prediction results are larger, and ideal prediction effects are difficult to obtain. Data requirements such as data volume requirements, data continuity requirements, data stability requirements, data singleness requirements, etc., and business scenario requirements such as simple business scenario, etc.

In order to solve the above problems, an embodiment of the present application provides a feature prediction method, which specifically includes: training the feature prediction model according to sample data of a plurality of products to obtain a trained feature prediction model; acquiring reference time and prediction time of a target product, and acquiring a historical characteristic sequence and a historical association characteristic sequence of the target product according to the reference time; acquiring a future time sequence characteristic sequence of a target product in a reference time to a predicted time; and inputting the historical feature sequence, the historical associated feature sequence and the future time sequence feature sequence into a trained feature prediction model to obtain a predicted value of the specified feature. Thus, the feature prediction model is obtained according to the sample data training of a plurality of products, and feature prediction is carried out through the feature data of a plurality of dimensions of the target product and the feature prediction model, so that the data volume and coverage of input data are enlarged, the data quality requirement is reduced, and the accuracy of feature prediction is improved.

Having described the basic principles of the present application, various non-limiting embodiments of the present application are specifically described below.

Exemplary method

In order to further explain the technical solutions provided in the embodiments of the present application, the following details are described with reference to the accompanying drawings and the detailed description. Although the embodiments of the present application provide the method operational steps as shown in the following embodiments or figures, more or fewer operational steps may be included in the method based on routine or non-inventive labor. In steps where there is logically no necessary causal relationship, the execution order of the steps is not limited to the execution order provided by the embodiments of the present application. The methods may be performed sequentially or in parallel as shown in the embodiments or the drawings when the actual processing or the apparatus is performed.

In the embodiment of the present application, the characteristics are sales, and the sales of the product is predicted as an example, and in practical application, the characteristics of the product may also be set according to practical application, which is not limited herein.

In this embodiment of the present application, before feature prediction is performed on a product, a feature prediction model is trained first to obtain a trained feature prediction model, and referring to fig. 1, a flowchart of an implementation of a method for training a feature prediction model is shown, where the specific flow of the method is as follows:

step 101: the control equipment respectively combines every two sampling times in the set time period to obtain corresponding binary groups.

Specifically, when executing step 101, the control device executes the following steps for each sampling time within the set period of time, respectively:

the control device combines the sampling time with each sampling time after the sampling time respectively to obtain a binary group consisting of two sampling times.

It should be noted that the set time period and the sampling time may be set according to the actual application scenario, which is not limited herein. The doublet consists of two sampling times, the first sampling time being earlier than the second sampling time in the doublet.

For example, referring to FIG. 2, which is an exemplary graph of two-tuple data, a set time period of 1 month No. 1 to 1 month No. 5 may result in 10 two tuples, (1.1, 1.2), (1.1, 1.3), (1.1,1.4), (1.1,1.5), (1.2, 1.3), (1.2,1.4), (1.2,1.5), (1.3, 1.4), (1.3,1.5), and (1.4, 1.5).

Thus, the longer the set period, the smaller the interval between sampling times, and the larger the amount of data of the generated binary group.

Step 102: the control device takes each binary group of each product as a sample point, and determines the reference time and the prediction time of the corresponding sample point according to the binary groups.

Specifically, the control device takes each of the two groups of each product as one sample point, takes the first sampling time contained in the two groups as the reference time of the corresponding sample point, and takes the second sampling time contained in the two groups as the prediction time of the corresponding sample point.

The reference time is used for dividing the historical data and the future data corresponding to the sample point, and can be represented by today (today), and the predicted time can be represented by a target (target).

It should be noted that, the products are different, or the two groups are different, and are different sample points, that is, one sample point corresponds to one product, the reference time and the predicted time.

For example, referring to FIG. 2, which is an exemplary graph of two-tuple data, a set time period of 1 month No. 1 to 1 month No. 5, 10 two-tuple, i.e., 10 sample points, (1.1, 1.2), (1.1, 1.3), (1.1,1.4), (1.1,1.5), (1.2, 1.3), (1.2,1.4), (1.2,1.5), (1.3, 1.4), (1.3,1.5), and (1.4, 1.5) may be generated for a product.

It should be noted that, the more the number of sample points is, the more general the feature prediction model can learn, and the better the prediction effect is, in this embodiment of the present application, each sampling time is combined, the longer the set time period is, the more the data amount of the generated sample points is, so that a large number of sample points can be generated in a short time, and the problem of insufficient data amount can be solved.

Step 103: the control device determines a historical characteristic sample sequence, a historical correlation characteristic sample sequence, a future time sequence characteristic sample sequence, a predicted sample value and a static characteristic sequence corresponding to each sample point according to the reference time and the predicted time corresponding to each sample point.

Specifically, when executing step 103, the control device may execute the following steps for each sampling point:

s1031: and the control equipment acquires a historical characteristic sample sequence and a historical associated characteristic sample sequence of the product corresponding to the sampling point according to the reference time corresponding to the sampling point.

Specifically, the control device obtains a characteristic value of a specified characteristic of a product corresponding to a sampling point before a reference time corresponding to the sampling point, and forms a historical characteristic sample sequence according to each obtained characteristic value of the specified characteristic. The control equipment acquires the characteristic value of the associated characteristic of the specified characteristic of the product corresponding to the sampling point before the reference time corresponding to the sampling point, and forms a history associated characteristic sequence according to the acquired characteristic values of the associated characteristic.

Wherein the historical feature sample sequence consists of feature values of specified features of the product within a set period of time and before a reference time. Each historical correlated feature sample sequence consists of feature values of correlated features of the product within a set period of time and before a reference time.

The history feature sample sequence and the history-related feature sample sequence are generally data of statistical indicators of objective facts that have occurred in history. The specified feature is a feature to be predicted, such as sales. The associated features are features related to the set specified features, such as sales and flows. The associated features may be features of one dimension or may include features of multiple dimensions.

For example, one sample point is a binary group (1.3,1.5) of the product A, the designated feature is sales, the associated feature is sales, and the set time period is 1.1-1.5, then the control device generates a historical feature sample sequence according to the sales of the product A at 1.1 and 1.2, and generates a historical associated feature sample sequence according to the sales of the product A at 1.1 and 1.2.

S1032: the control equipment acquires a product corresponding to the sampling point, and a future time sequence characteristic sample sequence is obtained from the reference time corresponding to the sampling point to the prediction time corresponding to the sampling point.

Wherein each future time series characteristic sample sequence is composed of characteristic values of the future time series characteristics of the product from a reference time to a predicted time.

Wherein, each characteristic value of the future time sequence characteristic is preset to reflect the future unspecified characteristic. Future sequences of time series characteristic samples are usually planned or data with explicit regularity, i.e. data that can be explicitly known. For example, the future timing characteristic may be a future legal holiday or the like.

It should be noted that the future time sequence feature sample sequence from the reference time to the predicted time includes a future time sequence feature value corresponding to the reference time, a future time sequence feature value corresponding to the predicted time, and each future time sequence feature value corresponding to a time within the reference time and the predicted time.

S1033: the control equipment acquires the specified characteristics of the product corresponding to the sampling point, and the characteristic value of the prediction time corresponding to the sampling point is used as a prediction sample value.

The predicted sample value is a characteristic value corresponding to the specified characteristic of the product at the predicted time.

S1034: the control equipment acquires a static characteristic sequence of the product corresponding to the sampling point.

Specifically, the control device obtains a static characteristic value corresponding to a static characteristic of the product corresponding to the sampling point.

The static characteristic sequence consists of characteristic values of all static characteristics. Static features are product-related and time-independent features.

The nth static feature sequence may be expressed as s ⁿ S static feature sequence, representing n representing the dimension of the static feature. The set of static feature sequences may be denoted S. The number of the static feature sequences may be one or more.

For example, the static feature is identification information (Identity, ID) of a commodity, a category ID of a commodity, or the like.

S1035: the control equipment fills the history associated feature sample sequence and the future time sequence feature sample sequence which are lack of data so as to update the history associated feature sample sequence and the future time sequence feature sample sequence which are lack of data, and generates corresponding auxiliary feature sequences according to each filled history associated feature sample sequence and future time sequence feature sample sequence respectively.

The auxiliary feature sequence is generated according to a historical correlation feature sample sequence with data missing and a future time sequence feature sequence.

When S1035 is performed, the control apparatus may employ the steps of:

filling the history associated feature sample sequence and the future time sequence feature sample sequence which are missing by adopting a designated filling value so as to update the history associated feature sample sequence and the future time sequence feature sample sequence, and generating corresponding auxiliary feature sequences aiming at each filled history associated feature sample sequence and future time sequence feature sample sequence respectively.

The auxiliary feature sequence is used for indicating whether elements in the filled feature sequence are filled, namely, whether elements of the filled history associated feature sample sequence or the future time sequence feature sample sequence are filled.

It should be noted that, the filled feature sequence and the auxiliary feature sequence are in one-to-one correspondence, that is, each filled history associated feature sample sequence generates a corresponding auxiliary feature sequence, and each filled future time sequence feature sample sequence generates a corresponding auxiliary feature sequence.

In practical application, the specified filling value may be set according to the practical application scenario, for example, may be 0, which is not limited herein.

In one embodiment, the control device may perform the following steps for each history associated feature sample sequence:

step a: the control device determines the history feature length according to the set time period and the reference time.

The historical characteristic length represents the length of tracing back the past, and can be the number of elements contained in the historical associated characteristic sample sequence when the data is not missing.

For example, the set time period is 1.1-1.5, the reference time is 1.3, sampling is performed every other day, then the time 1.1 and 1.2 correspond to two element values, and then the history feature length is 2.

Step b: when the length of the history associated feature sample sequence is lower than the history feature length, the control device fills the history associated feature sample sequence by adopting a designated filling value, so that the length of the filled history associated feature sample sequence reaches the history feature length.

Step c: the control device generates a corresponding auxiliary feature sequence for the filled history associated feature sample sequence.

In one embodiment, the control device may perform the following steps for each future sequence of timing feature samples:

step a: the control device determines the future feature length from the reference time to the predicted time.

The future feature length represents a length going back to the future, and may be the number of elements contained in the future time sequence feature sample sequence when the data is not missing.

Step b: when the length of the future time sequence characteristic sample sequence is lower than the future characteristic length, the control equipment fills the future time sequence characteristic sample sequence by adopting a designated filling value, so that the length of the filled future time sequence characteristic sample sequence reaches the future characteristic length.

Step c: the control device generates a corresponding auxiliary feature sequence for the filled future sequence of time series feature sample sequences.

This is because in practical applications, there may be a missing data, for example, when the history feature length l1=30, only the data of the first 20 days of the commercial product on the market for 10 days is missing, so the data can be filled in the feature sequence with the missing data, and the corresponding auxiliary feature sequence can be generated according to whether the feature sequence is a filling element or not.

Step 104: the control equipment trains the feature prediction model according to the historical feature sample sequence, the historical associated feature sample sequence, the future time sequence feature sample sequence and the predicted sample value of each product to obtain a trained feature prediction model.

The feature prediction model is used for predicting a predicted value of a specified feature of a product at a prediction time and is obtained through training according to sample data of each dimension of each product.

Alternatively, the feature prediction model may be obtained by training using an XGBoost model, or other models may be used, which is not limited herein. The control device may initialize parameters in the feature prediction model in advance before executing step 104.

Furthermore, the control device may further train the feature prediction model according to the historical feature sample sequence, the historical associated feature sample sequence, the future time sequence feature sample sequence, the auxiliary feature sequence and the predicted sample value of each sampling point, to obtain a trained feature prediction model.

In one embodiment, the following steps may be used to train the feature prediction model:

s1041: the control device inputs the historical feature sample sequence, the historical associated feature sample sequence, the future time sequence feature sample sequence, the auxiliary feature sequence and the static feature sequence corresponding to the sample points into a feature prediction model, and outputs a predicted value.

In particular, the historical feature sample sequence of sample points can employ y _t1 Expressed, t1=(T-L1，T-L1+1，……，T-2，T-1)。

The set of historically correlated feature sample sequences may be represented as H _t1 T1= (T-L1, T-l1+1, … …, T-2, T-1). The ith history-associated feature sample sequence may be represented as h ⁱ ，h ⁱ _t1 The feature value of the ith associated feature at time t1 is represented.

The set of future time series feature sample sequences may be represented as F _t2 T2= (T, t+1, … …, t+l2-1, t+l2). The jth future sequence of timing feature samples may be represented as f ^j ，f ^j _t2 Representing the feature value of the jth future time series feature at time t 2.

The assist feature sequence may be represented as p ^k _t3 。

Wherein T represents a reference time, L1 represents a history feature length, L2 represents a future feature length, i represents a dimension of an associated feature, j represents a dimension of a future time sequence feature, k represents a dimension of a corresponding filled feature sequence, and T1, T2 and T3 each represent discrete times.

According to the above parameters, the predicted value Y may take the following expression:

Y＝g(y _t1 ，H _t1 ，F _t2 ，p ^k _t3 ，S)。

wherein g is a feature prediction model.

S1042: the control device determines a prediction error according to the prediction sample value corresponding to the sample point and the output prediction value.

S1043: the control device judges the prediction error to adjust parameters in the characteristic prediction model.

S1044: the control device determines whether the prediction error meets the training termination condition, if so, S1045 is executed, otherwise S1041 is executed.

Alternatively, the training termination condition may be that the prediction error is lower than a preset error threshold, or that the prediction error is lower than the preset error threshold for multiple consecutive times.

In practical application, the training termination condition and the preset error threshold value can be set according to the practical application scene, and the method is not limited.

S1045: the control device obtains a trained feature prediction model.

In this way, in the subsequent prediction step, the target product to be predicted can be predicted according to the trained feature prediction model.

It should be noted that, each product in the model training process includes a target product to be predicted and also includes a product not to be predicted. For example, the same model may be trained by each sample point corresponding to 100 commodities, and the 1 st commodity may be predicted by the trained feature prediction model.

Under the electronic market scene, if the online time is not long enough, the commodity quantity is insufficient, the condition of insufficient data quantity often occurs (for example, the commodity is marketed for 30 days, the data length is only 30, and the data quantity required for fitting a complex model is usually millions), so that a machine learning algorithm cannot be widely used in the business prediction of a motor, in the embodiment of the application, a large number of sample points are generated in a binary group mode, the same model is trained through the sample points of different products, the trained feature prediction model can not only consider a plurality of dimensions of a single product, but also comprehensively consider the characteristics of a plurality of products, on one hand, the feature prediction with less auxiliary data can be performed, and on the other hand, the influence of abnormal values appearing on the single product on the feature prediction model can be greatly reduced, so that the feature prediction model with higher prediction accuracy can be obtained.

Referring to fig. 3, a flowchart of an implementation of a feature prediction method provided in the present application is shown, where the specific flow of the method is as follows:

step 300: the control device acquires a reference time and a predicted time of the target product.

Specifically, the target product is a product to be predicted, and the reference time and the predicted time may be obtained according to input of a user, or may be preset, without limitation.

For example, the control device obtains a reference time of 1.3 for the cosmetic B (target product) and a predicted time of 1.5.

Step 301: and the control equipment acquires a historical characteristic sequence and a historical association characteristic sequence of the target product according to the reference time.

Specifically, when step 301 is performed, the control device may employ the following steps:

s3011: the control equipment acquires the characteristic value of the appointed characteristic of the target product before the reference time, and forms a historical characteristic sequence according to the acquired characteristic values of the appointed characteristic.

In one embodiment, the control device obtains feature values of the specified feature of the target product before the set start time to the reference time, and forms a historical feature sequence according to the obtained feature values.

In practical application, the set start time may be set according to the practical application scenario, which is not limited herein.

Wherein the historical feature sequence is determined from feature values of specified features of the product prior to the reference time.

S3012: the control equipment acquires the characteristic value of the association characteristic of the appointed characteristic of the target product before the reference time, and forms a history association characteristic sequence according to each acquired characteristic value of the association characteristic.

In one embodiment, the control device obtains, for each associated feature of the specified feature, each feature value of the associated feature of the target product before the set start time to the reference time, and forms a corresponding historical associated feature sequence according to each obtained feature value.

The historical association characteristic sequence is determined according to characteristic values of association characteristics of the product before the reference time.

The history feature sequence and the history-related feature sequence are generally data of statistical indicators of objective facts that have occurred in history.

Thus, the historical characteristic sequence and each historical association characteristic sequence of the target product can be obtained.

Step 302: the control device obtains a future time sequence characteristic sequence of the target product in the range from the reference time to the prediction time.

Specifically, the control device acquires feature values of future time sequence features of the target product from the reference time to the predicted time, and forms a future time sequence feature sequence according to each feature value of the future time sequence features.

In one embodiment, the control device acquires, for each future time series feature, each feature value of the future time series feature of the target product between the reference time and the predicted time, and composes the acquired feature values into a corresponding sequence of future time series feature values.

Wherein, each characteristic value of the future time sequence characteristic is preset, and the future time sequence characteristic sequence is composed of characteristic values of the future time sequence characteristic of the product from the reference time to the forecast time.

Furthermore, the control device can also fill the historical association feature sequence and the future time sequence feature sequence of the data missing by adopting the appointed filling value so as to finish updating the corresponding feature sequence, and generate the corresponding auxiliary feature sequence aiming at the filled feature sequence.

Specifically, the control device fills the history associated feature sequence and the future time sequence feature sequence of the data loss by adopting the appointed filling value so as to update the history associated feature sequence and the future time sequence feature sequence of the data loss, and generates corresponding auxiliary feature sequences for each filled history associated feature sequence and each future time sequence feature sequence respectively.

The auxiliary feature sequence may also be used to indicate whether elements in the post-population historical correlation feature sequence or the future timing feature sequence are populated.

The filling of the data missing history correlation feature sequence and the future time sequence feature sequence and the corresponding auxiliary feature sequence are generated based on the principle similar to the filling of the data missing history correlation feature sequence and the future time sequence feature sample sequence and the corresponding auxiliary feature sequence is generated, and the details are not repeated here.

Further, the control device may also obtain a static feature sequence of the target product.

Specifically, the control device obtains static feature values corresponding to each static feature of the target product, and forms a static feature sequence of the target product according to each obtained static feature value.

In this way, input data of the feature prediction model can be obtained.

Step 303: the control equipment inputs the historical feature sequence, the historical associated feature sequence and the future time sequence feature sequence of the target product into a trained feature prediction model to obtain a predicted value of the specified feature.

Furthermore, the control device can also input the historical feature sequence, the historical association feature sequence, the future time sequence feature sequence, the auxiliary feature sequence and the static feature sequence of the target product into a feature prediction model to obtain a predicted value of the specified feature.

According to the method and the device, the feature prediction model is trained according to the plurality of dimension sample data of the plurality of products, different dimensions of a single product can be considered, characteristics of the plurality of products can be comprehensively considered, the coverage range of input data is enlarged, the feature prediction model can learn complex relations between different features and different products from a plurality of angles, the prediction accuracy is improved, a large number of sampling points can be generated through the binary groups of the products, the problems that the data quantity of the products is small, the feature prediction accuracy is influenced by abnormal values appearing on the single products are solved, good prediction effects can be still obtained when the data quality is poor, the processes of preprocessing a large number of data and cleaning the data are saved, the data quality requirement is reduced, the model is independently learned in the training process, the model is not required to be artificially modified, the method and the device can be suitable for various complex scenes, and the application range is wide, and the universality is strong.

The following describes training and predicting a feature prediction model using a specific application scenario. Referring to fig. 4, an exemplary diagram of sample points is shown. The set time period is 1.01-1.04, and the products comprise commodity 1, commodity 2, commodity 3 and commodity 4. Each circle represents a sample point. Taking commodity 1 as an example, the training set of each commodity includes 6 sample points, and the corresponding binary groups are (1.1, 1.2), (1.1, 1.3), (1.1,1.4), (1.2, 1.3), (1.2,1.4), (1.3, and 1.4) in order. The control device trains the feature prediction model according to each sample point (24 sample points in total) of the commodity 1, the commodity 2, the commodity 3 and the commodity 4, and obtains the trained feature prediction model.

The predicted set for each commodity includes 2 sample points, with corresponding tuples (1.4, 1.5) and (1.4,1.6) in order. The control device predicts the sample points (1.4, 1.5) and (1.4,1.6) of each commodity in turn by using the trained feature prediction model, and obtains the predicted value of each commodity at 1.5 and the predicted value of each commodity at 1.6 respectively.

The following describes training and predicting a feature prediction model using another specific application scenario.

Assuming that data of 100 commodities from 1 month 1 day 2019 to 6 months 30 days 2019 are known, sales of 10 commodities from 1 month 1 day 2019 to 30 days 2019 to 7 months 30 days each need to be predicted.

The control device executes the following steps for each commodity, and generates each binary group according to 1/month 2019 to 6/month 30/month 2019.

Wherein the binary groups are (20190101, 20190102) … … (20190101, 20190630) in sequence; (20190102, 20190103) … … (20190102, 20190630); … …; (20190628, 20190629), (20190628, 20190630) (20190629, 20190630).

Next, the control device obtains each sampling point according to the binary group of each commodity, and respectively determines the historical characteristic sample sequence y of each sampling point according to the data of each commodity from 1 month, 1 day, 2019, 6 months and 30 days _t1 Set H of historic associated feature sample sequences _t1 Set F of future time series characteristic sample sequences _t2 Helper feature sequence p ^k _t3 A static feature sequence S, and a predicted sample value. In this way, corresponding training sample data for each sample point can be obtained.

Wherein, t1= (T-L1, T-l1+1, … …, T-2, T-1), t2= (T, t+1, … …, t+l2-1, t+l2), T1, T2 and T3 each represent a discrete time, and T is a reference time. L1 is the history feature length, and L2 is the future feature length. For each sampling point, its reference time, predicted time, historical feature length L1, and future feature length L2 are all determined.

Then, the control device uses the historical characteristic sample sequence y corresponding to each sampling point _t1 Set H of historic associated feature sample sequences _t1 Set F of future time series characteristic sample sequences _t2 Helper feature sequence p ^k _t3 And inputting the static feature sequence S into a feature prediction model to obtain a trained feature prediction model.

Next, the control device generates 30 tuples according to 2019, 7, 1, to 2019, 7, 30, and 20190630 as a reference time, and generates 10×30=300 sampling points according to the 10 commodities to be predicted and the 30 tuples.

Wherein, 30 binary groups are in turn: (20190630, 20190701), (20190630, 20190702) … … (20190630, 20190730).

The reference time may be any one of 1 st a 2019, 6 th a 2019, 30 th a date, and the later the date, the better the prediction effect.

Further, the control device acquires a history feature sequence, a history association feature sequence, a future time sequence feature sequence, an auxiliary feature sequence and a static feature sequence corresponding to the 300 sample points.

And finally, the control equipment respectively inputs the historical feature sequence, the historical association feature sequence, the future time sequence feature sequence, the auxiliary feature sequence and the static feature sequence of each sample point into a feature prediction model to obtain corresponding sales prediction values.

Thus, the sales predicted value of each of the 10 commodities on each of days 2019, 7, 1 and 2019, 7, 30 can be obtained.

Exemplary apparatus

Based on the same inventive concept, the embodiment of the present application further provides a device for feature prediction, and since the principle of solving the problem by the device and the equipment is similar to that of a method for feature prediction, the implementation of the device can refer to the implementation of the method, and the repetition is omitted.

Fig. 5 is a schematic structural diagram of an apparatus for feature prediction according to an embodiment of the present application. An apparatus for feature prediction comprising:

a first obtaining unit 501 configured to obtain a reference time and a predicted time of a target product, where the reference time is used to divide historical data and future data;

a second obtaining unit 502, configured to obtain a historical feature sequence and a historical association feature sequence of the target product according to the reference time;

a third obtaining unit 503, configured to obtain a future time sequence feature sequence of the target product from the reference time to the predicted time;

the prediction unit 504 is configured to input the historical feature sequence, the historical associated feature sequence, and the future time sequence feature sequence into a trained feature prediction model, to obtain a predicted value of the specified feature, where the feature prediction model is obtained by training according to sample data of a plurality of products.

Preferably, the second obtaining unit 502 is configured to:

Preferably, the third obtaining unit 503 is configured to:

Preferably, the prediction unit 504 is further configured to:

the prediction unit 504 is configured to:

Preferably, the first obtaining unit 501 is further configured to:

According to the feature prediction method, the device, the electronic equipment and the storage medium, the feature prediction model is trained according to sample data of a plurality of products, a trained feature prediction model is obtained, the reference time and the prediction time of a target product are obtained, and the historical feature sequence and the historical association feature sequence of the target product are obtained according to the reference time; acquiring a future time sequence characteristic sequence of a target product in a reference time to a predicted time; and inputting the historical feature sequence, the historical associated feature sequence and the future time sequence feature sequence into a trained feature prediction model to obtain a predicted value of the specified feature. Thus, the feature prediction model is obtained according to the sample data training of a plurality of products, and feature prediction is carried out through the feature data of a plurality of dimensions of the target product and the feature prediction model, so that the data volume and coverage of input data are enlarged, the data quality requirement is reduced, and the accuracy of feature prediction is improved.

Based on the same inventive concept as the characteristic prediction method, the embodiment of the application also provides electronic equipment, which can be a desktop computer, a portable computer, a server and the like. As shown in fig. 6, the electronic device 60 may include a processor 601 and a memory 602.

The processor 601 may be a general purpose processor such as a Central Processing Unit (CPU), digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, and may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in the processor for execution.

The memory 602 is a non-volatile computer readable storage medium that can be used to store non-volatile software programs, non-volatile computer executable programs, and modules. The Memory may include at least one type of storage medium, and may include, for example, flash Memory, hard disk, multimedia card, card Memory, random access Memory (Random Access Memory, RAM), static random access Memory (Static Random Access Memory, SRAM), programmable Read-Only Memory (Programmable Read Only Memory, PROM), read-Only Memory (ROM), charged erasable programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), magnetic Memory, magnetic disk, optical disk, and the like. The memory is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 602 in the present embodiment may also be circuitry or any other device capable of implementing a memory function for storing program instructions and/or data.

Exemplary program product

Embodiments of the present application provide a computer-readable storage medium storing computer program instructions for use with the above-described electronic device, which contains a program for executing the above-described feature prediction method.

The computer storage media described above can be any available media or data storage device that can be accessed by a computer, including, but not limited to, magnetic storage (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical storage (e.g., CD, DVD, BD, HVD, etc.), and semiconductor storage (e.g., ROM, EPROM, EEPROM, nonvolatile storage (NAND FLASH), solid State Disk (SSD)), etc.

In some possible implementations, the various aspects of the present application may also be implemented as a computer program product comprising program code for causing a server device to perform the steps of the feature prediction method according to the various exemplary embodiments of the present application as described in the "exemplary methods" section of this specification, when the computer program product is run on the server device.

The computer program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer program product for instant messaging applications according to embodiments of the present application may employ a portable compact disc read-only memory (CD-ROM) and include program code and may run on a server device. However, the program product of the present application is not limited thereto, and in this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functions of two or more of the elements described above may be embodied in one element in accordance with embodiments of the present application. Conversely, the features and functions of one unit described above may be further divided into a plurality of units to be embodied.

Furthermore, although the operations of the methods of the present application are depicted in the drawings in a particular order, this is not required to or suggested that these operations must be performed in this particular order or that all of the illustrated operations must be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.

While the spirit and principles of this application have been described with reference to several particular embodiments, it is to be understood that this application is not limited to the disclosed particular embodiments nor does it imply that features in the various aspects are not useful in combination, nor are they intended to be in any way useful for the convenience of the description. The application is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. A method of feature prediction, comprising:

acquiring a historical characteristic sequence and a historical association characteristic sequence of the target product according to the reference time;

Acquiring a future time sequence characteristic sequence of the target product in the reference time to the predicted time;

inputting the historical feature sequence, the historical associated feature sequence and the future time sequence feature sequence into a trained feature prediction model to obtain a predicted value of a specified feature, wherein the feature prediction model is obtained by training according to sample data of a plurality of products;

before the reference time and the predicted time of the target product are acquired, the method further comprises:

training the feature prediction model according to the historical feature sample sequence, the historical associated feature sample sequence, the future time sequence feature sample sequence, the auxiliary feature sequence, the predicted sample value and the static feature sequence of each product to obtain a trained feature prediction model; the static feature is a product-related and time-independent feature; the auxiliary characteristic sequence is generated by filling a history associated characteristic sample sequence and a future time sequence characteristic sample sequence which are lack of data by adopting a specified filling value; the future time sequence features are unspecified features reflecting the future; the predicted sample value is a characteristic value of the predicted time corresponding to the sampling point.

2. The method of claim 1, wherein obtaining the historical and historical-associated signature of the target product based on the reference time comprises:

acquiring a characteristic value of the designated characteristic of the target product before the reference time;

according to the obtained characteristic values of the specified characteristics, a historical characteristic sequence is formed;

acquiring a characteristic value of an associated characteristic of the designated characteristic of the target product before a reference time;

3. The method of claim 1, wherein obtaining a future sequence of timing characteristics of the target product over the reference time to the predicted time comprises:

acquiring characteristic values of future time sequence characteristics of the target product from a reference time to a predicted time, wherein each characteristic value of the future time sequence characteristics is preset;

and forming a future time sequence feature sequence according to each feature value of the future time sequence feature.

4. The method of claim 1, further comprising, prior to inputting the historical feature sequence, the historical correlation feature sequence, and the future time series feature sequence into a trained feature prediction model to obtain a predicted value for a specified feature:

Filling the history associated feature sequence and the future time sequence feature sequence of the data deletion by adopting the appointed filling value so as to update the history associated feature sequence and the future time sequence feature sequence;

for each of the filled historical correlated feature sequences and each of the future time series feature sequences, respectively, a corresponding auxiliary feature sequence is generated, the auxiliary feature sequence being used to indicate whether elements in the filled historical correlated feature sequences or the future time series feature sequences are filled.

5. The method of claim 4, further comprising, prior to inputting the historical feature sequence, the historical correlated feature sequence, and the future time series feature sequence into a trained feature prediction model to obtain a predicted value for a specified feature:

acquiring static characteristic values corresponding to all static characteristics of the target product, wherein the static characteristics are characteristics related to the product and are irrelevant to time;

And inputting the historical feature sequence, the historical associated feature sequence, the future time sequence feature sequence, the auxiliary feature sequence and the static feature sequence into the feature prediction model to obtain a predicted value of a specified feature.

6. The method of claim 1, wherein training the feature prediction model based on the historical feature sample sequence, the historical correlated feature sample sequence, the future time series feature sample sequence, the auxiliary feature sequence, the predicted sample value, and the static feature sequence for each product to obtain a trained feature prediction model comprises:

respectively combining every two sampling times in a set time period to obtain corresponding binary groups, wherein the first sampling time contained in the binary groups is earlier than the second sampling time;

taking each binary group of each product as a sample point, taking a first sampling time contained in the binary group as a reference time of a corresponding sample point, and taking a second sampling time contained in the binary group as a prediction time of the corresponding sample point;

7. A feature prediction apparatus, comprising:

a third acquisition unit, configured to acquire a future time sequence characteristic sequence of the target product in the reference time to the prediction time;

the prediction unit is used for inputting the historical feature sequence, the historical associated feature sequence and the future time sequence feature sequence into a trained feature prediction model to obtain a predicted value of a specified feature, wherein the feature prediction model is obtained by training according to sample data of a plurality of products;

the first acquisition unit is further configured to:

8. The apparatus of claim 7, wherein the second acquisition unit is to:

9. The apparatus of claim 7, wherein the third acquisition unit is to:

10. The apparatus of claim 7, wherein the prediction unit is further to:

11. The apparatus of claim 10, wherein the prediction unit is further to:

the prediction unit is used for:

12. The apparatus of claim 7, wherein the first acquisition unit is further to:

13. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any one of claims 1 to 6 when the computer program is executed by the processor.

14. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the steps of the method of any of claims 1 to 6.