CN111401940A

CN111401940A - Feature prediction method, feature prediction device, electronic device, and storage medium

Info

Publication number: CN111401940A
Application number: CN202010148299.0A
Authority: CN
Inventors: 王迪; 肖伟集; 朱旭律; 杨杰
Original assignee: Hangzhou Netease Zaigu Technology Co Ltd
Current assignee: Hangzhou Netease Zaigu Technology Co Ltd
Priority date: 2020-03-05
Filing date: 2020-03-05
Publication date: 2020-07-10
Anticipated expiration: 2040-03-05
Also published as: CN111401940B

Abstract

The application discloses a feature prediction method, a feature prediction device, an electronic device and a storage medium, which are used for improving the prediction accuracy, and the method comprises the following steps: training the feature prediction model according to sample data of a plurality of products to obtain the trained feature prediction model, acquiring reference time and prediction time of a target product, and acquiring a historical feature sequence and a historical associated feature sequence of the target product according to the reference time; acquiring a future time sequence characteristic sequence of the target product from the reference time to the predicted time; and inputting the historical characteristic sequence, the historical association characteristic sequence and the future time sequence characteristic sequence into the trained characteristic prediction model to obtain a predicted value of the specified characteristic. Therefore, the feature prediction model is obtained according to sample data training of a plurality of products, and feature prediction is carried out through the feature data of a plurality of dimensions of the target product and the feature prediction model, so that the data volume and the coverage of input data are enlarged, and the accuracy of feature prediction is improved.

Description

Feature prediction method, feature prediction device, electronic device, and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a feature prediction method and apparatus, an electronic device, and a storage medium.

Background

This section is intended to provide a background or context to the embodiments of the application that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.

In recent years, with the development of the internet, the retail industry has also been rapidly developed. In order to make better business decisions, it is often necessary to accurately predict characteristics such as sales of products.

Most of the traditional prediction modes have high requirements on data continuity, data quantity, stability and the like, cannot adapt to complex service scenes, and cannot well meet most of data requirements and service scenes, so that the traditional prediction modes have large prediction errors and are difficult to obtain ideal prediction effects.

Therefore, how to improve the accuracy of prediction is an urgent problem to be solved.

Disclosure of Invention

In view of the above technical problems, there is a strong need for an improved method for improving the accuracy of feature prediction.

In one aspect, an embodiment of the present application provides a feature prediction method, including:

acquiring reference time and predicted time of a target product, wherein the reference time is used for dividing historical data and future data;

acquiring a historical characteristic sequence and a historical associated characteristic sequence of a target product according to the reference time;

acquiring a future time sequence characteristic sequence of the target product from the reference time to the predicted time;

inputting the historical characteristic sequence, the historical associated characteristic sequence and the future time sequence characteristic sequence into a trained characteristic prediction model to obtain a predicted value of the specified characteristic, wherein the characteristic prediction model is obtained by training according to sample data of a plurality of products.

Preferably, the obtaining of the historical feature sequence and the historical associated feature sequence of the target product according to the reference time includes:

acquiring a characteristic value of a designated characteristic of a target product before a reference time;

forming a historical characteristic sequence according to the acquired characteristic values of the specified characteristics;

acquiring a characteristic value of the associated characteristic of the specified characteristic of the target product before the reference time;

and forming a historical associated feature sequence according to the obtained feature values of the associated features.

Preferably, the obtaining of the future time sequence feature sequence of the target product from the reference time to the predicted time includes:

acquiring a characteristic value of a future time sequence characteristic of a target product between reference time and predicted time, wherein each characteristic value of the future time sequence characteristic is preset;

and forming a future time sequence characteristic sequence according to the characteristic values of the future time sequence characteristic.

Preferably, before inputting the historical feature sequence, the historical associated feature sequence and the future timing feature sequence into the trained feature prediction model to obtain the predicted value of the specified feature, the method further includes:

filling the historical associated characteristic sequence and the future time sequence characteristic sequence with missing data by adopting a specified filling value so as to update the historical associated characteristic sequence and the future time sequence characteristic sequence;

and respectively generating corresponding auxiliary feature sequences aiming at each filled historical associated feature sequence and each future time sequence feature sequence, wherein the auxiliary feature sequences are used for indicating whether elements in the filled historical associated feature sequences or the future time sequence feature sequences are filled.

obtaining static characteristic values corresponding to all static characteristics of a target product, wherein the static characteristics are characteristics related to the product and unrelated to time;

forming a static characteristic sequence of the target product according to the obtained static characteristic values;

inputting the historical characteristic sequence, the historical association characteristic sequence and the future time sequence characteristic sequence into a trained characteristic prediction model to obtain a predicted value of the specified characteristic, wherein the method comprises the following steps:

and inputting the historical characteristic sequence, the historical associated characteristic sequence, the future time sequence characteristic sequence, the auxiliary characteristic sequence and the static characteristic sequence into a characteristic prediction model to obtain a predicted value of the specified characteristic.

Preferably, before obtaining the reference time and the predicted time of the target product, the method further comprises:

and training the characteristic prediction model according to the historical characteristic sample sequence, the historical associated characteristic sample sequence, the future time sequence characteristic sample sequence, the auxiliary characteristic sequence, the prediction sample value and the static characteristic sequence of each product to obtain the trained characteristic prediction model.

Preferably, the training of the feature prediction model according to the historical feature sample sequence, the historical associated feature sample sequence, the future time sequence feature sample sequence, the auxiliary feature sequence, the prediction sample value and the static feature sequence of each product to obtain the trained feature prediction model includes:

respectively combining every two sampling times in a set time period to obtain a corresponding binary group, wherein the first sampling time contained in the binary group is earlier than the second sampling time;

respectively taking each two-tuple of each product as a sample point, taking the first sampling time contained in the two-tuple as the reference time of the corresponding sample point, and taking the second sampling time contained in the two-tuple as the predicted time of the corresponding sample point;

determining a historical characteristic sample sequence, a historical associated characteristic sample sequence, a future time sequence characteristic sample sequence, an auxiliary characteristic sequence, a predicted sample value and a static characteristic sequence corresponding to each sample point according to the reference time and the predicted time corresponding to each sample point;

and training the characteristic prediction model according to the historical characteristic sample sequence, the historical associated characteristic sample sequence, the future time sequence characteristic sample sequence, the auxiliary characteristic sequence, the prediction sample value and the static characteristic sequence corresponding to each sample point to obtain the trained characteristic prediction model.

In one aspect, an embodiment of the present application provides a feature prediction apparatus, including:

the system comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring reference time and predicted time of a target product, and the reference time is used for dividing historical data and future data;

the second acquisition unit is used for acquiring the historical characteristic sequence and the historical associated characteristic sequence of the target product according to the reference time;

the third acquisition unit is used for acquiring a future time sequence characteristic sequence of the target product from the reference time to the predicted time;

and the prediction unit is used for inputting the historical characteristic sequence, the historical associated characteristic sequence and the future time sequence characteristic sequence into a trained characteristic prediction model to obtain a predicted value of the specified characteristic, and the characteristic prediction model is obtained by training sample data of a plurality of products.

Preferably, the second obtaining unit is configured to:

Preferably, the third obtaining unit is configured to:

Preferably, the prediction unit is further configured to:

filling the historical associated characteristic sequence and the future time sequence characteristic sequence with missing data by adopting a specified filling value so as to update the historical associated characteristic sequence and the future time sequence characteristic sequence with missing data;

Preferably, the prediction unit is further configured to:

the prediction unit is to:

Preferably, the first obtaining unit is further configured to:

In one aspect, an embodiment of the present application provides a computer-readable storage medium having stored thereon computer program instructions, which, when executed by a processor, implement the steps of any of the above-described methods.

In one aspect, an embodiment of the present application provides a computer program product comprising a computer program stored on a computer-readable storage medium, the computer program comprising program instructions which, when executed by a processor, implement the steps of any of the methods described above.

According to the feature prediction method, the feature prediction device, the electronic equipment and the storage medium, the feature prediction model is trained according to the sample data of a plurality of products, and the trained feature prediction model is obtained; acquiring reference time and prediction time of a target product, and acquiring a historical characteristic sequence and a historical associated characteristic sequence of the target product according to the reference time; acquiring a future time sequence characteristic sequence of the target product from the reference time to the predicted time; and inputting the historical characteristic sequence, the historical association characteristic sequence and the future time sequence characteristic sequence into the trained characteristic prediction model to obtain a predicted value of the specified characteristic. Therefore, the feature prediction model is obtained according to sample data training of a plurality of products, and feature prediction is carried out through the feature data of a plurality of dimensions of the target product and the feature prediction model, so that the data volume and the coverage of input data are enlarged, the data quality requirement is lowered, and the accuracy of the feature prediction is improved.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present application will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present application are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:

fig. 1 is a flowchart illustrating an implementation of a method for training a feature prediction model according to an embodiment of the present disclosure;

FIG. 2 is a diagram illustrating exemplary binary group data according to an embodiment of the present disclosure;

fig. 3 is a flowchart illustrating an implementation of a feature prediction method according to an embodiment of the present application;

FIG. 4 is a sample point illustration provided in accordance with an embodiment of the present application;

fig. 5 is a schematic structural diagram of a feature prediction apparatus according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The principles and spirit of the present application will be described with reference to a number of exemplary embodiments. It should be understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the present application, and are not intended to limit the scope of the present application in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

As will be appreciated by one skilled in the art, embodiments of the present application may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.

In this document, it is to be understood that any number of elements in the figures are provided by way of illustration and not limitation, and any nomenclature is used for differentiation only and not in any limiting sense.

For convenience of understanding, terms referred to in the embodiments of the present application are explained below:

the terminal device can be a device capable of installing various applications and displaying objects provided in the installed applications, and the terminal device can be mobile or fixed. Desktop computers, mobile phones, mobile computers, tablet computers, media players, smart wearable devices, smart televisions, in-vehicle devices, Personal Digital Assistants (PDAs), point of sale (POS), or other electronic devices capable of implementing the above functions, and the like.

Reference time: for partitioning historical data and future data. Data before the reference time is history data, and data after the reference time is future data.

Sample point: and representing a product, wherein corresponding sample points are different when the product is different or the two groups are different.

Future timing characteristics: typically characterized by a planned or well-defined regular change in data.

Static characteristics: are product-related and time-independent features.

Auxiliary characteristic sequence: for indicating whether elements in the padded feature sequence are padded.

eXtreme Gradient Boosting (XGBoost) model: is a decision tree model that is itself a general machine learning algorithm.

A characteristic prediction model: the predicted value of the designated characteristic used for predicting the product at the prediction time is obtained by training according to the sample data of each product. The feature prediction model may be obtained by training an XGBoost model, or may be obtained by training other models, which is not limited herein.

The principles and spirit of the present application are explained in detail below with reference to several representative embodiments of the present application.

Summary of The Invention

The inventor of the application finds that most of the traditional prediction modes have high requirements on data and application scenes of products, and most of the data requirements and service scenes cannot well meet the requirements, so that the error of a prediction result is large, and an ideal prediction effect is difficult to obtain. Data requirements such as data volume requirements, data continuity requirements, data stability requirements, data unicity requirements and the like, and service scenarios such as simple service scenarios and the like.

In order to solve the above problem, an embodiment of the present application provides a feature prediction method, which specifically includes: training the feature prediction model according to sample data of a plurality of products to obtain a trained feature prediction model; acquiring reference time and prediction time of a target product, and acquiring a historical characteristic sequence and a historical associated characteristic sequence of the target product according to the reference time; acquiring a future time sequence characteristic sequence of the target product from the reference time to the predicted time; and inputting the historical characteristic sequence, the historical association characteristic sequence and the future time sequence characteristic sequence into the trained characteristic prediction model to obtain a predicted value of the specified characteristic. Therefore, the feature prediction model is obtained according to sample data training of a plurality of products, and feature prediction is carried out through the feature data of a plurality of dimensions of the target product and the feature prediction model, so that the data volume and the coverage of input data are enlarged, the data quality requirement is lowered, and the accuracy of the feature prediction is improved.

Having described the basic principles of the present application, various non-limiting embodiments of the present application are described in detail below.

Exemplary method

To further illustrate the technical solutions provided by the embodiments of the present application, the following detailed description is made with reference to the accompanying drawings and the detailed description. Although the embodiments of the present application provide method steps as shown in the following embodiments or figures, more or fewer steps may be included in the method based on conventional or non-inventive efforts. In steps where no necessary causal relationship exists logically, the order of execution of the steps is not limited to that provided by the embodiments of the present application. The method can be executed in sequence or in parallel according to the method shown in the embodiment or the figure when the method is executed in an actual processing procedure or a device.

In the embodiment of the present application, the description is given by taking the characteristics as sales volume and predicting the sales volume of the product as an example, and in practical applications, the characteristics of the product may also be set according to practical applications, which is not limited herein.

In the embodiment of the present application, before feature prediction is performed on a product, a feature prediction model is trained to obtain a trained feature prediction model, and referring to fig. 1, a flowchart of an implementation of a method for training a feature prediction model is shown, and a specific flow of the method is as follows:

step 101: and the control equipment combines every two sampling times in a set time period to obtain corresponding binary groups.

Specifically, when step 101 is executed, the control device executes the following steps for each sampling time in the set time period:

the control device combines the sampling time with each sampling time after the sampling time respectively to obtain a binary group consisting of two sampling times.

It should be noted that the set time period and the sampling time may be set according to an actual application scenario, and are not limited herein. The doublet consists of two sample times, the first sample time in the doublet being earlier than the second sample time.

For example, referring to fig. 2, which is an exemplary diagram of binary data, 10 tuples, (1.1,1.2), (1.1,1.3), (1.1,1.4), (1.1,1.5), (1.2,1.3), (1.2,1.4), (1.2,1.5), (1.3,1.4), (1.3,1.5) and (1.4,1.5) can be generated by setting the time period to be 1 month 1 to 1 month 5.

Thus, the longer the set period of time, the smaller the interval between sampling times, and the larger the amount of data of the generated binary.

Step 102: the control device takes each two-tuple of each product as a sample point, and determines the reference time and the predicted time of the corresponding sample point according to the two-tuple.

Specifically, the control device takes each two-tuple of each product as a sample point, takes the first sampling time contained in the two-tuple as the reference time of the corresponding sample point, and takes the second sampling time contained in the two-tuple as the predicted time of the corresponding sample point.

The reference time is used for dividing the historical data and the future data corresponding to the sample point, and may be represented by today (today), and the predicted time may be represented by target (target).

It should be noted that different products or different binary groups are different sample points, that is, one sample point corresponds to one product, one reference time, and one predicted time.

For example, referring to fig. 2, which is an exemplary diagram of binary data, setting time periods from 1 month 1 to 1 month 5, 10 binary groups, i.e., 10 sample points, (1.1,1.2), (1.1,1.3), (1.1,1.4), (1.1,1.5), (1.2,1.3), (1.2,1.4), (1.2,1.5), (1.3,1.4), (1.3,1.5) and (1.4,1.5) can be generated for one product.

It should be noted that, the larger the number of sample points, the more the rule that the feature prediction model can learn is, the more universal, and the better the prediction effect is, in the embodiment of the present application, each sampling time is combined, and the longer the set time period is, the more the data amount of the generated sample points is, the larger the number of sample points can be generated in a short time, and the problem of insufficient data amount can be solved.

Step 103: and the control equipment determines a historical characteristic sample sequence, a historical associated characteristic sample sequence, a future time sequence characteristic sample sequence, a predicted sample value and a static characteristic sequence corresponding to each sample point according to the reference time and the predicted time corresponding to each sample point.

Specifically, when step 103 is executed, the control device may execute the following steps for each sampling point respectively:

s1031: and the control equipment acquires a historical characteristic sample sequence and a historical associated characteristic sample sequence of a product corresponding to the sampling point according to the reference time corresponding to the sampling point.

Specifically, the control device obtains a characteristic value of a specified characteristic of a product corresponding to the sampling point before a reference time corresponding to the sampling point, and forms a historical characteristic sample sequence according to each obtained characteristic value of the specified characteristic. The control equipment acquires the characteristic value of the associated characteristic of the specified characteristic of the product corresponding to the sampling point before the reference time corresponding to the sampling point, and forms a historical associated characteristic sequence according to the acquired characteristic values of the associated characteristic.

Wherein the historical characteristic sample sequence consists of characteristic values of the specified characteristic of the product within a set time period and before a reference time. Each historical associated feature sample sequence consists of feature values of the associated feature of the product within a set time period and before a reference time.

It should be noted that the historical characteristic sample sequence and the historical associated characteristic sample sequence are generally data of statistical indexes of objective facts that history has occurred. The specified feature is a feature to be predicted, e.g., a sales volume. The associated characteristics are characteristics related to the set specified characteristics, such as sales and flow. The associated features may be features of one dimension, or may include features of multiple dimensions.

For example, if one sample point is a binary group (1.3,1.5) of the product a, the specified feature is sales, the associated feature is sales, and the set time period is 1.1 to 1.5, the control device generates a history feature sample sequence based on the sales of the product a at 1.1 and 1.2 and generates a history associated feature sample sequence based on the sales of the product a at 1.1 and 1.2.

S1032: and the control equipment acquires a product corresponding to the sampling point, and a future time sequence characteristic sample sequence is obtained from the reference time corresponding to the sampling point to the prediction time corresponding to the sampling point.

Wherein each future time sequence characteristic sample sequence consists of characteristic values of the future time sequence characteristic of the product between the reference time and the predicted time.

Wherein, each characteristic value of the future time sequence characteristic is preset to reflect future non-specified characteristics. The future time series characteristic sample sequence is usually planned or data with definite regularity, that is, data that can be known definitely. For example, the future timing characteristics may be future legal holidays, and the like.

It should be noted that the future time series characteristic sample sequence between the reference time and the predicted time includes a future time series characteristic value corresponding to the reference time, a future time series characteristic value corresponding to the predicted time, and each future time series characteristic value corresponding to a time within the reference time and the predicted time.

S1033: and the control equipment acquires the designated characteristics of the product corresponding to the sampling point, and takes the characteristic value of the prediction time corresponding to the sampling point as a prediction sample value.

And the prediction sample value is a characteristic value corresponding to the specified characteristic of the product at the prediction time.

S1034: and the control equipment acquires the static characteristic sequence of the product corresponding to the sampling point.

Specifically, the control device obtains a static characteristic value corresponding to a static characteristic of a product corresponding to the sampling point.

Wherein the static feature sequence is composed of feature values of each static feature. Static features are product-related and time-independent features.

The nth static signature sequence can be represented as sⁿAnd s static feature sequence, wherein n represents the dimension of the static feature. The set of static signature sequences may be denoted as S. The number of the static feature sequences may be one or more.

For example, the static characteristics are identification Information (ID) of the product, category ID of the product, and the like.

S1035: and the control equipment fills the historical associated characteristic sample sequence and the future time sequence characteristic sample sequence of the data loss to update the historical associated characteristic sample sequence and the future time sequence characteristic sample sequence of the data loss, and generates a corresponding auxiliary characteristic sequence according to each filled historical associated characteristic sample sequence and future time sequence characteristic sample sequence.

Wherein the assistant feature sequence is generated according to the historical associated feature sample sequence with data missing and the future time sequence feature sequence.

In executing S1035, the control apparatus may adopt the following steps:

and filling the historical associated characteristic sample sequence and the future time sequence characteristic sample sequence with missing data by adopting a specified filling value to update the historical associated characteristic sample sequence and the future time sequence characteristic sample sequence, and generating a corresponding auxiliary characteristic sequence aiming at each filled historical associated characteristic sample sequence and future time sequence characteristic sample sequence.

The auxiliary feature sequence is used to indicate whether elements in the padded feature sequence are padded, that is, whether elements of the padded historical associated feature sample sequence or future time series feature sample sequence are padded.

It should be noted that the filled feature sequences and the auxiliary feature sequences are in a one-to-one correspondence relationship, that is, each filled historical associated feature sample sequence generates a corresponding auxiliary feature sequence, and each filled future time sequence feature sample sequence generates a corresponding auxiliary feature sequence.

In practical applications, the specified padding value may be set according to practical application scenarios, for example, may be 0, and is not limited herein.

In one embodiment, the control device may perform the following steps for each history associated feature sample sequence:

step a: the control device determines the history feature length based on the set time period and the reference time.

The historical characteristic length represents a length of a retrospective past, and may be the number of elements included in the historical associated characteristic sample sequence when data is not missing.

For example, if the time period is set to 1.1 to 1.5 and the reference time is set to 1.3, and sampling is performed every other day, the two element values corresponding to the times 1.1 and 1.2 are set, and the historical feature length is 2.

Step b: when the length of the historical associated feature sample sequence is lower than the historical feature length, the control device fills the historical associated feature sample sequence by adopting a specified filling value, so that the length of the filled historical associated feature sample sequence reaches the historical feature length.

Step c: and the control equipment generates a corresponding auxiliary characteristic sequence aiming at the filled historical associated characteristic sample sequence.

In one embodiment, the control device may perform the following steps for each future sequence of time series feature samples:

step a: the control device determines a future feature length based on the reference time to the predicted time.

The future characteristic length represents a length of tracing back the future, and may be the number of elements included in the future time series characteristic sample sequence when data is not missing.

Step b: when the length of the future time sequence feature sample sequence is lower than the future feature length, the control device fills the future time sequence feature sample sequence by adopting a specified filling value, so that the length of the filled future time sequence feature sample sequence reaches the future feature length.

Step c: and the control equipment generates a corresponding auxiliary characteristic sequence aiming at the filled future time sequence characteristic sample sequence.

This is because in practical applications, there may be data missing, for example, when the historical feature length L1 is 30, only the data of the first 20 days of the commodity which is listed on the market for 10 days is missing, so that the feature sequence with data missing can be data-filled, and a corresponding assistant feature sequence can be generated according to whether the feature sequence is a filling element or not.

Step 104: and the control equipment trains the characteristic prediction model according to the historical characteristic sample sequence, the historical associated characteristic sample sequence, the future time sequence characteristic sample sequence and the prediction sample value of each product to obtain the trained characteristic prediction model.

The feature prediction model is used for predicting the predicted value of the specified feature of the product at the prediction time and is obtained by training according to sample data of each dimension of each product.

Optionally, the feature prediction model may be obtained by training using an XGBoost model, or may be obtained by using another model, which is not limited herein. Before executing step 104, the control device may initialize parameters in the feature prediction model in advance.

Further, the control device may train the feature prediction model according to the historical feature sample sequence, the historical associated feature sample sequence, the future timing sequence feature sample sequence, the auxiliary feature sequence, and the prediction sample value of each sampling point, to obtain the trained feature prediction model.

In one embodiment, the following steps may be adopted when training the feature prediction model:

s1041: and the control equipment inputs the historical characteristic sample sequence, the historical associated characteristic sample sequence, the future time sequence characteristic sample sequence, the auxiliary characteristic sequence and the static characteristic sequence corresponding to the sample points into the characteristic prediction model and outputs a predicted value.

Specifically, the historical characteristic sample sequence of the sample points can adopt y_t1This indicates that T1 is (T-L1, T-L1 +1, … …, T-2, T-1).

The set of historically associated feature sample sequences may be denoted as H_t1T1 ═ T-L1 (T-L, T-L1 +1, … …, T-2, T-1)ⁱ，hⁱ _t1Representing the feature value of the ith associated feature at time t 1.

The set of each future sequence of time series feature samples may be denoted as F_t2T2 ═ T, T +1, … …, T + L2-1, T + L2 j th future time series of feature samples can be expressed as f^j，f^j _t2Representing the feature value of the jth future timing feature at time t 2.

The helper signature sequence may be denoted as p^k _t3。

Where T represents a base time, L1 represents a historical feature length, L2 represents a future feature length, i represents a dimension of an associated feature, j represents a dimension of a future time series feature, k is a dimension of a corresponding padded feature sequence, and T1, T2, and T3 all represent discrete times.

According to the above parameters, the predicted value Y may adopt the following expression:

Y＝g(y_t1，H_t1，F_t2，p^k _t3，S)。

wherein g is a feature prediction model.

S1042: and the control equipment determines a prediction error according to the prediction sample value corresponding to the sample point and the output prediction value.

S1043: the control device judges the prediction error and adjusts the parameters in the characteristic prediction model.

S1044: and the control equipment judges whether the prediction error meets the training termination condition, if so, S1045 is executed, and otherwise, S1041 is executed.

Optionally, the training termination condition may be that the prediction error is lower than a preset error threshold, or that the prediction errors of consecutive times are lower than a preset error threshold.

In practical application, both the training termination condition and the preset error threshold value may be set according to a practical application scenario, which is not limited herein.

S1045: the control device obtains a trained feature prediction model.

Therefore, in the subsequent prediction step, the target product to be predicted can be predicted according to the trained feature prediction model.

It should be noted that each product in the model training process includes a target product to be predicted and also includes a product that is not to be predicted. For example, the same model may be trained by each sample point corresponding to 100 commodities, and the trained feature prediction model may be used to predict the 1 st commodity.

Under the E-market scene, if the online time is not long enough and the number of commodities is not large enough, the situation of insufficient data amount often happens (for example, 30-day commodities are listed on the market, the data length is only 30, and the data amount used for fitting a complex model is generally millions), so that the machine learning algorithm cannot be widely used in E-business prediction, in the embodiment of the application, a large number of sample points are generated in a binary mode, and the same model is trained through the sample points of different products, so that the trained feature prediction model not only can consider multiple dimensions of a single product, but also can comprehensively consider the characteristics of multiple products, on one hand, the feature prediction with less data can be assisted, and on the other hand, the influence of abnormal values on a single product on the feature prediction model can be greatly reduced, so that the feature prediction model with high prediction accuracy can be obtained.

Referring to fig. 3, a flowchart of an implementation of a feature prediction method provided in the present application is shown, and the specific flow of the method is as follows:

step 300: the control device acquires the reference time and the predicted time of the target product.

Specifically, the target product is a product to be predicted, and the reference time and the predicted time may be obtained according to input of a user, or may be preset, without limitation.

For example, the control apparatus acquires cosmetic B (target product) at a reference time of 1.3 and a predicted time of 1.5.

Step 301: and the control equipment acquires the historical characteristic sequence and the historical associated characteristic sequence of the target product according to the reference time.

Specifically, when step 301 is executed, the control device may adopt the following steps:

s3011: the control equipment acquires the characteristic values of the designated characteristics of the target product before the reference time, and forms a historical characteristic sequence according to the acquired characteristic values of the designated characteristics.

In one embodiment, the control device acquires characteristic values of the specified characteristics of the target product before the set starting time to the reference time, and forms a historical characteristic sequence according to the acquired characteristic values.

In practical application, the set start time may be set according to a practical application scenario, which is not limited herein.

Wherein the historical characteristic sequence is determined according to the characteristic value of the specified characteristic of the product before the reference time.

S3012: the control equipment acquires the characteristic values of the associated characteristics of the specified characteristics of the target product before the reference time, and forms a historical associated characteristic sequence according to the acquired characteristic values of the associated characteristics.

In one embodiment, the control device obtains, for each associated feature of the specified features, feature values of the associated feature of the target product before a set starting time to a reference time, and forms a corresponding historical associated feature sequence according to the obtained feature values.

Wherein, the historical associated characteristic sequence is determined according to the characteristic value of the associated characteristic of the product before the reference time.

It should be noted that the historical signature sequence and the historical associated signature sequence are generally data that are statistical indicators of the objective fact that history has occurred.

Therefore, the historical characteristic sequence and each historical associated characteristic sequence of the target product can be obtained.

Step 302: the control equipment acquires a future time sequence characteristic sequence of the target product from the reference time to the predicted time.

Specifically, the control device obtains a characteristic value of the future time sequence characteristic of the target product between the reference time and the predicted time, and forms a future time sequence characteristic sequence according to each characteristic value of the future time sequence characteristic.

In one embodiment, the control device obtains, for each future time series characteristic, characteristic values of the future time series characteristic of the target product between a reference time and a predicted time, and combines the obtained characteristic values into a corresponding future time series characteristic value sequence.

Wherein, each characteristic value of the future time sequence characteristic is preset, and the future time sequence characteristic sequence is composed of the characteristic values of the future time sequence characteristic of the product between the reference time and the predicted time.

Further, the control device may further fill the historical associated feature sequence and the future time sequence feature sequence with a specified filling value to complete updating of the corresponding feature sequence, and generate a corresponding auxiliary feature sequence for the filled feature sequence.

Specifically, the control device fills the historical associated feature sequence and the future time sequence feature sequence with the specified filling value to update the historical associated feature sequence and the future time sequence feature sequence with the missing data, and generates a corresponding auxiliary feature sequence for each of the historical associated feature sequence and each of the future time sequence feature sequence after filling.

Wherein the assistant feature sequence may also be used to indicate whether elements in the populated historical associated feature sequence or future time series feature sequence are populated.

Based on the principle that the historical associated feature sample sequence and the future time sequence feature sample sequence which are missing data are filled and corresponding auxiliary feature sequences are generated, the historical associated feature sequence and the future time sequence feature sequence which are missing data are filled and corresponding auxiliary feature sequences are generated, and the detailed description is omitted.

Further, the control device can also acquire a static characteristic sequence of the target product.

Specifically, the control device obtains static feature values corresponding to each static feature of the target product, and forms a static feature sequence of the target product according to the obtained static feature values.

In this way, input data for the feature prediction model may be obtained.

Step 303: and the control equipment inputs the historical characteristic sequence, the historical associated characteristic sequence and the future time sequence characteristic sequence of the target product into the trained characteristic prediction model to obtain the predicted value of the specified characteristic.

Further, the control device can input the historical characteristic sequence, the historical associated characteristic sequence, the future time sequence characteristic sequence, the auxiliary characteristic sequence and the static characteristic sequence of the target product into the characteristic prediction model to obtain the predicted value of the specified characteristic.

In the embodiment of the application, the characteristic prediction model is trained according to the multi-dimension sample data of a plurality of products, not only can consider different dimensions of a single product, but also can comprehensively consider the characteristics of the products, the coverage range of input data is expanded, the characteristic prediction model can learn different characteristics and complex relationships among different products from multiple angles, the prediction accuracy is improved, a large number of sampling points can be generated through the binary group of each product, the problems that the data quantity of the product is small and the characteristic prediction accuracy is influenced by abnormal values appearing on the single product are solved, a good prediction effect can be obtained when the data quality is poor, the processes of preprocessing a large amount of data and cleaning the data are saved, the data quality requirement is reduced, the model can learn autonomously in the training process, the interior of the model does not need to be artificially reformed, and the method can be suitable for various complex scenes, wide application range and strong universality.

The following describes training and prediction of the feature prediction model using a specific application scenario. FIG. 4 is a sample point diagram. The set time period is 1.01-1.04, and the products comprise commodity 1, commodity 2, commodity 3 and commodity 4. Each circle represents a sample point. Taking commodity 1 as an example, the training set of each commodity includes 6 sample points, and the corresponding two-tuples are (1.1,1.2), (1.1,1.3), (1.1,1.4), (1.2,1.3), (1.2,1.4), (1.3,1.4) in sequence. The control device trains the feature prediction model according to the sample points (24 sample points in total) of the commodity 1, the commodity 2, the commodity 3 and the commodity 4 to obtain the trained feature prediction model.

The prediction set for each commodity includes 2 sample points, and the corresponding two-tuples are (1.4,1.5) and (1.4,1.6) in turn. The control device adopts the trained feature prediction model to sequentially predict the sample points (1.4,1.5) and (1.4,1.6) of each commodity, and respectively obtain the predicted value of each commodity at 1.5 and the predicted value at 1.6.

Another specific application scenario is adopted below to train and predict the feature prediction model.

Assuming that data of 100 commodities from 1 month and 1 day in 2019 to 6 months and 30 days in 2019 are known, the sales volume of 10 commodities in the data needs to be predicted in each day from 7 months and 1 day in 2019 to 7 months and 30 days in 2019.

The control device executes the following steps for each commodity, and generates each binary group from 1/month 1/year 2019 to 6/month 30/year 2019.

Wherein, the two-tuple is (20190101, 20190102) … … (20190101, 20190630) in sequence;

(20190102，20190103)……(20190102，20190630)；……；(20190628，20190629)、(20190628，20190630)(20190629，20190630)。

then, the control equipment obtains each sampling point according to the binary group of each commodity, and respectively determines the historical characteristic sample sequence y of each sampling point according to the data of 1 month and 1 day of 2019 year, 6 months and 30 days of each commodity_t1Set H of history correlation characteristic sample sequences_t1Set F of future time series characteristic sample sequences_t2Helper signature sequence p^k _t3A static feature sequence S, and predicted sample values. Thus, the corresponding training sample data of each sampling point can be obtained.

Where, T1 ═ T-L1, T-L1 +1, … …, T-2, T-1, T2 ═ T, T +1, … …, T + L2-1, T + L2, T1, T2, and T3 all represent discrete times, T is a reference time, L1 is a historical feature length, L2 is a future feature length, for each sample point, the reference time, the predicted time, the historical feature length L1, and the future feature length L2 are all determined.

Then, the control equipment corresponds the historical characteristic sample sequence y corresponding to each sampling point_t1Set H of history correlation characteristic sample sequences_t1Set F of future time series characteristic sample sequences_t2Helper signature sequence p^k _t3And inputting the static characteristic sequence S into the characteristic prediction model to obtain the trained characteristic prediction model.

Next, the control device generates 30 duplets from reference time 20190630 from day 1 of 7/2019 to day 30 of 7/2019, and generates 10 × 30 to 300 sampling points from 10 commodities to be predicted and 30 duplets.

Wherein, the 30 binary groups are sequentially: (20190630, 20190701), (20190630, 20190702) … … (20190630, 20190730).

The reference time may be any one of the days 1/2019 to 30/2019, and the later the date is, the better the prediction effect is.

Further, the control device obtains the historical feature sequence, the historical associated feature sequence, the future time sequence feature sequence, the auxiliary feature sequence and the static feature sequence corresponding to the 300 sample points.

And finally, the control equipment respectively inputs the historical characteristic sequence, the historical associated characteristic sequence, the future time sequence characteristic sequence, the auxiliary characteristic sequence and the static characteristic sequence of each sample point into a characteristic prediction model to obtain corresponding sales prediction values.

In this way, predicted sales values for the 10 commodities on each of 7/month 1 in 2019 to 7/month 30 in 2019 can be obtained.

Exemplary device

Based on the same inventive concept, the embodiment of the present application further provides a feature prediction apparatus, and since the principle of the apparatus and the device for solving the problem is similar to that of a feature prediction method, the implementation of the apparatus can refer to the implementation of the method, and repeated details are not repeated.

Fig. 5 is a schematic structural diagram of an apparatus for feature prediction according to an embodiment of the present application.

An apparatus for feature prediction comprising:

a first obtaining unit 501, configured to obtain a reference time and a predicted time of a target product, where the reference time is used to divide historical data and future data;

a second obtaining unit 502, configured to obtain a history feature sequence and a history association feature sequence of the target product according to the reference time;

a third obtaining unit 503, configured to obtain a future time series feature sequence of the target product from the reference time to the predicted time;

the prediction unit 504 is configured to input the historical feature sequence, the historical associated feature sequence, and the future time sequence feature sequence into a trained feature prediction model to obtain a predicted value of the specified feature, where the feature prediction model is obtained by training sample data of a plurality of products.

Preferably, the second obtaining unit 502 is configured to:

Preferably, the third obtaining unit 503 is configured to:

Preferably, the prediction unit 504 is further configured to:

prediction unit 504 is to:

Preferably, the first obtaining unit 501 is further configured to:

According to the feature prediction method, the feature prediction device, the electronic equipment and the storage medium, the feature prediction model is trained according to the sample data of a plurality of products, the trained feature prediction model is obtained, the reference time and the prediction time of the target product are obtained, and the historical feature sequence and the historical associated feature sequence of the target product are obtained according to the reference time; acquiring a future time sequence characteristic sequence of the target product from the reference time to the predicted time; and inputting the historical characteristic sequence, the historical association characteristic sequence and the future time sequence characteristic sequence into the trained characteristic prediction model to obtain a predicted value of the specified characteristic. Therefore, the feature prediction model is obtained according to sample data training of a plurality of products, and feature prediction is carried out through the feature data of a plurality of dimensions of the target product and the feature prediction model, so that the data volume and the coverage of input data are enlarged, the data quality requirement is lowered, and the accuracy of the feature prediction is improved.

Based on the same inventive concept as the feature prediction method, the embodiment of the present application further provides an electronic device, which may specifically be a desktop computer, a portable computer, a server, and the like. As shown in fig. 6, the electronic device 60 may include a processor 601 and a memory 602.

The Processor 601 may be a general-purpose Processor, such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component, and may implement or execute the methods, steps, and logic blocks disclosed in the embodiments of the present Application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in a processor.

The memory 602, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charged Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and the like. The memory is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 602 in the embodiments of the present application may also be circuitry or any other device capable of performing a storage function for storing program instructions and/or data.

Exemplary program product

Embodiments of the present application provide a computer-readable storage medium for storing computer program instructions for the electronic device, which includes a program for executing the feature prediction method.

The computer storage media may be any available media or data storage device that can be accessed by a computer, including but not limited to magnetic memory (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical memory (e.g., CDs, DVDs, BDs, HVDs, etc.), and semiconductor memory (e.g., ROMs, EPROMs, EEPROMs, non-volatile memory (NAND F L ASH), Solid State Disks (SSDs)), etc.

In some possible embodiments, the various aspects of the present application may also be implemented as a computer program product comprising program code for causing a server device to perform the steps of the feature prediction method according to various exemplary embodiments of the present application described in the "exemplary methods" section above of this specification, when the computer program product is run on the server device.

The computer program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer program product for instant messaging applications according to embodiments of the present application may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a server device. However, the program product of the present application is not limited thereto, and in this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including AN object oriented programming language such as Java, C + +, or the like, as well as conventional procedural programming languages, such as the "C" language or similar programming languages.

It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such division is merely exemplary and not mandatory. Indeed, the features and functions of two or more units described above may be embodied in one unit, according to embodiments of the application. Conversely, the features and functions of one unit described above may be further divided into embodiments by a plurality of units.

Further, while the operations of the methods of the present application are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

While the spirit and principles of the application have been described with reference to several particular embodiments, it is to be understood that the application is not limited to the disclosed embodiments, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit from the description. The application is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. A method of feature prediction, comprising:

acquiring a historical characteristic sequence and a historical associated characteristic sequence of the target product according to the reference time;

2. The method of claim 1, wherein obtaining the historical signature sequence and the historical associated signature sequence of the target product based on the reference time comprises:

acquiring a characteristic value of the designated characteristic of the target product before the reference time;

3. The method of claim 1, wherein obtaining a future temporal signature sequence of the target product from the base time to the predicted time comprises:

acquiring a characteristic value of a future time sequence characteristic of the target product between reference time and predicted time, wherein each characteristic value of the future time sequence characteristic is preset;

4. The method of claim 1, wherein before inputting the historical signature sequence, the historical associated signature sequence, and the future time series signature sequence into a trained feature prediction model to obtain a predicted value of a given feature, further comprising:

filling a historical associated characteristic sequence and a future time sequence characteristic sequence with missing data by adopting a specified filling value so as to update the historical associated characteristic sequence and the future time sequence characteristic sequence;

5. The method of claim 4, wherein before inputting the historical feature sequence, the historical associated feature sequence, and the future time series feature sequence into a trained feature prediction model to obtain a predicted value of a specified feature, further comprising:

obtaining static characteristic values corresponding to all static characteristics of the target product, wherein the static characteristics are characteristics related to the product and unrelated to time;

inputting the historical feature sequence, the historical associated feature sequence and the future time sequence feature sequence into a trained feature prediction model to obtain a predicted value of the specified feature, wherein the method comprises the following steps:

and inputting the historical characteristic sequence, the historical associated characteristic sequence, the future time sequence characteristic sequence, the auxiliary characteristic sequence and the static characteristic sequence into the characteristic prediction model to obtain a predicted value of the specified characteristic.

6. The method of any one of claims 1-5, further comprising, prior to obtaining the base time and the forecast time for the target product:

7. The method of claim 6, wherein training the feature prediction model according to the historical feature sample sequence, the historical associated feature sample sequence, the future time sequence feature sample sequence, the auxiliary feature sequence, the predicted sample value, and the static feature sequence of each product to obtain the trained feature prediction model comprises:

8. A feature prediction apparatus, comprising:

a third obtaining unit, configured to obtain a future time series feature sequence of the target product from the reference time to the predicted time;

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 7 are implemented when the computer program is executed by the processor.

10. A computer-readable storage medium having computer program instructions stored thereon, which, when executed by a processor, implement the steps of the method of any one of claims 1 to 7.