CN111177657B

CN111177657B - Demand determining method, system, electronic device and storage medium

Info

Publication number: CN111177657B
Application number: CN201911425328.7A
Authority: CN
Inventors: 杨秋源; 周超; 许平; 牛世雄; 徐明泉
Original assignee: Beijing SF Intra City Technology Co Ltd
Current assignee: Beijing SF Intra City Technology Co Ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2023-09-08
Anticipated expiration: 2039-12-31
Also published as: CN111177657A

Abstract

The application provides a demand determining method, a demand determining system, electronic equipment and a storage medium, and relates to the technical field of supply chains, wherein the demand determining method comprises the following steps: triggering and analyzing tasks based on task information configured by a user; acquiring related data matched with task information; when the analysis task is executed, analysis is performed based on the related data to obtain a demand analysis result. The file can screen out related data of the task information based on different task information so as to obtain analysis results under different tasks, can process a plurality of different kinds of requirement determination tasks, can determine requirements for different objects, and can improve the universality of a requirement determination method.

Description

Demand determining method, system, electronic device and storage medium

Technical Field

The application relates to the technical field of supply chains, in particular to a demand method, a demand system, electronic equipment and a storage medium.

Background

In the technical field of supply chains, accurate demand determination (including demand analysis and analysis prediction) is required to be applied to links of store demand prediction, warehouse demand prediction and the like for subsequent replenishment planning, production planning and the like. The current demand determination is mainly performed on a single commodity, for example, a commodity in a certain store cannot adapt to various kinds of commodities and various demand determination tasks, and the problem of low universality of demand determination exists.

Disclosure of Invention

The embodiment of the application provides a demand determining method, a demand determining system, electronic equipment and a storage medium, which are used for solving the problem of low universality of the current demand determining method.

The embodiment of the application provides a demand determining method, which comprises the following steps: triggering and analyzing tasks based on task information configured by a user; acquiring related data matched with the task information; and when the analysis task is executed, analyzing based on the related data to obtain a demand analysis result.

In the implementation process, different types of analysis tasks can be triggered based on task information configured by a user, so that various different types of requirement determination tasks are processed, and requirement determination is performed on different objects, and the universality of a requirement determination method can be improved.

Optionally, when the analysis task is executed, the analyzing based on the related data to obtain a requirement analysis result includes: preprocessing the related data to obtain preprocessed data; performing feature analysis on the preprocessed data to obtain feature analysis data; and carrying out data analysis on the characteristic analysis data to obtain the data analysis result.

In the implementation process, the integrity and the accuracy of the preprocessed data can be improved by preprocessing the related data, so that the integrity and the accuracy of the feature analysis data can be improved, and the accuracy of the data analysis result can be further improved.

Optionally, the preprocessing the related data to obtain preprocessed data includes: performing value deficiency processing on the related data to obtain complete data; and discretizing the complete data to obtain the preprocessing data.

In the implementation process, the complete data can supplement the missing part in the related data, and the integrity of the related data can be improved. The discretization processing can compress the data volume of the complete data to obtain the preprocessing data, so that the efficiency of operation based on the discretization data is improved.

Optionally, the performing feature analysis on the preprocessed data to obtain feature analysis data includes: performing feature selection on the preprocessed data to obtain feature selection data; extracting the characteristics of the characteristic selection data to obtain characteristic extraction data; and carrying out feature combination on the feature extraction data to obtain the feature analysis data.

In the implementation process, after feature selection, feature extraction and feature combination are performed on the preprocessed data, the purposes of dimension reduction, optimization and feature classification performance improvement of the data can be achieved, and therefore accuracy of data analysis is improved.

Optionally, the performing data analysis on the feature analysis data to obtain the data analysis result includes: and carrying out one or more of trend mode analysis, data distribution analysis, feature importance analysis, association relation analysis and life cycle analysis on the feature analysis data to obtain the data analysis result.

In the implementation process, various feature analysis types such as trend pattern analysis, data distribution analysis, feature importance analysis, association relation analysis, life cycle analysis and the like are provided, and different types of analysis can be performed based on different requirements, so that the universality of data analysis is improved, and the accuracy of the data analysis result is improved.

Optionally, after the analysis is performed in the analysis task based on the related data to obtain a requirement analysis result, the method further includes: selecting a prediction model set based on the demand analysis result; and obtaining a demand prediction result through the prediction model.

In the implementation process, the demand analysis improves the universality of the demand analysis, so that the demand prediction based on the demand analysis result can improve the universality of the demand prediction.

Optionally, the selecting a prediction model set based on the requirement analysis result includes: determining a candidate model set based on the demand analysis result; performing feature extraction, model training and model evaluation on each candidate model in the candidate model set to obtain an evaluation result of each candidate model; performing model fusion on candidate models of which the evaluation results meet a first preset condition in the candidate model set to obtain a first fusion model; taking the fusion model and each candidate model meeting a first preset condition as a prediction model in a candidate prediction model set; performing feature extraction, model retraining and model reevaluation on each prediction model in the candidate prediction model set to obtain reevaluation results of each prediction model; and selecting a prediction model with the reevaluation result meeting a second preset condition from the prediction models to perform model fusion, so as to obtain the prediction model set.

In the implementation process, the candidate set matched with the demand analysis result is selected based on the demand analysis result, and after model training, model evaluation and model fusion, twice screening is performed, and the prediction accuracy of the prediction model can be improved by the prediction model set finally obtained by sequentially conforming to the first preset condition and the second preset condition.

The embodiment of the application also provides a demand determining system, which comprises a data and parallel control subsystem and an algorithm policy engine subsystem; the data and parallel control subsystem is used for triggering an analysis task based on task information configured by a user and acquiring related data matched with the task information; the algorithm policy engine subsystem is used for analyzing based on the related data to obtain a demand analysis result when the analysis task is executed.

Optionally, the algorithm policy engine subsystem is specifically configured to preprocess the related data to obtain preprocessed data; performing feature analysis on the preprocessed data to obtain feature analysis data; and carrying out data analysis on the characteristic analysis data to obtain the data analysis result.

Optionally, the algorithm policy engine subsystem is specifically further configured to perform value-missing processing on the related data to obtain complete data; and discretizing the complete data to obtain the preprocessing data.

In the implementation process, the complete data can supplement the missing part in the related data, and the integrity of the related data can be improved. The discretization processing can compress the data volume of the complete data to obtain the preprocessing data, so that the efficiency of operation based on the preprocessing data is improved.

Optionally, the algorithm policy engine subsystem is specifically further configured to perform feature selection on the preprocessed data to obtain feature selection data; extracting the characteristics of the characteristic selection data to obtain characteristic extraction data; and carrying out feature combination on the feature extraction data to obtain the feature analysis data.

In the implementation process, after the feature selection, feature extraction and feature combination are performed on the preprocessed data, the purposes of dimension reduction, optimization and feature classification performance improvement of the data can be achieved, so that the accuracy of data analysis is improved.

Optionally, the algorithm policy engine subsystem is specifically further configured to perform one or more of trend pattern analysis, data distribution analysis, feature importance analysis, association relation analysis, and life cycle analysis on the feature analysis data, to obtain the data analysis result.

In the implementation process, various characteristic analysis types such as trend pattern analysis, data distribution analysis, characteristic importance analysis, association relation analysis, life cycle analysis and the like are provided, and different types of analysis can be performed based on different requirements, so that the universality of data analysis is improved, and the accuracy of the data analysis result is improved.

Optionally, the algorithm policy engine subsystem is further configured to select a prediction model set based on the requirement analysis result; and obtaining a demand prediction result through the prediction model.

Optionally, the algorithm policy engine subsystem is further configured to determine a candidate model set based on the demand analysis result; performing feature extraction, model training and model evaluation on each candidate model in the candidate model set to obtain an evaluation result of each candidate model; performing model fusion on candidate models of which the evaluation results meet a first preset condition in the candidate model set to obtain a first fusion model; taking the fusion model and each candidate model meeting a first preset condition as a prediction model in a candidate prediction model set; performing feature extraction, model retraining and model reevaluation on each prediction model in the candidate prediction model set to obtain reevaluation results of each prediction model; and selecting a prediction model of which the reevaluation result meets a second preset condition from the prediction models to obtain the candidate prediction model set.

The embodiment of the application also provides electronic equipment, which comprises a memory and a processor, wherein the memory stores program instructions, and the processor executes the steps in the method when running the program instructions.

Embodiments of the present application also provide a storage medium having stored therein computer program instructions which, when executed by a processor, perform the steps of the above-described method.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

Fig. 1 is a flowchart of a demand determining method according to an embodiment of the present application.

Fig. 2 is a flowchart of an analysis task for obtaining a demand analysis result based on related data according to an embodiment of the present application.

Fig. 3 is a flowchart of preprocessing related data to obtain preprocessed data according to an embodiment of the present application.

Fig. 4 is a flowchart of performing feature analysis on pre-processed data to obtain feature analysis data according to an embodiment of the present application.

Fig. 5 is a flowchart of a method for obtaining a demand analysis result based on related data in an analysis task according to an embodiment of the present application.

FIG. 6 is a flowchart of selecting a prediction model set based on a result of demand analysis according to an embodiment of the present application.

Fig. 7 is a block diagram of a demand determining system according to an embodiment of the present application.

Icon: 60-a demand determination system; 601-data and parallel control subsystem; 602-algorithm policy engine subsystem.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.

In the description of the present application, it should be noted that the terms "first," "second," and the like are used merely to distinguish between descriptions and should not be construed as indicating or implying relative importance.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the embodiments of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and drawings.

An embodiment of the present application provides a method for determining a demand, please refer to fig. 1, and fig. 1 is a flowchart of a method for determining a demand provided by an embodiment of the present application. The method comprises the following steps:

step S1: the analysis task is triggered based on the user configured task information.

The task information is user demand information, such as analyzing sales of brand a in the first quarter. As one embodiment, the user may issue the task information by text input, voice input, or the like. It will be appreciated that different task information may trigger different analysis tasks. In actual production and life, demand analysis can be performed on goods of different kinds, different brands, different time sections, different areas, different suppliers and the like.

Step S2: and acquiring related data matched with the task information.

In actual production and life, when demand analysis is performed on an article, related data includes sales information, sales promotion information, weather information, public opinion data, and the like of the article. As one way, during demand analysis of brand A, sales information, promotional information, weather information, and public opinion data, i.e., user rating information of brand A, related to brand A is obtained.

The searching for the matched related data according to the task information may be performed by performing field matching of specific fields on the task information, and pre-configuring related data corresponding to each specific field, so that the related data can be matched based on the task information.

Step S3: when the analysis task is performed, analysis is performed based on the related data to obtain a demand analysis result.

It can be understood that related data matched with the task information is obtained, the requirement analysis is performed based on the related data, and different requirement analysis can be completed based on different task information to obtain corresponding requirement analysis results. The method can be used for processing a plurality of different kinds of requirement determining tasks and determining requirements for different objects, so that the universality of the requirement determining method can be improved.

Referring to fig. 2, fig. 2 is a flowchart of an analysis task for obtaining a demand analysis result based on related data according to an embodiment of the present application. Optionally, step S3 is specifically divided into the following sub-steps:

step S31: and preprocessing the related data to obtain preprocessed data.

Referring to fig. 3, fig. 3 is a flowchart of preprocessing related data to obtain preprocessed data according to an embodiment of the present application. Alternatively, step S31 may be divided into the following sub-steps:

step S31.1: and carrying out value deficiency processing on the related data to obtain complete data.

Step S31.2: and discretizing the complete data to obtain preprocessed data.

In step S31.1, the cause of the missing related data is various and is mainly classified into mechanical cause and artificial cause. The mechanical cause is the loss of relevant data due to failure of data collection or preservation caused by mechanical cause, such as failure of data storage, memory corruption, mechanical failure, failure of data collection (for timing data collection) for a certain period of time. The artificial reasons are the lack of relevant data due to subjective errors, historical limitations or intentional concealing of people, e.g. interviewees refused to reveal answers to relevant questions when collecting relevant data in market research, or the answer questions were invalid, data entry personnel missed data.

The method for carrying out the value-missing processing on the related data for the reasons comprises the following steps: and deleting the sample with the missing value and performing interpolation on the missing value. Regarding the deletion of related data caused by human factors, people will influence the authenticity of the data, and the true values of other attributes of samples with the missing values cannot be guaranteed, so that interpolation depending on the attribute values is unreliable, so that a method for processing the deletion of the related data caused by human factors is generally not recommended, and a method for deleting the samples with the missing values is generally adopted. The missing value interpolation is mainly aimed at the missing of related data caused by mechanical reasons, and the reliability of the missing value interpolation is ensured.

The simple deletion method and the weighting method are used for deleting the samples with the missing values, and the simple deletion method is most effective if the problem of data deletion can achieve the aim by simply deleting a small part of the samples. When the type of the missing value is a non-complete random missing (namely, the missing of the data is related to the value of the incomplete variable, the missing value can cause the related data to lose a great amount of useful information, the uncertainty represented by the related data is more remarkable, and the deterministic component contained in the related data is more difficult to grasp), the deviation can be reduced by weighting the complete data. After marking the incomplete data samples, the complete data samples are given different weights. This approach can effectively reduce bias if there are variables in the interpretation variables that are determinative of the weight estimates.

The missing value interpolation method comprises the following steps: and (3) average value interpolation, wherein if the missing value is a numerical attribute, interpolation is performed by using the average value of the attribute in the values of all other objects. If the null value is a non-numerical attribute, the missing attribute value is complemented with the value of the attribute that appears most frequently in all other objects according to the statistically mode principle. The missing value interpolation method further comprises the following steps: the least square interpolation is used for complementing the related data based on the principle of the least square method, so that the sum of squares of the dispersion of the whole complemented related data is minimum, and the accuracy of the related data can be improved.

In order to improve the space-time efficiency of operation and the classification clustering capacity and noise resistance of the complete data to be processed, discretization processing is needed to be carried out on the complete data subjected to missing value processing, a single-heat coding method can be adopted in the discretization processing method, the value of discrete features of the complete data is expanded to an European space, and a certain value of the discrete features corresponds to a certain point of the European space. The discrete features are subjected to single-heat coding, so that the distance calculation between the features is more reasonable. After the discrete features are unithermally encoded, the encoded features, in fact, the features in each dimension can be considered as continuous features. The single-hot coding solves the problem that the classifier does not benefit attribute data, and plays a role in expanding features to a certain extent. Its value is only 0 and 1, and different types are stored in vertical space, such as [ red, blue ], then one-hot encoding is used.

With continued reference to fig. 1 and fig. 4, fig. 4 is a flowchart of performing feature analysis on the preprocessed data to obtain feature analysis data according to an embodiment of the present application.

Step S32: and carrying out feature analysis on the preprocessed data to obtain feature analysis data.

Accurately estimating the function of certain variables at a given accuracy, the required sample size increases exponentially as the sample dimension increases, causing dimension disasters. The dimension reduction is to overcome the dimension disaster, acquire essential characteristics, reduce the complexity of data processing, save storage space, remove useless noise and realize data visualization. In order to reduce the dimension, feature selection and feature extraction are required.

Alternatively, step S32 may be divided into the following sub-steps:

step S32.1: and performing feature selection on the preprocessed data to obtain feature selection data.

It will be appreciated that feature selection does not change the meaning of the pre-processed data, and only features are screened, leaving features that have a greater impact on the target. The feature selection method is to select a subset from the preprocessed data, which is an inclusive relationship, without modifying the original feature space. Common feature selection methods are: the Filter method has the main ideas that: the features of each dimension are "scored", i.e., the features of each dimension are given weights, which represent the importance of the features of that dimension, and then ranked according to the weights; the wrapier method has the main idea that: the feature selection of the preprocessed data is regarded as a search optimization problem, different combinations are generated, and the combinations are evaluated and compared with other combinations. The choice of features is thus considered as an optimization problem, and in addition to the two algorithms mentioned above, other algorithms may be chosen according to specific requirements, such as GA (Genetic Algorithm ), PSO (Particle Swarm Optimization, particle swarm optimization algorithm), DE (Differential Evolution, differentiation) algorithm, etc.

Step S32.2: and carrying out feature extraction on the feature selection data to obtain feature extraction data.

Alternatively, the feature extraction method in the present embodiment includes PCA (Principal Component Analysis, principal component analysis method), LDA (Latent Dirichlet Allocation, linear discriminant analysis method), and the like.

PCA is a statistical method that converts a set of variables that may have a correlation into a set of linearly uncorrelated variables by an orthostatic transformation, the converted set of variables being called the principal component. In many cases, there is a certain correlation between variables, and when there is a certain correlation between two variables, it can be interpreted that the two variables reflect that there is a certain overlap of information of the analysis task. The principal component analysis is to eliminate redundant repeated variables (closely related variables) for all the originally proposed variables, and establish as few new variables as possible, so that the new variables are uncorrelated pairwise, and the new variables keep original information as much as possible in the aspect of reflecting the information of analysis tasks.

LDA is a supervised learning dimension reduction technology, PCA is an unsupervised dimension reduction technology, LDA considers factors of categories on the basis of dimension reduction, the obtained projection has minimum intra-category variance, and the category-category variance is the largest.

Step S32.3: and carrying out feature combination on the feature extraction data to obtain feature analysis data.

By way of example, a composite feature may be formed by combining individual features (which may be multiplications or Cartesian products in this embodiment). The feature combinations help represent the non-linear relationship. Many different kinds of feature combinations can be created, for example, feature a and feature B are combined in the following way:

[ A x B ] is a feature combination formed by multiplying the values of two features.

[ A x B x C x D x E ]: the values of five features are multiplied to form a feature combination.

[ A x A ] the value of a single feature is squared to form a feature combination.

With continued reference to fig. 1, step S33: and carrying out data analysis on the feature analysis data to obtain a data analysis result.

Optionally, step S33 includes: and carrying out one or more of trend mode analysis, data distribution analysis, feature importance analysis, association relation analysis and life cycle analysis on the feature analysis data to obtain the data analysis result.

The data analysis is suitable for long-term tracking of the core index of the product, such as click rate, active user number and the like. Trend analysis is the analysis of changes that require explicit data, as well as the analysis of the cause of the changes. Trend analysis, better yield is ratio. Several concepts need to be clarified at the time of trend analysis: ring ratio, same ratio, fixed base ratio. The ring ratio is that the current statistics are compared with the previous period, for example, the 2 months in 2017 are compared with the 1 month in 2017, and the ring ratio can know the recent change trend, but the change trend is slightly different in seasons. To eliminate seasonal differences, there is then a comparative concept, such as 2 months in 2017 and 2 months in 2016. The base ratio is compared with a certain base point, for example, month 1 in 2016 is used as the base point, and the base ratio is compared with month 2 in 2017 and month 1 in 2016. Such as: the number of active users in certain APP month in 2017 is 2000 ten thousand, the ring ratio is increased by 2% compared with 1 month, and the ring ratio is increased by 20% compared with 2 months in the last year. Another core objective of trend analysis is to explain the trend, what happens to give a reasonable explanation of the obvious inflection points in the trend line, whether for external or internal reasons.

The data distribution analysis is an analysis method for researching the distribution rule of each group by grouping data (quantitative data) equidistantly or non-equidistantly (quantitative grouping) according to the analysis purpose. Is commonly used for consumer distribution, revenue distribution, age distribution, etc.

In one embodiment, the feature importance may be measured in two classification cases using a support vector machine decision function weight vector when performing feature importance analysis. And designing a related feature selection algorithm by combining a method for supporting the importance of the feature of the weight vector of the decision function of the vector machine with a forward sequence search strategy and a backward sequence search strategy. Algorithms may be employed including: the SVM_W_SFS1 algorithm and the SVM_W_SFS2 algorithm, wherein the SVM_W_SFS1 algorithm uses all feature training support vector machines to obtain weight vectors of a support vector machine decision function, the features are arranged in a descending order according to the absolute value of the weight vectors of the support vector machine decision function, the features with large contribution are arranged at the forefront, then the features with large contribution are gradually added according to the sequence of the features in a forward sequence searching mode, and the classification accuracy of the training set is recorded until the last feature is added.

The SVM_W_SFS2 algorithm is specifically that all feature training support vector machines are used for obtaining weight vectors of a support vector machine decision function, features corresponding to the maximum value in the absolute values of the weight vectors of the support vector machine decision function are selected, the selected feature subset (the feature subset is empty in the initial process) is added, the features are deleted from the rest feature set (the feature full set in the initial process) at the same time, training is carried out on a training set by using the feature subset only comprising the selected features, and the classification accuracy of the training set is recorded; training on the training set by using the features only comprising the residual feature set, selecting the feature set with the largest weight vector value of the decision function of the support vector machine to the selected feature set, deleting the feature set from the residual feature set, training on the training set by using the sample only comprising the selected feature to obtain the classification accuracy of the training set, and repeating the steps until the residual feature set is empty.

Association analysis is to find the association or correlation existing in the related data, so as to describe the rule and mode of simultaneous occurrence of some attributes in one thing. For example, in a shopping basket analysis task, a customer's buying habits are analyzed by finding the contact between different items placed in his shopping basket. By knowing which goods are frequently purchased simultaneously by customers, this associative discovery can help retailers formulate marketing strategies, tariff designs, sales promotions, discharge of goods, and customer divisions based on purchasing patterns. The commonly used association analysis algorithm comprises an Apriori (association analysis) algorithm, and the Apriori algorithm uses Apriori properties to produce candidate item sets, so that the size of frequent sets can be compressed, and the rate of association analysis can be improved.

Lifecycle analysis is a means of evaluating the overall environmental impact of a product or class of facilities from a "bassinet" to a tomb, "observing problems from the regional, national, or even global breadth and the high degree of sustainable development thereof. For example, the period from the first contact of the user to the product or service, the downloading of the registration to the user to the last unloading of the loss is a life cycle, and the motivation of each behavior of the user can be mined based on the result of the life cycle analysis to prompt the user to prefer the product and slow down the loss of the user.

Referring to fig. 5, fig. 5 is a flowchart of a method for obtaining a requirement analysis result based on analysis of related data in an analysis task according to an embodiment of the present application, optionally, after step S3, further includes:

step S4: and selecting a prediction model set based on the demand analysis result.

It will be appreciated that different sets of prediction models, such as the SARIMA (Seasonal Autoregressive Integrated Moving Average, seasonal differential autoregressive moving average) model, which is one of the methods of time series prediction analysis, may be selected according to different actual needs; the holter-Winters (Holt-Winters) method, which is a time series analysis and prediction method, can also be selected in addition to the SARIMA model. Specifically, the Holt-windows method is applicable to non-stationary sequences containing linear trends and periodic fluctuations, and model parameters are continuously adapted to the changes of the non-stationary sequences by using an exponential smoothing method, and short-term forecast is performed on future trends. The Holt-windows method introduces season terms based on the Holt model, and can be used for processing fixed-period fluctuation behaviors in the time sequence of month data, quarter data, week data and the like.

Referring to fig. 6, fig. 6 is a flowchart of selecting a prediction model set based on a result of demand analysis according to an embodiment of the present application. Optionally, step S4 is divided into the following sub-steps:

step S41: a set of candidate models is determined based on the demand analysis results.

Step S42: and carrying out feature extraction, model training and model evaluation on each candidate model in the candidate model set to obtain an evaluation result of each candidate model.

Step S43: and carrying out model fusion on the candidate models, of which the evaluation results meet the first preset conditions, in the candidate model set, so as to obtain a first fusion model.

It can be appreciated that, in order to increase the generalization ability of the model, each model in the candidate model set is fused and evaluated, the model evaluation algorithm may be selected from MAPE (Mean Absolute Percentage Error, absolute percentage error), RMSE (Root Mean Square Error ), MAE (Mean Absolute Error, mean absolute error) and other algorithms, and the model fusion algorithm may be selected from Boosting (lifting) algorithm, bagging (Bootstrap aggregating, guided aggregation) algorithm, stacking (Stacking) algorithm, blending (mixing) algorithm and other algorithms. For example, the MAPE may be an average MAPE, and the calculation formula of the average MAPE is as follows:

Where n represents the number of predictors in the model,representing the ith predicted value, y, of n predicted values _i Representing the true value of the i-th predicted value.

The MAE is calculated as follows:

the RMSE calculation formula is as follows:

for example, in the case of price prediction, every square is ten thousand yuan, and we predict that the result is ten thousand yuan. Then the square unit of the difference should be in the order of tens of millions. The formulation may be done for ease of description. The standard deviation in the above three algorithms is used for measuring the discrete degree of a group of numbers, the root mean square error is used for measuring the deviation between the observed value and the true value, and the study objects and the study purposes of the three algorithms are different and can be selected according to actual conditions.

The emphasis of Bagging in model fusion algorithms is to obtain an integrated model with smaller variance than its components, while Boosting and Stacking will mainly generate strong models with lower bias than its components (even though variance can be reduced), in particular:

boosting is a method used to improve the accuracy of weak classification algorithms by constructing a series of prediction functions and then combining them into a prediction function in some way. Boosting algorithm is a method to improve the accuracy of any given learning algorithm. The working mechanism of Boosting algorithm is that firstly, a first weak learner is trained by initial weight from training set, and the weight of training sample is updated according to the learning error rate expression of weak learning, so that the weight of training sample points with high learning error rate of the first weak learner becomes high, and the points with high error rate are more valued in the second weak learner. And training the weak learners II based on the training set after the weight adjustment, and repeating the steps until the number of the weak learners reaches the preset number T, and finally integrating the T weak learners through an aggregation strategy to obtain the final strong learner.

The Bagging algorithm is a typical representative of a parallel integrated learning method and is directly based on a self-help sampling method. Given a data set containing m samples, we first randomly take one sample into the sample and then put that sample back into the original data set so that it is still likely to be selected at the next sample. Thus, through m times of random sampling operation, we obtain a sampling set containing m samples, and samples in the initial training set appear in the sampling set for multiple times, and some samples never appear. About 63.2% of the samples in the initial training set are present in the sample set. From the perspective of deviation-variance, bagging mainly focuses on reducing variance, so that the effect is more obvious on learners which are not pruned decision trees, neural networks and the like and are easily disturbed by samples.

The Stacking algorithm, which usually considers heterogeneous weak learners, learns them in parallel and combines them by training a "meta model" to output a final prediction result based on the predictions of the different weak models.

The Blending algorithm is a simpler fusion mode, the classifier is selected in various ways, and different effects can be obtained by different classifier combinations, so that the classifier is selected in the practical application process to occupy most of time. Blending is substantially the same as Stacking except that instead of obtaining predicted values by the CV strategy of K-Fold, the training set creates features of a second stage model, such as 10% training data, and the second stage's stator model fits the predicted values of the 10% training data based on the first stage model.

Therefore, the embodiment can select the fusion algorithm according to the characteristics of the algorithm such as Boosting, bagging, stacking, blending and the like and combining the specific requirements of the requirement prediction.

Step S44: and taking the fusion model and each candidate model meeting the first preset condition as a prediction model in the candidate prediction model set.

The candidate set of predictive models may be tested with a test sequence, and as an embodiment, the first preset condition may be a correctness exceeding 90% and the candidate set of predictive models may be screened for. The test sequence is a known sequence, different result sequences are output after the test sequence is input into each candidate prediction model in the candidate prediction model set, and the input sequence and each different result sequence are compared, so that the accuracy of each candidate prediction model can be obtained. The test sequence can be set according to the actual situation.

Step S45: and extracting features of each prediction model in the candidate prediction model set, retraining the model, and reevaluating the model to obtain reevaluation results of each prediction model.

Step S46: and selecting a prediction model with the reevaluation result meeting a second preset condition from the prediction models to obtain a candidate prediction model set.

The model evaluation fusion method in step S45 and step S46 is similar to that in step S42 and step S43, and will not be described in detail here. The second preset condition may be similar to the first preset condition, and may limit the accuracy, or may be selected according to the model calculation time.

It will be appreciated that in the model evaluation fusion in step S42, step S43, step S45, and step S46, when there is only one model to be fused, a unique model is taken as a result of model fusion.

With continued reference to fig. 5, step S5: and obtaining a demand prediction result through a prediction model.

After step S5, as an embodiment, confidence verification and distribution fitting may be performed on the obtained demand prediction result, so as to prevent the deviation between the analyzed result and the predicted result from being too large. The confidence verification comprises confidence verification by standard deviation, quantiles and the like, and distribution fitting can be performed by adopting normal distribution, poisson distribution and gamma distribution.

As an embodiment, before step S1, further includes step S5: and receiving the task information configured by the user.

Referring to fig. 7, fig. 7 is a block diagram of a demand determining system according to an embodiment of the present application.

In order to better implement the demand determining method provided in the present embodiment, the present embodiment further provides a demand determining system 60. The demand determination system 60 includes:

the data and parallel control subsystem 601 is used for triggering and analyzing tasks based on task information configured by a user and acquiring related data matched with the task information;

the algorithm policy engine subsystem 602 is configured to perform analysis in an analysis task based on the relevant data to obtain a requirement analysis result.

Optionally, the algorithm policy engine subsystem 602 is configured to preprocess the related data to obtain preprocessed data; performing feature analysis on the preprocessed data to obtain feature analysis data; and carrying out data analysis on the feature analysis data to obtain a data analysis result.

Optionally, the algorithm policy engine subsystem 602 is specifically further configured to perform a value-missing process on the related data to obtain complete data; and discretizing the complete data to obtain preprocessed data.

Optionally, the algorithm policy engine subsystem 602 is specifically further configured to perform feature selection on the preprocessed data to obtain feature selection data; extracting the characteristics of the characteristic selection data to obtain characteristic extraction data; and carrying out feature combination on the feature extraction data to obtain feature analysis data.

Optionally, the algorithm policy engine subsystem 602 is specifically further configured to perform one or more of trend pattern analysis, data distribution analysis, feature importance analysis, association analysis, and life cycle analysis on the feature analysis data to obtain a data analysis result.

Optionally, the algorithm policy engine subsystem 602 is further configured to select a prediction model set based on the requirement analysis result, and obtain a requirement prediction result through the prediction model.

Optionally, the algorithm policy engine subsystem 602 is further configured to determine a candidate model set based on the demand analysis result; extracting features, training the models and evaluating the models of each candidate model in the candidate model set to obtain evaluation results of each candidate model; carrying out model fusion on candidate models, of which the evaluation results meet a first preset condition, in the candidate model set to obtain a first fusion model; taking the fusion model and each candidate model meeting the first preset condition as a prediction model in a candidate prediction model set; extracting features of each prediction model in the candidate prediction model set, retraining the model, and reevaluating the model to obtain reevaluation results of each prediction model; and selecting a prediction model with the reevaluation result meeting a second preset condition from the prediction models to obtain a candidate prediction model set.

The embodiment also provides an electronic device, which includes a memory and a processor, where the memory stores program instructions, and the processor executes steps in any of the methods when executing the program instructions.

The present embodiment also provides a storage medium having stored therein computer program instructions which, when executed by a processor, perform the steps of any of the methods described above.

Alternatively, the electronic device may be a personal computer (personal computer, PC), tablet, smart phone, personal digital assistant (personal digital assistant, PDA), or the like.

In summary, the embodiment of the application provides a demand determining method, which includes: triggering and analyzing tasks based on task information configured by a user; acquiring related data matched with the task information; and analyzing in the analysis task based on the related data to obtain a demand analysis result.

In the implementation process, various different kinds of requirement determination tasks can be processed, and requirement determination can be performed on different objects, so that the universality of the requirement determination method can be improved.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. The apparatus embodiments described above are merely illustrative, for example, block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of devices according to various embodiments of the present application. In this regard, each block in the block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams, and combinations of blocks in the block diagrams, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. The present embodiment therefore also provides a readable storage medium having stored therein computer program instructions which, when read and executed by a processor, perform the steps of any one of the methods of block data storage. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A demand determining method, the method comprising:

triggering and analyzing tasks based on task information configured by a user;

acquiring related data matched with the task information;

when the analysis task is executed, analyzing based on the related data to obtain a demand analysis result;

after the analysis task performs analysis based on the related data to obtain a requirement analysis result, the method further includes:

selecting a prediction model set based on the demand analysis result;

obtaining a demand prediction result through the prediction model;

the selecting a prediction model set based on the demand analysis result includes:

determining a candidate model set based on the demand analysis result;

performing feature extraction, model training and model evaluation on each candidate model in the candidate model set to obtain an evaluation result of each candidate model;

Carrying out model fusion on candidate models of which the evaluation results meet a first preset condition in the candidate model set to obtain a fusion model;

taking the fusion model and each candidate model meeting a first preset condition as a prediction model in a candidate prediction model set;

performing feature extraction, model retraining and model reevaluation on each prediction model in the candidate prediction model set to obtain reevaluation results of each prediction model;

and selecting the prediction model of which the reevaluation result meets a second preset condition from the prediction models to perform model fusion, so as to obtain the candidate prediction model set.

2. The method of claim 1, wherein the analyzing, when the analyzing task is performed, based on the related data to obtain a demand analysis result, comprises:

preprocessing the related data to obtain preprocessed data;

performing feature analysis on the preprocessed data to obtain feature analysis data;

and carrying out data analysis on the characteristic analysis data to obtain a data analysis result.

3. The method of claim 2, wherein preprocessing the related data to obtain preprocessed data comprises:

Performing value deficiency processing on the related data to obtain complete data;

and discretizing the complete data to obtain the preprocessing data.

4. The method of claim 2, wherein the performing the feature analysis on the pre-processed data to obtain feature analysis data comprises:

performing feature selection on the preprocessed data to obtain feature selection data;

extracting the characteristics of the characteristic selection data to obtain characteristic extraction data;

and carrying out feature combination on the feature extraction data to obtain the feature analysis data.

5. The method according to claim 2, wherein the performing data analysis on the feature analysis data to obtain the data analysis result includes:

and carrying out one or more of trend mode analysis, data distribution analysis, feature importance analysis, association relation analysis and life cycle analysis on the feature analysis data to obtain the data analysis result.

6. A demand determination system, wherein the system comprises a data and parallel control subsystem and an algorithm policy engine subsystem;

the data and parallel control subsystem is used for triggering an analysis task based on task information configured by a user and acquiring related data matched with the task information;

The algorithm policy engine subsystem is used for analyzing based on the related data to obtain a demand analysis result when the analysis task is executed;

the algorithm policy engine subsystem is also used for selecting a prediction model set based on the demand analysis result; obtaining a demand prediction result through the prediction model;

the algorithm policy engine subsystem is specifically configured to determine a candidate model set based on the requirement analysis result; performing feature extraction, model training and model evaluation on each candidate model in the candidate model set to obtain an evaluation result of each candidate model; carrying out model fusion on candidate models of which the evaluation results meet a first preset condition in the candidate model set to obtain a fusion model; taking the fusion model and each candidate model meeting a first preset condition as a prediction model in a candidate prediction model set; performing feature extraction, model retraining and model reevaluation on each prediction model in the candidate prediction model set to obtain reevaluation results of each prediction model; and selecting the prediction model of which the reevaluation result meets a second preset condition from the prediction models to perform model fusion, so as to obtain the candidate prediction model set.

7. An electronic device comprising a memory and a processor, the memory having stored therein program instructions which, when executed by the processor, perform the steps of the method of any of claims 1-5.

8. A storage medium having stored therein computer program instructions which, when executed by a processor, perform the steps of the method of any of claims 1-5.