CN111177657A

CN111177657A - Demand method, demand system, electronic device, and storage medium

Info

Publication number: CN111177657A
Application number: CN201911425328.7A
Authority: CN
Inventors: 杨秋源; 周超; 许平; 牛世雄; 徐明泉
Original assignee: Beijing SF Intra City Technology Co Ltd
Current assignee: Beijing SF Intra City Technology Co Ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2020-05-19
Anticipated expiration: 2039-12-31
Also published as: CN111177657B

Abstract

The application provides a demand determination method, a demand determination system, electronic equipment and a storage medium, which relate to the technical field of supply chains, and the method comprises the following steps: triggering an analysis task based on task information configured by a user; acquiring related data matched with the task information; while performing the analysis task, performing an analysis based on the relevant data to obtain a demand analysis result. The application file can screen out relevant data of the task information based on different task information so as to obtain analysis results under different tasks, can process various different types of demand determination tasks, can determine demands for different objects, and can improve the universality of the demand determination method.

Description

Demand method, demand system, electronic device, and storage medium

Technical Field

The present application relates to the field of supply chain technologies, and in particular, to a demand method, a demand system, an electronic device, and a storage medium.

Background

In the technical field of supply chains, accurate demand determination (including demand analysis and analysis prediction) is required to be applied to links such as store demand prediction and warehouse demand prediction for subsequent replenishment plans and production plans. The current demand determination mainly aims at analyzing a single goods, for example, a certain goods of a certain store cannot adapt to various goods and various demand determination tasks, and the problem of low universality of demand determination exists.

Disclosure of Invention

Embodiments of the present application provide a demand determination method, a demand determination system, an electronic device, and a storage medium, so as to solve the problem that the current demand determination method is low in universality.

An embodiment of the present application provides a demand determination method, including: triggering an analysis task based on task information configured by a user; acquiring related data matched with the task information; while performing the analysis task, performing an analysis based on the relevant data to obtain a demand analysis result.

In the implementation process, different types of analysis tasks can be triggered based on the task information configured by the user, so that various different types of requirement determination tasks are processed, different objects are required to be determined, and the universality of the requirement determination method can be improved.

Optionally, when the analysis task is executed, performing analysis based on the relevant data to obtain a demand analysis result includes: preprocessing the related data to obtain preprocessed data; performing characteristic analysis on the preprocessed data to obtain characteristic analysis data; and carrying out data analysis on the characteristic analysis data to obtain a data analysis result.

In the implementation process, the related data is preprocessed, so that the completeness and the accuracy of the preprocessed data can be improved, the completeness and the accuracy of the feature analysis data can be improved, and the accuracy of the data analysis result can be further improved.

Optionally, the preprocessing the related data to obtain preprocessed data includes: carrying out default processing on the related data to obtain complete data; and carrying out discretization processing on the complete data to obtain the preprocessed data.

In the implementation process, the complete data can supplement missing parts in the related data, so that the integrity of the related data can be improved. The discretization processing can compress the data size of the complete data to obtain the preprocessed data, and the operation efficiency based on the discretization data is improved.

Optionally, the performing feature analysis on the preprocessed data to obtain feature analysis data includes: performing feature selection on the preprocessed data to obtain feature selection data; carrying out feature extraction on the feature selection data to obtain feature extraction data; and carrying out feature combination on the feature extraction data to obtain the feature analysis data.

In the implementation process, after the feature selection, the feature extraction and the feature combination are carried out on the preprocessed data, the purposes of data dimension reduction, optimization and feature classification performance improvement can be achieved, and therefore the accuracy of data analysis is improved.

Optionally, the performing data analysis on the feature analysis data to obtain the data analysis result includes: and performing one or more of trend pattern analysis, data distribution analysis, feature importance analysis, association relation analysis and life cycle analysis on the feature analysis data to obtain the data analysis result.

In the implementation process, a plurality of characteristic analysis types such as trend pattern analysis, data distribution analysis, characteristic importance analysis, incidence relation analysis, life cycle analysis and the like are provided, and different types of analysis can be performed based on different requirements, so that the universality of data analysis is improved, and the accuracy of a data analysis result can be improved.

Optionally, after performing analysis based on the relevant data in the analysis task to obtain a demand analysis result, the method further includes: selecting a set of predictive models based on the demand analysis results; and obtaining a demand prediction result through the prediction model.

In the implementation process, the demand analysis improves the universality of the demand analysis, so that the universality of the demand prediction can be improved by predicting the demand based on the demand analysis result.

Optionally, the selecting a set of prediction models based on the demand analysis result includes: determining a set of candidate models based on the demand analysis result; performing feature extraction, model training and model evaluation on each candidate model in the candidate model set to obtain an evaluation result of each candidate model; performing model fusion on the candidate models of which the evaluation results meet a first preset condition in the candidate model set to obtain a first fusion model; taking the fusion model and each candidate model meeting the first preset condition as a prediction model in a candidate prediction model set; performing feature extraction, model retraining and model reevaluation on each prediction model in the candidate prediction model set to obtain reevaluation results of each prediction model; and selecting the prediction models of which the re-evaluation results meet second preset conditions from the prediction models to perform model fusion to obtain the prediction model set.

In the implementation process, the candidate set matched with the demand analysis result is selected based on the demand analysis result, model training, model evaluation and model fusion are carried out, and the prediction accuracy of the prediction model can be improved by the prediction model set finally obtained after the candidate set meets the first preset condition and the second preset condition in sequence after twice screening.

The embodiment of the application also provides a demand determination system, which comprises a data and parallel control subsystem and an algorithm strategy engine subsystem; the data and parallel control subsystem is used for triggering an analysis task based on task information configured by a user and acquiring related data matched with the task information; the algorithm strategy engine subsystem is used for carrying out analysis based on the related data to obtain a demand analysis result when the analysis task is executed.

Optionally, the algorithm policy engine subsystem is specifically configured to perform preprocessing on the relevant data to obtain preprocessed data; performing characteristic analysis on the preprocessed data to obtain characteristic analysis data; and carrying out data analysis on the characteristic analysis data to obtain a data analysis result.

Optionally, the algorithm policy engine subsystem is further specifically configured to perform default processing on the relevant data to obtain complete data; and carrying out discretization processing on the complete data to obtain the preprocessed data.

In the implementation process, the complete data can supplement missing parts in the related data, so that the integrity of the related data can be improved. The discretization processing can compress the data size of the complete data to obtain the preprocessed data, and the operation efficiency based on the preprocessed data is improved.

Optionally, the algorithm policy engine subsystem is further specifically configured to perform feature selection on the preprocessed data to obtain feature selection data; carrying out feature extraction on the feature selection data to obtain feature extraction data; and carrying out feature combination on the feature extraction data to obtain the feature analysis data.

Optionally, the algorithm policy engine subsystem is further specifically configured to perform one or more of trend pattern analysis, data distribution analysis, feature importance analysis, association analysis, and life cycle analysis on the feature analysis data to obtain the data analysis result.

In the implementation process, various feature analysis types such as trend pattern analysis, data distribution analysis, feature importance analysis, association relation analysis, life cycle analysis and the like are provided, and different types of analysis can be performed based on different requirements, so that the universality of data analysis is improved, and the accuracy of a data analysis result can be improved.

Optionally, the algorithm policy engine subsystem is further configured to select a prediction model set based on the demand analysis result; and obtaining a demand prediction result through the prediction model.

Optionally, the algorithm policy engine subsystem is further configured to determine a set of candidate models based on the demand analysis result; performing feature extraction, model training and model evaluation on each candidate model in the candidate model set to obtain an evaluation result of each candidate model; performing model fusion on the candidate models of which the evaluation results meet a first preset condition in the candidate model set to obtain a first fusion model; taking the fusion model and each candidate model meeting the first preset condition as a prediction model in a candidate prediction model set; performing feature extraction, model retraining and model reevaluation on each prediction model in the candidate prediction model set to obtain reevaluation results of each prediction model; and selecting the prediction models of which the re-evaluation results meet second preset conditions from the prediction models to obtain the candidate prediction model set.

An embodiment of the present application further provides an electronic device, where the electronic device includes a memory and a processor, where the memory stores program instructions, and the processor executes the program instructions to perform the steps in the above method.

Embodiments of the present application also provide a storage medium, in which computer program instructions are stored, and when the computer program instructions are executed by a processor, the steps in the above method are executed.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

Fig. 1 is a flowchart of a demand determination method according to an embodiment of the present application.

Fig. 2 is a flowchart illustrating analysis performed based on relevant data in an analysis task to obtain a demand analysis result according to an embodiment of the present application.

Fig. 3 is a flowchart for preprocessing related data to obtain preprocessed data according to an embodiment of the present disclosure.

Fig. 4 is a flowchart for performing feature analysis on preprocessed data to obtain feature analysis data according to an embodiment of the present disclosure.

Fig. 5 is a flowchart of a method after performing analysis based on relevant data in an analysis task to obtain a demand analysis result according to an embodiment of the present application.

Fig. 6 is a flowchart of selecting a prediction model set based on a result of demand analysis according to an embodiment of the present application.

Fig. 7 is a block diagram of a demand determination system according to an embodiment of the present application.

Icon: 60-a demand determination system; 601-a data and concurrency control subsystem; 602-algorithm policy engine subsystem.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

In the description of the present application, it is noted that the terms "first", "second", and the like are used merely for distinguishing between descriptions and are not intended to indicate or imply relative importance.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and drawings.

An embodiment of the present application provides a method for determining a demand, please refer to fig. 1, where fig. 1 is a flowchart of the method for determining a demand according to the embodiment of the present application. The method comprises the following steps:

step S1: and triggering an analysis task based on the task information configured by the user.

The task information is the user's demand information, such as analyzing brand a sales in the first quarter. In one embodiment, the user may issue the task information by text input, voice input, or the like. It will be appreciated that different task information may trigger different analysis tasks. In actual production life, the demand analysis can be carried out on goods of different types, different brands, different time periods, different regions, different suppliers and the like.

Step S2: and acquiring related data matched with the task information.

In actual production life, when demand analysis is performed on an article, relevant data includes sales information, sales promotion information, weather information, public opinion data, and the like of the article. As one way, when analyzing the demand of brand a, sales information, sales promotion information, weather information, and public opinion data related to brand a are acquired, where the public opinion data is user evaluation information of brand a.

The searching for the matched related data according to the task information may be performed by performing field matching of specific fields on the task information, and configuring the related data corresponding to each specific field in advance, so that the related data can be matched based on the task information.

Step S3: while performing the analysis task, an analysis is performed based on the relevant data to obtain a demand analysis result.

It can be understood that the related data matched with the task information is obtained, the requirement analysis is performed based on the related data, and different requirement analyses can be completed based on different task information to obtain corresponding requirement analysis results. The method can process various different types of requirement determination tasks and determine the requirements of different objects, and can improve the universality of the requirement determination method.

Referring to fig. 2, fig. 2 is a flowchart illustrating a method for performing analysis based on relevant data to obtain a result of demand analysis in an analysis task according to an embodiment of the present application. Optionally, step S3 is specifically divided into the following sub-steps:

step S31: and preprocessing the related data to obtain preprocessed data.

Referring to fig. 3, fig. 3 is a flowchart illustrating preprocessing of related data to obtain preprocessed data according to an embodiment of the present disclosure. Alternatively, step S31 may be divided into the following sub-steps:

step S31.1: and carrying out default processing on the related data to obtain complete data.

Step S31.2: and carrying out discretization on the complete data to obtain preprocessed data.

In step S31.1, the reasons for the lack of the relevant data are various and mainly classified into mechanical reasons and human reasons. The mechanical reason is the lack of relevant data due to failure of data collection or storage due to mechanical reasons, such as failure of data storage, memory corruption, mechanical failure resulting in failure to collect data for a certain period of time (for timed data collection). The human cause is the absence of relevant data due to human subjective mistakes, historical limitations, or intentional concealment, such as the refusal of the interviewee to reveal the answers to relevant questions when collecting relevant data in a market survey, or the question being answered being invalid, the data entry personnel missing the data by mistake.

The method for processing the deficiency value of the related data due to the reasons comprises the following steps: deleting samples with missing values and interpolating the missing values. For the missing of the related data caused by human factors, people will affect the reality of the data, and the true values of other attributes of the sample with the missing value cannot be guaranteed, so the interpolation depending on the attribute values is also unreliable, so the missing processing method of the related data caused by human factors generally does not recommend the method of missing value interpolation, and generally adopts the method of deleting the sample with the missing value. The missing value interpolation is mainly aimed at the missing of related data caused by mechanical reasons, and the reliability of the missing value interpolation is guaranteed.

The method for deleting the samples with the missing values comprises a simple deleting method and a weighting method, the simple deleting method deletes the samples with the missing values, and if the data missing problem can be achieved by simply deleting a small part of samples, the method is most effective. When the type of the missing value is non-complete random missing (which means that the missing of the data is related to the value of the incomplete variable, and the missing value causes the related data to lose a large amount of useful information, the uncertainty shown by the related data is more obvious, and the deterministic component contained in the related data is more difficult to grasp), the deviation can be reduced by weighting the complete data. After labeling incomplete data samples, the complete data samples are given different weights. This approach can effectively reduce bias if there are variables in the interpretation variables that are determinative of the weight estimation.

The missing value interpolation method comprises the following steps: and (4) average value interpolation, if the missing value is a numerical attribute, performing interpolation by using the average value of the values of the attribute in all other objects. If the null value is a non-numeric attribute, the missing attribute value is filled up with the value of the attribute that appears most frequently in all other objects according to the statistical mode principle. The missing value interpolation method further comprises the following steps: and the least square method interpolation is used for complementing the relevant data based on the principle of the least square method, so that the integral dispersion square sum of the complemented relevant data is minimum, and the accuracy of the relevant data can be improved.

In order to improve the time-space efficiency of operation and improve the classification clustering capability and noise immunity of the complete data to be processed, discretization processing needs to be performed on the complete data subjected to missing value processing, a single-hot coding method can be adopted for the discretization processing method, values of discrete features of the complete data are expanded to an Euclidean space, and a certain value of the discrete features corresponds to a certain point of the Euclidean space. Using unique thermal coding for discrete features allows the distance between features to be calculated more reasonably. After the discrete features are subjected to unique hot coding, the coded features can be regarded as continuous features in each dimension. The one-hot coding solves the problem that the classifier does not process attribute data well, and plays a role in expanding features to a certain extent. Its values are only 0 and 1, different types are stored in vertical space, e.g., [ red, blue ], then one-hot coding is used.

With continuing reference to fig. 1 and fig. 4, fig. 4 is a flowchart illustrating a feature analysis performed on preprocessed data to obtain feature analysis data according to an embodiment of the present disclosure.

Step S32: and performing characteristic analysis on the preprocessed data to obtain characteristic analysis data.

To accurately estimate a function of some variables with a given accuracy, the required number of samples grows exponentially as the number of sample dimensions increases, causing a catastrophic failure of the dimensions. The dimensionality reduction is to overcome the dimensionality disaster, obtain essential features, reduce the complexity of data processing, save storage space, remove useless noise and realize data visualization. In order to reduce dimensions, feature selection and feature extraction are required.

Alternatively, step S32 may be divided into the following sub-steps:

step S32.1: and performing feature selection on the preprocessed data to obtain feature selection data.

It will be appreciated that feature selection does not change the meaning of the pre-processed data, and only the features are screened, leaving features that have a greater impact on the target. The method of feature selection is to select a subset from the preprocessed data that is an inclusive relationship without altering the original feature space. Common feature selection methods are: the Filter method, the main idea of which is: the features of each dimension are scored, namely the features of each dimension are endowed with weights, and the weights represent the importance of the features of the dimension and are then sorted according to the weights; the Wrapper (wrapping) method has the main idea that: the feature selection of the preprocessed data is regarded as a search optimization problem, different combinations are generated, the combinations are evaluated and compared with other combinations. Thus, the selection of the features is regarded as an optimization problem, and besides the two algorithms, other algorithms such as GA (Genetic Algorithm ), PSO (Particle swarm optimization), DE (Differential Evolution, Differential) Algorithm, etc. may be selected according to specific requirements.

Step S32.2: and performing feature extraction on the feature selection data to obtain feature extraction data.

Alternatively, the method for feature extraction in this embodiment includes PCA (Principal Component Analysis), LDA (linear discriminant Analysis), and the like.

PCA is a statistical method that transforms a set of variables that may be correlated into a set of linearly uncorrelated variables by orthogonal transformation, which is called the principal component. In many cases, there is a certain correlation between variables, and when there is a certain correlation between two variables, it can be interpreted that there is a certain overlap between the two variables reflecting the information of the analysis task. The principal component analysis is to eliminate redundant repeated variables (closely related variables) for all the originally proposed variables, and establish as few as possible new variables, so that the new variables are irrelevant pairwise, and the new variables keep original information as much as possible in the aspect of reflecting the information of the analysis task.

LDA is a dimension reduction technology of supervised learning, PCA is an unsupervised dimension reduction technology, LDA considers the factors of classes on the basis of dimension reduction, and the obtained variance in the projection classes is minimum, and the variance between the classes is maximum.

Step S32.3: and carrying out feature combination on the feature extraction data to obtain feature analysis data.

Illustratively, a composite feature may be formed by combining individual features (this embodiment may be multiplication or cartesian product). The combination of features helps to represent the non-linear relationship. Many different kinds of feature combinations can be created, for example, feature a and feature B are combined by:

[ A × B ] a combination of features formed by multiplying the values of two features.

[ A × B × C × D × E ] is a combination of features formed by multiplying values of five features.

[ A × A ] A combination of features formed by squaring the values of individual features.

Continuing with fig. 1, step S33: and carrying out data analysis on the characteristic analysis data to obtain a data analysis result.

Optionally, step S33 includes: and performing one or more of trend pattern analysis, data distribution analysis, feature importance analysis, association relation analysis and life cycle analysis on the feature analysis data to obtain the data analysis result.

The data analysis is suitable for long-term tracking of product core indexes, such as click rate, number of active users and the like. Trend analysis is the change that requires explicit data, and analysis of the cause of the change. Trend analysis, the better output is the ratio. Several concepts need to be clarified at the time of trend analysis: ring ratio, same ratio and fixed base ratio. The ring ratio means that the statistical data of the current period is compared with the previous period, for example, the 2 month of 2017 is compared with the 1 month of 2017, and the ring ratio can know the closer trend of change, but has some seasonal differences. To eliminate seasonal differences, there is then a notion of parity, for example, a comparison between month 2 of 2017 and month 2 of 2016. The base ratio is compared with a certain base point, for example, 2016 (1 month) is used as the base point, and the base ratio is compared between 2017 (2 month) and 2016 (1 month). Such as: the number of active users in a certain APP month in the 2 month of 2017 is 2000 ten thousand, compared with the 1 month, the ring ratio is increased by 2%, and compared with the 2 month of the last year, the ring ratio is increased by 20%. Trend analysis another core goal is to explain trends, and for obvious inflection points in a trend line, what happens gives a reasonable explanation, whether it is an external cause or an internal cause.

The data distribution analysis is an analysis method for classifying data (quantitative data) into groups (quantitative groups) at equal intervals or at different intervals according to the purpose of analysis and studying the distribution rule of each group. Often used for consumer distribution, income distribution, age distribution, etc.

In one embodiment, in performing feature importance analysis, the feature importance may be measured in two classification cases by using a decision function weight vector of a support vector machine. A method for measuring feature importance by using a decision function weight vector of a support vector machine is combined with forward sequence search and backward sequence search strategies to design a related feature selection algorithm. Algorithms that may be employed include: the SVM _ W _ SFS1 algorithm and the SVM _ W _ SFS2 algorithm are characterized in that the SVM _ W _ SFS1 algorithm trains a support vector machine by using all features to obtain weight vectors of decision functions of the support vector machine, the features are arranged in a descending order according to the absolute value of the weight vectors of the decision functions of the support vector machine, the features with large contribution are arranged at the front, then the features with large contribution are gradually added according to the sequence of the features in a forward sequential search mode, and the classification accuracy of a training set is recorded until the last feature is added.

The SVM _ W _ SFS2 algorithm specifically comprises the steps of training a support vector machine by using all features to obtain a weight vector of a decision function of the support vector machine, selecting a feature corresponding to the maximum value in the absolute value of the weight vector of the decision function of the support vector machine, adding the feature into a selected feature subset (the feature subset is empty initially), deleting the feature from the rest feature set (the feature full set initially), training on a training set by using the subset only containing the selected feature, and recording the classification accuracy of the training set; training on a training set by using features only containing the residual feature set, selecting the features with the maximum weight vector value of a decision function of a support vector machine to be in the selected feature set, deleting the features from the residual feature set, training on the training set by using a sample only containing the selected features to obtain the classification accuracy of the training set, and repeating the step until the residual feature set is empty.

The incidence relation analysis is to find the incidence or correlation existing in the related data, thereby describing the rule and the mode of some attributes in one thing appearing at the same time. For example, in a shopping basket analysis task, the purchasing habits of a customer are analyzed by finding associations between different items that the customer places in their shopping basket. This correlated discovery may help retailers formulate marketing strategies, tariff designs, merchandise promotions, discharge of merchandise, and customer segmentation based on buying patterns by knowing which merchandise is frequently purchased simultaneously by the customer. Common association analysis algorithms include Apriori (association analysis) algorithms, and the Apriori algorithms use Apriori properties to produce candidate sets, so that the size of frequent sets can be compressed, and the association analysis rate can be increased.

Life cycle analysis is a means of assessing the overall environmental impact of a product or a class of facilities throughout the "cradle to grave" process, with problems observed in terms of regional, national and even global breadth and the height of its sustainable development. For example, the period from the first time a user contacts a product or service, downloads a registration to become a user, and finally unloads the user is a life cycle, and based on the result of the life cycle analysis, the motivation of each behavior of the user can be mined to promote the user to prefer the product and slow down the user's loss.

Referring to fig. 5, fig. 5 is a flowchart of a method after performing analysis based on relevant data in an analysis task to obtain a demand analysis result according to an embodiment of the present application, and optionally, after step S3, the method further includes:

step S4: and selecting a prediction model set based on the demand analysis result.

It is understood that different prediction model sets, such as a SARIMA (Seasonal adaptive Integrated Moving Average) model, which is one of the time series prediction analysis methods, may be selected according to different actual needs; in addition to the SARIMA model, the Holt-Winters (Holt-Winters) method, which is a time series analysis and prediction method, can also be selected. Specifically, the Holt-Winters method is suitable for non-stationary sequences with linear trends and periodic fluctuations, model parameters are continuously adapted to the changes of the non-stationary sequences by using an exponential smoothing method, and the future trends are subjected to short-term prediction. The Holt-Winters method introduces a seasonal term on the basis of a Holt model, and can be used for processing the fluctuation behavior of fixed periods in time series such as monthly data, quarterly data, weekly data and the like.

Referring to fig. 6, fig. 6 is a flowchart illustrating selection of a prediction model set based on a demand analysis result according to an embodiment of the present disclosure. Optionally, step S4 is divided into the following sub-steps:

step S41: a set of candidate models is determined based on the demand analysis results.

Step S42: and performing feature extraction, model training and model evaluation on each candidate model in the candidate model set to obtain an evaluation result of each candidate model.

Step S43: and carrying out model fusion on the candidate models of which the evaluation results meet the first preset condition in the candidate model set to obtain a first fusion model.

It can be understood that, in order to increase the generalization capability of the models, each model in the candidate model set is fused and evaluated, the model evaluation algorithm may select an algorithm such as MAPE (Mean Absolute Percentage Error), RMSE (Root Mean Square Error), MAE (Mean Absolute Error), and the model fusion algorithm may select an algorithm such as Boosting algorithm, Bagging (boost aggregation), Stacking (mixing) algorithm, and Blending algorithm. For example, the MAPE may be a mean MAPE, and the calculation formula of the mean MAPE is as follows:

where n represents the number of predictors in the model,

indicates the ith predictor, y, of the n predictors_iRepresenting the true value of the ith predicted value.

The MAE is calculated as follows:

the RMSE calculation formula is as follows:

for example, in the case of room price prediction, each square is ten thousand yuan, and the prediction result is also ten thousand yuan. The square unit of the difference should be in the order of tens of millions. The evolution may be done for convenience of description. In the above three algorithms, the standard deviation is used to measure the degree of dispersion of a group of numbers, the root mean square error is used to measure the deviation between the observed value and the true value, and the research objects and the research purposes are different and can be selected according to the actual situation.

Bagging in the model fusion algorithm focuses on obtaining an integrated model with a smaller variance than its components, while Boosting and Stacking will mainly generate a strong model with a lower bias than its components (even if the variance can be reduced), specifically:

the Boosting algorithm is a method for improving the accuracy of weak classification algorithms by constructing a series of prediction functions and then combining them in a certain way into a prediction function. The Boosting algorithm is a method to improve the accuracy of any given learning algorithm. The working mechanism of the Boosting algorithm is that a weak learner I is firstly trained from a training set by using initial weight, and the weight of a training sample is updated according to the learning error rate performance of weak learning, so that the weight of training sample points with high learning error rate of the weak learner I is higher, and the points with high error rate are paid more attention by the following weak learner II. And then training a second weak learner based on the training set with the adjusted weights, repeating the process until the number of the second weak learners reaches a preset number T, and finally integrating the T weak learners through a set strategy to obtain a final strong learner.

The Bagging algorithm is a typical representative of a parallel type ensemble learning method, and is directly based on a self-service sampling method. Given a data set containing m samples, we randomly take a sample into the sample and then put the sample back into the original data set so that the sample may still be selected at the next sampling. Thus, through m times of random sampling operation, a sampling set containing m samples is obtained, and some samples in the initial training set appear in the sampling set for multiple times, and some samples never appear. Approximately 63.2% of the samples in the initial training set appear in the sample set. From the perspective of deviation-variance, Bagging mainly focuses on reducing variance, so that the Bagging effect is more obvious on learners which do not prune decision trees, neural networks and the like and are easily disturbed by samples.

The Stacking algorithm generally considers heterogeneous weak learners, learns the heterogeneous weak learners in parallel, combines the heterogeneous weak learners by training a meta-model, and outputs a final prediction result according to the prediction results of the different weak learners.

The Blending algorithm is a simpler fusion mode, the classifiers are selected in various ways, and different effects can be obtained by combining different classifiers, so that most of time is occupied by selecting the classifiers in the actual application process. Blending is approximately the same as Stacking, except that the main difference of Blending is that the training set does not obtain the predicted values through the CV strategy of K-Fold to generate the features of the second-stage model, but a Holdout set is established, for example, 10% of training data, and the stacker model of the second stage fits the predicted values of the 10% of training data based on the first-stage model.

Therefore, the selection of the fusion algorithm can be performed according to the characteristics of the algorithms such as Boosting, Bagging, Stacking, Blending and the like, and by combining the specific requirements of the demand prediction.

Step S44: and taking the fusion model and each candidate model meeting the first preset condition as a prediction model in the candidate prediction model set.

The candidate prediction model set may be tested by a test sequence, and as an embodiment, the first preset condition may be that the accuracy rate exceeds 90%, and the candidate prediction model set is screened according to the condition. The test sequence is a known sequence, different result sequences are output after the test sequence is input into each candidate prediction model in the candidate prediction model set, and the input sequence and each different result sequence are compared to obtain the accuracy of each candidate prediction model. The test sequence can be set according to the actual situation.

Step S45: and performing feature extraction, model retraining and model reevaluation on each prediction model in the candidate prediction model set to obtain a reevaluation result of each prediction model.

Step S46: and selecting the prediction models of which the reevaluation results meet the second preset condition from the prediction models to obtain a candidate prediction model set.

The model evaluation fusion method in step S45 and step S46 is similar to that in step S42 and step S43, and will not be described herein again. The second preset condition may be similar to the first preset condition, and may limit the accuracy, or may perform the screening according to the model operation time.

It is understood that when the models in steps S42, S43, S45 and S46 are evaluated and fused, and there is only one model to be fused, the only model is taken as the result of model fusion.

Continuing with fig. 5, step S5: and obtaining a demand forecasting result through a forecasting model.

As an embodiment, after step S5, confidence check and distribution fitting may be performed on the obtained demand prediction result, so as to prevent the analyzed result and the predicted result from being too biased. The confidence check comprises confidence check by standard deviation, quantile points and the like, and can adopt normal distribution, Poisson distribution and gamma distribution for distribution fitting.

As an embodiment, before step S1, the method further includes step S5: and receiving the task information configured by the user.

An embodiment of the present application provides a demand determination system, please refer to fig. 7, and fig. 7 is a block diagram of a demand determination system provided in an embodiment of the present application.

In order to better implement the requirement determining method provided by the present embodiment, the present embodiment further provides a requirement determining system 60. The demand determination system 60 includes:

the data and parallel control subsystem 601 is used for triggering an analysis task based on task information configured by a user and acquiring related data matched with the task information;

the algorithm strategy engine subsystem 602 is used for performing analysis based on the relevant data in the analysis task to obtain a demand analysis result.

Optionally, the algorithm policy engine subsystem 602 is configured to perform preprocessing on the relevant data to obtain preprocessed data; carrying out characteristic analysis on the preprocessed data to obtain characteristic analysis data; and carrying out data analysis on the characteristic analysis data to obtain a data analysis result.

Optionally, the algorithm policy engine subsystem 602 is further specifically configured to perform default processing on the relevant data to obtain complete data; and carrying out discretization on the complete data to obtain preprocessed data.

Optionally, the algorithm policy engine subsystem 602 is further specifically configured to perform feature selection on the preprocessed data to obtain feature selection data; carrying out feature extraction on the feature selection data to obtain feature extraction data; and carrying out feature combination on the feature extraction data to obtain feature analysis data.

Optionally, the algorithm policy engine subsystem 602 is further specifically configured to perform one or more of trend pattern analysis, data distribution analysis, feature importance analysis, association analysis, and life cycle analysis on the feature analysis data to obtain a data analysis result.

Optionally, the algorithm policy engine subsystem 602 is further configured to select a prediction model set based on the demand analysis result, and obtain a demand prediction result through the prediction model.

Optionally, the algorithm policy engine subsystem 602 is further configured to determine a set of candidate models based on the result of the demand analysis; performing feature extraction, model training and model evaluation on each candidate model in the candidate model set to obtain an evaluation result of each candidate model; performing model fusion on the candidate models of which the evaluation results meet a first preset condition in the candidate model set to obtain a first fusion model; taking the fusion model and each candidate model meeting a first preset condition as a prediction model in a candidate prediction model set; performing feature extraction, model retraining and model reevaluation on each prediction model in the candidate prediction model set to obtain reevaluation results of each prediction model; and selecting the prediction models of which the reevaluation results meet the second preset condition from the prediction models to obtain a candidate prediction model set.

The present embodiment also provides an electronic device, where the electronic device includes a memory and a processor, where the memory stores program instructions, and the processor executes the program instructions to perform the steps in any one of the above methods.

The present embodiment also provides a storage medium, in which computer program instructions are stored, and when the computer program instructions are executed by a processor, the steps in any one of the above methods are executed.

Alternatively, the electronic device may be a Personal Computer (PC), a tablet PC, a smart phone, a Personal Digital Assistant (PDA), or other electronic devices.

In summary, an embodiment of the present application provides a demand determining method, including: triggering an analysis task based on task information configured by a user; acquiring related data matched with the task information; and analyzing based on the related data in the analysis task to obtain a demand analysis result.

In the implementation process, various different types of requirement determination tasks can be processed, different objects can be subjected to requirement determination, and the universality of the requirement determination method can be improved.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. The apparatus embodiments described above are merely illustrative, and for example, the block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of devices according to various embodiments of the present application. In this regard, each block in the block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams, and combinations of blocks in the block diagrams, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Therefore, the present embodiment further provides a readable storage medium, in which computer program instructions are stored, and when the computer program instructions are read and executed by a processor, the computer program instructions perform the steps of any of the block data storage methods. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for demand determination, the method comprising:

triggering an analysis task based on task information configured by a user;

acquiring related data matched with the task information;

while performing the analysis task, performing an analysis based on the relevant data to obtain a demand analysis result.

2. The method of claim 1, wherein analyzing based on the relevant data to obtain a demand analysis result while performing the analysis task comprises:

preprocessing the related data to obtain preprocessed data;

performing characteristic analysis on the preprocessed data to obtain characteristic analysis data;

and carrying out data analysis on the characteristic analysis data to obtain a data analysis result.

3. The method of claim 2, wherein preprocessing the relevant data to obtain preprocessed data comprises:

carrying out default processing on the related data to obtain complete data;

and carrying out discretization processing on the complete data to obtain the preprocessed data.

4. The method of claim 2, wherein said performing feature analysis on said preprocessed data to obtain feature analysis data comprises:

performing feature selection on the preprocessed data to obtain feature selection data;

carrying out feature extraction on the feature selection data to obtain feature extraction data;

and carrying out feature combination on the feature extraction data to obtain the feature analysis data.

5. The method of claim 2, wherein the performing data analysis on the feature analysis data to obtain the data analysis result comprises:

and performing one or more of trend pattern analysis, data distribution analysis, feature importance analysis, association relation analysis and life cycle analysis on the feature analysis data to obtain the data analysis result.

6. The method of claim 1, wherein after performing the analysis based on the relevant data in the analysis task to obtain a demand analysis result, the method further comprises:

selecting a set of predictive models based on the demand analysis results;

and obtaining a demand prediction result through the prediction model.

7. The method of claim 6, wherein selecting a set of predictive models based on the demand analysis results comprises:

determining a set of candidate models based on the demand analysis result;

performing feature extraction, model training and model evaluation on each candidate model in the candidate model set to obtain an evaluation result of each candidate model;

performing model fusion on the candidate models of which the evaluation results meet a first preset condition in the candidate model set to obtain a fusion model;

taking the fusion model and each candidate model meeting the first preset condition as a prediction model in a candidate prediction model set;

performing feature extraction, model retraining and model reevaluation on each prediction model in the candidate prediction model set to obtain reevaluation results of each prediction model;

and selecting the prediction models of which the re-evaluation results meet second preset conditions from the prediction models to perform model fusion, so as to obtain the candidate prediction model set.

8. A demand determination system, comprising a data and concurrency control subsystem, an algorithmic policy engine subsystem;

the data and parallel control subsystem is used for triggering an analysis task based on task information configured by a user and acquiring related data matched with the task information;

the algorithm strategy engine subsystem is used for carrying out analysis based on the related data to obtain a demand analysis result when the analysis task is executed.

9. The system of claim 8, wherein the algorithm policy engine subsystem is further configured to select a set of predictive models based on the demand analysis results; and obtaining a demand prediction result through the prediction model.

10. An electronic device comprising a memory having stored therein program instructions and a processor that, when executed, performs the steps of the method of any of claims 1-7.

11. A storage medium having stored thereon computer program instructions for executing the steps of the method according to any one of claims 1 to 7 when executed by a processor.