WO2022252630A1 - 基于模型集合的数据预测方法、装置、设备及存储介质 - Google Patents

基于模型集合的数据预测方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2022252630A1
WO2022252630A1 PCT/CN2022/071836 CN2022071836W WO2022252630A1 WO 2022252630 A1 WO2022252630 A1 WO 2022252630A1 CN 2022071836 W CN2022071836 W CN 2022071836W WO 2022252630 A1 WO2022252630 A1 WO 2022252630A1
Authority
WO
WIPO (PCT)
Prior art keywords
prediction result
target
model
regression
classification
Prior art date
Application number
PCT/CN2022/071836
Other languages
English (en)
French (fr)
Inventor
张春玲
彭琛
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022252630A1 publication Critical patent/WO2022252630A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling

Definitions

  • the present application relates to the technical field of artificial intelligence data prediction, and in particular to a data prediction method, device, computer equipment and storage medium based on a model set.
  • Embodiments of the present application provide a data prediction method, device, computer equipment, and storage medium based on model sets, which can improve the accuracy of prediction results.
  • the embodiment of the present application provides a data prediction method based on a model set, which includes:
  • the model set includes multiple classification models and multiple regression models.
  • model, the target classification model and the target regression model are respectively the classification model with the highest quality score and the regression model with the highest quality score in the model set;
  • regression prediction result is in the same direction as the classification prediction result, then determine the target prediction result from the regression prediction result and the classification prediction result according to the target prediction type;
  • the target prediction result is determined according to a preset correction rule.
  • the embodiment of the present application also provides a data prediction device based on a model set, which includes:
  • the acquisition unit is used to acquire the factor data of the target industry
  • the input unit is used to respectively input the target industry factor data into the target regression model and the target classification model in the preset model set for result prediction, respectively obtain the regression prediction result and the classification prediction result, and the model set includes multiple classifications model and a plurality of regression models, the target classification model and the target regression model are respectively the classification model with the highest quality score and the regression model with the highest quality score in the model set;
  • a first determination unit configured to determine whether the directions of the regression prediction result and the classification prediction result are consistent
  • a second determining unit configured to determine a target prediction result from the regression prediction result and the classification prediction result according to the target prediction type when the regression prediction result is in the same direction as the classification prediction result;
  • the third determining unit is configured to determine a target prediction result according to a preset correction rule when the directions of the regression prediction result and the classification prediction result are inconsistent.
  • the embodiment of the present application also provides a computer device, which includes a memory and a processor, the memory stores a computer program, and the processor performs the following steps when executing the computer program:
  • the model set includes multiple classification models and multiple regression models.
  • model, the target classification model and the target regression model are respectively the classification model with the highest quality score and the regression model with the highest quality score in the model set;
  • regression prediction result is in the same direction as the classification prediction result, then determine the target prediction result from the regression prediction result and the classification prediction result according to the target prediction type;
  • the target prediction result is determined according to a preset correction rule.
  • the embodiment of the present application also provides a computer-readable storage medium, the storage medium stores a computer program, the computer program includes program instructions, and the program instructions perform the following steps when executed by a processor:
  • the model set includes multiple classification models and multiple regression models.
  • model, the target classification model and the target regression model are respectively the classification model with the highest quality score and the regression model with the highest quality score in the model set;
  • regression prediction result is in the same direction as the classification prediction result, then determine the target prediction result from the regression prediction result and the classification prediction result according to the target prediction type;
  • the target prediction result is determined according to a preset correction rule.
  • Embodiments of the present application provide a data prediction method, device, computer equipment, and storage medium based on a model set.
  • the embodiment of the present application combines the classification model and the regression model to predict the prediction result, and analyzes the same target problem from multiple angles, which can improve the accuracy of the prediction result.
  • FIG. 1 is a schematic diagram of an application scenario of a data prediction method based on a model set provided in an embodiment of the present application
  • FIG. 2 is a schematic flow chart of a data prediction method based on a model set provided in an embodiment of the present application
  • FIG. 3 is a schematic sub-flow diagram of a data prediction method based on a model set provided in an embodiment of the present application
  • FIG. 4 is a schematic diagram of another sub-flow of the data prediction method based on the model set provided by the embodiment of the present application;
  • FIG. 5 is a schematic diagram of another sub-flow of the model set-based data prediction method provided by the embodiment of the present application.
  • FIG. 6 is a schematic diagram of another sub-flow of the data prediction method based on the model set provided by the embodiment of the present application.
  • FIG. 7 is a schematic flowchart of a data prediction method based on a model set provided by another embodiment of the present application.
  • FIG. 8 is a schematic block diagram of a data prediction device based on a model set provided by an embodiment of the present application.
  • FIG. 9 is a schematic block diagram of a data prediction device based on a model set provided by another embodiment of the present application.
  • Fig. 10 is a schematic block diagram of a computer device provided by an embodiment of the present application.
  • Embodiments of the present application provide a data prediction method, device, computer equipment, and storage medium based on a model set.
  • the execution subject of the data prediction method based on the model set may be the data prediction device based on the model set provided in the embodiment of the present application, or a computer device integrated with the data prediction device based on the model set, wherein the data prediction method based on the model set
  • the predicting device can be realized by means of hardware or software
  • the computer equipment can be a terminal or a server
  • the terminal can be a smart phone, a tablet computer, a palmtop computer, or a notebook computer.
  • FIG. 1 is a schematic diagram of an application scenario of a data prediction method based on a model set provided in an embodiment of the present application.
  • the data prediction method based on the model set is applied to the computer device 10 in FIG. 1, and the model set corresponding to the target business is stored in the computer device 10, and the model set includes a plurality of classification models and a plurality of regression models, wherein,
  • the model with the highest quality score among the classification models is the target classification model
  • the model with the highest quality score among the regression models is the target regression model.
  • the computer device 10 after the computer device 10 obtains the unprocessed first industry factor data, it will Perform data preprocessing on the factor data of the first industry, including missing value processing, frequency conversion, etc., to obtain the factor data of the second industry, and perform feature engineering processing on the factor data of the second industry, and obtain the factor data of the target industry that needs to be input into the model in the model set , and then input the target industry factor data into the target regression model and the target classification model in the preset model set to obtain the regression prediction result and the classification prediction result, and finally according to whether the direction of the regression prediction result is consistent with the classification prediction result, if they are consistent , then determine the target prediction result from the regression prediction result and the classification prediction result according to the target prediction type, and if they are inconsistent, determine the target prediction result according to the preset correction rules.
  • FIG. 2 is a schematic flowchart of a data prediction method based on a model set provided in an embodiment of the present application. As shown in Fig. 2, the method includes the following steps S110-150.
  • the target industry factor data can be data constructed according to industry characteristics, industry logic, etc., where industry characteristics can include data such as revenue, market value, and market size of the industry, and industry logic is to analyze the upstream and downstream of the industry. What, what are the upstream raw materials, the specific price and output of raw materials, the scale of downstream demand and other data, such as the automobile industry, upstream steel production, rubber prices, etc., the sales volume of major downstream models, sales of vehicle electrical appliances, etc., this implementation In this example, the data corresponding to the characteristics of the industry and the logic of the industry will be selected as the industry factor data.
  • the computer device in this embodiment can automatically extract industry factor data from the corresponding database, or manually input the industry factor data through the user.
  • step S110 includes:
  • the first industry factor data is unprocessed industry factor data, that is, the original industry factor data obtained from the database, such as the original upstream raw material price data, downstream demand data, revenue data and market value of the automobile industry data etc.
  • the data preprocessing includes the missing value processing of the first industry factor data and the frequency conversion processing of the first industry factor data, wherein, the method of data simulation can be used to process the missing value of the data, for example, the data is in a certain The value of the time period is missing, and the change curve of the data of the same time period in the previous year can be simulated to fill in the data, so as to realize the processing of the missing value of the data.
  • the frequency conversion processing in this implementation can be The data in units of days, weeks, etc. are uniformly converted into data in units of months, etc., and step S112 implements structured and effective processing of all factors.
  • this step is mainly to perform feature engineering processing on the structured factors (second industry factors), combine factor importance, timeliness and other characteristics, evaluate factor effectiveness weights, or synthesize new effective factors through processing.
  • the X factor is the industry factor data
  • Y is the forecast result as an example for illustration, that is, X is the data input to the model, and Y is the data output by the model.
  • the value of Y in this embodiment can specifically be the industry Added value or capacity output value, etc., can be adjusted according to the target.
  • the effective factors of the prosperity of the automobile industry mainly evaluates the data disclosure quality of the X factor (the time difference between the update time and the release cycle, whether the update time difference of the sequential cycle is stable), the degree of data missing (whether there is a stable and normal update in the near future), X Correlation between factors and Y, correlation between X factors (whether there are a large number of similar factors), when there are too many X factors, Principal Component Analysis (PCA) is required to process, retain the principal components or effectively synthesize new factors, and conduct preliminary analysis For factor screening, the X factor that is finally retained is the industry factor data that needs to be input into the model in the model set in this embodiment.
  • PCA Principal Component Analysis
  • the model set includes multiple classification models and multiple regression models
  • the target classification model and the regression model are respectively the classification model with the highest quality score and the regression model with the highest quality score in the model set.
  • n is an integer greater than or equal to 1.
  • the target regression model is the top n models with the highest scores in the regression model
  • the target classification model is also is the top n models with the highest scores in the classification model.
  • the regression prediction result can be the mean value of n regression prediction results, or the mean value after removing the highest value and the lowest value among the n regression prediction results.
  • the classification prediction result can be The mean of the predicted results for n categories, or the mean after removing the highest and lowest values of the predicted results for n categories.
  • the target industry factor data will be input into the target regression model and the target classification model respectively, It is not necessary to make predictions for all the models in the model set at the same time.
  • the predictions of other models are triggered to reduce the power consumption of computer equipment.
  • the target industry factor data will be input into all models in the model combination to obtain
  • the prediction results of all the models in the model set are convenient for subsequent calling of the prediction results according to the correction rules when the direction of the regression prediction results is inconsistent with the classification prediction results, so as to improve the prediction speed.
  • the regression prediction results estimate that the Y value in June 2021 will fall compared with the actual value of the Y value in the previous period, or the Y value of the previous period is estimated by classification.
  • the forecast result predicts that the Y value of the automobile industry in June 2021 will fall, while the regression forecast predicts that the Y value in June 2021 will increase compared with the actual value of the previous period. The direction of the predicted results is not consistent.
  • the regression prediction results estimate that the Y value in June 2021 will increase compared with the actual value of the Y value in the previous period, or the classification prediction results estimate the 2021
  • the Y value of the automobile industry in June 2021 fell, and the regression prediction result estimated that the Y value in June 2021 fell compared with the actual value of the previous period. At this time, it means that the regression prediction result is in the same direction as the classification prediction result.
  • the regression prediction result is in the same direction as the classification prediction result, if the target prediction type is the regression type at this time, then the regression prediction result is determined as the target prediction result at this time.
  • the classification prediction result of the classification model is used to assist in verification The accuracy of the regression prediction results improves the accuracy of the final prediction results.
  • the classification prediction result is determined as the target prediction result at this time.
  • the regression prediction result of the regression model is used to assist in verifying the accuracy of the classification prediction result, thereby improving the final prediction. the accuracy of the results.
  • step S150 includes:
  • the target prediction type is a regression type
  • the quality score of the target classification model is greater than the quality score of the target regression model
  • the quality score of the target classification model can be the total or average score of the quality scores corresponding to multiple target classification models.
  • the target regression model's The quality score may be a total score or an average score of quality scores corresponding to multiple target regression models.
  • step S151 includes:
  • the regression result when the direction of the regression prediction result is inconsistent with the classification prediction result, and the target prediction type is regression, and the quality score of the target classification model is greater than the quality score of the target regression model, the regression result needs to be corrected according to the classification result forecast direction.
  • the target regression model can be reselected according to the quality score of the regression model, and all or part of the target classification model can be discarded (if there are multiple target regression models), and the selected quality score is only lower than the original target.
  • the regression model of the classification model serves as the new target regression model.
  • the target regression model and the target classification model have been predicted before, it is necessary to input the target industry factor data into the modified target regression model to obtain the modified regression prediction result.
  • the regression prediction results of the corresponding models can be directly obtained at this time, and there is no need to input the target industry factor data In the modified target regression model.
  • step S1513 Determine whether the direction of the modified regression prediction result is the same as that of the classification prediction result. If yes, perform step S1514; if not, return to step S1511.
  • the modified regression prediction result is determined as the corrected regression prediction result at this time. correction, which improves the accuracy of the predicted results.
  • step S1511 if the direction of the modified regression prediction result is still inconsistent with the direction of the classification prediction result, return to step S1511 until the direction of the modified regression prediction result is consistent with the direction of the classification prediction result, and output the corrected regression prediction result.
  • the target prediction result can be determined by referring to the historical growth rate in the same period of the previous year.
  • the corrected regression prediction result after correcting the regression prediction result according to the classification prediction result and obtaining the corrected regression prediction result, the corrected regression prediction result will be output as the target prediction result.
  • the target prediction type is a classification type
  • the quality score of the target regression model is greater than the quality score of the target classification model
  • step S153 includes:
  • the classification result needs to be corrected according to the regression result forecast direction.
  • the target classification model can be reselected according to the quality score of the classification model, and all or part of the target classification model can be discarded (if there are multiple target classification models, some can be discarded at this time), and the selected quality score is only lower than the original target.
  • the classification model of the classification model serves as the new target classification model.
  • the target regression model and the target classification model have been predicted before, it is necessary to input the target industry factor data into the modified target classification model to obtain the modified classification prediction result.
  • step S1533. Determine whether the direction of the modified classification prediction result is the same as that of the regression prediction result. If yes, perform step S1534; if not, return to step S1531.
  • the modified classification prediction result is determined as the corrected classification prediction result. correction, which improves the accuracy of the predicted results.
  • step S1531 if the direction of the modified regression prediction result is still inconsistent with the direction of the classification prediction result, return to step S1531 until the direction of the modified regression prediction result is consistent with the direction of the classification prediction result, and output the corrected classification prediction result.
  • the target prediction result can be determined by referring to the historical growth trend of the same period of the previous year.
  • the corrected classification prediction result when the classification prediction result is corrected according to the regression prediction result to obtain the corrected classification prediction result, the corrected classification prediction result will be output as the target prediction result.
  • the classification prediction result can be directly output at this time as the target prediction result.
  • Fig. 7 is a schematic flowchart of a data prediction method based on a model set provided by another embodiment of the present application.
  • the data prediction method based on a model set in this embodiment includes steps S210-S270.
  • the steps S230-S270 are similar to the steps S110-S150 in the above embodiment, and will not be repeated here.
  • the steps S210-S220 added in this embodiment will be described in detail below.
  • the model set which includes multiple classification models and multiple regression models.
  • Models wherein the classification model includes models such as support vector machines and random forests, and the regression model includes models such as linear regression and ridge regression.
  • the computer device After the computer device obtains multiple classification models and multiple regression models, it will be based on The plurality of classification models and the plurality of regression models construct a model set.
  • S220 Determine the quality score of each model in the model set according to historical industry factor data and historical results.
  • this step includes inputting historical industry factor data into each model in the model set to obtain the forecast results corresponding to each model; and then determining the quality score of each model in the model set according to the forecast results and historical results.
  • the historical industry factor data in this embodiment corresponds to the target industry, for example, this embodiment needs to predict the industrial increase of the automobile industry, then the historical industry factor data is the historical factor data of the automobile industry at this time, specifically , this embodiment uses the method of backtesting to evaluate the quality score of each model in the model set, as follows:
  • the verification period can be set to 6 months before June 2021 (that is, the verification period is 2020 -12, 2021-01, 2021-02, 2021-03, 2021-04, and 2021-05), and then quantitatively evaluate the performance of each model during the verification period, such as the accuracy of the classification model during the verification period (according to The Y value output by the model is compared with the actual Y value), the error deviation rate of the regression model (obtained by comparing the Y value output by the model with the actual Y value), the execution performance of the model (the speed of obtaining the result, etc.), the model The stability of the results (the degree of deviation of multiple measurements, etc.), etc.
  • the model in this embodiment can perform out-of-date predictions. For example, if the data in April and May 2021 has problems such as a small amount of data and a lag in data update, the data in March 2021 can be used at this time Make a forecast for the Y value in June 2021. It solves the problem that the credibility of the forecast results is doubtful, and the interpretability of the results is slightly weaker due to the low frequency, small amount of data, update lag, and irregular lack of economic data.
  • the model set includes multiple classification models and multiple regression models, the target classification model and regression model are the classification model and regression model with the highest quality score in the model set respectively; then determine whether the direction of the regression prediction result and the classification prediction result is Consistent; if the direction of the regression prediction result is consistent with the direction of the classification prediction result, the target prediction result is determined from the regression prediction result and the classification prediction result according to the target prediction type; if the direction of the regression prediction result is inconsistent with the classification prediction result, according to the preset Correction rules determine the target prediction results.
  • the embodiment of the present application combines the classification model and the regression model to predict the prediction result, and analyzes the same target problem from multiple angles, which can improve the accuracy of the prediction result.
  • the beneficial effects of this application also include: the data prediction method based on the model set provided by this application covers as many classification or regression models applicable to economic structural factors as possible, avoiding the abnormal deviation of a single model; whether it is classification or Regression models are used to predict two types of models, analyze the same target problem from multiple perspectives, and improve the accuracy of predictions; by evaluating the effect of the model, quantifying the evaluation factors, and using the model to train and select the most suitable model under the current situation , consider that different models are suitable for different scenarios, and integrate correction rules to make the prediction results of the model more reasonable.
  • Fig. 8 is a schematic block diagram of a data prediction device based on a model set provided by an embodiment of the present application.
  • the present application also provides a data prediction device based on model set.
  • the data prediction device based on a model set includes a unit for executing the above-mentioned data prediction method based on a model set, and the device can be configured in a terminal such as a desktop computer, a tablet computer, a laptop computer, or the like.
  • the data prediction apparatus based on a model set includes an acquisition unit 801 , an input unit 802 , a first determination unit 803 , a second determination unit 804 and a third determination unit 805 .
  • the input unit 802 is used to respectively input the target industry factor data into the target regression model and the target classification model in the preset model set for result prediction, and respectively obtain regression prediction results and classification prediction results.
  • the model set includes multiple A classification model and a plurality of regression models, the target classification model and the target regression model are respectively the classification model with the highest quality score and the regression model with the highest quality score in the model set;
  • the first determining unit 803 is configured to determine whether the directions of the regression prediction result and the classification prediction result are consistent;
  • the second determining unit 804 is configured to determine a target prediction result from the regression prediction result and the classification prediction result according to the target prediction type when the regression prediction result is in the same direction as the classification prediction result;
  • the third determining unit 805 is configured to determine a target prediction result according to a preset correction rule when the directions of the regression prediction result and the classification prediction result are inconsistent.
  • the third determining unit 805 is specifically configured to:
  • the target prediction type is a regression type
  • the quality score of the target classification model is greater than the quality score of the target regression model
  • the target prediction type is a classification type
  • the quality score of the target regression model is greater than the quality score of the target classification model
  • the corrected classification prediction result is determined as the target prediction result.
  • the third determining unit 805 is configured to:
  • modified regression prediction result is in the same direction as the classification prediction result, then determining the modified regression prediction result as the corrected regression prediction result;
  • the third determining unit 805 is configured to:
  • the obtaining unit 801 is specifically configured to:
  • the first industry factor data is unprocessed industry factor data
  • Fig. 9 is a schematic block diagram of a data prediction device based on a model set provided by another embodiment of the present application. As shown in FIG. 9 , the device for predicting data based on a model set in this embodiment is based on the above embodiments with a construction unit 806 and a fourth determination unit 807 added.
  • a construction unit 806, configured to construct the model set
  • the fourth determining unit 807 is configured to determine the quality score of each model in the model set according to historical industry factor data and historical results.
  • the fourth determining unit 807 is specifically configured to:
  • the quality score of each model in the model set is respectively determined according to the prediction result and the historical result.
  • the above-mentioned data prediction device based on a model set can be implemented in the form of a computer program, and the computer program can be run on a computer device as shown in FIG. 10 .
  • the computer device 1000 may be a terminal or a server, wherein the terminal may be an electronic device with a communication function such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant, and a wearable device.
  • the server can be an independent server, or a server cluster composed of multiple servers.
  • the computer device 1000 includes a processor 1002 connected through a system bus 1001 , a memory and a network interface 1005 , wherein the memory may include a non-volatile storage medium 1003 and an internal memory 1004 .
  • the non-volatile storage medium 1003 can store an operating system 10031 and a computer program 10032 .
  • the computer program 10032 includes program instructions.
  • the processor 1002 can execute a data prediction method based on a model set.
  • the processor 1002 is used to provide computing and control capabilities to support the operation of the entire computer device 1000 .
  • the internal memory 1004 provides an environment for running the computer program 10032 in the non-volatile storage medium 1003.
  • the processor 1002 can execute a data prediction method based on a model set.
  • the network interface 1005 is used for network communication with other devices.
  • the structure shown in FIG. 10 is only a block diagram of a partial structure related to the solution of this application, and does not constitute a limitation on the computer device 1000 to which the solution of this application is applied.
  • the specific computer device 1000 may include more or fewer components than shown, or combine certain components, or have a different arrangement of components.
  • the processor 1002 is configured to run a computer program 10032 stored in the memory, so as to realize the following steps:
  • the model set includes multiple classification models and multiple regression models.
  • model, the target classification model and the target regression model are respectively the classification model with the highest quality score and the regression model with the highest quality score in the model set;
  • regression prediction result is in the same direction as the classification prediction result, then determine the target prediction result from the regression prediction result and the classification prediction result according to the target prediction type;
  • the target prediction result is determined according to a preset correction rule.
  • the processor 1002 when the processor 1002 implements the step of determining the target prediction result according to the preset correction rules, it specifically implements the following steps:
  • the target prediction type is a regression type
  • the quality score of the target classification model is greater than the quality score of the target regression model
  • the target prediction type is a classification type
  • the quality score of the target regression model is greater than the quality score of the target classification model
  • the corrected classification prediction result is determined as the target prediction result.
  • the processor 1002 when the processor 1002 implements the step of correcting the regression prediction result according to the classification prediction result to obtain the corrected regression prediction result, the following steps are specifically implemented:
  • modified regression prediction result is in the same direction as the classification prediction result, then determining the modified regression prediction result as the corrected regression prediction result;
  • the processor 1002 when the processor 1002 implements the step of correcting the classification prediction result according to the regression prediction result to obtain the corrected classification prediction result, the following steps are specifically implemented:
  • the processor 1002 when the processor 1002 implements the step of acquiring target industry factor data, it specifically implements the following steps:
  • the first industry factor data is unprocessed industry factor data
  • the processor 1002 before the processor 1002 implements the step of inputting the target industry factor data into the target regression model and the target classification model in the preset model set for result prediction, the following steps are further implemented:
  • a quality score for each model in the set of models is determined based on historical industry factor data and historical results.
  • the processor 1002 when the processor 1002 implements the step of determining the quality score of each model in the model set according to historical industry factor data and historical results, it specifically implements the following steps:
  • the quality score of each model in the model set is respectively determined according to the prediction result and the historical result.
  • the processor 1002 may be a central processing unit (Central Processing Unit, CPU), and the processor 1002 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), Application Specific Integrated Circuit (ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the computer program includes program instructions, and the computer program can be stored in a storage medium, which is a computer-readable storage medium.
  • the program instructions are executed by at least one processor in the computer system to implement the process steps of the above method embodiments.
  • the present application also provides a storage medium.
  • the storage medium may be a computer-readable storage medium, and the computer-readable storage medium may be non-volatile or volatile.
  • the storage medium stores a computer program, wherein the computer program includes program instructions. When the program instruction is executed by the processor, the processor performs the following steps:
  • the model set includes multiple classification models and multiple regression models.
  • model, the target classification model and the target regression model are respectively the classification model with the highest quality score and the regression model with the highest quality score in the model set;
  • regression prediction result is in the same direction as the classification prediction result, then determine the target prediction result from the regression prediction result and the classification prediction result according to the target prediction type;
  • the target prediction result is determined according to a preset correction rule.
  • the processor executes the program instructions to realize the step of determining the target prediction result according to the preset correction rules, the following steps are specifically realized:
  • the target prediction type is a regression type
  • the quality score of the target classification model is greater than the quality score of the target regression model
  • the target prediction type is a classification type
  • the quality score of the target regression model is greater than the quality score of the target classification model
  • the corrected classification prediction result is determined as the target prediction result.
  • the processor executes the program instructions to implement the step of correcting the regression prediction result according to the classification prediction result to obtain the corrected regression prediction result, the following steps are specifically implemented:
  • modified regression prediction result is in the same direction as the classification prediction result, then determining the modified regression prediction result as the corrected regression prediction result;
  • the processor executes the program instructions to implement the step of correcting the classification prediction result according to the regression prediction result to obtain the corrected classification prediction result, the following steps are specifically implemented:
  • the first industry factor data is unprocessed industry factor data
  • the processor executes the program instructions to implement the step of inputting the target industry factor data into the target regression model and the target classification model in the preset model set respectively for result prediction, further Implement the following steps:
  • a quality score for each model in the set of models is determined based on historical industry factor data and historical results.
  • the processor executes the program instructions to realize the step of determining the quality score of each model in the model set according to historical industry factor data and historical results, the following steps are specifically implemented:
  • the quality score of each model in the model set is respectively determined according to the prediction result and the historical result.
  • the storage medium can be a computer-readable storage medium that can store program codes such as a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk, or an optical disk.
  • program codes such as a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk, or an optical disk.
  • the disclosed devices and methods may be implemented in other ways.
  • the device embodiments described above are illustrative only.
  • the division of each unit is only a logical function division, and there may be another division method in actual implementation.
  • several units or components may be combined or integrated into another system, or some features may be omitted, or not implemented.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a storage medium.
  • the technical solution of the present application is essentially or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of software products, and the computer software products are stored in a storage medium.
  • several instructions are included to make a computer device (which may be a personal computer, a terminal, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Theoretical Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Game Theory and Decision Science (AREA)
  • Tourism & Hospitality (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Educational Administration (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请实施例公开了一种基于模型集合的数据预测方法、装置、计算机设备及存储介质。方法包括:首先获取目标行业因子数据;然后将目标行业因子数据分别输入预设的模型集合中的目标回归模型以及目标分类模型进行结果预测,分别得到回归预测结果以及分类预测结果;再确定回归预测结果与分类预测结果的方向是否一致;若回归预测结果与分类预测结果的方向一致,则根据目标预测类型从回归预测结果以及分类预测结果中确定目标预测结果;若回归预测结果与分类预测结果的方向不一致,则根据预设的校正规则确定目标预测结果。本申请实施例结合了分类模型以及回归模型对预测结果进行预测,从多角度分析同一个目标问题,可以提高预测结果的准确性。

Description

基于模型集合的数据预测方法、装置、设备及存储介质
本申请要求于2021年5月31日提交中国专利局、申请号为202110600017.0,申请名称为“基于模型集合的数据预测方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能的数据预测技术领域,尤其涉及一种基于模型集合的数据预测方法、装置、计算机设备及存储介质。
背景技术
在经济研究中通常需要对工业增加值或者产能产值等未来的变化趋势进行一定的预判。
除传统的根据专家经验对未来的变化趋势进行预判以外,通常会使用可量化的经济因子对未来经济走势进行预测。发明人发现目前经济研究中一般使用的预测模型都是较为传统的模拟推演或简单的线性回归等机器学习模型,预测的结果出错不易被发现,导致预测结果不准确。
发明内容
本申请实施例提供了一种基于模型集合的数据预测方法、装置、计算机设备及存储介质,可以提高预测结果的准确性。
第一方面,本申请实施例提供了一种基于模型集合的数据预测方法,其包括:
获取目标行业因子数据;
将所述目标行业因子数据分别输入预设的模型集合中的目标回归模型以及目标分类模型进行结果预测,分别得到回归预测结果以及分类预测结果,所述模型集合包括多个分类模型以及多个回归模型,所述目标分类模型以及所述目标回归模型分别为所述模型集合中质量分数最高的分类模型以及质量分数最高的回归模型;
确定所述回归预测结果与所述分类预测结果的方向是否一致;
若所述回归预测结果与所述分类预测结果的方向一致,则根据目标预测类型从所述回归预测结果以及所述分类预测结果中确定目标预测结果;
若所述回归预测结果与所述分类预测结果的方向不一致,则根据预设的校正规则确定目标预测结果。
第二方面,本申请实施例还提供了一种基于模型集合的数据预测装置,其包:
获取单元,用于获取目标行业因子数据;
输入单元,用于将所述目标行业因子数据分别输入预设的模型集合中的目标回归模型以及目标分类模型进行结果预测,分别得到回归预测结果以及分类预测结果,所述模型集合包括多个分类模型以及多个回归模型,所述目标分类模型以及所述目标回归模型分别为所述模型集合中质量分数最高的分类模型以及质量分数最高的回归模型;
第一确定单元,用于确定所述回归预测结果与所述分类预测结果的方向是否一致;
第二确定单元,用于当所述回归预测结果与所述分类预测结果的方向一致时,根据目标预测类型从所述回归预测结果以及所述分类预测结果中确定目标预测结果;
第三确定单元,用于当所述回归预测结果与所述分类预测结果的方向不一致时,根据预设的校正规则确定目标预测结果。
第三方面,本申请实施例还提供了一种计算机设备,其包括存储器及处理器,所述存储器上存储有计算机程序,所述处理器执行所述计算机程序时执行以下步骤:
获取目标行业因子数据;
将所述目标行业因子数据分别输入预设的模型集合中的目标回归模型以及目标分类模型进行结果预测,分别得到回归预测结果以及分类预测结果,所述模型集合包括多个分类模型以及多个回归模型,所述目标分类模型以及所述目标回归模型分别为所述模型集合中质量 分数最高的分类模型以及质量分数最高的回归模型;
确定所述回归预测结果与所述分类预测结果的方向是否一致;
若所述回归预测结果与所述分类预测结果的方向一致,则根据目标预测类型从所述回归预测结果以及所述分类预测结果中确定目标预测结果;
若所述回归预测结果与所述分类预测结果的方向不一致,则根据预设的校正规则确定目标预测结果。
第四方面,本申请实施例还提供了一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时执行以下步骤:
获取目标行业因子数据;
将所述目标行业因子数据分别输入预设的模型集合中的目标回归模型以及目标分类模型进行结果预测,分别得到回归预测结果以及分类预测结果,所述模型集合包括多个分类模型以及多个回归模型,所述目标分类模型以及所述目标回归模型分别为所述模型集合中质量分数最高的分类模型以及质量分数最高的回归模型;
确定所述回归预测结果与所述分类预测结果的方向是否一致;
若所述回归预测结果与所述分类预测结果的方向一致,则根据目标预测类型从所述回归预测结果以及所述分类预测结果中确定目标预测结果;
若所述回归预测结果与所述分类预测结果的方向不一致,则根据预设的校正规则确定目标预测结果。
本申请实施例提供了一种基于模型集合的数据预测方法、装置、计算机设备及存储介质。本申请实施例结合了分类模型以及回归模型对预测结果进行预测,从多角度分析同一个目标问题,可以提高预测结果的准确性。
附图说明
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的基于模型集合的数据预测方法的应用场景示意图;
图2为本申请实施例提供的基于模型集合的数据预测方法的一流程示意图;
图3为本申请实施例提供的基于模型集合的数据预测方法的一子流程示意图;
图4为本申请实施例提供的基于模型集合的数据预测方法的另一子流程示意图;
图5为本申请实施例提供的基于模型集合的数据预测方法的另一子流程示意图;
图6为本申请实施例提供的基于模型集合的数据预测方法的另一子流程示意图;
图7为本申请另一实施例提供的基于模型集合的数据预测方法的一流程示意图;
图8为本申请实施例提供的基于模型集合的数据预测装置的一示意性框图;
图9为本申请另一实施例提供的基于模型集合的数据预测装置的一示意性框图;
图10为本申请实施例提供的计算机设备的示意性框图。
具体实施方式
本申请实施例提供了一种基于模型集合的数据预测方法、装置、计算机设备及存储介质。
该基于模型集合的数据预测方法的执行主体可以是本申请实施例提供的基于模型集合的数据预测装置,或者集成了该基于模型集合的数据预测装置的计算机设备,其中,该基于模型集合的数据预测装置可以采用硬件或者软件的方式实现,该计算机设备可以为终端或服务器,该终端可以是智能手机、平板电脑、掌上电脑、或者笔记本电脑等。
请参阅图1,图1为本申请实施例提供的基于模型集合的数据预测方法的应用场景示意图。该基于模型集合的数据预测方法应用于图1中的计算机设备10中,该计算机设备10中存储有与目标业务对应的模型集合,该模型集合包括多个分类模型以及多个回归模型,其中,分类模型中质量分数最高的模型为目标分类模型,回归模型中质量分数最高的模型为目标回归模型,本实施例中,当计算机设备10获取未经处理的第一行业因子数据之后,会对第一行 业因子数据进行包括缺失值处理、频率转换等的数据预处理,得到第二行业因子数据,并对第二行业因子数据进行特征工程处理,得到需要输入模型集合中的模型的目标行业因子数据,然后将该目标行业因子数据分别输入预设的模型集合中的目标回归模型以及目标分类模型,得到回归预测结果以及分类预测结果,最后根据回归预测结果与分类预测结果的方向是否一致,若一致,则根据目标预测类型从回归预测结果以及分类预测结果中确定目标预测结果,若不一致,则根据预设的校正规则确定目标预测结果。
请参阅图2,图2是本申请实施例提供的基于模型集合的数据预测方法的流程示意图。如图2所示,该方法包括以下步骤S110-150。
S110、获取目标行业因子数据。
本实施例中,目标行业因子数据可以为根据行业特征,产业逻辑等构建数据,其中行业特征可以包括这个行业的营收,市值,市场规模等数据,产业逻辑即分析该行业的上下游具体是什么,上游的原材料会是哪些,原材料具体的价格产量,下游的需求规模等数据,比如汽车行业,上游的钢材产量,橡胶价格等,下游的主要车型销售量,车载电器销售额等,本实施例将选取该行业特征以及产业逻辑对应的数据作为行业因子数据。
具体地,本实施例中的计算机设备可以从对应的数据库中自动提取行业因子数据,或者通过用户手动输入该行业因子数据。
在一些实施例中,如图3所示,步骤S110包括:
S111、获取第一行业因子数据。
本实施例中,第一行业因子数据为未经处理的行业因子数据,即从数据库中获取的原始的行业因子数据,如汽车行业原始的上游原料价格数据、下游需求数据、营收数据以及市值数据等。
S112、对第一行业因子数据进行数据预处理,得到第二行业因子数据。
本实施例中,数据预处理包括第一行业因子数据的缺失值处理以及第一行业因子数据的频率转换处理,其中,可以使用数据模拟的方法进行数据的缺失值处理,例如,数据在某个时间段的值是缺失的,可以模拟上一年同一时间段的数据的变化曲线进行数据的填补,从而实现数据的缺失值处理,在一些实施例中,本实施中的频率转换处理可以为将以天、周等单位的数据统一转换成以月为单位的数据等,步骤S112实现对所有的因子进行结构化的有效处理。
S113、对第二行业因子数据进行特征工程处理,得到目标行业因子数据。
本实施例中,该步骤主要是对结构化后的因子(第二行业因子)做特征工程处理,结合因子重要性,时效性等特征,评估因子有效性权重,或通过处理合成新型有效因子。
本实施例中,以X因子为行业因子数据,Y为预测结果为例进行说明,即X为输入模型的数据,Y为模型输出的数据,其中,本实施例中的Y值具体可以为工业增加值或产能产值等,具体可以根据目标进行调整。
针对汽车行业景气度有效因子的评估,主要进行评估X因子的数据披露质量(更新时间和发布周期的时间差,顺序周期的更新时间差是否稳定),数据缺失程度(近期是否有稳定正常更新),X因子与Y相关性,X因子间相关性(是否存在大量同类因子),X因子过多时需进行主成分分析(Principal Component Analysis,PCA)处理,保留主成分或有效合成的新因子,进行初步的因子筛选,最终保留的X因子为本实施例中需要输入模型集合中模型的行业因子数据。
S120、将该目标行业因子数据分别输入预设的模型集合中的目标回归模型以及目标分类模型,分别得到回归预测结果以及分类预测结果。
其中,该模型集合包括多个分类模型以及多个回归模型,该目标分类模型以及该回归模型分别为该模型集合中质量分数最高的分类模型以及质量分数最高的回归模型。
本实施例中,目标回归模型以及目标分类模型为n个,n为大于或等于1的整数,当n大于1时,目标回归模型为回归模型中分数最高的前n个模型,目标分类模型也为分类模型 中分数最高的前n个模型,此时,回归预测结果可以为n个回归预测结果的均值,或者去除n个回归预测结果中的最高值以及最低值之后的均值,分类预测结果可以为n个分类预测结果的均值,或者去除n个分类预测结果中的最高值以及最低值之后的均值。
在一些实施例中,当构建模型集合并根据对应的历史行业因子及历史结果确定了模型集合中每个模型的质量分数之后,会将目标行业因子数据分别输入目标回归模型以及目标分类模型中,不需同时进行模型集合中所有模型的预测,当后面回归预测结果与该分类预测结果的方向不一致时,再触发其他模型的预测,降低计算机设备的功耗。
在一些实施例中,当构建模型集合并根据对应的历史行业因子及历史结果确定了模型集合中每个模型的质量分数之后,会将目标行业因子数据分别输入模型结合中的所有模型中,获取模型集合中所有模型的预测结果,方便后续当回归预测结果与该分类预测结果的方向不一致时,根据校正规则进行预测结果的调用,提高预测速度。
S130、确定回归预测结果与分类预测结果的方向是否一致,若是,则执行步骤S140,若否,则执行步骤S150。
本实施例中,若通过分类预测结果预估2021年6月汽车行业的Y值上涨,而回归预测结果预估2021年6月的Y值较上一期的Y值真实值下跌,或者通过分类预测结果预估2021年6月汽车行业的Y值下跌,而回归预测结果预估2021年6月的Y值较上一期的Y值真实值上涨,此时,则说明回归预测结果与该分类预测结果的方向不一致。
若通过分类预测结果预估2021年6月汽车行业的Y值上涨,并且回归预测结果预估2021年6月的Y值较上一期的Y值真实值上涨,或者通过分类预测结果预估2021年6月汽车行业的Y值下跌,并且回归预测结果预估2021年6月的Y值较上一期真实值下跌,此时,则说明回归预测结果与该分类预测结果的方向一致。
S140、根据目标预测类型从该回归预测结果以及该分类预测结果中确定目标预测结果。
当回归预测结果与分类预测结果的方向一致,若此时目标预测类型为回归类型,则此时将该回归预测结果确定为目标预测结果,本实施例中通过分类模型的分类预测结果辅助验证了回归预测结果的准确度,从而提高了最终预测结果的准确性。
若此时目标预测类型为分类类型,则此时将该分类预测结果确定为目标预测结果,本实施例中通过回归模型的回归预测结果辅助验证了分类预测结果的准确度,从而提高了最终预测结果的准确性。
S150、根据预设的校正规则确定目标预测结果。
本实施例中,当回归预测结果与分类预测结果的方向不一致时,这种情况下会触发矫正规则,放大调整Y值预测结果。
在一些实施例中,如图4所示,步骤S150包括:
S151、若目标预测类型为回归类型,且该目标分类模型的质量分数大于该目标回归模型的质量分数,则根据该分类预测结果校正该回归预测结果,得到校正后的回归预测结果。
其中,当目标分类模型以及目标回归分类模型均有多个时,此时目标分类模型的质量分数可以为多个目标分类模型对应的质量分数的总分或平均分,同理,目标回归模型的质量分数可以为多个目标回归模型对应的质量分数的总分或平均分。
具体地,在一些实施例中,请参阅图5,步骤S151包括:
S1511、根据模型集合中回归模型的质量分数更改目标回归模型,得到更改后的目标回归模型。
本实施例中,当回归预测结果与分类预测结果的方向不一致,且目标预测类型为回归,该目标分类模型的质量分数大于该目标回归模型的质量分数时,此时需要根据分类结果更正回归结果的预测方向。
具体地,可以根据回归模型质量分数的高低重新选取目标回归模型,抛弃全部或部分(若目标回归模型有多个时,此时可以抛弃部分)的目标分类模型,选取质量分数仅低于原目标分类模型的回归模型作为新的目标回归模型。
S1512、将目标行业因子数据输入更改后的目标回归模型,得到更改后的回归预测结果。
在一些实施例中,若之前只对目标回归模型以及目标分类模型进行了预测,则此时需要再将目标行业因子数据输入更改后的目标回归模型中,得到更改后的回归预测结果。
在另一些实施例中,若之前已经根据目标行业因子数据对模型集合中所有的模型进行了结果的预测,则此时直接获取对应模型的回归预测结果即可,无须再将目标行业因子数据输入更改后的目标回归模型中。
S1513、确定更改后的回归预测结果与分类预测结果的方向是否一致,若是,则执行步骤S1514,若否,则返回执行步骤S1511。
S1514、将更改后的回归预测结果确定为校正后的回归预测结果。
如果更改后的回归预测结果与分类预测结果的方向一致,则此时将更改后的回归预测结果确定为校正后的回归预测结果,此时,本实施例通过分类预测结果对回归预测结果进行了校正,提高了预测结果的准确性。
本实施例中,如果更改后的回归预测结果与分类预测结果的方向还是不一致,则此时返回执行步骤S1511,直到更改后的回归预测结果与分类预测结果的方向一致,输出校正后的回归预测结果。
需要说明的是,若存在有遍历了所有的回归模型之后,还是没有得到回归预测结果与分类预测结果的方向一致的回归预测结果,此时,可以参考往年同期的历史增长率确定目标预测结果。
S152、将该校正后的回归预测结果确定为该目标预测结果。
本实施例中,当根据该分类预测结果校正该回归预测结果,得到校正后的回归预测结果之后,将会输出该校正后的回归预测结果作为目标预测结果。
S153、若目标预测类型为分类类型,且该目标回归模型的质量分数大于该目标分类模型的质量分数,则根据该回归预测结果校正该分类预测结果,得到校正后的分类预测结果。
具体地,在一些实施例中,请参阅图6,步骤S153包括:
S1531、根据模型集合中分类模型的质量分数更改目标分类模型,得到更改后的目标分类模型。
本实施例中,当回归预测结果与分类预测结果的方向不一致,且目标预测类型为分类,该目标回归模型的质量分数大于该目标分类模型的质量分数时,此时需要根据回归结果更正分类结果的预测方向。
具体地,可以根据分类模型质量分数的高低重新选取目标分类模型,抛弃全部或部分(若目标分类模型有多个时,此时可以抛弃部分)的目标分类模型,选取质量分数仅低于原目标分类模型的分类模型作为新的目标分类模型。
S1532、将目标行业因子数据输入更改后的目标分类模型,得到的更改后的分类预测结果。
在一些实施例中,若之前只对目标回归模型以及目标分类模型进行了预测,则此时则需要将目标行业因子数据输入更改后的目标分类模型中,得到更改后的分类预测结果。
在另一些实施例中,若之前已经根据目标行业因子数据对模型集合中所有的模型进行了结果的预测,则此时直接获取对应模型的分类预测结果即可,无须再将目标行业因子数据输入更改后的目标分类模型中。
S1533、确定更改后的分类预测结果与回归预测结果的方向是否一致,若是,则执行步骤S1534,若否,则返回执行步骤S1531。
S1534、将更改后的分类预测结果确定为校正后的分类预测结果。
如果更改后的回归预测结果与分类预测结果的方向一致,则此时将更改后的分类预测结果确定为校正后的分类预测结果,此时,本实施例通过回归预测结果对分类预测结果进行了校正,提高了预测结果的准确性。
本实施例中,如果更改后的回归预测结果与分类预测结果的方向还是不一致,则此时返回执行步骤S1531,直到更改后的回归预测结果与分类预测结果的方向一致,输出校正后的 分类预测结果。
需要说明的是,若存在有遍历了所有的分类模型之后,还是没有得到回归预测结果与分类预测结果的方向一致的回归预测结果,此时,可以参考往年同期的历史增长趋势确定目标预测结果。
S154、将该校正后的分类预测结果确定为该目标预测结果。
本实施例中,当根据该回归预测结果校正该分类预测结果,得到校正后的分类预测结果之后,将会输出该校正后的分类预测结果作为目标预测结果。
需要说明的是,本实施例中,如果回归预测结果与分类预测结果的方向不一致,当目标预测类型为回归类型,并且此时目标回归模型的质量分数高于目标分类模型的质量分数时,此时可以直接输出回归预测结果作为目标预测结果。
同理,如果回归预测结果与分类预测结果的方向不一致,当目标预测类型为分类类型,并且此时目标分类模型的质量分数高于目标回归模型的质量分数时,此时可以直接输出分类预测结果作为目标预测结果。
图7是本申请另一实施例提供的一种基于模型集合的数据预测方法的流程示意图。如图7所示,本实施例的基于模型集合的数据预测方法包括步骤S210-S270。其中步骤S230-S270与上述实施例中的步骤S110-S150类似,在此不再赘述。下面详细说明本实施例中所增加的步骤S210-S220。
S210、构建模型集合。
本实施例中,在实现步骤将目标行业因子数据分别输入预设的模型集合中的目标回归模型以及目标分类模型之前,需要进行模型集合的构建,模型集合中包括多个分类模型以及多个回归模型,其中,分类模型包括支持向量机和随机森林等模型,回归模型包括有线性回归和岭回归等模型,本实施例中计算机设备获取到多个分类模型以及多个回归模型之后,将会根据该多个分类模型以及多个回归模型构建模型集合。
S220、根据历史行业因子数据以及历史结果确定模型集合中每个模型的质量分数。
具体地,该步骤包括将历史行业因子数据输入模型集合中每个模型中,得到每个模型分别对应的预测结果;然后根据预测结果以及历史结果分别确定模型集合中每个模型的质量分数。
其中,本实施例中的历史行业因子数据与目标行业相对应,例如,本实施例需要对汽车行业的工业增加进行预测,则此时历史行业因子数据为该汽车行业的历史因子数据,具体地,本实施例使用回测的方法进行模型集合中每个模型的质量分数的评估,如下:
设置验证期,回测同等条件下的分类模型以及回归模型的预测效果,将单期分类模型的准确率,回归模型的误差偏离程度等量化为模型的评估因子。以目标为预测2021年6月汽车行业的工业增加值为例,当下为2021年5月31日,则此时可以设置2021年6月往前的6个月为验证期(即验证期为2020-12、2021-01、2021-02、2021-03、2021-04以及2021-05),然后针对每个模型在验证期的表现做量化评估,如验证期内的分类模型的准确率(根据模型输出的Y值与实际Y值做比对得到),回归模型的误差偏离率(根据模型输出的Y值与实际Y值做比对得到),模型执行性能(得到结果的速度等),模型结果的稳定性(多次测量的偏差程度等)等。
需要说明的是,本实施例中的历史行业因子数据在输入模型集合中的模型进行检验之前,需要对历史行业因子数据做数据预处理以及特征工程处理,具体处理方式可参考步骤S112以及S113,具体此处不做赘述。
需要说明的是,本实施例中的模型可以进行错期预测,例如,如果2021年4月及5月的数据存在数据量少,数据更新滞后等问题,此时可以利用2021年3月的数据对2021年6月的Y值进行预测。解决了因经济类数据普遍存在频率低、数据量少、更新滞后以及不定时缺失等,导致的预测结果的可信度存疑,更导致结果的可解释性略弱的问题。
综上所述,本申请实施例中,首先获取目标行业因子数据;然后将目标行业因子数据分 别输入预设的模型集合中的目标回归模型以及目标分类模型进行结果预测,分别得到回归预测结果以及分类预测结果,模型集合包括多个分类模型以及多个回归模型,目标分类模型以及回归模型分别为模型集合中质量分数最高的分类模型以及回归模型;再确定回归预测结果与分类预测结果的方向是否一致;若回归预测结果与分类预测结果的方向一致,则根据目标预测类型从回归预测结果以及分类预测结果中确定目标预测结果;若回归预测结果与分类预测结果的方向不一致,则根据预设的校正规则确定目标预测结果。本申请实施例结合了分类模型以及回归模型对预测结果进行预测,从多角度分析同一个目标问题,可以提高预测结果的准确性。
此外,本申请的有益效果还包括:本申请提供的基于模型集合的数据预测方法涵盖尽可能多的适用于经济结构化因子的分类或回归模型,避免了单一模型的异常偏差;不管是分类或回归模型,都进行两种类型模型的预测,从多角度分析同一个目标问题,提高预测加过的准确性;通过评估模型效果,量化评估因子,用模型来训练选择当前情况下最适合的模型,考虑不同模型适用于不同的场景,并且融合矫正规则,让模型的预测结果更合理。
图8是本申请实施例提供的一种基于模型集合的数据预测装置的示意性框图。如图8所示,对应于以上基于模型集合的数据预测方法,本申请还提供一种基于模型集合的数据预测装置。该基于模型集合的数据预测装置包括用于执行上述基于模型集合的数据预测方法的单元,该装置可以被配置于台式电脑、平板电脑、手提电脑、等终端中。具体地,请参阅图8,该基于模型集合的数据预测装置包括获取单元801、输入单元802、第一确定单元803、第二确定单元804以及第三确定单元805。
获取单元801,用于获取目标行业因子数据;
输入单元802,用于将所述目标行业因子数据分别输入预设的模型集合中的目标回归模型以及目标分类模型进行结果预测,分别得到回归预测结果以及分类预测结果,所述模型集合包括多个分类模型以及多个回归模型,所述目标分类模型以及所述目标回归模型分别为所述模型集合中质量分数最高的分类模型以及质量分数最高的回归模型;
第一确定单元803,用于确定所述回归预测结果与所述分类预测结果的方向是否一致;
第二确定单元804,用于当所述回归预测结果与所述分类预测结果的方向一致时,根据目标预测类型从所述回归预测结果以及所述分类预测结果中确定目标预测结果;
第三确定单元805,用于当所述回归预测结果与所述分类预测结果的方向不一致时,根据预设的校正规则确定目标预测结果。
在一些实施例中,所述第三确定单元805具体用于:
若所述目标预测类型为回归类型,且所述目标分类模型的质量分数大于所述目标回归模型的质量分数,则根据所述分类预测结果校正所述回归预测结果,得到校正后的回归预测结果;
将所述校正后的回归预测结果确定为所述目标预测结果;
若所述目标预测类型为分类类型,且所述目标回归模型的质量分数大于所述目标分类模型的质量分数,则根据所述回归预测结果校正所述分类预测结果,得到校正后的分类预测结果;
将所述校正后的分类预测结果确定为所述目标预测结果。
在一些实施例中,更具体地,所述第三确定单元805用于:
根据模型集合中回归模型的质量分数更改目标回归模型,得到更改后的目标回归模型;
将所述目标行业因子数据输入所述更改后的目标回归模型,得到更改后的回归预测结果;
若所述更改后的回归预测结果与所述分类预测结果的方向一致,则将所述更改后的回归预测结果确定为所述校正后的回归预测结果;
若所述更改后的回归预测结果与所述分类预测结果的方向不一致,则返回执行所述根据模型集合中回归模型的质量分数更改目标回归模型的步骤。
在一些实施例中,更具体地,所述第三确定单元805用于:
根据模型集合中分类模型的质量分数更改目标分类模型,得到更改后的目标分类模型;
将所述目标行业因子数据输入所述更改后的目标分类模型,得到更改后的分类预测结果;
若所述更改后的分类预测结果与所述回归预测结果的方向一致,则将所述更改后的分类预测结果确定为所述校正后的分类预测结果;
若所述更改后的分类预测结果与所述回归预测结果的方向不一致,则返回执行所述根据模型集合中分类模型的质量分数更改目标分类模型的步骤。
在一些实施例中,所述获取单元801具体用于:
获取第一行业因子数据,所述第一行业因子数据为未经处理的行业因子数据;
对所述第一行业因子数据进行数据预处理,得到第二行业因子数据;
对所述第二行业因子数据进行特征工程处理,得到所述目标行业因子数据。
图9是本申请另一实施例提供的一种基于模型集合的数据预测装置的示意性框图。如图9所示,本实施例的基于模型集合的数据预测装置是上述实施例的基础上增加了构建单元806以及第四确定单元807。
构建单元806,用于构建所述模型集合;
第四确定单元807,用于根据历史行业因子数据以及历史结果确定所述模型集合中每个模型的质量分数。
在一些实施例中,所述第四确定单元807具体用于:
将所述历史行业因子数据输入所述模型集合中每个模型中,得到每个模型分别对应的预测结果;
根据所述预测结果以及所述历史结果分别确定所述模型集合中每个模型的质量分数。
需要说明的是,所属领域的技术人员可以清楚地了解到,上述基于模型集合的数据预测装置和各单元的具体实现过程,可以参考前述方法实施例中的相应描述,为了描述的方便和简洁,在此不再赘述。
上述基于模型集合的数据预测装置可以实现为一种计算机程序的形式,该计算机程序可以在如图10所示的计算机设备上运行。
请参阅图10,图10是本申请实施例提供的一种计算机设备的示意性框图。该计算机设备1000可以是终端,也可以是服务器,其中,终端可以是智能手机、平板电脑、笔记本电脑、台式电脑、个人数字助理和穿戴式设备等具有通信功能的电子设备。服务器可以是独立的服务器,也可以是多个服务器组成的服务器集群。
参阅图10,该计算机设备1000包括通过系统总线1001连接的处理器1002、存储器和网络接口1005,其中,存储器可以包括非易失性存储介质1003和内存储器1004。
该非易失性存储介质1003可存储操作系统10031和计算机程序10032。该计算机程序10032包括程序指令,该程序指令被执行时,可使得处理器1002执行一种基于模型集合的数据预测方法。
该处理器1002用于提供计算和控制能力,以支撑整个计算机设备1000的运行。
该内存储器1004为非易失性存储介质1003中的计算机程序10032的运行提供环境,该计算机程序10032被处理器1002执行时,可使得处理器1002执行一种基于模型集合的数据预测方法。
该网络接口1005用于与其它设备进行网络通信。本领域技术人员可以理解,图10中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备1000的限定,具体的计算机设备1000可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
其中,所述处理器1002用于运行存储在存储器中的计算机程序10032,以实现如下步骤:
获取目标行业因子数据;
将所述目标行业因子数据分别输入预设的模型集合中的目标回归模型以及目标分类模型进行结果预测,分别得到回归预测结果以及分类预测结果,所述模型集合包括多个分类模型 以及多个回归模型,所述目标分类模型以及所述目标回归模型分别为所述模型集合中质量分数最高的分类模型以及质量分数最高的回归模型;
确定所述回归预测结果与所述分类预测结果的方向是否一致;
若所述回归预测结果与所述分类预测结果的方向一致,则根据目标预测类型从所述回归预测结果以及所述分类预测结果中确定目标预测结果;
若所述回归预测结果与所述分类预测结果的方向不一致,则根据预设的校正规则确定目标预测结果。
在一实施例中,处理器1002在实现所述根据预设的校正规则确定目标预测结果步骤时,具体实现如下步骤:
若所述目标预测类型为回归类型,且所述目标分类模型的质量分数大于所述目标回归模型的质量分数,则根据所述分类预测结果校正所述回归预测结果,得到校正后的回归预测结果;
将所述校正后的回归预测结果确定为所述目标预测结果;
若所述目标预测类型为分类类型,且所述目标回归模型的质量分数大于所述目标分类模型的质量分数,则根据所述回归预测结果校正所述分类预测结果,得到校正后的分类预测结果;
将所述校正后的分类预测结果确定为所述目标预测结果。
在一实施例中,处理器1002在实现所述根据所述分类预测结果校正所述回归预测结果,得到校正后的回归预测结果步骤时,具体实现如下步骤:
根据模型集合中回归模型的质量分数更改目标回归模型,得到更改后的目标回归模型;
将所述目标行业因子数据输入所述更改后的目标回归模型,得到更改后的回归预测结果;
若所述更改后的回归预测结果与所述分类预测结果的方向一致,则将所述更改后的回归预测结果确定为所述校正后的回归预测结果;
若所述更改后的回归预测结果与所述分类预测结果的方向不一致,则返回执行所述根据模型集合中回归模型的质量分数更改目标回归模型的步骤。
在一实施例中,处理器1002在实现所述根据所述回归预测结果校正所述分类预测结果,得到校正后的分类预测结果步骤时,具体实现如下步骤:
根据模型集合中分类模型的质量分数更改目标分类模型,得到更改后的目标分类模型;
将所述目标行业因子数据输入所述更改后的目标分类模型,得到更改后的分类预测结果;
若所述更改后的分类预测结果与所述回归预测结果的方向一致,则将所述更改后的分类预测结果确定为所述校正后的分类预测结果;
若所述更改后的分类预测结果与所述回归预测结果的方向不一致,则返回执行所述根据模型集合中分类模型的质量分数更改目标分类模型的步骤。
在一实施例中,处理器1002在实现所述获取目标行业因子数据步骤时,具体实现如下步骤:
获取第一行业因子数据,所述第一行业因子数据为未经处理的行业因子数据;
对所述第一行业因子数据进行数据预处理,得到第二行业因子数据;
对所述第二行业因子数据进行特征工程处理,得到所述目标行业因子数据。
在一实施例中,处理器1002在实现所述将所述目标行业因子数据分别输入预设的模型集合中的目标回归模型以及目标分类模型进行结果预测步骤之前,还实现如下步骤:
构建所述模型集合;
根据历史行业因子数据以及历史结果确定所述模型集合中每个模型的质量分数。
在一实施例中,处理器1002在实现所述根据历史行业因子数据以及历史结果确定所述模型集合中每个模型的质量分数步骤时,具体实现如下步骤:
将所述历史行业因子数据输入所述模型集合中每个模型中,得到每个模型分别对应的预测结果;
根据所述预测结果以及所述历史结果分别确定所述模型集合中每个模型的质量分数。
应当理解,在本申请实施例中,处理器1002可以是中央处理单元(Central Processing Unit,CPU),该处理器1002还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
本领域普通技术人员可以理解的是实现上述实施例的方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成。该计算机程序包括程序指令,计算机程序可存储于一存储介质中,该存储介质为计算机可读存储介质。该程序指令被该计算机系统中的至少一个处理器执行,以实现上述方法的实施例的流程步骤。
因此,本申请还提供一种存储介质。该存储介质可以为计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性。该存储介质存储有计算机程序,其中计算机程序包括程序指令。该程序指令被处理器执行时使处理器执行如下步骤:
获取目标行业因子数据;
将所述目标行业因子数据分别输入预设的模型集合中的目标回归模型以及目标分类模型进行结果预测,分别得到回归预测结果以及分类预测结果,所述模型集合包括多个分类模型以及多个回归模型,所述目标分类模型以及所述目标回归模型分别为所述模型集合中质量分数最高的分类模型以及质量分数最高的回归模型;
确定所述回归预测结果与所述分类预测结果的方向是否一致;
若所述回归预测结果与所述分类预测结果的方向一致,则根据目标预测类型从所述回归预测结果以及所述分类预测结果中确定目标预测结果;
若所述回归预测结果与所述分类预测结果的方向不一致,则根据预设的校正规则确定目标预测结果。
在一实施例中,所述处理器在执行所述程序指令而实现所述根据预设的校正规则确定目标预测结果步骤时,具体实现如下步骤:
若所述目标预测类型为回归类型,且所述目标分类模型的质量分数大于所述目标回归模型的质量分数,则根据所述分类预测结果校正所述回归预测结果,得到校正后的回归预测结果;
将所述校正后的回归预测结果确定为所述目标预测结果;
若所述目标预测类型为分类类型,且所述目标回归模型的质量分数大于所述目标分类模型的质量分数,则根据所述回归预测结果校正所述分类预测结果,得到校正后的分类预测结果;
将所述校正后的分类预测结果确定为所述目标预测结果。
在一实施例中,所述处理器在执行所述程序指令而实现所述根据所述分类预测结果校正所述回归预测结果,得到校正后的回归预测结果步骤时,具体实现如下步骤:
根据模型集合中回归模型的质量分数更改目标回归模型,得到更改后的目标回归模型;
将所述目标行业因子数据输入所述更改后的目标回归模型,得到更改后的回归预测结果;
若所述更改后的回归预测结果与所述分类预测结果的方向一致,则将所述更改后的回归预测结果确定为所述校正后的回归预测结果;
若所述更改后的回归预测结果与所述分类预测结果的方向不一致,则返回执行所述根据模型集合中回归模型的质量分数更改目标回归模型的步骤。
在一实施例中,所述处理器在执行所述程序指令而实现所述根据所述回归预测结果校正所述分类预测结果,得到校正后的分类预测结果步骤时,具体实现如下步骤:
根据模型集合中分类模型的质量分数更改目标分类模型,得到更改后的目标分类模型;
将所述目标行业因子数据输入所述更改后的目标分类模型,得到更改后的分类预测结果;
若所述更改后的分类预测结果与所述回归预测结果的方向一致,则将所述更改后的分类预测结果确定为所述校正后的分类预测结果;
若所述更改后的分类预测结果与所述回归预测结果的方向不一致,则返回执行所述根据模型集合中分类模型的质量分数更改目标分类模型的步骤。
在一实施例中,所述处理器在执行所述程序指令而实现所述获取目标行业因子数据步骤时,具体实现如下步骤:
获取第一行业因子数据,所述第一行业因子数据为未经处理的行业因子数据;
对所述第一行业因子数据进行数据预处理,得到第二行业因子数据;
对所述第二行业因子数据进行特征工程处理,得到所述目标行业因子数据。
在一实施例中,所述处理器在执行所述程序指令而实现所述将所述目标行业因子数据分别输入预设的模型集合中的目标回归模型以及目标分类模型进行结果预测步骤之前,还实现如下步骤:
构建所述模型集合;
根据历史行业因子数据以及历史结果确定所述模型集合中每个模型的质量分数。
在一实施例中,所述处理器在执行所述程序指令而实现所述根据历史行业因子数据以及历史结果确定所述模型集合中每个模型的质量分数步骤时,具体实现如下步骤:
将所述历史行业因子数据输入所述模型集合中每个模型中,得到每个模型分别对应的预测结果;
根据所述预测结果以及所述历史结果分别确定所述模型集合中每个模型的质量分数。
所述存储介质可以是U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的计算机可读存储介质。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的。例如,各个单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。
本申请实施例方法中的步骤可以根据实际需要进行顺序调整、合并和删减。本申请实施例装置中的单元可以根据实际需要进行合并、划分和删减。另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以是两个或两个以上单元集成在一个单元中。
该集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分,或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,终端,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (20)

  1. 一种基于模型集合的数据预测方法,包括:
    获取目标行业因子数据;
    将所述目标行业因子数据分别输入预设的模型集合中的目标回归模型以及目标分类模型进行结果预测,分别得到回归预测结果以及分类预测结果,所述模型集合包括多个分类模型以及多个回归模型,所述目标分类模型以及所述目标回归模型分别为所述模型集合中质量分数最高的分类模型以及质量分数最高的回归模型;
    确定所述回归预测结果与所述分类预测结果的方向是否一致;
    若所述回归预测结果与所述分类预测结果的方向一致,则根据目标预测类型从所述回归预测结果以及所述分类预测结果中确定目标预测结果;
    若所述回归预测结果与所述分类预测结果的方向不一致,则根据目标预测类型以及预设的校正规则确定目标预测结果。
  2. 根据权利要求1所述的方法,其中,所述根据目标预测类型以及预设的校正规则确定目标预测结果,包括:
    若所述目标预测类型为回归类型,且所述目标分类模型的质量分数大于所述目标回归模型的质量分数,则根据所述分类预测结果校正所述回归预测结果,得到校正后的回归预测结果;
    将所述校正后的回归预测结果确定为所述目标预测结果;
    若所述目标预测类型为分类类型,且所述目标回归模型的质量分数大于所述目标分类模型的质量分数,则根据所述回归预测结果校正所述分类预测结果,得到校正后的分类预测结果;
    将所述校正后的分类预测结果确定为所述目标预测结果。
  3. 根据权利要求2所述的方法,其中,所述根据所述分类预测结果校正所述回归预测结果,得到校正后的回归预测结果,包括:
    根据模型集合中回归模型的质量分数更改目标回归模型,得到更改后的目标回归模型;
    将所述目标行业因子数据输入所述更改后的目标回归模型,得到更改后的回归预测结果;
    若所述更改后的回归预测结果与所述分类预测结果的方向一致,则将所述更改后的回归预测结果确定为所述校正后的回归预测结果;
    若所述更改后的回归预测结果与所述分类预测结果的方向不一致,则返回执行所述根据模型集合中回归模型的质量分数更改目标回归模型的步骤。
  4. 根据权利要求2所述的方法,其中,所述根据所述回归预测结果校正所述分类预测结果,得到校正后的分类预测结果,包括:
    根据模型集合中分类模型的质量分数更改目标分类模型,得到更改后的目标分类模型;
    将所述目标行业因子数据输入所述更改后的目标分类模型,得到更改后的分类预测结果;
    若所述更改后的分类预测结果与所述回归预测结果的方向一致,则将所述更改后的分类预测结果确定为所述校正后的分类预测结果;
    若所述更改后的分类预测结果与所述回归预测结果的方向不一致,则返回执行所述根据模型集合中分类模型的质量分数更改目标分类模型的步骤。
  5. 根据权利要求1所述的方法,其中,所述获取目标行业因子数据,包括:
    获取第一行业因子数据,所述第一行业因子数据为未经处理的行业因子数据;
    对所述第一行业因子数据进行数据预处理,得到第二行业因子数据;
    对所述第二行业因子数据进行特征工程处理,得到所述目标行业因子数据。
  6. 根据权利要求1所述的方法,其中,所述将所述目标行业因子数据分别输入预设的模型集合中的目标回归模型以及目标分类模型进行结果预测之前,所述方法还包括:
    构建所述模型集合;
    根据历史行业因子数据以及历史结果确定所述模型集合中每个模型的质量分数。
  7. 根据权利要求6所述的方法,其中,所述根据历史行业因子数据以及历史结果确定所述模型集合中每个模型的质量分数,包括:
    将所述历史行业因子数据输入所述模型集合中每个模型中,得到每个模型分别对应的预测结果;
    根据所述预测结果以及所述历史结果分别确定所述模型集合中每个模型的质量分数。
  8. 一种基于模型集合的数据预测装置,包括:
    获取单元,用于获取目标行业因子数据;
    输入单元,用于将所述目标行业因子数据分别输入预设的模型集合中的目标回归模型以及目标分类模型进行结果预测,分别得到回归预测结果以及分类预测结果,所述模型集合包括多个分类模型以及多个回归模型,所述目标分类模型以及所述目标回归模型分别为所述模型集合中质量分数最高的分类模型以及质量分数最高的回归模型;
    第一确定单元,用于确定所述回归预测结果与所述分类预测结果的方向是否一致;
    第二确定单元,用于当所述回归预测结果与所述分类预测结果的方向一致时,根据目标预测类型从所述回归预测结果以及所述分类预测结果中确定目标预测结果;
    第三确定单元,用于当所述回归预测结果与所述分类预测结果的方向不一致时,根据预设的校正规则确定目标预测结果。
  9. 一种计算机设备,其中,所述计算机设备包括存储器及处理器,所述存储器上存储有计算机程序,所述处理器执行所述计算机程序时实现以下步骤:
    获取目标行业因子数据;
    将所述目标行业因子数据分别输入预设的模型集合中的目标回归模型以及目标分类模型进行结果预测,分别得到回归预测结果以及分类预测结果,所述模型集合包括多个分类模型以及多个回归模型,所述目标分类模型以及所述目标回归模型分别为所述模型集合中质量分数最高的分类模型以及质量分数最高的回归模型;
    确定所述回归预测结果与所述分类预测结果的方向是否一致;
    若所述回归预测结果与所述分类预测结果的方向一致,则根据目标预测类型从所述回归预测结果以及所述分类预测结果中确定目标预测结果;
    若所述回归预测结果与所述分类预测结果的方向不一致,则根据目标预测类型以及预设的校正规则确定目标预测结果。
  10. 根据权利要求9所述的计算机设备,其中,所述根据目标预测类型以及预设的校正规则确定目标预测结果,包括:
    若所述目标预测类型为回归类型,且所述目标分类模型的质量分数大于所述目标回归模型的质量分数,则根据所述分类预测结果校正所述回归预测结果,得到校正后的回归预测结果;
    将所述校正后的回归预测结果确定为所述目标预测结果;
    若所述目标预测类型为分类类型,且所述目标回归模型的质量分数大于所述目标分类模型的质量分数,则根据所述回归预测结果校正所述分类预测结果,得到校正后的分类预测结果;
    将所述校正后的分类预测结果确定为所述目标预测结果。
  11. 根据权利要求10所述的计算机设备,其中,所述根据所述分类预测结果校正所述回归预测结果,得到校正后的回归预测结果,包括:
    根据模型集合中回归模型的质量分数更改目标回归模型,得到更改后的目标回归模型;
    将所述目标行业因子数据输入所述更改后的目标回归模型,得到更改后的回归预测结果;
    若所述更改后的回归预测结果与所述分类预测结果的方向一致,则将所述更改后的回归预测结果确定为所述校正后的回归预测结果;
    若所述更改后的回归预测结果与所述分类预测结果的方向不一致,则返回执行所述根据模型集合中回归模型的质量分数更改目标回归模型的步骤。
  12. 根据权利要求10所述的计算机设备,其中,所述根据所述回归预测结果校正所述分类预测结果,得到校正后的分类预测结果,包括:
    根据模型集合中分类模型的质量分数更改目标分类模型,得到更改后的目标分类模型;
    将所述目标行业因子数据输入所述更改后的目标分类模型,得到更改后的分类预测结果;
    若所述更改后的分类预测结果与所述回归预测结果的方向一致,则将所述更改后的分类预测结果确定为所述校正后的分类预测结果;
    若所述更改后的分类预测结果与所述回归预测结果的方向不一致,则返回执行所述根据模型集合中分类模型的质量分数更改目标分类模型的步骤。
  13. 根据权利要求9所述的计算机设备,其中,所述获取目标行业因子数据,包括:
    获取第一行业因子数据,所述第一行业因子数据为未经处理的行业因子数据;
    对所述第一行业因子数据进行数据预处理,得到第二行业因子数据;
    对所述第二行业因子数据进行特征工程处理,得到所述目标行业因子数据。
  14. 根据权利要求9所述的计算机设备,其中,所述将所述目标行业因子数据分别输入预设的模型集合中的目标回归模型以及目标分类模型进行结果预测之前,所述方法还包括:
    构建所述模型集合;
    根据历史行业因子数据以及历史结果确定所述模型集合中每个模型的质量分数。
  15. 根据权利要求14所述的计算机设备,其中,所述根据历史行业因子数据以及历史结果确定所述模型集合中每个模型的质量分数,包括:
    将所述历史行业因子数据输入所述模型集合中每个模型中,得到每个模型分别对应的预测结果;
    根据所述预测结果以及所述历史结果分别确定所述模型集合中每个模型的质量分数。
  16. 一种计算机可读存储介质,其中,所述存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时实现以下步骤:
    获取目标行业因子数据;
    将所述目标行业因子数据分别输入预设的模型集合中的目标回归模型以及目标分类模型进行结果预测,分别得到回归预测结果以及分类预测结果,所述模型集合包括多个分类模型以及多个回归模型,所述目标分类模型以及所述目标回归模型分别为所述模型集合中质量分数最高的分类模型以及质量分数最高的回归模型;
    确定所述回归预测结果与所述分类预测结果的方向是否一致;
    若所述回归预测结果与所述分类预测结果的方向一致,则根据目标预测类型从所述回归预测结果以及所述分类预测结果中确定目标预测结果;
    若所述回归预测结果与所述分类预测结果的方向不一致,则根据目标预测类型以及预设的校正规则确定目标预测结果。
  17. 根据权利要求16所述的存储介质,其中,所述根据目标预测类型以及预设的校正规则确定目标预测结果,包括:
    若所述目标预测类型为回归类型,且所述目标分类模型的质量分数大于所述目标回归模型的质量分数,则根据所述分类预测结果校正所述回归预测结果,得到校正后的回归预测结果;
    将所述校正后的回归预测结果确定为所述目标预测结果;
    若所述目标预测类型为分类类型,且所述目标回归模型的质量分数大于所述目标分类模型的质量分数,则根据所述回归预测结果校正所述分类预测结果,得到校正后的分类预测结果;
    将所述校正后的分类预测结果确定为所述目标预测结果。
  18. 根据权利要求17所述的存储介质,其中,所述根据所述分类预测结果校正所述回归预测结果,得到校正后的回归预测结果,包括:
    根据模型集合中回归模型的质量分数更改目标回归模型,得到更改后的目标回归模型;
    将所述目标行业因子数据输入所述更改后的目标回归模型,得到更改后的回归预测结果;
    若所述更改后的回归预测结果与所述分类预测结果的方向一致,则将所述更改后的回归预测结果确定为所述校正后的回归预测结果;
    若所述更改后的回归预测结果与所述分类预测结果的方向不一致,则返回执行所述根据模型集合中回归模型的质量分数更改目标回归模型的步骤。
  19. 根据权利要求17所述的存储介质,其中,所述根据所述回归预测结果校正所述分类预测结果,得到校正后的分类预测结果,包括:
    根据模型集合中分类模型的质量分数更改目标分类模型,得到更改后的目标分类模型;
    将所述目标行业因子数据输入所述更改后的目标分类模型,得到更改后的分类预测结果;
    若所述更改后的分类预测结果与所述回归预测结果的方向一致,则将所述更改后的分类预测结果确定为所述校正后的分类预测结果;
    若所述更改后的分类预测结果与所述回归预测结果的方向不一致,则返回执行所述根据模型集合中分类模型的质量分数更改目标分类模型的步骤。
  20. 根据权利要求16所述的存储介质,其中,所述获取目标行业因子数据,包括:
    获取第一行业因子数据,所述第一行业因子数据为未经处理的行业因子数据;
    对所述第一行业因子数据进行数据预处理,得到第二行业因子数据;
    对所述第二行业因子数据进行特征工程处理,得到所述目标行业因子数据。
PCT/CN2022/071836 2021-05-31 2022-01-13 基于模型集合的数据预测方法、装置、设备及存储介质 WO2022252630A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110600017.0 2021-05-31
CN202110600017.0A CN113205230A (zh) 2021-05-31 2021-05-31 基于模型集合的数据预测方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022252630A1 true WO2022252630A1 (zh) 2022-12-08

Family

ID=77023762

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/071836 WO2022252630A1 (zh) 2021-05-31 2022-01-13 基于模型集合的数据预测方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN113205230A (zh)
WO (1) WO2022252630A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113205230A (zh) * 2021-05-31 2021-08-03 平安科技(深圳)有限公司 基于模型集合的数据预测方法、装置、设备及存储介质
CN116705150A (zh) * 2023-06-05 2023-09-05 国家超级计算天津中心 基因表达效率的确定方法、装置、设备及介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170300605A1 (en) * 2016-04-19 2017-10-19 General Electric Company Creating predictive damage models by transductive transfer learning
CN109800483A (zh) * 2018-12-29 2019-05-24 北京城市网邻信息技术有限公司 一种预测方法、装置、电子设备和计算机可读存储介质
CN112541574A (zh) * 2020-12-03 2021-03-23 支付宝(杭州)信息技术有限公司 保护隐私的业务预测方法及装置
CN113205230A (zh) * 2021-05-31 2021-08-03 平安科技(深圳)有限公司 基于模型集合的数据预测方法、装置、设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170300605A1 (en) * 2016-04-19 2017-10-19 General Electric Company Creating predictive damage models by transductive transfer learning
CN109800483A (zh) * 2018-12-29 2019-05-24 北京城市网邻信息技术有限公司 一种预测方法、装置、电子设备和计算机可读存储介质
CN112541574A (zh) * 2020-12-03 2021-03-23 支付宝(杭州)信息技术有限公司 保护隐私的业务预测方法及装置
CN113205230A (zh) * 2021-05-31 2021-08-03 平安科技(深圳)有限公司 基于模型集合的数据预测方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN113205230A (zh) 2021-08-03

Similar Documents

Publication Publication Date Title
US10642642B2 (en) Techniques to manage virtual classes for statistical tests
WO2022252630A1 (zh) 基于模型集合的数据预测方法、装置、设备及存储介质
WO2019205325A1 (zh) 确定用户风险等级的方法、终端设备及计算机可读存储介质
CN114118640B (zh) 长期降水预测模型构建方法、长期降水预测方法及装置
CN112529301B (zh) 用电量预测方法、设备及存储介质
US20210103858A1 (en) Method and system for model auto-selection using an ensemble of machine learning models
US11650968B2 (en) Systems and methods for predictive early stopping in neural network training
CN107247666B (zh) 一种基于特征选择和集成学习的软件缺陷个数预测方法
WO2021004324A1 (zh) 资源数据的处理方法、装置、计算机设备和存储介质
US20150186907A1 (en) Data mining
CN112907062B (zh) 融合温度特征的电网电量预测方法、装置、介质及终端
CN109426655B (zh) 数据分析方法、装置、电子设备及计算机可读存储介质
CN114356895B (zh) 基于异常工况数据库管理的方法、装置、设备及存储介质
CN112365070A (zh) 一种电力负荷预测方法、装置、设备及可读存储介质
CN112766724A (zh) 一种业务监控方法、装置及设备
Li et al. Improved LSTM-based prediction method for highly variable workload and resources in clouds
CN107644042B (zh) 软件程序点击率预估排序方法及服务器
CN109829115B (zh) 搜索引擎关键词优化方法
WO2022222230A1 (zh) 基于机器学习的指标预测方法、装置、设备及存储介质
CN114387089A (zh) 客户信用风险评估方法、装置、设备及存储介质
CN109344047B (zh) 系统回归测试方法、计算机可读存储介质和终端设备
CN113191810A (zh) 游戏指标预测方法、装置及电子设备
CN112734099A (zh) 企业风险的预测方法、装置及服务器
US20190138931A1 (en) Apparatus and method of introducing probability and uncertainty via order statistics to unsupervised data classification via clustering
JP7322918B2 (ja) プログラム、情報処理装置、及び学習モデルの生成方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22814683

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22814683

Country of ref document: EP

Kind code of ref document: A1