CN116342172A - Oil price prediction method, device and equipment based on combination of linear regression and decision tree - Google Patents

Oil price prediction method, device and equipment based on combination of linear regression and decision tree Download PDF

Info

Publication number
CN116342172A
CN116342172A CN202310640982.XA CN202310640982A CN116342172A CN 116342172 A CN116342172 A CN 116342172A CN 202310640982 A CN202310640982 A CN 202310640982A CN 116342172 A CN116342172 A CN 116342172A
Authority
CN
China
Prior art keywords
data
training
oil price
linear regression
decision
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310640982.XA
Other languages
Chinese (zh)
Inventor
冀征
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yiyou Internet Technology Co ltd
Original Assignee
Beijing Yiyou Internet Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yiyou Internet Technology Co ltd filed Critical Beijing Yiyou Internet Technology Co ltd
Priority to CN202310640982.XA priority Critical patent/CN116342172A/en
Publication of CN116342172A publication Critical patent/CN116342172A/en
Priority to CN202311739198.0A priority patent/CN117592012A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/27Regression, e.g. linear or logistic regression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides an oil price prediction method, device and equipment based on combination of linear regression and decision tree. The method comprises the following steps: the linear regression model is trained, and the decision tree model is built, so that the linear regression model and the decision tree model can be combined for predicting the oil price for the oil price data to be predicted, the error of the oil price prediction result can be effectively reduced, and the decision tree model can be utilized for repairing the error during the continuous rising period or the falling period of the oil price, so that the accuracy of the oil price prediction result is effectively improved.

Description

Oil price prediction method, device and equipment based on combination of linear regression and decision tree
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a method, an apparatus, and a device for predicting an oil price based on combination of linear regression and a decision tree.
Background
The currently used method for predicting the price adjustment oil price is to adjust data through a single time dimension, and the prediction essence is basically about a single dimension degree: and (5) price adjustment. The periodicity, trend, etc. of the time series data are used for prediction.
This conventional approach is very limited. Firstly, many factors for determining the price adjustment result are available, only time sequence data with a single dimension is used, the data utilization rate is low, and the accuracy is difficult to ensure by training an algorithm model. Secondly, the time sequence algorithm extends the past trend, so that the method cannot predict the turning point of the time sequence. In this case, the predictor often needs to correct the prediction result by its own knowledge and experience, which brings great inconvenience.
Disclosure of Invention
In view of the above, the present application aims to provide a method, a device and an apparatus for predicting oil price based on combination of linear regression and decision tree, so as to solve or partially solve the above technical problems.
Based on the above object, a first aspect of the present application provides an oil price prediction method based on combination of linear regression and decision tree, including:
acquiring oil price historical data in a preset time period;
determining a linear regression training sample according to the oil price historical data;
inputting the linear regression training sample into a pre-constructed initial linear regression model for training analysis;
according to the training result of the training analysis of the initial linear regression model, comparing and analyzing with the actual oil price result corresponding to the oil price historical data, correcting and adjusting the characteristic parameters of the initial linear regression model according to the comparison analysis result, and taking the initial linear regression model after final correction and adjustment as a linear regression model;
Generating oil price historical return data in the training process by utilizing the oil price historical data, recording residual errors of a training result and an actual mail result in the initial linear regression model training process, and obtaining residual error data corresponding to the historical return data;
constructing a decision tree model according to residual data corresponding to the historical return data;
acquiring oil price data to be predicted, and inputting the oil price data to be predicted into the linear regression model for prediction processing to obtain a first predicted oil value;
determining a residual prediction value corresponding to the oil price data to be predicted by utilizing the decision tree model;
and carrying out difference operation on the first predicted oil value and the residual error predicted value to obtain a final predicted oil value.
Based on the same conception, a second aspect of the application provides an oil price prediction device based on combination of linear regression and decision tree, comprising:
a history data acquisition module configured to acquire oil price history data within a predetermined period of time;
a linear regression sample determination module configured to determine a linear regression training sample from the oil price history data;
the linear regression training module is configured to input the linear regression training sample into a pre-constructed initial linear regression model for training analysis;
The linear regression model determining module 34 is configured to perform comparative analysis with the actual oil price result corresponding to the oil price historical data according to the training result of the training analysis of the initial linear regression model, further perform correction adjustment on the characteristic parameters of the initial linear regression model according to the comparative analysis result, and take the initial linear regression model after the final correction adjustment as a linear regression model;
the residual data determining module is configured to generate oil price historical return data in the training process by utilizing the oil price historical data, record residual errors of a training result and an actual oil price result in the initial linear regression model training process, and obtain residual error data corresponding to the historical return data;
the decision tree construction module is configured to construct a decision tree model according to residual data corresponding to the historical return data;
the linear regression prediction module is configured to acquire oil price data to be predicted, input the oil price data to be predicted into the linear regression model for prediction processing, and obtain a first predicted oil value;
a residual value prediction module configured to determine a residual prediction value corresponding to the oil price data to be predicted using the decision tree model;
And the oil price prediction module is configured to perform difference operation on the first predicted oil value and the residual error predicted value to obtain a final predicted oil value.
Based on the same conception, a third aspect of the present application proposes an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, said processor implementing the method according to the first aspect when executing said program.
Based on the same conception, a fourth aspect of the present application proposes a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of the first aspect.
From the above, it can be seen that, according to the oil price prediction method, device and equipment based on the combination of linear regression and decision tree, the linear regression model and the decision tree model are combined to perform oil price prediction, so that the error of the oil price prediction result can be effectively reduced, and for the continuous rising period or falling period of the oil price, the decision tree model can be utilized to well play a role in repairing the error, so that the accuracy of the oil price prediction result can be effectively improved.
Drawings
In order to more clearly illustrate the technical solutions of the present application or related art, the drawings that are required to be used in the description of the embodiments or related art will be briefly described below, and it is apparent that the drawings in the following description are only embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort to those of ordinary skill in the art.
Fig. 1 is a schematic view of an application scenario in an embodiment of the present application;
FIG. 2 is a flow chart of a linear regression and decision tree combination based oil price prediction method according to an embodiment of the present application;
FIG. 3 is a block diagram of a linear regression and decision tree combination based oil price prediction device according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
It will be appreciated that the data (including but not limited to the data itself, the acquisition or use of the data) involved in the present technical solution should comply with the corresponding legal regulations and the requirements of the relevant regulations.
The principles and spirit of the present application will be described below with reference to several exemplary embodiments. It should be understood that these embodiments are presented merely to enable one skilled in the art to better understand and practice the present application and are not intended to limit the scope of the present application in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
It will be appreciated that before using the technical solutions of the various embodiments in the disclosure, the user may be informed of the type of personal information involved, the range of use, the use scenario, etc. in an appropriate manner, and obtain the authorization of the user.
For example, in response to receiving an active request from a user, a prompt is sent to the user to explicitly prompt the user that the operation it is requesting to perform will require personal information to be obtained and used with the user. Therefore, the user can select whether to provide personal information to the software or hardware such as the electronic equipment, the application program, the server or the storage medium for executing the operation of the technical scheme according to the prompt information.
As an alternative but non-limiting implementation, in response to receiving an active request from a user, the manner in which the prompt information is sent to the user may be, for example, a popup, in which the prompt information may be presented in a text manner. In addition, a selection control for the user to select to provide personal information to the electronic device in a 'consent' or 'disagreement' manner can be carried in the popup window.
It will be appreciated that the above-described notification and user authorization process is merely illustrative, and not limiting of the implementations of the present disclosure, and that other ways of satisfying relevant legal regulations may be applied to the implementations of the present disclosure.
In this document, it should be understood that any number of elements in the drawings is for illustration and not limitation, and that any naming is used only for distinction and not for any limitation.
Based on the above description of the background art, there are also the following cases in the related art:
the oil price adjustment price is strongly influenced by the crude oil price, exchange rate and other indexes, so that data such as Brint, WTI, arman and exchange rate and the like are collected and processed. The data are incorporated into the model for predicting the price adjustment result, so that the utilization rate of available data can be greatly improved, and more importantly, the prediction accuracy is also improved. The self-triggering correction learning model is trained on the basis of errors, the result can be automatically corrected further, the robustness of the algorithm is greatly improved, and the manual participation is reduced to be more intelligent.
Based on the foregoing, the principles and spirit of the present application are explained in detail below with reference to several representative embodiments thereof.
Referring to fig. 1, an application scenario diagram of a resource allocation method based on a container cluster management system according to an embodiment of the present application is provided. The application scenario includes a terminal device 101, a server 102, and a data storage system 103. The terminal device 101, the server 102 and the data storage system 103 may be connected through a wired or wireless communication network. Terminal device 101 includes, but is not limited to, a desktop computer, mobile phone, mobile computer, tablet, media player, smart wearable device, personal digital assistant (personal digital assistant, PDA) or other electronic device capable of performing the functions described above, and the like. The server 102 and the data storage system 103 may be independent physical servers, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, and basic cloud computing services such as big data and artificial intelligence platforms.
The user of the terminal device 101 is a user who can train a linear regression model with the terminal device 101 using the oil price history data and train a decision tree model with the generated oil price history data. In this way, the linear regression model is utilized to conduct prediction processing on the oil price data to be predicted to obtain a first predicted oil price, then a decision tree model is utilized to determine a corresponding residual predicted value, and finally the difference value operation is conducted on the first predicted oil price and the residual predicted value to obtain a final predicted oil price so as to complete the oil price prediction process. Wherein server 102 provides data support for data processing procedures of terminal device 101 and data storage system 103 provides data storage support for the operational functioning of server 102.
An oil price prediction method based on a combination of linear regression and decision tree according to an exemplary embodiment of the present application is described below in conjunction with the application scenario of fig. 1. It should be noted that the above application scenario is only shown for the convenience of understanding the spirit and principles of the present application, and embodiments of the present application are not limited in any way in this respect. Rather, embodiments of the present application may be applied to any scenario where applicable.
The embodiment of the application provides an oil price prediction method based on combination of linear regression and a decision tree.
As shown in fig. 2, the method includes:
step 201, acquiring oil price historical data in a preset time period.
Wherein the oil price history data includes: date, exchange rate data, brint data, WTI (West Texas Intermediate, intermediate crude in west texas) data, and amann crude price data.
And 202, determining a linear regression training sample according to the oil price historical data.
And 203, inputting the linear regression training sample into a pre-constructed initial linear regression model for training analysis.
The linear regression analysis refers to a predictive modeling technology, and mainly researches the relationship between independent variables and dependent variables. Typically, the data points are fitted using a line/curve, and parameters are calculated to minimize the difference in distance from the curve to the data points.
And 204, performing comparison analysis on the training result of the training analysis of the initial linear regression model and the actual oil price result corresponding to the oil price historical data, correcting and adjusting the characteristic parameters of the initial linear regression model according to the comparison analysis result, and taking the initial linear regression model after final correction and adjustment as a linear regression model.
And 205, generating oil price historical return data in the training process by utilizing the oil price historical data, and recording residual errors of a training result and an actual mail result in the initial linear regression model training process to obtain residual error data corresponding to the historical return data.
And 206, constructing a decision tree model according to residual data corresponding to the historical return data.
Where a Decision Tree (Decision Tree) is a Tree structure (which may be a binary Tree or a non-binary Tree), each non-leaf node corresponds to a feature, each branch of the node represents a value of the feature, and each leaf node stores a class or a regression function.
The decision making process using the decision tree is to extract the corresponding features of the items to be classified from the root node, select the output branches according to the values thereof, and sequentially go down until the leaf nodes are reached, and take the class stored in the leaf nodes or the operation result of the regression function as the output (decision) result.
And step 207, acquiring oil price data to be predicted, and inputting the oil price data to be predicted into the linear regression model for prediction processing to obtain a first predicted oil value.
And step 208, determining a residual prediction value corresponding to the oil price data to be predicted by using the decision tree model.
And step 209, performing a difference operation on the first predicted oil value and the residual predicted value to obtain a final predicted oil value.
Through the scheme, the linear regression model and the decision tree model are combined to conduct oil price prediction, so that errors of oil price prediction results can be effectively reduced, and for a continuous rising period or a falling period of oil price, the decision tree model can be utilized to well play a role in repairing errors, and further accuracy of the oil price prediction results is effectively improved.
In some embodiments, step 202 comprises:
step 2021, determining a category of the oil price history data.
Step 2022, calculating the oil price historical data of each category by using a correlation coefficient calculation formula, and calculating the correlation coefficient between the oil price historical data of each category and the oil price.
Step 2023, performing thermodynamic diagram drawing according to the calculated correlation coefficient.
Step 2024, determining, from the oil price historical data of each category according to the thermodynamic diagram, the oil price historical data of at least one target category with a correlation coefficient greater than a predetermined value as a linear regression training sample.
In the concrete implementation, data analysis is carried out on the oil price data of the past year, and the three crude oil price changes are high in similarity and basically keep consistent fluctuation after data analysis.
Through the analysis of the change trend of the prices of gasoline and diesel oil in the past year, the price change similarity of the gasoline and diesel oil is higher. Substantially maintaining relatively consistent fluctuations.
Compared with gasoline and diesel data, four input fields of the exchange rate, the Brint crude oil price, the WTI crude oil price and the Arman crude oil price show that the price fluctuation of the three crude oils of the Brint crude oil price, the WTI crude oil price and the Arman crude oil price has a trend relationship which is similar to the target output, and has a direct causal relationship. Whereas the data of the exchange rate field has no direct trend causal relationship in trend.
Preliminary correlation analysis:
correlation analysis refers to analyzing two or more variable elements with correlation, so as to measure the correlation degree of two variable factors.
Figure SMS_1
I represents the ith sample, x i And y i The values of the two variables are respectively taken,
Figure SMS_2
for all x i Mean value of>
Figure SMS_3
For all y i Average value of (2).
The ρ correlation coefficient measures a linear correlation. If ρ=0, only the radio correlation between x and y cannot be said, and no correlation cannot be said. The larger the absolute value of the correlation coefficient, the stronger the correlation: the closer the correlation coefficient is to 1 or-1, the stronger the correlation, the closer the correlation coefficient is to 0, and the weaker the correlation.
And calculating a correlation coefficient, drawing a thermodynamic diagram, recording correlation data, and analyzing to determine that the correlation between the price of the three crude oils and the price of the gasoline price adjustment is more than 0.7, thereby belonging to strong correlation factors. A direct correlation of less than 0.5 is weak.
Since the exchange rate does not directly act on the price adjustment data. An attempt is made to multiply the exchange rate into the price of crude oil and to compare the correlation of the product data.
Calculating the correlation, and drawing a thermodynamic diagram, wherein the thermodynamic diagram comprises the following concrete steps: according to the original data: and the data table formed by multiplying the exchange rate, the Brint, the WTI, the Alman crude oil price, the price adjustment price of the retail limit price of gasoline and diesel oil and a plurality of field data of the product of the exchange rate and three crude oils is drawn.
Considering these fields as input variables that can be used for the model, then in the thermodynamic diagram each variable is represented as a row or column in the thermodynamic diagram and the correlation coefficients between each pair of variables are represented as a color coded block. The color and brightness of the square reflects the degree of correlation between the variables. The correlation coefficient between the price of the crude oil multiplied by the exchange rate and the retail price adjustment becomes larger according to the thermodynamic diagram. It can be seen that the data of the price of crude oil affects the price adjustment more directly.
Initial trials of modeling can thus be conducted based on the above data knowledge. That is, according to the thermodynamic diagram, from the oil price historical data of each class, the oil price historical data of at least one target class with the correlation coefficient larger than a preset value is determined as a linear regression training sample.
In some embodiments, step 202 comprises:
step 2021' extracts crude oil price data from the oil price history data and divides it by time period.
Step 2022' calculates mean data, slope data, median data, and fitted median data from the oil price history data.
Step 2023' adds the mean value data, slope data, median data or fitted median data to the crude price data as pre-training input data, wherein the mean value data, slope data, median data or fitted median data each correspond to a set of pre-training input data.
And step 2024', inputting the pre-training input data into an initial linear regression model for pre-training to obtain pre-training results, wherein each set of pre-training input data corresponds to one set of pre-training results.
And step 2025', comparing each group of pre-training results with the actual oil price results, and if the difference value between one group of pre-training results corresponding to the mean value data and the actual oil price results is the smallest, combining the mean value data and the crude oil price data as a linear regression training sample of an initial linear regression model to carry out input training analysis.
Example one: in each group, data of ten days of a crude oil price adjustment period are used as characteristic input, only 19 pieces of data of price adjustment days and the first 14 pieces of data of a training set are selected, and the last 4 pieces of data are verified. The conclusion is that the average gap between the predictions of the last four pieces of data and the real world is within a predetermined gap value (e.g., 150).
Example two: ten days data comprehensive characteristic input is carried out in each price adjustment period, and new index characteristic improvement model effect is tried to be input: the mean, slope, median, etc. characteristics of the inputs in each group (where the data for every ten day pitch period is one group) are added.
Input is the average value of the prices of the crude oils for 10 days in the price adjustment period. The conclusion is that the average gap between the predictions of the last four pieces of data and the real world is within a predetermined gap value (e.g., 150).
Thus, it is known that, in addition to taking ten-day data of the price adjustment period of crude oil and taking the average value of crude oil in ten days of the price adjustment period as input, attempts have been made to use various combinations of indexes such as median, fitting median, slope, etc. of the ten-day data. Overall, the mean is also used alone as input, with minimal differences between true and predicted.
And finally, establishing characteristic extraction, namely taking the price data and the mean value of the crude oil in ten days of the price adjustment period as a linear regression training sample of an initial linear regression model to carry out input training analysis.
In some embodiments, step 202 comprises:
and A1, dividing the oil price historical data according to a time period, and dividing N groups of oil price historical data.
And A2, calculating mean value data, slope data, median data and fitting median data according to the oil price historical data.
And A3, respectively combining the mean value data, the slope data, the median data or the fitting median data with N groups of oil price historical data to obtain various characteristic training data.
And step A4, performing back-testing processing on the initial linear regression model by utilizing the various feature training data to obtain a back-testing processing result corresponding to each feature training data.
And step A5, comparing the return test processing results corresponding to each feature training data with the actual oil price results, and arranging a plurality of return test processing results corresponding to a plurality of feature training data according to the sequence of the difference from small to large.
And (3) selecting a predetermined number of types of feature training data with the front arrangement sequence as a linear regression training sample without being subjected to A6.
In specific implementation, the back measurement is to obtain a predicted result in the history based on the actual oil price data which has already occurred in the history, and compare the predicted result with the actual result according to the linear regression prediction mode. The performance of the linear regression model in the historical data is further analyzed.
The oil price history data is divided by time period, and N sets of oil price history data are divided, for example, 43 sets of oil price history data of one time period every 10 days, 2021 month 1 to 2022 month 11.
By back-testing, the time period that is best as training set data is determined. It was found from the above that it is not good to use all the data as training sets, and it is necessary to select the historical data of oil prices for a part of the period groups.
The best input features are further determined by the back-testing, several strong correlation indices (median, median fit, slope) are determined by the description of the above embodiments, there are many permutations and combinations, and it is best to determine which effect by the back-testing data of the different feature models.
The return test respectively analyzes the data training linear regression model prediction of the time period of 5-20 price adjustment periods, and determines that the result of the training linear regression model prediction by using the data of 10 price adjustment periods as a training set is better. The method comprises the steps of carrying out back measurement by respectively using characteristic training data of various combinations of a price adjustment period mean value, a median value, a fitting median value and a slope index, further determining the most suitable characteristic (namely, arranging various back measurement processing results corresponding to various characteristic training data according to the order of the difference values from small to large, selecting the characteristic training data of a preset number of types with the front arrangement order as a linear regression training sample), and determining that the best result is obtained by adding the characteristic training data (namely, the linear regression training sample) by using the price adjustment period mean value into the back measurement result.
In some embodiments, step 206 comprises:
step 2061, determining a corresponding characteristic value from residual data corresponding to the historical return data, inputting the characteristic value as a decision training sample into a decision training algorithm, and performing decision training processing to obtain a decision training processing result.
Step 2062, determining the importance degree of each characteristic value according to the decision training processing result.
And step 2063, performing tree-like arrangement on the characteristic values according to the importance degree from high to low to obtain a decision tree model.
In the specific implementation, the decision making process using the decision tree is to start from the root node, extract the corresponding features in the items to be classified (namely, residual data corresponding to each historical return data), select output branches according to the importance degree, and sequentially descend until reaching the leaf node, and take the class stored in the leaf node or the operation result of the regression function as an output (decision) result.
In some embodiments, step 2061 comprises:
and 20611, identifying the category for the residual data corresponding to each historical return data according to the price adjustment result.
In step 20612, the feature data required by the residual data corresponding to each historical return data is determined, and each feature data is subjected to numerical processing to determine a feature value, thereby obtaining a decision training sample.
Step 20613, inputting the decision training sample into a decision training algorithm, and performing decision training processing to obtain a decision training processing result.
In the specific implementation, the number of residual data corresponding to the historical back test data is M, the residual data is used as M sample data, then the expected category (for example, price adjustment result) of each sample is marked, some characteristic indexes can be selected as decision conditions (for example, input indexes), and finally the corresponding value is generated as a characteristic value by carrying out the numerical processing on the required characteristic corresponding to each sample, so that a decision training sample can be obtained. And inputting the decision training sample into a decision training algorithm determined by a certain principle, further processing and determining the importance degree corresponding to each characteristic value by using the decision training algorithm, and generating a decision tree model according to the order of the importance degree from high to low.
In some embodiments, step 2062 comprises:
in step 20621, sample classification is performed on the decision training processing results to obtain n decision training processing categories.
Step 20622, calculating the value probability pi of each decision training processing result in the sample set S by using an information entropy formula to obtain an information entropy result, wherein the information entropy formula is as follows: entropy (S) = Σni=1pilog2 (pi), entropy (S) is information Entropy, and i is the sequence number of the decision training processing result.
In step 20623, the feature T corresponding to each decision training processing result sample in the sample set S is determined, and the values of all features T form a feature set value (T).
Step 20624, calculating the gain of the information obtained by splitting the sample set S based on the feature T: informationGain (T) = Entropy (S) - Σvalue (T) |sv|s|entropy (Sv), where InformationGain (T) is the information gain, |s| is the number of samples of S, v is a characteristic value of T; sv is a set of samples of value v of the feature T in S, and |sv| is the number of samples of Sv.
And 20625, taking the information gain result as the importance degree.
The decision tree construction process is an iterative process. In each iteration, different characteristic values are adopted as splitting points to divide the decision training samples into different categories. The feature used as a split point is called a split feature. And selecting targets of the splitting features, and enabling decision training samples in one splitting subset to belong to the same category as far as possible. To meet this requirement, this is accomplished using the procedure of steps 20621 through 20625 described above, wherein the information entropy represents the degree of confusion of the information, the more confusing the information, the greater the value of the information entropy. It is therefore necessary to use the calculated value of the information gain as the importance level of the corresponding characteristic value. This can ensure that the decision tree model generated based on the importance level is more accurate.
It should be noted that, the method of the embodiments of the present application may be performed by a single device, for example, a computer or a server. The method of the embodiment can also be applied to a distributed scene, and is completed by mutually matching a plurality of devices. In the case of such a distributed scenario, one of the devices may perform only one or more steps of the methods of embodiments of the present application, and the devices may interact with each other to complete the methods.
It should be noted that some embodiments of the present application are described above. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments described above and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
Based on the same conception, the application also provides an oil price prediction device based on combination of linear regression and a decision tree, which corresponds to the method of any embodiment. Referring to fig. 3, the apparatus includes:
A history data acquisition module 31 configured to acquire oil price history data for a predetermined period of time;
a linear regression sample determination module 32 configured to determine a linear regression training sample from the oil price history data;
a linear regression training module 33 configured to input the linear regression training samples into a pre-constructed initial linear regression model for training analysis;
the linear regression model determining module 34 is configured to perform comparative analysis with the actual oil price result corresponding to the oil price historical data according to the training result of the training analysis of the initial linear regression model, further perform correction adjustment on the characteristic parameters of the initial linear regression model according to the comparative analysis result, and take the initial linear regression model after the final correction adjustment as a linear regression model;
the residual data determining module 35 is configured to generate oil price historical return data in the training process by using the oil price historical data, record residual errors of a training result and an actual oil price result in the initial linear regression model training process, and obtain residual error data corresponding to the historical return data;
a decision tree construction module 36 configured to construct a decision tree model from residual data corresponding to the historical return data;
The linear regression prediction module 37 is configured to obtain oil price data to be predicted, input the oil price data to be predicted into the linear regression model for prediction processing, and obtain a first predicted oil value;
a residual value prediction module 38 configured to determine a residual prediction value corresponding to the oil price data to be predicted using the decision tree model;
the oil price prediction module 39 is configured to perform a difference operation on the first predicted oil price and the residual predicted value to obtain a final predicted oil price.
In some embodiments, the linear regression sample determination module 32, including, the correlation calculation determination training data unit, is configured to:
determining the category of the oil price historical data;
calculating the oil price historical data of each category by using a correlation coefficient calculation formula, and calculating the correlation coefficient between the oil price historical data of each category and the oil price;
carrying out thermodynamic diagram drawing according to the calculated correlation coefficient;
and determining the oil price historical data of at least one target class with the correlation coefficient larger than a preset value from the oil price historical data of each class according to the thermodynamic diagram as a linear regression training sample.
In some embodiments, the linear regression sample determination module 32, including, feature screening determination training data units, is configured to:
Extracting crude oil price data from the oil price historical data and dividing the crude oil price data according to a time period;
calculating mean value data, slope data, median data and fitting median data according to the oil price historical data;
adding the mean value data, the slope data, the median data or the fitted median data to the crude oil price data as pre-training input data, wherein the mean value data, the slope data, the median data or the fitted median data respectively correspond to a group of pre-training input data;
inputting the pre-training input data into an initial linear regression model for pre-training to obtain pre-training results, wherein each group of pre-training input data corresponds to a group of pre-training results;
and comparing each group of pre-training results with the actual oil price results, and if the difference value between one group of pre-training results corresponding to the mean value data and the actual oil price results is the smallest, combining the mean value data and the crude oil price data as a linear regression training sample of an initial linear regression model to carry out input training analysis.
In some embodiments, the linear regression sample determination module 32, including, the regression analysis determination training data unit, is configured to:
Dividing the oil price historical data according to a time period, and dividing N groups of oil price historical data;
calculating mean value data, slope data, median data and fitting median data according to the oil price historical data;
respectively combining the mean value data, the slope data, the median data or the fitting median data with N groups of oil price historical data to obtain various characteristic training data;
performing back-testing processing on the initial linear regression model by utilizing the multiple feature training data to obtain a back-testing processing result corresponding to each feature training data;
comparing the difference value between the return measurement processing result corresponding to each characteristic training data and the actual oil price result, and arranging a plurality of return measurement processing results corresponding to a plurality of characteristic training data according to the sequence of the difference value from small to large;
and selecting the characteristic training data of a preset number of types with the front arrangement sequence as a linear regression training sample.
In some embodiments, decision tree construction module 36 includes:
the decision training processing unit is configured to determine corresponding characteristic values from residual data corresponding to the historical return data, input the characteristic values as decision training samples into a decision training algorithm, and perform decision training processing to obtain decision training processing results;
An importance degree determining unit configured to determine importance degrees of the respective feature values according to the decision training processing result;
the decision tree construction unit is configured to perform tree arrangement on the characteristic values according to the importance degree from high to low to obtain a decision tree model.
In some embodiments, the decision training processing unit is further configured to:
carrying out category identification on residual data corresponding to each historical return data according to a price adjustment result;
determining characteristic data required by residual data corresponding to each historical return data, and carrying out numerical processing on each characteristic data to determine a characteristic value so as to obtain a decision training sample;
and inputting the decision training sample into a decision training algorithm, and performing decision training processing to obtain a decision training processing result.
In some embodiments, the importance level determining unit is configured to:
sample classification is carried out on the decision training processing results to obtain n decision training processing categories;
and (3) calculating the value probability pi of each decision training processing result in the sample set S by using an information entropy formula to obtain an information entropy result, wherein the information entropy formula is as follows: entropy (S) = Σni=1pilog2 (pi), entropy (S) is information Entropy, and i is the sequence number of the decision training processing result;
Determining a feature T corresponding to each decision training processing result sample in the sample set S, wherein the values of all the features T form a feature set value (T);
calculating the information gain acquired after splitting the sample set S based on the characteristic T: informationGain (T) = Entropy (S) - Σvalue (T) |sv|s|entropy (Sv), where InformationGain (T) is the information gain, |s| is the number of samples of S, v is a characteristic value of T; sv is a set of samples of value v of feature T in S, |sv| is the number of samples of Sv;
and taking the information gain result as the importance degree.
For convenience of description, the above devices are described as being functionally divided into various modules, respectively. Of course, the functions of each module may be implemented in the same piece or pieces of software and/or hardware when implementing the present application.
The device of the foregoing embodiment is configured to implement the corresponding method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which is not described herein.
Based on the same conception, the application also provides electronic equipment corresponding to the method of any embodiment, comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the method of any embodiment when executing the program.
Fig. 4 shows a more specific hardware architecture of an electronic device according to this embodiment, where the device may include: processor 410, memory 420, input/output interface 430, communication interface 440, and bus 450. Wherein processor 410, memory 420, input/output interface 430, and communication interface 440 enable communication connections within the device between each other via bus 450.
The processor 410 may be implemented by a general-purpose CPU (Central Processing Unit ), a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc. for executing relevant programs to implement the technical solutions provided in the embodiments of the present disclosure.
The Memory 420 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), static storage device, dynamic storage device, or the like. Memory 420 may store an operating system and other application programs, and when the technical solutions provided by the embodiments of the present specification are implemented in software or firmware, the relevant program codes are stored in memory 420 and invoked for execution by processor 410.
The input/output interface 430 is used to connect with an input/output module to realize information input and output. The input/output module may be configured as a component in a device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.
The communication interface 440 is used to connect communication modules (not shown) to enable communication interactions of the device with other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).
Bus 450 includes a path to transfer information between components of the device (e.g., processor 410, memory 420, input/output interface 430, and communication interface 440).
It should be noted that although the above device only shows the processor 410, the memory 420, the input/output interface 430, the communication interface 440, and the bus 450, in the implementation, the device may further include other components necessary to achieve normal operation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may include only the components necessary to implement the embodiments of the present description, and not all the components shown in the drawings.
The electronic device of the foregoing embodiment is configured to implement the corresponding method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which is not described herein.
Based on the same conception, corresponding to any of the above embodiments of the method, the present application also provides a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method as described in any of the above embodiments.
The computer readable media of the present embodiments, including both permanent and non-permanent, removable and non-removable media, may be used to implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device.
The storage medium of the foregoing embodiments stores computer instructions for causing the computer to perform the method of any of the foregoing embodiments, and has the advantages of the corresponding method embodiments, which are not described herein.
Based on the same conception, the application also provides a computer program product corresponding to the method of any embodiment, comprising computer program instructions, which when run on a computer, cause the computer to execute the method of any embodiment, and the method has the beneficial effects of the corresponding method embodiment, which are not repeated herein.
Those of ordinary skill in the art will appreciate that: the discussion of any of the embodiments above is merely exemplary and is not intended to suggest that the scope of the application (including the claims) is limited to these examples; the technical features of the above embodiments or in the different embodiments may also be combined within the idea of the present application, the steps may be implemented in any order, and there are many other variations of the different aspects of the embodiments of the present application as described above, which are not provided in detail for the sake of brevity.
Additionally, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures, in order to simplify the illustration and discussion, and so as not to obscure the embodiments of the present application. Furthermore, the devices may be shown in block diagram form in order to avoid obscuring the embodiments of the present application, and this also takes into account the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform on which the embodiments of the present application are to be implemented (i.e., such specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the application, it should be apparent to one skilled in the art that embodiments of the application can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative in nature and not as restrictive.
While the present application has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of those embodiments will be apparent to those skilled in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic RAM (DRAM)) may use the embodiments discussed.
The present embodiments are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Accordingly, any omissions, modifications, equivalents, improvements and/or the like which are within the spirit and principles of the embodiments are intended to be included within the scope of the present application.

Claims (10)

1. The oil price prediction method based on the combination of linear regression and decision tree is characterized by comprising the following steps:
acquiring oil price historical data in a preset time period;
determining a linear regression training sample according to the oil price historical data;
inputting the linear regression training sample into a pre-constructed initial linear regression model for training analysis;
according to the training result of the training analysis of the initial linear regression model, comparing and analyzing with the actual oil price result corresponding to the oil price historical data, correcting and adjusting the characteristic parameters of the initial linear regression model according to the comparison analysis result, and taking the initial linear regression model after final correction and adjustment as a linear regression model;
Generating oil price historical return data in the training process by utilizing the oil price historical data, recording residual errors of a training result and an actual mail result in the initial linear regression model training process, and obtaining residual error data corresponding to the historical return data;
constructing a decision tree model according to residual data corresponding to the historical return data;
acquiring oil price data to be predicted, and inputting the oil price data to be predicted into the linear regression model for prediction processing to obtain a first predicted oil value;
determining a residual prediction value corresponding to the oil price data to be predicted by utilizing the decision tree model;
and carrying out difference operation on the first predicted oil value and the residual error predicted value to obtain a final predicted oil value.
2. The method of claim 1, wherein said determining a linear regression training sample from said oil price history data comprises:
determining the category of the oil price historical data;
calculating the oil price historical data of each category by using a correlation coefficient calculation formula, and calculating the correlation coefficient between the oil price historical data of each category and the oil price;
carrying out thermodynamic diagram drawing according to the calculated correlation coefficient;
and determining the oil price historical data of at least one target class with the correlation coefficient larger than a preset value from the oil price historical data of each class according to the thermodynamic diagram as a linear regression training sample.
3. The method of claim 1, wherein said determining a linear regression training sample from said oil price history data comprises:
extracting crude oil price data from the oil price historical data and dividing the crude oil price data according to a time period;
calculating mean value data, slope data, median data and fitting median data according to the oil price historical data;
adding the mean value data, the slope data, the median data or the fitted median data to the crude oil price data as pre-training input data, wherein the mean value data, the slope data, the median data or the fitted median data respectively correspond to a group of pre-training input data;
inputting the pre-training input data into an initial linear regression model for pre-training to obtain pre-training results, wherein each group of pre-training input data corresponds to a group of pre-training results;
and comparing each group of pre-training results with the actual oil price results, and if the difference value between one group of pre-training results corresponding to the mean value data and the actual oil price results is the smallest, combining the mean value data and the crude oil price data as a linear regression training sample of an initial linear regression model to carry out input training analysis.
4. The method of claim 1, wherein determining a linear regression training sample from the oil price history data comprises:
dividing the oil price historical data according to a time period, and dividing N groups of oil price historical data;
calculating mean value data, slope data, median data and fitting median data according to the oil price historical data;
respectively combining the mean value data, the slope data, the median data or the fitting median data with N groups of oil price historical data to obtain various characteristic training data;
performing back-testing processing on the initial linear regression model by utilizing the multiple feature training data to obtain a back-testing processing result corresponding to each feature training data;
comparing the difference value between the return measurement processing result corresponding to each characteristic training data and the actual oil price result, and arranging a plurality of return measurement processing results corresponding to a plurality of characteristic training data according to the sequence of the difference value from small to large;
and selecting the characteristic training data of a preset number of types with the front arrangement sequence as a linear regression training sample.
5. The method of claim 1, wherein constructing a decision tree model from residual data corresponding to the historical return data comprises:
Determining corresponding characteristic values of residual data corresponding to the historical back-test data, inputting the characteristic values as decision training samples into a decision training algorithm, and performing decision training processing to obtain decision training processing results;
determining the importance degree of each characteristic value according to the decision training processing result;
and (3) tree-shaped arrangement is carried out on the characteristic values according to the importance degree from high to low to obtain a decision tree model.
6. The method according to claim 5, wherein determining the corresponding feature value from the residual data corresponding to the historical return data as a decision training sample is input into a decision training algorithm, and performing a decision training process to obtain a decision training process result includes:
carrying out category identification on residual data corresponding to each historical return data according to a price adjustment result;
determining characteristic data required by residual data corresponding to each historical return data, and carrying out numerical processing on each characteristic data to determine a characteristic value so as to obtain a decision training sample;
and inputting the decision training sample into a decision training algorithm, and performing decision training processing to obtain a decision training processing result.
7. The method according to claim 5 or 6, wherein determining the importance level of each feature value according to the decision training processing result comprises:
Sample classification is carried out on the decision training processing results to obtain n decision training processing categories;
and (3) calculating the value probability pi of each decision training processing result in the sample set S by using an information entropy formula to obtain an information entropy result, wherein the information entropy formula is as follows: entropy (S) = Σni=1pilog2 (pi), entropy (S) is information Entropy, and i is the sequence number of the decision training processing result;
determining a feature T corresponding to each decision training processing result sample in the sample set S, wherein the values of all the features T form a feature set value (T);
calculating the information gain acquired after splitting the sample set S based on the characteristic T: informationGain (T) = Entropy (S) - Σvalue (T) |sv|s|entropy (Sv), where InformationGain (T) is the information gain, |s| is the number of samples of S, v is a characteristic value of T; sv is a set of samples of value v of feature T in S, |sv| is the number of samples of Sv;
and taking the information gain result as an importance degree.
8. An oil price prediction device based on combination of linear regression and decision tree, comprising:
a history data acquisition module configured to acquire oil price history data within a predetermined period of time;
A linear regression sample determination module configured to determine a linear regression training sample from the oil price history data;
the linear regression training module is configured to input the linear regression training sample into a pre-constructed initial linear regression model for training analysis;
the linear regression model determining module 34 is configured to perform comparative analysis with the actual oil price result corresponding to the oil price historical data according to the training result of the training analysis of the initial linear regression model, further perform correction adjustment on the characteristic parameters of the initial linear regression model according to the comparative analysis result, and take the initial linear regression model after the final correction adjustment as a linear regression model;
the residual data determining module is configured to generate oil price historical return data in the training process by utilizing the oil price historical data, record residual errors of a training result and an actual oil price result in the initial linear regression model training process, and obtain residual error data corresponding to the historical return data;
the decision tree construction module is configured to construct a decision tree model according to residual data corresponding to the historical return data;
the linear regression prediction module is configured to acquire oil price data to be predicted, input the oil price data to be predicted into the linear regression model for prediction processing, and obtain a first predicted oil value;
A residual value prediction module configured to determine a residual prediction value corresponding to the oil price data to be predicted using the decision tree model;
and the oil price prediction module is configured to perform difference operation on the first predicted oil value and the residual error predicted value to obtain a final predicted oil value.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 7 when the program is executed by the processor.
10. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 7.
CN202310640982.XA 2023-06-01 2023-06-01 Oil price prediction method, device and equipment based on combination of linear regression and decision tree Pending CN116342172A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202310640982.XA CN116342172A (en) 2023-06-01 2023-06-01 Oil price prediction method, device and equipment based on combination of linear regression and decision tree
CN202311739198.0A CN117592012A (en) 2023-06-01 2023-12-18 Petroleum feature processing method, device and equipment based on linear regression and decision tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310640982.XA CN116342172A (en) 2023-06-01 2023-06-01 Oil price prediction method, device and equipment based on combination of linear regression and decision tree

Publications (1)

Publication Number Publication Date
CN116342172A true CN116342172A (en) 2023-06-27

Family

ID=86880875

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202310640982.XA Pending CN116342172A (en) 2023-06-01 2023-06-01 Oil price prediction method, device and equipment based on combination of linear regression and decision tree
CN202311739198.0A Pending CN117592012A (en) 2023-06-01 2023-12-18 Petroleum feature processing method, device and equipment based on linear regression and decision tree

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202311739198.0A Pending CN117592012A (en) 2023-06-01 2023-12-18 Petroleum feature processing method, device and equipment based on linear regression and decision tree

Country Status (1)

Country Link
CN (2) CN116342172A (en)

Also Published As

Publication number Publication date
CN117592012A (en) 2024-02-23

Similar Documents

Publication Publication Date Title
CN113222700B (en) Session-based recommendation method and device
CN110008973B (en) Model training method, method and device for determining target user based on model
CN114298417A (en) Anti-fraud risk assessment method, anti-fraud risk training method, anti-fraud risk assessment device, anti-fraud risk training device and readable storage medium
CN110633859B (en) Hydrologic sequence prediction method integrated by two-stage decomposition
CN113538070B (en) User life value cycle detection method and device and computer equipment
CN111079944B (en) Transfer learning model interpretation realization method and device, electronic equipment and storage medium
CN113362118B (en) User electricity consumption behavior analysis method and system based on random forest
CN115238855A (en) Completion method of time sequence knowledge graph based on graph neural network and related equipment
CN114880505A (en) Image retrieval method, device and computer program product
CN116362823A (en) Recommendation model training method, recommendation method and recommendation device for behavior sparse scene
CN113656699B (en) User feature vector determining method, related equipment and medium
CN110929285B (en) Method and device for processing private data
KR20210143460A (en) Apparatus for feature recommendation and method thereof
CN112330442A (en) Modeling method and device based on ultra-long behavior sequence, terminal and storage medium
CN112463964B (en) Text classification and model training method, device, equipment and storage medium
CN116342172A (en) Oil price prediction method, device and equipment based on combination of linear regression and decision tree
CN111783453B (en) Text emotion information processing method and device
CN113850523A (en) ESG index determining method based on data completion and related product
CN114969543B (en) Popularization method, popularization system, electronic equipment and storage medium
WO2024113641A1 (en) Video recommendation method and apparatus, and electronic device, computer-readable storage medium and computer program product
US11669681B2 (en) Automated calculation predictions with explanations
CN112541705B (en) Method, device, equipment and storage medium for generating user behavior evaluation model
CN115130537A (en) Detection method, detection device, storage medium, equipment and program product
CN118154270A (en) Resource object recommendation method, device, computer equipment and storage medium
CN116775981A (en) System recommendation method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20230627

WD01 Invention patent application deemed withdrawn after publication