CN109658241A - A kind of screw-thread steel forward price ups and downs probability forecasting method - Google Patents

A kind of screw-thread steel forward price ups and downs probability forecasting method Download PDF

Info

Publication number
CN109658241A
CN109658241A CN201811403947.1A CN201811403947A CN109658241A CN 109658241 A CN109658241 A CN 109658241A CN 201811403947 A CN201811403947 A CN 201811403947A CN 109658241 A CN109658241 A CN 109658241A
Authority
CN
China
Prior art keywords
data
node
feature
screw
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811403947.1A
Other languages
Chinese (zh)
Inventor
周振华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Zhidaochuangyu Information Technology Co Ltd
Original Assignee
Chengdu Zhidaochuangyu Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Zhidaochuangyu Information Technology Co Ltd filed Critical Chengdu Zhidaochuangyu Information Technology Co Ltd
Priority to CN201811403947.1A priority Critical patent/CN109658241A/en
Publication of CN109658241A publication Critical patent/CN109658241A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0278Product appraisal

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Game Theory and Decision Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Human Resources & Organizations (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Operations Research (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Mathematical Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Technology Law (AREA)
  • Probability & Statistics with Applications (AREA)
  • Tourism & Hospitality (AREA)
  • Algebra (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)

Abstract

The invention discloses a kind of screw-thread steel forward price ups and downs probability forecasting methods, and specific method is: screw-thread steel characteristic is collected from internet and third party database;Criterion is minimized using information gain ratio combination square error, retains the biggish feature of information gain, generates decision tree;Then the empirical entropy that each node is calculated by loss function, recursively bounces back from the leaf node of tree upwards, if all leaf nodes of some father node merged, enables to its loss function to reduce, then carries out beta pruning, father node is become new leaf node;This step is repeated, it is final to reduce over-fitting probability until that cannot continue to merge;The present invention improves the speed of screw-thread steel forecasting of futures prix, saves manual analysis cost, realizes and is manually difficult to the various dimensions big data completed statistical analysis, while model has continuous learning feature, precision of prediction can be higher and higher.

Description

A kind of screw-thread steel forward price ups and downs probability forecasting method
Technical field
The present invention relates to forecasting of futures prix fields, and in particular to a kind of screw-thread steel forward price ups and downs probabilistic forecasting side Method.
Background technique
Term definition:
Screw-thread steel: screw-thread steel is being commonly called as hot rolled ribbed bars.Common its trade mark of hot-rolled reinforced bar by HRB and the trade mark surrender Point minimum value is constituted.H, R, B are respectively the English head of hot rolling (Hotrolled), three (Ribbed) with ribbing, reinforcing bar (Bars) words Position letter.
Futures: futures (Futures) and stock are entirely different, are the goods (commodity) that can really trade, phase from stock The owner of cargo is mark if it were not for goods, but with certain mass product such as cotton, soybean, petroleum etc. and financial asset such as stock, bond etc. Standardization can trade contract.Therefore, this subject matter can be certain commodity (such as gold, crude oil, agricultural product), can also be with It is financial instrument.
Decision tree: decision tree (Decision Tree) be it is known it is various happen probability on the basis of, pass through composition Decision tree seeks the probability that the desired value of net present value (NPV) is more than or equal to zero, and assessment item risk judges the decision point of its feasibility Analysis method is a kind of intuitive graphical method for using probability analysis.Since this decision branch is drawn as figure like the branch of one tree It is dry, therefore claim decision tree.
Fitting: figuratively, fitting is exactly that point a series of in plane is connected with a smooth curve.Cause There is countless possibility for this curve, to there are various approximating methods.The curve of fitting can generally use function representation, according to this The difference of a function has different fitting names.
Machine learning: machine learning (Machine Learning, ML) is a multi-field cross discipline, is related to probability By, statistics, Approximation Theory, convextiry analysis, the multiple subjects such as algorithm complexity theory.Specialize in how computer is simulated or realized The learning behavior of the mankind reorganizes the existing structure of knowledge and is allowed to constantly improve itself to obtain new knowledge or skills Performance.
Prediction model: prediction model is when being predicted using quantitative forecast method, and most important work is to establish prediction Mathematical model.Prediction model refers to the quantitative relation for prediction, between the things described in mathematical linguistics or formula.It The inherent law between things is disclosed to a certain extent, using it as the direct basis for calculating predicted value when prediction.Therefore, it Prediction accuracy is had significant effect.Any specific prediction technique is all characterized by its specific mathematical model. There are many type of prediction technique, respectively there is corresponding prediction model.
Basis: basis be a certain particular commodity in the spot price of a certain a certain amount of time and place and forward price it Difference.Its calculation method is that spot price subtracts forward price.If spot price is lower than forward price, basis is negative value;From stock Price is higher than forward price, and basis is positive value.
Over-fitting: over-fitting refers in order to obtain unanimously hypothesis and makes to assume to become over stringent.Avoiding over-fitting is point A core missions in the design of class device.The method for increasing data volume and test sample collection is generallyd use to classifier performance progress Evaluation.
Prior art " a kind of plastic raw materials concluded price trend forecasting method and device " is by obtaining default history The order data of plastic raw materials, plastics forward price data, crude oil futures price data, bank rate data and remittance in period Rate data;Order data is screened according to preset condition;According to after screening order data, plastics forward price data, Crude oil futures price data, bank rate data and exchange rate data calculate estimating into for plastic raw materials in default future time section Hand over price.
The shortcomings that prior art:
1. not analyzing for screw-thread steel varietal characteristic, it is not useable for screw-thread steel futures advance-decline forecasting.
2. unused machine learning techniques, do not have self-study characteristic, and needs manually adjusting parameter repeatedly, it is time-consuming to take Power.
3. it is slow to analyze speed, big data scene is not applicable, and it is big to increase feature work amount.
Summary of the invention
To solve the above problems, the present invention provides a kind of screw-thread steel forward price ups and downs probability forecasting method.This programme Specific steps are as follows:
1, learning sample data collection;Including the evidence and from third party database procurement data of fetching of swashing from internet;
2, data loading: after obtaining data, being stored in database for data, and when storage arranges and calculate all characteristic values, To be subsequently used as trained and test data;
3, data characteristics is chosen and is calculated
The data of continuous a period of time in database are taken out as training dataset D;Another section is taken not repeat with data set D Data as test data set T;Input training dataset D and feature A;
Empirical entropy H (D), the feature A of data set D are calculated separately to the empirical condition entropy H of data set D (D | A), information gain G (D, A), information gain ratio gR (D | A);
4, decision-tree model generation and beta pruning
Criterion is minimized using CART algorithm and square error and generates decision tree, CART assumes that decision tree is a y-bend Tree, by recursively two points of each features, is divided into limited unit for feature space, and prediction is determined on these units Probability distribution;After having constructed decision tree, decision tree is carried out to subtract branch, noise node is removed;The beta pruning of decision tree passes through minimum Change the loss function of decision tree entirety to realize;
5, model measurement: ready test data set T before input, and the error between comparison model output and target value Value measures the quality of model training result;When predictablity rate is more than 70%, for training in next step;
6, training in rotation: the old data in data warehouse are divided into multiple groups training sample test data and complete more wheel training, and are held It is continuous to obtain the following new data generated and be used as training sample and test data, it repeats 2-5 step and iterates model training in rotation, raising Precision reaches designated value, output model;
7, latest data collection is inputted, screw-thread steel future forward price advance-decline forecasting result is exported.
In the program, swashes evidence of fetching from internet, be to crawl corresponding page using timing script and parse, after parsing Data be stored in database;Timing crawls the script with parsing data, and requests, the celery of Python can be used, Beautifulsoup4 is realized;From third party database procurement data, including free and payment uses;
Above data includes harbour inventory data, registration warehouse receipt data, in-stock data, futures data and basis data;It will Data cleansing arranges and is stored in database after daily merging for unit;When certain data sampling time unit is less than one, take The average value of the same day all data;It is greater than one day data without using any sampling time unit.
In the program, characteristic value calculation formula is as follows:
It is worth before harbour inventory change amount=harbour quantity in stock-harbour quantity in stock
It is worth before registration warehouse receipt variable quantity=registration warehouse receipt amount-registration warehouse receipt
Basis=spot price-forward price
Basis rate=basis/spot price
Opposite basis=basis-is averaged basis
Opposite basis rate=opposite basis/spot price
Other features are directed to database data value;
Other described features include stock average price on the 3rd, stock average price on the 7th, stock average price on the 15th, stock average price on the 30th.
For Feature Selection in step 3, wherein forward price data are exported as model, other characteristics are as model Input;Current data set D sample size is | D |, there is k classification Ck, | Ck| it is classification CkNumber of samples, a certain feature A has n A different value a1,a2,……,an;Data set D can be divided into n subset D according to the value of feature A1,D2,……,Dn, |Di| it is DiNumber of samples, and remember subset DiIn belong to class CkThe collection of sample be combined into Dik,|Dik| it is DikNumber of samples.
Calculate data set D empirical entropy H (D) formula be
Entropy expresses the randomness of the data sample, i.e. confusion degree.
Feature A is to the empirical condition entropy H of data set D (D | A) calculating formula
In the case that conditional entropy expresses the fixation of A characteristic value, the entropy of data set D.
Information gain g (D, A) calculating formula is
G (D, A)=H (D)-H (D | A)
Information gain expresses when learning feature A, so that the entropy of class data set D reduces degree.
Information gain gR (D | A) calculating formula is
H in above formulaA(D) empirical entropy of the training set D about the value of feature A is indicated, both A value was to timing, the experience of data set D Entropy, calculating formula are
Feature A is to the information gain of training dataset D than being defined as its information gain and value of the training set D about feature A The ratio between entropy;Information gain ratio is bigger, effective special increment, when constructing tree, can calculate on each node information and increase Beneficial ratio, and finally determine the characteristic value of each node selection.
Further, in step 4, it is as follows to generate method for decision tree:
(1), since root node, the information gain of all possible features is calculated node, selects information gain maximum Feature of the feature as node, and child node is constructed by the different values of this feature;
(2), above method is recursively called to child node, constructs decision tree;
(3), until the information gain of all features without feature optional time until;
It is as follows that the square error minimizes criterion:
Assuming that the input space is divided into M unit R1,R2,...,RM, and in each unit RmOn have a fixation Output valve cm, then regression tree can be expressed as
When the division of the input space determines, square error can be usedTo indicate regression tree pair The y in the prediction error of training data, formulaiIndicate the output feature given in data set.
Meanwhile in step 4, the beta pruning of decision tree passes through the complexity T to model on the basis of improving information gain Apply punishment, just obtain the definition of loss function:
In above formula, NtIndicate the leaf node number below present node t;Ht(T) indicate that present node t calculates downwards test The empirical entropy of data set;| T | indicate the leaf node number of whole decision tree, both model complexity;The size of α reflects pair in formula The compromise of model training collection degree of fitting and model complexity considers;Wherein Ht(T) calculating formula are as follows:
Data set T can be divided into n subset T according to the value of feature A1,T2,……,Tn, | Ti| it is TiSample Number;The process of the beta pruning exactly when α is determined, selects the smallest model of loss function, and specific algorithm is as follows:
(1), the empirical entropy of each node is calculated;
(2), it recursively bounces back upwards from the leaf node of tree, if all leaf nodes of some father node merged, energy Enough so that its loss function reduces, then beta pruning is carried out, father node is become into new leaf node;
(3), step 2 is repeated, until that cannot continue to merge.
Technical solution of the present invention bring has the beneficial effect that
1, the speed of screw-thread steel forecasting of futures prix is improved;
2, manual analysis cost is saved;
3, it realizes and is manually difficult to the various dimensions big data completed statistical analysis;
4, model has continuous learning feature, and precision of prediction can be higher and higher.
Detailed description of the invention
Fig. 1 is the flow chart of this programme.
Specific embodiment
The present invention is described in more detail with implementation method with reference to the accompanying drawing.
Fig. 1 is the flow chart of this programme, specific steps are as follows:
1, learning sample data collection;Including the evidence and from third party database procurement data of fetching of swashing from internet;
1.1 swash evidence of fetching from internet:
There is the website of some free publicity commodity datas in internet at present, and general type is to roll to refresh by the period.It can make Corresponding page is crawled with timing script and is parsed, and the data after parsing are stored in database.Timing crawls the foot with parsing data This, can be used the requests of Python, and the libraries such as celery, beautifulsoup4 are realized.
1.2 from third party database procurement data:
Part third party database has had structural data, can free or payment inquiry use.
2, data loading: after obtaining data, being stored in database for data, and when storage arranges and calculate all characteristic values, To be subsequently used as trained and test data;Characteristic value calculation formula is as follows:
It is worth before harbour inventory change amount=harbour quantity in stock-harbour quantity in stock
It is worth before registration warehouse receipt variable quantity=registration warehouse receipt amount-registration warehouse receipt
Basis=spot price-forward price
Basis rate=basis/spot price
Opposite basis=basis-is averaged basis
Opposite basis rate=opposite basis/spot price
Other features include stock average price on the 3rd, stock average price on the 7th, stock average price on the 15th, stock average price on the 30th.Directly come Derived from database data value.
3, data characteristics is chosen and is calculated
3.1 take out the data of continuous a period of time in database as training dataset D;Take another section with data set D not Duplicate data are as test data set T;Input training dataset D and feature A;
Current data set D sample size is | D |, there is k classification Ck, | Ck| it is classification CkNumber of samples, a certain feature A There are n different value a1,a2,……,an;Data set D can be divided into n subset D according to the value of feature A1,D2,……, Dn, | Di| it is DiNumber of samples, and remember subset DiIn belong to class CkThe collection of sample be combined into Dik,|Dik| it is DikNumber of samples.
3.2 calculate the empirical entropy H (D) of data set D:
Entropy expresses the randomness (confusion degree) of the data sample, both:
3.3 calculate feature A to the empirical condition entropy H of data set D (D | A):
In the case that conditional entropy expresses the fixation of A characteristic value, the entropy of data set D, both:
3.4 calculate information gain:
Information gain expresses when learning feature A, so that the entropy of class data set D reduces degree, both:
G (D, A)=H (D)-H (D | A)
3.5 calculate information gain ratio:
Using information gain as feature selecting criterion, can there are problems that the feature for being partial to select value more.It can be with This problem is compared using information gain to be corrected.Feature A is defined as its information to the information gain ratio of training dataset D and increases Benefit and the entropy of value the ratio between of the training set D about feature A, i.e.,
H in above formulaA(D) empirical entropy of the training set D about the value of feature A is indicated, both A value was to timing, the experience of data set D Entropy, calculating formula are
Information gain ratio is bigger, and effective special increment can calculate information gain when constructing tree on each node Than, and finally determine the characteristic value of each node selection.
4, decision-tree model generation and beta pruning
Criterion is minimized using CART algorithm and square error and generates decision tree, CART assumes that decision tree is a y-bend Tree, by recursively two points of each features, is divided into limited unit for feature space, and prediction is determined on these units Probability distribution;
It is as follows that decision tree generates method:
(1), since root node, the information gain of all possible features is calculated node, selects information gain maximum Feature of the feature as node, and child node is constructed by the different values of this feature;
(2), above method is recursively called to child node, constructs decision tree;
(3), until the information gain of all features without feature optional time until;
It is as follows that the square error minimizes criterion:
Assuming that the input space is divided into M unit R1,R2,...,RM, and in each unit RmOn have a fixation Output valve cm, then regression tree can be expressed as
When the division of the input space determines, square error can be usedTo indicate regression tree pair The y in the prediction error of training data, formulaiIndicate the output feature given in data set.
After having constructed decision tree, decision tree is carried out to subtract branch, noise node is removed;The beta pruning of decision tree passes through minimization The loss function of decision tree entirety is realized;
The beta pruning of decision tree applies punishment on the basis of improving information gain, through the complexity T to model, just obtains The definition of loss function:
In above formula, NtIndicate the leaf node number below present node t;Ht(T) indicate that present node t calculates downwards test The empirical entropy of data set;| T | indicate the leaf node number of whole decision tree, both model complexity;The size of α reflects pair in formula The compromise of model training collection degree of fitting and model complexity considers;Wherein Ht(T) calculating formula are as follows:
Data set T can be divided into n subset T according to the value of feature A1,T2,……,Tn, | Ti| it is TiSample Number.The process of beta pruning exactly when α is determined, selects the smallest model of loss function, and specific algorithm is as follows:
(1), the empirical entropy of each node is calculated;
(2), it recursively bounces back upwards from the leaf node of tree, if all leaf nodes of some father node merged, energy Enough so that its loss function reduces, then beta pruning is carried out, father node is become into new leaf node;
(3), step 2 is repeated, until that cannot continue to merge.
5, model measurement: ready test data set T before input, and the error between comparison model output and target value Value measures the quality of model training result;When predictablity rate is more than 70%, for training in next step;
6, training in rotation: the old data in data warehouse are divided into multiple groups training sample test data and complete more wheel training, and are held It is continuous to obtain the following new data generated and be used as training sample and test data, it repeats 2-5 step and iterates model training in rotation, raising Precision reaches designated value, output model;
7, latest data collection is inputted, screw-thread steel future forward price advance-decline forecasting result is exported.

Claims (10)

1. a kind of screw-thread steel forward price ups and downs probability forecasting method, which comprises the following steps:
(1), learning sample data collection;Including the evidence and from third party database procurement data of fetching of swashing from internet;
(2), data loading: after obtaining data, being stored in database for data, and when storage arranges and calculate all characteristic values, with Just it is subsequently used as trained and test data;
(3), data characteristics is chosen and is calculated
The data of continuous a period of time in database are taken out as training dataset D;Take another section and the unduplicated number of data set D According to as test data set T;Input training dataset D and feature A;
Calculate separately empirical entropy H (D), the feature A of data set D to the empirical condition entropy H of data set D (D | A), information gain g (D, A), information gain ratio gR (D | A);
(4), decision-tree model generation and beta pruning
Criterion is minimized using CART algorithm and square error and generates decision tree, CART assumes that decision tree is a binary tree, leads to Recursively two points of each features are crossed, feature space is divided into limited unit, and determine the probability of prediction on these units Distribution;After having constructed decision tree, decision tree is carried out to subtract branch, noise node is removed;The beta pruning of decision tree is determined by minimization The loss function of plan tree entirety is realized;
(5), model measurement: ready test data set T before input, and the error between comparison model output and target value Value measures the quality of model training result;When predictablity rate is more than 70%, for training in next step;
(6), training in rotation: the old data in data warehouse are divided into multiple groups training sample test data and complete more wheel training, and are continued It obtains the following new data generated and is used as training sample and test data, repeat 2-5 step and iterate model training in rotation, improve smart Degree reaches designated value, output model;
(7), latest data collection is inputted, screw-thread steel future forward price advance-decline forecasting result is exported.
2. a kind of screw-thread steel forward price ups and downs probability forecasting method according to claim 1, which is characterized in that it is described from Internet swashes evidence of fetching, and is to crawl corresponding page using timing script and parse, and the data after parsing are stored in database;It is fixed When crawl with parsing data script, can be used Python requests, celery, beautifulsoup4 realize;It is described From third party database procurement data, including free and payment uses;
The data include harbour inventory data, registration warehouse receipt data, in-stock data, futures data and basis data;By data Cleaning arranges and is stored in database after daily merging for unit;
When certain data sampling time unit is less than one, the average value of the same day all data is taken;
It is greater than one day data without using any sampling time unit.
3. a kind of screw-thread steel forward price ups and downs probability forecasting method according to claim 2, which is characterized in that the spy Value indicative calculation formula is as follows:
It is worth before harbour inventory change amount=harbour quantity in stock-harbour quantity in stock
It is worth before registration warehouse receipt variable quantity=registration warehouse receipt amount-registration warehouse receipt
Basis=spot price-forward price
Basis rate=basis/spot price
Opposite basis=basis-is averaged basis
Opposite basis rate=opposite basis/spot price
Other features are directed to database data value;
Other described features include stock average price on the 3rd, stock average price on the 7th, stock average price on the 15th, stock average price on the 30th.
4. a kind of screw-thread steel forward price ups and downs probability forecasting method according to claim 2, which is characterized in that the spy Sign is chosen, and wherein forward price data are exported as model, other characteristics are as mode input;Current data set D sample Capacity is | D |, there is k classification Ck, | Ck| it is classification CkNumber of samples, the value a that a certain feature A has n different1, a2,……,an;Data set D can be divided into n subset D according to the value of feature A1,D2,……,Dn, | Di| it is DiSample Number, and remember subset DiIn belong to class CkThe collection of sample be combined into Dik,|Dik| it is DikNumber of samples.
5. a kind of screw-thread steel forward price ups and downs probability forecasting method according to claim 4, which is characterized in that the number It is according to empirical entropy H (D) calculating formula for collecting D
Entropy expresses the randomness of the data sample, i.e. confusion degree.
6. a kind of screw-thread steel forward price ups and downs probability forecasting method according to claim 4, which is characterized in that the spy Sign A is to the empirical condition entropy H of data set D (D | A) calculating formula
In the case that conditional entropy expresses the fixation of A characteristic value, the entropy of data set D.
7. a kind of screw-thread steel forward price ups and downs probability forecasting method according to claim 4, which is characterized in that the letter Ceasing gain g (D, A) calculating formula is
G (D, A)=H (D)-H (D | A)
Information gain expresses when learning feature A, so that the entropy of class data set D reduces degree.
8. a kind of screw-thread steel forward price ups and downs probability forecasting method according to claim 4, which is characterized in that the letter Breath gain gR (D | A) calculating formula is
H in above formulaA(D) empirical entropy of the training set D about the value of feature A is indicated, both A value was given periodically, the empirical entropy of data set D, Calculating formula is
Feature A is to the information gain of training dataset D than being defined as its information gain and entropy of the training set D about the value of feature A The ratio between;Information gain ratio is bigger, and effective special increment can calculate information gain ratio when constructing tree on each node, And finally determine the characteristic value of each node selection.
9. a kind of screw-thread steel forward price ups and downs probability forecasting method according to claim 8, which is characterized in that described to determine Plan tree generation method is as follows:
(1), since root node, the information gain of all possible features is calculated node, the maximum feature of information gain is selected Child node is constructed as the feature of node, and by the different values of this feature;
(2), above method is recursively called to child node, constructs decision tree;
(3), until the information gain of all features without feature optional time until;
It is as follows that the square error minimizes criterion:
Assuming that the input space is divided into M unit R1,R2,...,RM, and in each unit RmOn have one it is fixed defeated Value c outm, then regression tree can be expressed as
When the division of the input space determines, square error can be usedTo indicate regression tree for training The prediction error of data, y in formulaiIndicate the output feature given in data set.
10. a kind of screw-thread steel forward price ups and downs probability forecasting method according to claim 9, which is characterized in that described The beta pruning of decision tree applies punishment on the basis of improving information gain, through the complexity T to model, has just obtained loss letter Several definition:
In above formula, NtIndicate the leaf node number below present node t;Ht(T) indicate that present node t calculates downwards test data The empirical entropy of collection;| T | indicate the leaf node number of whole decision tree, both model complexity;The size of α is reflected to model in formula The compromise of training set degree of fitting and model complexity considers;Wherein Ht(T) calculating formula are as follows:
Data set T can be divided into n subset T according to the value of feature A1,T2,……,Tn, | Ti| it is TiNumber of samples;
The process of the beta pruning exactly when α is determined, selects the smallest model of loss function, and specific algorithm is as follows:
(1), the empirical entropy of each node is calculated;
(2), it recursively bounces back from the leaf node of tree, if all leaf nodes of some father node merged, can make upwards It obtains its loss function to reduce, then carries out beta pruning, father node is become into new leaf node;
(3), step 2 is repeated, until that cannot continue to merge.
CN201811403947.1A 2018-11-23 2018-11-23 A kind of screw-thread steel forward price ups and downs probability forecasting method Pending CN109658241A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811403947.1A CN109658241A (en) 2018-11-23 2018-11-23 A kind of screw-thread steel forward price ups and downs probability forecasting method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811403947.1A CN109658241A (en) 2018-11-23 2018-11-23 A kind of screw-thread steel forward price ups and downs probability forecasting method

Publications (1)

Publication Number Publication Date
CN109658241A true CN109658241A (en) 2019-04-19

Family

ID=66112435

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811403947.1A Pending CN109658241A (en) 2018-11-23 2018-11-23 A kind of screw-thread steel forward price ups and downs probability forecasting method

Country Status (1)

Country Link
CN (1) CN109658241A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110288482A (en) * 2019-07-02 2019-09-27 欧冶云商股份有限公司 Steel mill's futures exchange method and system
CN111062477A (en) * 2019-12-17 2020-04-24 腾讯云计算(北京)有限责任公司 Data processing method, device and storage medium
CN111861750A (en) * 2020-07-22 2020-10-30 北京睿知图远科技有限公司 Feature derivation system based on decision tree method and readable storage medium
CN112329809A (en) * 2020-09-25 2021-02-05 国网辽宁省电力有限公司大连供电公司 High-voltage circuit breaker fault diagnosis method based on decision tree algorithm
CN112489733A (en) * 2020-12-14 2021-03-12 郑州轻工业大学 Octane number loss prediction method based on particle swarm algorithm and neural network
CN116720356A (en) * 2023-06-08 2023-09-08 中国汽车工程研究院股份有限公司 Design method of active safety module of vehicle based on accident damage prediction of cyclist

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110288482A (en) * 2019-07-02 2019-09-27 欧冶云商股份有限公司 Steel mill's futures exchange method and system
CN110288482B (en) * 2019-07-02 2021-03-30 欧冶云商股份有限公司 Steel mill futures trading method and system
CN111062477A (en) * 2019-12-17 2020-04-24 腾讯云计算(北京)有限责任公司 Data processing method, device and storage medium
CN111062477B (en) * 2019-12-17 2023-12-08 腾讯云计算(北京)有限责任公司 Data processing method, device and storage medium
CN111861750A (en) * 2020-07-22 2020-10-30 北京睿知图远科技有限公司 Feature derivation system based on decision tree method and readable storage medium
CN112329809A (en) * 2020-09-25 2021-02-05 国网辽宁省电力有限公司大连供电公司 High-voltage circuit breaker fault diagnosis method based on decision tree algorithm
CN112489733A (en) * 2020-12-14 2021-03-12 郑州轻工业大学 Octane number loss prediction method based on particle swarm algorithm and neural network
CN112489733B (en) * 2020-12-14 2023-04-18 郑州轻工业大学 Octane number loss prediction method based on particle swarm algorithm and neural network
CN116720356A (en) * 2023-06-08 2023-09-08 中国汽车工程研究院股份有限公司 Design method of active safety module of vehicle based on accident damage prediction of cyclist

Similar Documents

Publication Publication Date Title
CN109658241A (en) A kind of screw-thread steel forward price ups and downs probability forecasting method
Li et al. The role of text-extracted investor sentiment in Chinese stock price prediction with the enhancement of deep learning
Ouyang et al. Agricultural commodity futures prices prediction via long-and short-term time series network
Moody et al. Architecture selection strategies for neural networks: Application to corporate bond rating prediction
Zhao et al. Sales forecast in e-commerce using convolutional neural network
Pham et al. Efficient estimation and optimization of building costs using machine learning
Nassar et al. Deep learning based approach for fresh produce market price prediction
Sbrana et al. Short-term inflation forecasting: the META approach
Wang et al. Cryptocurrency price prediction based on multiple market sentiment
Xu et al. An optimized decomposition integration framework for carbon price prediction based on multi-factor two-stage feature dimension reduction
He et al. End-to-end probabilistic forecasting of electricity price via convolutional neural network and label distribution learning
Houetohossou et al. Deep learning methods for biotic and abiotic stresses detection and classification in fruits and vegetables: State of the art and perspectives
Lalwani et al. The cross-section of Indian stock returns: evidence using machine learning
CN117807302B (en) Customer information processing method and device
Schosser Tensor extrapolation: Forecasting large-scale relational data
Feng et al. Predicting book sales trend using deep learning framework
Pattewar et al. Stock prediction analysis by customers opinion in Twitter data using an optimized intelligent model
Taherkhani et al. Intelligent decision support system using nested ensemble approach for customer churn in the hotel industry
O'Leary et al. An evaluation of machine learning approaches for milk volume prediction in Ireland
Kang et al. Predicting Stock Closing Price with Stock Network Public Opinion Based on AdaBoost-IWOA-Elman Model and CEEMDAN Algorithm
CN115841345A (en) Cross-border big data intelligent analysis method, system and storage medium
CN115187312A (en) Customer loss prediction method and system based on deep learning
Devyatkin et al. Neural networks for food export gain forecasting
CN111091410B (en) Node embedding and user behavior characteristic combined net point sales prediction method
Kangane et al. Analysis of different regression models for real estate price prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190419

RJ01 Rejection of invention patent application after publication