CN109658241A - A kind of screw-thread steel forward price ups and downs probability forecasting method - Google Patents
A kind of screw-thread steel forward price ups and downs probability forecasting method Download PDFInfo
- Publication number
- CN109658241A CN109658241A CN201811403947.1A CN201811403947A CN109658241A CN 109658241 A CN109658241 A CN 109658241A CN 201811403947 A CN201811403947 A CN 201811403947A CN 109658241 A CN109658241 A CN 109658241A
- Authority
- CN
- China
- Prior art keywords
- data
- node
- feature
- screw
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 229910000831 Steel Inorganic materials 0.000 title claims abstract description 25
- 239000010959 steel Substances 0.000 title claims abstract description 25
- 238000013277 forecasting method Methods 0.000 title claims abstract description 15
- 238000003066 decision tree Methods 0.000 claims abstract description 37
- 238000000034 method Methods 0.000 claims abstract description 20
- 238000013138 pruning Methods 0.000 claims abstract description 16
- 230000006870 function Effects 0.000 claims abstract description 15
- 238000012549 training Methods 0.000 claims description 42
- 238000012360 testing method Methods 0.000 claims description 19
- 238000004422 calculation algorithm Methods 0.000 claims description 7
- 238000013480 data collection Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 4
- 235000015849 Apium graveolens Dulce Group Nutrition 0.000 claims description 3
- 235000010591 Appio Nutrition 0.000 claims description 3
- 238000005259 measurement Methods 0.000 claims description 3
- 244000101724 Apium graveolens Dulce Group Species 0.000 claims 1
- 238000004140 cleaning Methods 0.000 claims 1
- 238000004458 analytical method Methods 0.000 abstract description 5
- 238000007619 statistical method Methods 0.000 abstract description 2
- 239000004033 plastic Substances 0.000 description 5
- 229920003023 plastic Polymers 0.000 description 5
- 239000002994 raw material Substances 0.000 description 5
- 238000010801 machine learning Methods 0.000 description 4
- 239000010779 crude oil Substances 0.000 description 3
- 240000007087 Apium graveolens Species 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000013178 mathematical model Methods 0.000 description 2
- 229920000742 Cotton Polymers 0.000 description 1
- 241001269238 Data Species 0.000 description 1
- 244000068988 Glycine max Species 0.000 description 1
- 235000010469 Glycine max Nutrition 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 239000010931 gold Substances 0.000 description 1
- 229910052737 gold Inorganic materials 0.000 description 1
- 238000005098 hot rolling Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 239000003208 petroleum Substances 0.000 description 1
- 230000003014 reinforcing effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/04—Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0278—Product appraisal
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Theoretical Computer Science (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- Game Theory and Decision Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Physics (AREA)
- Human Resources & Organizations (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Operations Research (AREA)
- Entrepreneurship & Innovation (AREA)
- Mathematical Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Technology Law (AREA)
- Probability & Statistics with Applications (AREA)
- Tourism & Hospitality (AREA)
- Algebra (AREA)
- Evolutionary Biology (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
Abstract
The invention discloses a kind of screw-thread steel forward price ups and downs probability forecasting methods, and specific method is: screw-thread steel characteristic is collected from internet and third party database;Criterion is minimized using information gain ratio combination square error, retains the biggish feature of information gain, generates decision tree;Then the empirical entropy that each node is calculated by loss function, recursively bounces back from the leaf node of tree upwards, if all leaf nodes of some father node merged, enables to its loss function to reduce, then carries out beta pruning, father node is become new leaf node;This step is repeated, it is final to reduce over-fitting probability until that cannot continue to merge;The present invention improves the speed of screw-thread steel forecasting of futures prix, saves manual analysis cost, realizes and is manually difficult to the various dimensions big data completed statistical analysis, while model has continuous learning feature, precision of prediction can be higher and higher.
Description
Technical field
The present invention relates to forecasting of futures prix fields, and in particular to a kind of screw-thread steel forward price ups and downs probabilistic forecasting side
Method.
Background technique
Term definition:
Screw-thread steel: screw-thread steel is being commonly called as hot rolled ribbed bars.Common its trade mark of hot-rolled reinforced bar by HRB and the trade mark surrender
Point minimum value is constituted.H, R, B are respectively the English head of hot rolling (Hotrolled), three (Ribbed) with ribbing, reinforcing bar (Bars) words
Position letter.
Futures: futures (Futures) and stock are entirely different, are the goods (commodity) that can really trade, phase from stock
The owner of cargo is mark if it were not for goods, but with certain mass product such as cotton, soybean, petroleum etc. and financial asset such as stock, bond etc.
Standardization can trade contract.Therefore, this subject matter can be certain commodity (such as gold, crude oil, agricultural product), can also be with
It is financial instrument.
Decision tree: decision tree (Decision Tree) be it is known it is various happen probability on the basis of, pass through composition
Decision tree seeks the probability that the desired value of net present value (NPV) is more than or equal to zero, and assessment item risk judges the decision point of its feasibility
Analysis method is a kind of intuitive graphical method for using probability analysis.Since this decision branch is drawn as figure like the branch of one tree
It is dry, therefore claim decision tree.
Fitting: figuratively, fitting is exactly that point a series of in plane is connected with a smooth curve.Cause
There is countless possibility for this curve, to there are various approximating methods.The curve of fitting can generally use function representation, according to this
The difference of a function has different fitting names.
Machine learning: machine learning (Machine Learning, ML) is a multi-field cross discipline, is related to probability
By, statistics, Approximation Theory, convextiry analysis, the multiple subjects such as algorithm complexity theory.Specialize in how computer is simulated or realized
The learning behavior of the mankind reorganizes the existing structure of knowledge and is allowed to constantly improve itself to obtain new knowledge or skills
Performance.
Prediction model: prediction model is when being predicted using quantitative forecast method, and most important work is to establish prediction
Mathematical model.Prediction model refers to the quantitative relation for prediction, between the things described in mathematical linguistics or formula.It
The inherent law between things is disclosed to a certain extent, using it as the direct basis for calculating predicted value when prediction.Therefore, it
Prediction accuracy is had significant effect.Any specific prediction technique is all characterized by its specific mathematical model.
There are many type of prediction technique, respectively there is corresponding prediction model.
Basis: basis be a certain particular commodity in the spot price of a certain a certain amount of time and place and forward price it
Difference.Its calculation method is that spot price subtracts forward price.If spot price is lower than forward price, basis is negative value;From stock
Price is higher than forward price, and basis is positive value.
Over-fitting: over-fitting refers in order to obtain unanimously hypothesis and makes to assume to become over stringent.Avoiding over-fitting is point
A core missions in the design of class device.The method for increasing data volume and test sample collection is generallyd use to classifier performance progress
Evaluation.
Prior art " a kind of plastic raw materials concluded price trend forecasting method and device " is by obtaining default history
The order data of plastic raw materials, plastics forward price data, crude oil futures price data, bank rate data and remittance in period
Rate data;Order data is screened according to preset condition;According to after screening order data, plastics forward price data,
Crude oil futures price data, bank rate data and exchange rate data calculate estimating into for plastic raw materials in default future time section
Hand over price.
The shortcomings that prior art:
1. not analyzing for screw-thread steel varietal characteristic, it is not useable for screw-thread steel futures advance-decline forecasting.
2. unused machine learning techniques, do not have self-study characteristic, and needs manually adjusting parameter repeatedly, it is time-consuming to take
Power.
3. it is slow to analyze speed, big data scene is not applicable, and it is big to increase feature work amount.
Summary of the invention
To solve the above problems, the present invention provides a kind of screw-thread steel forward price ups and downs probability forecasting method.This programme
Specific steps are as follows:
1, learning sample data collection;Including the evidence and from third party database procurement data of fetching of swashing from internet;
2, data loading: after obtaining data, being stored in database for data, and when storage arranges and calculate all characteristic values,
To be subsequently used as trained and test data;
3, data characteristics is chosen and is calculated
The data of continuous a period of time in database are taken out as training dataset D;Another section is taken not repeat with data set D
Data as test data set T;Input training dataset D and feature A;
Empirical entropy H (D), the feature A of data set D are calculated separately to the empirical condition entropy H of data set D (D | A), information gain
G (D, A), information gain ratio gR (D | A);
4, decision-tree model generation and beta pruning
Criterion is minimized using CART algorithm and square error and generates decision tree, CART assumes that decision tree is a y-bend
Tree, by recursively two points of each features, is divided into limited unit for feature space, and prediction is determined on these units
Probability distribution;After having constructed decision tree, decision tree is carried out to subtract branch, noise node is removed;The beta pruning of decision tree passes through minimum
Change the loss function of decision tree entirety to realize;
5, model measurement: ready test data set T before input, and the error between comparison model output and target value
Value measures the quality of model training result;When predictablity rate is more than 70%, for training in next step;
6, training in rotation: the old data in data warehouse are divided into multiple groups training sample test data and complete more wheel training, and are held
It is continuous to obtain the following new data generated and be used as training sample and test data, it repeats 2-5 step and iterates model training in rotation, raising
Precision reaches designated value, output model;
7, latest data collection is inputted, screw-thread steel future forward price advance-decline forecasting result is exported.
In the program, swashes evidence of fetching from internet, be to crawl corresponding page using timing script and parse, after parsing
Data be stored in database;Timing crawls the script with parsing data, and requests, the celery of Python can be used,
Beautifulsoup4 is realized;From third party database procurement data, including free and payment uses;
Above data includes harbour inventory data, registration warehouse receipt data, in-stock data, futures data and basis data;It will
Data cleansing arranges and is stored in database after daily merging for unit;When certain data sampling time unit is less than one, take
The average value of the same day all data;It is greater than one day data without using any sampling time unit.
In the program, characteristic value calculation formula is as follows:
It is worth before harbour inventory change amount=harbour quantity in stock-harbour quantity in stock
It is worth before registration warehouse receipt variable quantity=registration warehouse receipt amount-registration warehouse receipt
Basis=spot price-forward price
Basis rate=basis/spot price
Opposite basis=basis-is averaged basis
Opposite basis rate=opposite basis/spot price
Other features are directed to database data value;
Other described features include stock average price on the 3rd, stock average price on the 7th, stock average price on the 15th, stock average price on the 30th.
For Feature Selection in step 3, wherein forward price data are exported as model, other characteristics are as model
Input;Current data set D sample size is | D |, there is k classification Ck, | Ck| it is classification CkNumber of samples, a certain feature A has n
A different value a1,a2,……,an;Data set D can be divided into n subset D according to the value of feature A1,D2,……,Dn,
|Di| it is DiNumber of samples, and remember subset DiIn belong to class CkThe collection of sample be combined into Dik,|Dik| it is DikNumber of samples.
Calculate data set D empirical entropy H (D) formula be
Entropy expresses the randomness of the data sample, i.e. confusion degree.
Feature A is to the empirical condition entropy H of data set D (D | A) calculating formula
In the case that conditional entropy expresses the fixation of A characteristic value, the entropy of data set D.
Information gain g (D, A) calculating formula is
G (D, A)=H (D)-H (D | A)
Information gain expresses when learning feature A, so that the entropy of class data set D reduces degree.
Information gain gR (D | A) calculating formula is
H in above formulaA(D) empirical entropy of the training set D about the value of feature A is indicated, both A value was to timing, the experience of data set D
Entropy, calculating formula are
Feature A is to the information gain of training dataset D than being defined as its information gain and value of the training set D about feature A
The ratio between entropy;Information gain ratio is bigger, effective special increment, when constructing tree, can calculate on each node information and increase
Beneficial ratio, and finally determine the characteristic value of each node selection.
Further, in step 4, it is as follows to generate method for decision tree:
(1), since root node, the information gain of all possible features is calculated node, selects information gain maximum
Feature of the feature as node, and child node is constructed by the different values of this feature;
(2), above method is recursively called to child node, constructs decision tree;
(3), until the information gain of all features without feature optional time until;
It is as follows that the square error minimizes criterion:
Assuming that the input space is divided into M unit R1,R2,...,RM, and in each unit RmOn have a fixation
Output valve cm, then regression tree can be expressed as
When the division of the input space determines, square error can be usedTo indicate regression tree pair
The y in the prediction error of training data, formulaiIndicate the output feature given in data set.
Meanwhile in step 4, the beta pruning of decision tree passes through the complexity T to model on the basis of improving information gain
Apply punishment, just obtain the definition of loss function:
In above formula, NtIndicate the leaf node number below present node t;Ht(T) indicate that present node t calculates downwards test
The empirical entropy of data set;| T | indicate the leaf node number of whole decision tree, both model complexity;The size of α reflects pair in formula
The compromise of model training collection degree of fitting and model complexity considers;Wherein Ht(T) calculating formula are as follows:
Data set T can be divided into n subset T according to the value of feature A1,T2,……,Tn, | Ti| it is TiSample
Number;The process of the beta pruning exactly when α is determined, selects the smallest model of loss function, and specific algorithm is as follows:
(1), the empirical entropy of each node is calculated;
(2), it recursively bounces back upwards from the leaf node of tree, if all leaf nodes of some father node merged, energy
Enough so that its loss function reduces, then beta pruning is carried out, father node is become into new leaf node;
(3), step 2 is repeated, until that cannot continue to merge.
Technical solution of the present invention bring has the beneficial effect that
1, the speed of screw-thread steel forecasting of futures prix is improved;
2, manual analysis cost is saved;
3, it realizes and is manually difficult to the various dimensions big data completed statistical analysis;
4, model has continuous learning feature, and precision of prediction can be higher and higher.
Detailed description of the invention
Fig. 1 is the flow chart of this programme.
Specific embodiment
The present invention is described in more detail with implementation method with reference to the accompanying drawing.
Fig. 1 is the flow chart of this programme, specific steps are as follows:
1, learning sample data collection;Including the evidence and from third party database procurement data of fetching of swashing from internet;
1.1 swash evidence of fetching from internet:
There is the website of some free publicity commodity datas in internet at present, and general type is to roll to refresh by the period.It can make
Corresponding page is crawled with timing script and is parsed, and the data after parsing are stored in database.Timing crawls the foot with parsing data
This, can be used the requests of Python, and the libraries such as celery, beautifulsoup4 are realized.
1.2 from third party database procurement data:
Part third party database has had structural data, can free or payment inquiry use.
2, data loading: after obtaining data, being stored in database for data, and when storage arranges and calculate all characteristic values,
To be subsequently used as trained and test data;Characteristic value calculation formula is as follows:
It is worth before harbour inventory change amount=harbour quantity in stock-harbour quantity in stock
It is worth before registration warehouse receipt variable quantity=registration warehouse receipt amount-registration warehouse receipt
Basis=spot price-forward price
Basis rate=basis/spot price
Opposite basis=basis-is averaged basis
Opposite basis rate=opposite basis/spot price
Other features include stock average price on the 3rd, stock average price on the 7th, stock average price on the 15th, stock average price on the 30th.Directly come
Derived from database data value.
3, data characteristics is chosen and is calculated
3.1 take out the data of continuous a period of time in database as training dataset D;Take another section with data set D not
Duplicate data are as test data set T;Input training dataset D and feature A;
Current data set D sample size is | D |, there is k classification Ck, | Ck| it is classification CkNumber of samples, a certain feature A
There are n different value a1,a2,……,an;Data set D can be divided into n subset D according to the value of feature A1,D2,……,
Dn, | Di| it is DiNumber of samples, and remember subset DiIn belong to class CkThe collection of sample be combined into Dik,|Dik| it is DikNumber of samples.
3.2 calculate the empirical entropy H (D) of data set D:
Entropy expresses the randomness (confusion degree) of the data sample, both:
3.3 calculate feature A to the empirical condition entropy H of data set D (D | A):
In the case that conditional entropy expresses the fixation of A characteristic value, the entropy of data set D, both:
3.4 calculate information gain:
Information gain expresses when learning feature A, so that the entropy of class data set D reduces degree, both:
G (D, A)=H (D)-H (D | A)
3.5 calculate information gain ratio:
Using information gain as feature selecting criterion, can there are problems that the feature for being partial to select value more.It can be with
This problem is compared using information gain to be corrected.Feature A is defined as its information to the information gain ratio of training dataset D and increases
Benefit and the entropy of value the ratio between of the training set D about feature A, i.e.,
H in above formulaA(D) empirical entropy of the training set D about the value of feature A is indicated, both A value was to timing, the experience of data set D
Entropy, calculating formula are
Information gain ratio is bigger, and effective special increment can calculate information gain when constructing tree on each node
Than, and finally determine the characteristic value of each node selection.
4, decision-tree model generation and beta pruning
Criterion is minimized using CART algorithm and square error and generates decision tree, CART assumes that decision tree is a y-bend
Tree, by recursively two points of each features, is divided into limited unit for feature space, and prediction is determined on these units
Probability distribution;
It is as follows that decision tree generates method:
(1), since root node, the information gain of all possible features is calculated node, selects information gain maximum
Feature of the feature as node, and child node is constructed by the different values of this feature;
(2), above method is recursively called to child node, constructs decision tree;
(3), until the information gain of all features without feature optional time until;
It is as follows that the square error minimizes criterion:
Assuming that the input space is divided into M unit R1,R2,...,RM, and in each unit RmOn have a fixation
Output valve cm, then regression tree can be expressed as
When the division of the input space determines, square error can be usedTo indicate regression tree pair
The y in the prediction error of training data, formulaiIndicate the output feature given in data set.
After having constructed decision tree, decision tree is carried out to subtract branch, noise node is removed;The beta pruning of decision tree passes through minimization
The loss function of decision tree entirety is realized;
The beta pruning of decision tree applies punishment on the basis of improving information gain, through the complexity T to model, just obtains
The definition of loss function:
In above formula, NtIndicate the leaf node number below present node t;Ht(T) indicate that present node t calculates downwards test
The empirical entropy of data set;| T | indicate the leaf node number of whole decision tree, both model complexity;The size of α reflects pair in formula
The compromise of model training collection degree of fitting and model complexity considers;Wherein Ht(T) calculating formula are as follows:
Data set T can be divided into n subset T according to the value of feature A1,T2,……,Tn, | Ti| it is TiSample
Number.The process of beta pruning exactly when α is determined, selects the smallest model of loss function, and specific algorithm is as follows:
(1), the empirical entropy of each node is calculated;
(2), it recursively bounces back upwards from the leaf node of tree, if all leaf nodes of some father node merged, energy
Enough so that its loss function reduces, then beta pruning is carried out, father node is become into new leaf node;
(3), step 2 is repeated, until that cannot continue to merge.
5, model measurement: ready test data set T before input, and the error between comparison model output and target value
Value measures the quality of model training result;When predictablity rate is more than 70%, for training in next step;
6, training in rotation: the old data in data warehouse are divided into multiple groups training sample test data and complete more wheel training, and are held
It is continuous to obtain the following new data generated and be used as training sample and test data, it repeats 2-5 step and iterates model training in rotation, raising
Precision reaches designated value, output model;
7, latest data collection is inputted, screw-thread steel future forward price advance-decline forecasting result is exported.
Claims (10)
1. a kind of screw-thread steel forward price ups and downs probability forecasting method, which comprises the following steps:
(1), learning sample data collection;Including the evidence and from third party database procurement data of fetching of swashing from internet;
(2), data loading: after obtaining data, being stored in database for data, and when storage arranges and calculate all characteristic values, with
Just it is subsequently used as trained and test data;
(3), data characteristics is chosen and is calculated
The data of continuous a period of time in database are taken out as training dataset D;Take another section and the unduplicated number of data set D
According to as test data set T;Input training dataset D and feature A;
Calculate separately empirical entropy H (D), the feature A of data set D to the empirical condition entropy H of data set D (D | A), information gain g (D,
A), information gain ratio gR (D | A);
(4), decision-tree model generation and beta pruning
Criterion is minimized using CART algorithm and square error and generates decision tree, CART assumes that decision tree is a binary tree, leads to
Recursively two points of each features are crossed, feature space is divided into limited unit, and determine the probability of prediction on these units
Distribution;After having constructed decision tree, decision tree is carried out to subtract branch, noise node is removed;The beta pruning of decision tree is determined by minimization
The loss function of plan tree entirety is realized;
(5), model measurement: ready test data set T before input, and the error between comparison model output and target value
Value measures the quality of model training result;When predictablity rate is more than 70%, for training in next step;
(6), training in rotation: the old data in data warehouse are divided into multiple groups training sample test data and complete more wheel training, and are continued
It obtains the following new data generated and is used as training sample and test data, repeat 2-5 step and iterate model training in rotation, improve smart
Degree reaches designated value, output model;
(7), latest data collection is inputted, screw-thread steel future forward price advance-decline forecasting result is exported.
2. a kind of screw-thread steel forward price ups and downs probability forecasting method according to claim 1, which is characterized in that it is described from
Internet swashes evidence of fetching, and is to crawl corresponding page using timing script and parse, and the data after parsing are stored in database;It is fixed
When crawl with parsing data script, can be used Python requests, celery, beautifulsoup4 realize;It is described
From third party database procurement data, including free and payment uses;
The data include harbour inventory data, registration warehouse receipt data, in-stock data, futures data and basis data;By data
Cleaning arranges and is stored in database after daily merging for unit;
When certain data sampling time unit is less than one, the average value of the same day all data is taken;
It is greater than one day data without using any sampling time unit.
3. a kind of screw-thread steel forward price ups and downs probability forecasting method according to claim 2, which is characterized in that the spy
Value indicative calculation formula is as follows:
It is worth before harbour inventory change amount=harbour quantity in stock-harbour quantity in stock
It is worth before registration warehouse receipt variable quantity=registration warehouse receipt amount-registration warehouse receipt
Basis=spot price-forward price
Basis rate=basis/spot price
Opposite basis=basis-is averaged basis
Opposite basis rate=opposite basis/spot price
Other features are directed to database data value;
Other described features include stock average price on the 3rd, stock average price on the 7th, stock average price on the 15th, stock average price on the 30th.
4. a kind of screw-thread steel forward price ups and downs probability forecasting method according to claim 2, which is characterized in that the spy
Sign is chosen, and wherein forward price data are exported as model, other characteristics are as mode input;Current data set D sample
Capacity is | D |, there is k classification Ck, | Ck| it is classification CkNumber of samples, the value a that a certain feature A has n different1,
a2,……,an;Data set D can be divided into n subset D according to the value of feature A1,D2,……,Dn, | Di| it is DiSample
Number, and remember subset DiIn belong to class CkThe collection of sample be combined into Dik,|Dik| it is DikNumber of samples.
5. a kind of screw-thread steel forward price ups and downs probability forecasting method according to claim 4, which is characterized in that the number
It is according to empirical entropy H (D) calculating formula for collecting D
Entropy expresses the randomness of the data sample, i.e. confusion degree.
6. a kind of screw-thread steel forward price ups and downs probability forecasting method according to claim 4, which is characterized in that the spy
Sign A is to the empirical condition entropy H of data set D (D | A) calculating formula
In the case that conditional entropy expresses the fixation of A characteristic value, the entropy of data set D.
7. a kind of screw-thread steel forward price ups and downs probability forecasting method according to claim 4, which is characterized in that the letter
Ceasing gain g (D, A) calculating formula is
G (D, A)=H (D)-H (D | A)
Information gain expresses when learning feature A, so that the entropy of class data set D reduces degree.
8. a kind of screw-thread steel forward price ups and downs probability forecasting method according to claim 4, which is characterized in that the letter
Breath gain gR (D | A) calculating formula is
H in above formulaA(D) empirical entropy of the training set D about the value of feature A is indicated, both A value was given periodically, the empirical entropy of data set D,
Calculating formula is
Feature A is to the information gain of training dataset D than being defined as its information gain and entropy of the training set D about the value of feature A
The ratio between;Information gain ratio is bigger, and effective special increment can calculate information gain ratio when constructing tree on each node,
And finally determine the characteristic value of each node selection.
9. a kind of screw-thread steel forward price ups and downs probability forecasting method according to claim 8, which is characterized in that described to determine
Plan tree generation method is as follows:
(1), since root node, the information gain of all possible features is calculated node, the maximum feature of information gain is selected
Child node is constructed as the feature of node, and by the different values of this feature;
(2), above method is recursively called to child node, constructs decision tree;
(3), until the information gain of all features without feature optional time until;
It is as follows that the square error minimizes criterion:
Assuming that the input space is divided into M unit R1,R2,...,RM, and in each unit RmOn have one it is fixed defeated
Value c outm, then regression tree can be expressed as
When the division of the input space determines, square error can be usedTo indicate regression tree for training
The prediction error of data, y in formulaiIndicate the output feature given in data set.
10. a kind of screw-thread steel forward price ups and downs probability forecasting method according to claim 9, which is characterized in that described
The beta pruning of decision tree applies punishment on the basis of improving information gain, through the complexity T to model, has just obtained loss letter
Several definition:
In above formula, NtIndicate the leaf node number below present node t;Ht(T) indicate that present node t calculates downwards test data
The empirical entropy of collection;| T | indicate the leaf node number of whole decision tree, both model complexity;The size of α is reflected to model in formula
The compromise of training set degree of fitting and model complexity considers;Wherein Ht(T) calculating formula are as follows:
Data set T can be divided into n subset T according to the value of feature A1,T2,……,Tn, | Ti| it is TiNumber of samples;
The process of the beta pruning exactly when α is determined, selects the smallest model of loss function, and specific algorithm is as follows:
(1), the empirical entropy of each node is calculated;
(2), it recursively bounces back from the leaf node of tree, if all leaf nodes of some father node merged, can make upwards
It obtains its loss function to reduce, then carries out beta pruning, father node is become into new leaf node;
(3), step 2 is repeated, until that cannot continue to merge.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811403947.1A CN109658241A (en) | 2018-11-23 | 2018-11-23 | A kind of screw-thread steel forward price ups and downs probability forecasting method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811403947.1A CN109658241A (en) | 2018-11-23 | 2018-11-23 | A kind of screw-thread steel forward price ups and downs probability forecasting method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109658241A true CN109658241A (en) | 2019-04-19 |
Family
ID=66112435
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811403947.1A Pending CN109658241A (en) | 2018-11-23 | 2018-11-23 | A kind of screw-thread steel forward price ups and downs probability forecasting method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109658241A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110288482A (en) * | 2019-07-02 | 2019-09-27 | 欧冶云商股份有限公司 | Steel mill's futures exchange method and system |
CN111062477A (en) * | 2019-12-17 | 2020-04-24 | 腾讯云计算(北京)有限责任公司 | Data processing method, device and storage medium |
CN111861750A (en) * | 2020-07-22 | 2020-10-30 | 北京睿知图远科技有限公司 | Feature derivation system based on decision tree method and readable storage medium |
CN112329809A (en) * | 2020-09-25 | 2021-02-05 | 国网辽宁省电力有限公司大连供电公司 | High-voltage circuit breaker fault diagnosis method based on decision tree algorithm |
CN112489733A (en) * | 2020-12-14 | 2021-03-12 | 郑州轻工业大学 | Octane number loss prediction method based on particle swarm algorithm and neural network |
CN116720356A (en) * | 2023-06-08 | 2023-09-08 | 中国汽车工程研究院股份有限公司 | Design method of active safety module of vehicle based on accident damage prediction of cyclist |
-
2018
- 2018-11-23 CN CN201811403947.1A patent/CN109658241A/en active Pending
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110288482A (en) * | 2019-07-02 | 2019-09-27 | 欧冶云商股份有限公司 | Steel mill's futures exchange method and system |
CN110288482B (en) * | 2019-07-02 | 2021-03-30 | 欧冶云商股份有限公司 | Steel mill futures trading method and system |
CN111062477A (en) * | 2019-12-17 | 2020-04-24 | 腾讯云计算(北京)有限责任公司 | Data processing method, device and storage medium |
CN111062477B (en) * | 2019-12-17 | 2023-12-08 | 腾讯云计算(北京)有限责任公司 | Data processing method, device and storage medium |
CN111861750A (en) * | 2020-07-22 | 2020-10-30 | 北京睿知图远科技有限公司 | Feature derivation system based on decision tree method and readable storage medium |
CN112329809A (en) * | 2020-09-25 | 2021-02-05 | 国网辽宁省电力有限公司大连供电公司 | High-voltage circuit breaker fault diagnosis method based on decision tree algorithm |
CN112489733A (en) * | 2020-12-14 | 2021-03-12 | 郑州轻工业大学 | Octane number loss prediction method based on particle swarm algorithm and neural network |
CN112489733B (en) * | 2020-12-14 | 2023-04-18 | 郑州轻工业大学 | Octane number loss prediction method based on particle swarm algorithm and neural network |
CN116720356A (en) * | 2023-06-08 | 2023-09-08 | 中国汽车工程研究院股份有限公司 | Design method of active safety module of vehicle based on accident damage prediction of cyclist |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109658241A (en) | A kind of screw-thread steel forward price ups and downs probability forecasting method | |
Li et al. | The role of text-extracted investor sentiment in Chinese stock price prediction with the enhancement of deep learning | |
Ouyang et al. | Agricultural commodity futures prices prediction via long-and short-term time series network | |
Moody et al. | Architecture selection strategies for neural networks: Application to corporate bond rating prediction | |
Zhao et al. | Sales forecast in e-commerce using convolutional neural network | |
Pham et al. | Efficient estimation and optimization of building costs using machine learning | |
Nassar et al. | Deep learning based approach for fresh produce market price prediction | |
Sbrana et al. | Short-term inflation forecasting: the META approach | |
Wang et al. | Cryptocurrency price prediction based on multiple market sentiment | |
Xu et al. | An optimized decomposition integration framework for carbon price prediction based on multi-factor two-stage feature dimension reduction | |
He et al. | End-to-end probabilistic forecasting of electricity price via convolutional neural network and label distribution learning | |
Houetohossou et al. | Deep learning methods for biotic and abiotic stresses detection and classification in fruits and vegetables: State of the art and perspectives | |
Lalwani et al. | The cross-section of Indian stock returns: evidence using machine learning | |
CN117807302B (en) | Customer information processing method and device | |
Schosser | Tensor extrapolation: Forecasting large-scale relational data | |
Feng et al. | Predicting book sales trend using deep learning framework | |
Pattewar et al. | Stock prediction analysis by customers opinion in Twitter data using an optimized intelligent model | |
Taherkhani et al. | Intelligent decision support system using nested ensemble approach for customer churn in the hotel industry | |
O'Leary et al. | An evaluation of machine learning approaches for milk volume prediction in Ireland | |
Kang et al. | Predicting Stock Closing Price with Stock Network Public Opinion Based on AdaBoost-IWOA-Elman Model and CEEMDAN Algorithm | |
CN115841345A (en) | Cross-border big data intelligent analysis method, system and storage medium | |
CN115187312A (en) | Customer loss prediction method and system based on deep learning | |
Devyatkin et al. | Neural networks for food export gain forecasting | |
CN111091410B (en) | Node embedding and user behavior characteristic combined net point sales prediction method | |
Kangane et al. | Analysis of different regression models for real estate price prediction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190419 |
|
RJ01 | Rejection of invention patent application after publication |