CN110824586B - Rainfall prediction method based on improved decision tree algorithm - Google Patents

Rainfall prediction method based on improved decision tree algorithm Download PDF

Info

Publication number
CN110824586B
CN110824586B CN201911012069.5A CN201911012069A CN110824586B CN 110824586 B CN110824586 B CN 110824586B CN 201911012069 A CN201911012069 A CN 201911012069A CN 110824586 B CN110824586 B CN 110824586B
Authority
CN
China
Prior art keywords
training
decision tree
data
network
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911012069.5A
Other languages
Chinese (zh)
Other versions
CN110824586A (en
Inventor
常敏
陈果
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Science And Technology Assets Management Co ltd
Original Assignee
University of Shanghai for Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Shanghai for Science and Technology filed Critical University of Shanghai for Science and Technology
Priority to CN201911012069.5A priority Critical patent/CN110824586B/en
Publication of CN110824586A publication Critical patent/CN110824586A/en
Application granted granted Critical
Publication of CN110824586B publication Critical patent/CN110824586B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01WMETEOROLOGY
    • G01W1/00Meteorology
    • G01W1/10Devices for predicting weather conditions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Environmental & Geological Engineering (AREA)
  • Atmospheric Sciences (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Ecology (AREA)
  • Environmental Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a rainfall prediction method based on an improved decision tree algorithm, which collects meteorological data of all regions for years and corresponding rainfall level data; carrying out normalization processing on the obtained meteorological data to obtain a corresponding normalized data set, and dividing the normalized data set into a training set and a test set in proportion; carrying out training by bringing the training set into the improved decision tree network, carrying out training by putting the test set into the trained decision tree model, checking the training result, inputting the data to be tested into the trained decision tree network for prediction, outputting the result, and evaluating the precipitation grade of the data; the decision tree is taken as a core, the independent variable average influence value is adopted for selection, namely, the selection is carried out according to the influence of the attribute on the result, the attribute with the largest influence degree is selected for branching, and the decision tree algorithm is improved for training, so that mass data is fully utilized, the accuracy of prediction is improved, and the problems of erroneous judgment and missed judgment are reduced.

Description

Rainfall prediction method based on improved decision tree algorithm
Technical Field
The invention relates to a data mining technology, in particular to a rainfall prediction method based on an improved decision tree algorithm.
Background
With the development of social economy and the continuous improvement of the requirements of human beings on weather services, weather data acquisition channels in the weather field are increasingly abundant, the data scale is continuously increased, and the weather data acquisition channels have spatial attributes, high dimensionality and instability, so that the great difficulty is increased for researching the traditional weather forecasting mode, and particularly the internal relation among all weather elements is researched, so that the weather data acquisition channels are particularly weak, a large amount of acquired weather data are not effectively utilized, and the weather data acquisition channels do not have a substantial effect on promoting the weather mode forecasting development. The internal mutual influence conditions of the weather system are complicated, the implicit value of the weather system cannot be found in the traditional weather research mode when a large amount of collected data are analyzed and processed, the data mining technology provides a new way for researching a large amount of weather data, plays an important role in finding various attribute relations in the weather field, and the classification mining technology can improve the accuracy of the weather forecasting mode by researching the potential rules in historical weather data through supervised learning.
Disclosure of Invention
The invention provides a rainfall prediction method based on an improved decision tree algorithm aiming at the problems of high dimensionality and instability caused by the continuous increase of the data scale.
The technical scheme of the invention is as follows: a rainfall prediction method based on an improved decision tree algorithm specifically comprises the following steps:
1) collecting meteorological data of various places of years and corresponding precipitation grade data;
2) carrying out normalization processing on the obtained meteorological data to obtain a corresponding normalized data set, and dividing the normalized data set into a training set and a test set in proportion;
3) carrying out training by bringing the training set into the improved decision tree network, carrying out training by putting the test set into the trained decision tree model, checking the training result, inputting the data to be tested into the trained decision tree network for prediction, outputting the result, and evaluating the precipitation grade of the data;
the improved decision tree network selects the attribute by using the information entropy in the original decision tree algorithm, selects the maximum attribute for branching, modifies the selection to be performed by using the independent variable average influence value, namely selects according to the influence of the attribute on the result, selects the attribute with the maximum influence degree for branching, and comprises the following improvement steps:
3.1) firstly, all the normalized training sets M are brought into a BP network for training, after the training of the BP network is terminated, each independent variable characteristic in the training sets M is respectively increased by 10 percent and decreased by 10 percent on the basis of the original value to form two new training samples M1And M2
3.2) then, adding M1And M2Respectively used as simulation samples to carry out simulation by utilizing the established BP network to obtain two simulation results A1And A2Obtaining A1And A2The difference value of (a) is an influence change value on the output after the independent variable is changed;
3.3) finally averaging the influence change values according to the number of observation cases to obtain a result value of the independent variable on BP network output, and recording the result value as an MIV value;
and 3.4) sequentially calculating the MIV value of each independent variable according to the steps, finally sequencing the independent variables according to the MIV absolute value to obtain a bit-order table of relative importance of the influence of each variable on the BP network output, thereby judging the influence degree of the input characteristics on the BP network result, and then selecting the attribute with the maximum influence degree to branch.
The invention has the beneficial effects that: the rainfall prediction method based on the improved decision tree algorithm takes the decision tree as a core, and the training is carried out by improving the decision tree algorithm, so that the prediction accuracy is improved, and the problems of erroneous judgment and missed judgment are reduced.
Drawings
FIG. 1 is a flow chart of a precipitation prediction method based on an improved decision tree algorithm according to the present invention;
FIG. 2 is a flow chart of the improved decision tree algorithm of the present invention.
Detailed Description
The rainfall prediction method based on the improved decision tree algorithm is shown in fig. 1 and comprises the following steps:
1. the meteorological data of each place in 2001-2011 and the corresponding precipitation level data are collected and arranged to obtain a data set containing the meteorological data of each place and the corresponding precipitation level.
The collected data should include attributes such as maximum wind speed, average air pressure, daily maximum air pressure, daily minimum air pressure, average relative humidity, minimum relative humidity, evaporation capacity, average air temperature, daily maximum air temperature, daily minimum air temperature, hours of sunshine, and precipitation level.
2. And carrying out normalization processing on the obtained original data to obtain a corresponding normalized data set. For the normalized data set, as 17: and 3, dividing the ratio into a training set and a testing set.
And (4) normalization process, namely, mapping the original data into a [0,1] interval by adopting [0,1] normalization.
3. And carrying the training set into the improved decision tree network for training. And substituting the test set into the trained decision tree model, checking the training result, inputting the data to be tested into the trained decision tree network for prediction, outputting the result, and evaluating the precipitation level of the data.
3.1, improving the decision tree algorithm:
the selection of attributes is performed by using information entropy in an ID3 algorithm (ID3 algorithm is a greedy algorithm used for constructing a decision tree), and the formula is as follows:
Figure BDA0002244501380000031
in the above formula, c represents the number of attributes that a data sample has, PiThe ratio of the ith attribute sample number in the c attributes is shown, and the attribute i with the largest E(s) is selected for branching.
3.2, the method is improved, and in order to change the original attribute selection mode, the method is used for improving the defect that the original information entropy selection mode can lead to the bias selection of attributes with a large number of values when the attributes are selected. Here we use the independent variable mean influence value to select, i.e. according to the influence of the attribute on the result, as shown in fig. 2, the steps are as follows:
3.2.1, firstly, all the training samples M after normalization are brought into the BP network for training, and after the training of the BP network is terminated. Respectively increasing 10% and decreasing 10% of each independent variable characteristic in the training sample M on the basis of the original value to form two new training samples M1And M2
3.2.2, mixing M1And M2Respectively used as simulation samples to carry out simulation by utilizing the established BP network to obtain two simulation results A1And A2Obtaining A1And A2The difference value of (1) is an influence change value generated on the output after the independent variable is changed, and finally, the influence change value is averaged according to the number of observation instances to obtain a result value of the independent variable on the BP network output, and the result value is recorded as an MIV value.
3.2.3, calculating the MIV value of each independent variable in sequence according to the steps, sequencing the independent variables according to the MIV absolute value to obtain a bit-order table of relative importance of the influence of each variable on the BP network output, judging the influence degree of the input characteristics on the BP network result, and then selecting the attribute with the maximum influence degree to branch.

Claims (1)

1. A rainfall prediction method based on an improved decision tree algorithm is characterized by comprising the following steps:
1) collecting meteorological data of various places of years and corresponding precipitation grade data;
2) carrying out normalization processing on the obtained meteorological data to obtain a corresponding normalized data set, and dividing the normalized data set into a training set and a test set in proportion; a normalization process, namely adopting [0,1] normalization, namely mapping the original data into a [0,1] interval;
3) carrying out training by bringing the training set into the improved decision tree network, carrying out training by putting the test set into the trained decision tree model, checking the training result, inputting the data to be tested into the trained decision tree network for prediction, outputting the result, and evaluating the precipitation grade of the data;
the improved decision tree network selects the attribute by using the information entropy in the original decision tree algorithm, selects the maximum attribute for branching, modifies the maximum attribute into the independent variable average influence value for selection, namely selects according to the influence of the attribute on the result, selects the attribute with the maximum influence degree for branching, and comprises the following improvement steps:
3.1) firstly, all the normalized training sets M are brought into a BP network for training, after the training of the BP network is terminated, each independent variable characteristic in the training sets M is respectively increased by 10 percent and decreased by 10 percent on the basis of the original value to form two new training samples M1And M2
3.2) then, adding M1And M2Respectively used as simulation samples, and are simulated by using a BP network to obtain two simulation results A1And A2Obtaining A1And A2The difference value of (a) is an influence change value on the output after the independent variable is changed;
3.3) finally averaging the influence change values according to the number of observation cases to obtain a result value of the independent variable on BP network output, and recording the result value as an MIV value;
and 3.4) sequentially calculating the MIV value of each independent variable according to the steps, finally sequencing each independent variable according to the MIV absolute value to obtain a bit order table of relative importance of each independent variable on BP network output, thereby judging the influence degree of input characteristics on BP network results, and then selecting the attribute with the maximum influence degree to carry out branching.
CN201911012069.5A 2019-10-23 2019-10-23 Rainfall prediction method based on improved decision tree algorithm Active CN110824586B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911012069.5A CN110824586B (en) 2019-10-23 2019-10-23 Rainfall prediction method based on improved decision tree algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911012069.5A CN110824586B (en) 2019-10-23 2019-10-23 Rainfall prediction method based on improved decision tree algorithm

Publications (2)

Publication Number Publication Date
CN110824586A CN110824586A (en) 2020-02-21
CN110824586B true CN110824586B (en) 2021-11-19

Family

ID=69550243

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911012069.5A Active CN110824586B (en) 2019-10-23 2019-10-23 Rainfall prediction method based on improved decision tree algorithm

Country Status (1)

Country Link
CN (1) CN110824586B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111624681A (en) * 2020-05-26 2020-09-04 杨祺铭 Hurricane intensity change prediction method based on data mining
CN111832828B (en) * 2020-07-17 2023-12-19 国家卫星气象中心(国家空间天气监测预警中心) Intelligent precipitation prediction method based on wind cloud No. four meteorological satellites
CN112926664B (en) * 2021-03-01 2023-11-24 南京信息工程大学 Feature selection and CART forest short-time strong precipitation prediction method based on evolutionary algorithm
CN114397814A (en) * 2021-12-06 2022-04-26 中国电建集团贵州电力设计研究院有限公司 Thermal power generating unit optimal operation parameter searching method based on BP neural network
CN114545528B (en) * 2022-03-09 2024-02-06 北京墨迹风云科技股份有限公司 Machine learning-based correction method and device after meteorological numerical mode element forecast

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101609986B (en) * 2008-06-20 2012-11-21 上海申瑞电力科技股份有限公司 Multilevel joint coordination automatic voltage control method based on decision trees
CN101752866A (en) * 2008-12-10 2010-06-23 上海申瑞电力科技股份有限公司 Automatic heavy-load equipment early warning implementation method based on decision tree
US9188453B2 (en) * 2013-03-07 2015-11-17 Sas Institute Inc. Constrained service restoration with heuristics
CN109447325A (en) * 2018-09-30 2019-03-08 广州地理研究所 Precipitation data detection method, device and electronic equipment based on random forests algorithm
CN110059713A (en) * 2019-03-07 2019-07-26 中国人民解放军国防科技大学 Precipitation type identification method based on precipitation particle multi-feature parameters
CN109978263B (en) * 2019-03-27 2023-06-09 上海市园林设计研究总院有限公司 Garden water system water level early warning method

Also Published As

Publication number Publication date
CN110824586A (en) 2020-02-21

Similar Documents

Publication Publication Date Title
CN110824586B (en) Rainfall prediction method based on improved decision tree algorithm
CN111722046B (en) Transformer fault diagnosis method based on deep forest model
CN112101480A (en) Multivariate clustering and fused time sequence combined prediction method
CN106600037B (en) Multi-parameter auxiliary load prediction method based on principal component analysis
CN110990784B (en) Cigarette ventilation rate prediction method based on gradient lifting regression tree
CN113435707B (en) Soil testing formula fertilization method based on deep learning and weighting multi-factor evaluation
CN112270129B (en) Plant growth prediction method based on big data analysis
CN111369057A (en) Air quality prediction optimization method and system based on deep learning
CN106874950A (en) A kind of method for identifying and classifying of transient power quality recorder data
CN113076920B (en) Intelligent fault diagnosis method based on asymmetric domain confrontation self-adaptive model
CN111126511A (en) Vegetation index fusion-based LAI quantitative model establishment method
CN116699096B (en) Water quality detection method and system based on deep learning
CN109214591B (en) Method and system for predicting aboveground biomass of woody plant
CN114662790A (en) Sea cucumber culture water temperature prediction method based on multi-dimensional data
CN114217025B (en) Analysis method for evaluating influence of meteorological data on air quality concentration prediction
CN114048682B (en) Rolling bearing acoustic emission intelligent diagnosis method based on fusion of optimized wavelet basis and multidimensional depth characteristics
CN109409644A (en) A kind of student performance analysis method based on improved C4.5 algorithm
CN116796403A (en) Building energy saving method based on comprehensive energy consumption prediction of commercial building
CN118332521A (en) Crust deformation time sequence simulation method based on particle swarm optimization random forest
CN107808245A (en) Based on the network scheduler system for improving traditional decision-tree
CN115358636B (en) Gasification furnace operation state evaluation method and system based on industrial big data
CN116401962A (en) Method for pushing optimal characteristic scheme of water quality model
CN105139025A (en) Non-linear analysis method based online intelligent identification method for flow pattern of gas-solid fluidized bed
CN104239698B (en) Solid propellant rocket vibrates the time series modification method of distorted signal
CN117522950B (en) Geometric parameter measurement method for plant stem growth based on machine vision

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20231228

Address after: Room 109, office building 2, No. 516, Jungong Road, Yangpu District, Shanghai 200093

Patentee after: Shanghai science and technology assets management Co.,Ltd.

Address before: 200093 No. 516, military road, Shanghai, Yangpu District

Patentee before: University of Shanghai for Science and Technology