CN112132321A - Method for predicting and analyzing forest fire based on machine learning - Google Patents
Method for predicting and analyzing forest fire based on machine learning Download PDFInfo
- Publication number
- CN112132321A CN112132321A CN202010865182.4A CN202010865182A CN112132321A CN 112132321 A CN112132321 A CN 112132321A CN 202010865182 A CN202010865182 A CN 202010865182A CN 112132321 A CN112132321 A CN 112132321A
- Authority
- CN
- China
- Prior art keywords
- data
- forest
- fire
- adopting
- machine learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
- G06F18/24143—Distances to neighbourhood prototypes, e.g. restricted Coulomb energy networks [RCEN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
- G06F18/24155—Bayesian classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Software Systems (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Strategic Management (AREA)
- General Health & Medical Sciences (AREA)
- Economics (AREA)
- Computational Linguistics (AREA)
- Marketing (AREA)
- Entrepreneurship & Innovation (AREA)
- General Business, Economics & Management (AREA)
- Quality & Reliability (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Operations Research (AREA)
- Tourism & Hospitality (AREA)
- Molecular Biology (AREA)
- Development Economics (AREA)
- Game Theory and Decision Science (AREA)
- Probability & Statistics with Applications (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Medical Informatics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a forest fire prediction analysis method based on machine learning, relates to the field of prediction analysis, adopts various machine learning algorithms, predicts forest fire probability through big data analysis, and effectively avoids the problems of over-strong subjectivity, inconsistent evaluation standards and large evaluation result difference of the traditional evaluation method.
Description
Technical Field
The invention relates to the field of predictive analysis, in particular to the field of predictive analysis of forest fires based on machine learning.
Background
At present, a semi-quantitative method is mainly adopted for the research of fire risk evaluation. For example, evaluation indexes, index weights and scores of the evaluation indexes are often determined according to expert experiences in a fuzzy comprehensive evaluation method, an index method and a matter element analysis method, and the evaluation mode mainly takes linearity as a main mode and depends heavily on subjective initiative and experience knowledge of individuals; qualitative evaluation methods such as safety checklists and pre-risk analysis lack clear measurable evaluation criteria; quantitative evaluation methods such as accident trees, for example, rely on expert judgment as well as on the probability of each event occurring. The evaluation theory of the risk of various forest fires is not mature enough, the evaluation standards are not uniform, and obvious subjectivity exists.
Moreover, even with existing prediction methods, the atmospheric environment and the flammability of vegetation are not taken into account, resulting in large deviations in the prediction results.
Disclosure of Invention
The invention aims to solve the technical problem of overcoming the defects in the prior art and provides a forest fire prediction analysis system based on machine learning.
The invention is realized by the following technical scheme:
the method adopts various machine learning algorithms and predicts the forest fire probability through big data analysis, and comprises the following steps:
data preprocessing: the data preprocessing comprises data acquisition and data processing;
data acquisition: the acquired data comprises two parts. The forecasting independent variable is composed of data of a plurality of parts, such as forest information condition, fire fighting facility condition, local geographic biological condition, weather condition, rainwater condition and ambient environment condition; the dependent variable to be predicted consists of fire information of historical fire information of a fire department.
Data processing: cleaning original data and removing repeated redundant data; and encoding non-numerical data in the original data.
For the fixed type data, adopting One-Hot coding for forest structure type, forest block use and forest combustible material type, converting the fixed type data into vector data which can be processed and identified by a computer,
for short text data, adopting One-Hot coding to the short text data of fire-fighting hidden danger and reporting information in the history record, adopting Word2vec to process the association among text vocabularies, and converting the association into dense Word vectors;
and for the long text data, generating a corresponding vector by adopting an LDA topic model so as to be used for subsequent processing.
Dimension reduction and feature selection:
selecting attributes closely related to fire occurrence by adopting a Relief characteristic selection method, deleting attribute variables with variance lower than a threshold value, then adopting a deep belief network to perform dimensionality reduction treatment,
model training:
four algorithms of k nearest neighbor, naive Bayes, random forests and AdaBoost are adopted, 10-fold cross validation is carried out on data, accuracy is used as weight, and weighted average values of prediction results of classifiers with different algorithms are used as final prediction results of the algorithms. The voting strategy of the k-nearest neighbor algorithm adopts a weighting method, namely the voting weights of all neighbor nodes are in inverse proportion to the distance, and the distinguishability is increased; and a KDTree algorithm is adopted in the search strategy, so that the search speed is accelerated. And (4) carrying out normalization processing on the data, and finally giving the fire probability by adopting the Euclidean distance as a distance definition mode.
In the random forest, setting the number n of all attributes, randomly selecting an attribute subset each time, taking log2n as the number of the attributes in the subset, training by adopting a small data volume sample, and selecting an optimal parameter so as to determine the maximum depth and the number of decision trees of each decision tree.
The AdaBoost model is trained in small data volume samples and selects the optimal number of individual classifiers and learning rate.
And (3) evaluating a model:
the model evaluation employs error rate, accuracy and cost sensitive error rate.
The error rate is the ratio of the number of samples with classified errors to the total number of samples, and is defined as
The accuracy is the proportion of the number of correctly classified samples to the total number of samples,
word2vec adopts CBOW to generate Word vectors of short texts, and the average value of the Word vectors of the texts is adopted to represent short text variables.
The naive Bayes model employs a Gaussian Bayes classifier.
The invention has the beneficial effects that: the problems of over-strong subjectivity, inconsistent evaluation standards and large evaluation result difference of the traditional evaluation method are effectively solved, corresponding accumulated data and environmental conditions are added into a prediction system, and a scientific data processing means is adopted, so that effective guarantee is provided for prediction of forest fires.
Drawings
FIG. 1 shows a model flow diagram according to an embodiment of the invention.
Detailed Description
In order to make the technical solutions of the present invention better understood by those skilled in the art, the present invention will be further described in detail with reference to the accompanying drawings and preferred embodiments.
As shown in the figure, the method for predicting and analyzing the forest fire based on the machine learning adopts various machine learning algorithms and predicts the forest fire probability through big data analysis, and comprises the following steps:
data preprocessing: the data preprocessing comprises data acquisition and data processing;
data acquisition: the acquired data comprises two parts. The forecasting independent variable is composed of data of a plurality of parts, such as forest information condition, fire fighting facility condition, local geographic biological condition, weather condition, rainwater condition and ambient environment condition; the dependent variable to be predicted consists of fire information of historical fire information of a fire department.
Data processing: cleaning original data and removing repeated redundant data; and encoding non-numerical data in the original data.
For the fixed type data, adopting One-Hot coding for forest structure type, forest block use and forest combustible material type, converting the fixed type data into vector data which can be processed and identified by a computer,
for short text data, adopting One-Hot coding to the short text data of fire-fighting hidden danger and reporting information in the history record, adopting Word2vec to process the association among text vocabularies, and converting the association into dense Word vectors;
and for the long text data, generating a corresponding vector by adopting an LDA topic model so as to be used for subsequent processing.
Dimension reduction and feature selection:
selecting attributes closely related to fire occurrence by adopting a Relief characteristic selection method, deleting attribute variables with variance lower than a threshold value, then adopting a deep belief network to perform dimensionality reduction treatment,
model training:
four algorithms of k nearest neighbor, naive Bayes, random forests and AdaBoost are adopted, 10-fold cross validation is carried out on data, accuracy is used as weight, and weighted average values of prediction results of classifiers with different algorithms are used as final prediction results of the algorithms. The voting strategy of the k-nearest neighbor algorithm adopts a weighting method, namely the voting weights of all neighbor nodes are in inverse proportion to the distance, and the distinguishability is increased; and a KDTree algorithm is adopted in the search strategy, so that the search speed is accelerated. And (4) carrying out normalization processing on the data, and finally giving the fire probability by adopting the Euclidean distance as a distance definition mode.
In the random forest, setting the number n of all attributes, randomly selecting an attribute subset each time, taking log2n as the number of the attributes in the subset, training by adopting a small data volume sample, and selecting an optimal parameter so as to determine the maximum depth and the number of decision trees of each decision tree.
The AdaBoost model is trained in small data volume samples and selects the optimal number of individual classifiers and learning rate.
And (3) evaluating a model:
the model evaluation employs error rate, accuracy and cost sensitive error rate.
The error rate is the ratio of the number of samples with classified errors to the total number of samples, and is defined as
The accuracy is the proportion of the number of correctly classified samples to the total number of samples,
word2vec adopts CBOW to generate Word vectors of short texts, and the average value of the Word vectors of the texts is adopted to represent short text variables. The naive bayes model employs a gaussian bayes classifier.
The invention has the beneficial effects that: and predicting the occurrence probability of the forest fire by adopting a machine learning method, and establishing a forest quantitative fire risk assessment system. On the basis of processing data by a one-hot code, Word2vec and LDA topic model, adopting a deep confidence network to reduce dimension, further adopting a Gaussian Bayes classifier, a k-nearest neighbor algorithm, a random forest and an Ada Boost algorithm, respectively constructing classifiers, and taking classification accuracy as a weight. The problems of over-strong subjectivity, inconsistent evaluation standards and large evaluation result difference of the traditional evaluation method are effectively avoided.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.
Claims (3)
1. A method for predicting and analyzing forest fires based on machine learning adopts various machine learning algorithms and predicts forest fire probabilities through big data analysis, and comprises the following steps:
data preprocessing: the data preprocessing comprises data acquisition and data processing;
data acquisition: the acquired data comprises two parts. The forecasting independent variable is composed of data of a plurality of parts, such as forest information condition, fire fighting facility condition, local geographic biological condition, weather condition, rainwater condition and ambient environment condition; the dependent variable to be predicted consists of fire information of historical fire information of a fire department.
Data processing: cleaning original data and removing repeated redundant data; and encoding non-numerical data in the original data.
For the fixed type data, adopting One-Hot coding for forest structure type, forest block use and forest combustible material type, converting the fixed type data into vector data which can be processed and identified by a computer,
for short text data, adopting One-Hot coding to the short text data of fire-fighting hidden danger and reporting information in the history record, adopting Word2vec to process the association among text vocabularies, and converting the association into dense Word vectors;
and for the long text data, generating a corresponding vector by adopting an LDA topic model so as to be used for subsequent processing.
Dimension reduction and feature selection:
selecting attributes closely related to fire occurrence by adopting a Relief characteristic selection method, deleting attribute variables with variance lower than a threshold value, then adopting a deep belief network to perform dimensionality reduction treatment,
model training:
four algorithms of k nearest neighbor, naive Bayes, random forests and AdaBoost are adopted, 10-fold cross validation is carried out on data, accuracy is used as weight, and weighted average values of prediction results of classifiers with different algorithms are used as final prediction results of the algorithms. The voting strategy of the k-nearest neighbor algorithm adopts a weighting method, namely the voting weights of all neighbor nodes are in inverse proportion to the distance, and the distinguishability is increased; and a KDTree algorithm is adopted in the search strategy, so that the search speed is accelerated. And (4) carrying out normalization processing on the data, and finally giving the fire probability by adopting the Euclidean distance as a distance definition mode.
In the random forest, setting the number n of all attributes, randomly selecting an attribute subset each time, taking log2n as the number of the attributes in the subset, training by adopting a small data volume sample, and selecting an optimal parameter so as to determine the maximum depth and the number of decision trees of each decision tree.
The AdaBoost model is trained in small data volume samples and selects the optimal number of individual classifiers and learning rate.
And (3) evaluating a model:
the model evaluation employs error rate, accuracy and cost sensitive error rate.
The error rate is the ratio of the number of samples with classified errors to the total number of samples, and is defined as
The accuracy is the proportion of the number of correctly classified samples to the total number of samples,
2. the method for forest fire prediction analysis based on machine learning according to claim 1, wherein Word2vec uses CBOW to generate Word vectors of short text, and the average value of the Word vectors of text is used to represent short text variables.
3. The method for predictive analysis of forest fires based on machine learning of claim 1, said naive bayes model employing a gaussian bayes classifier.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010865182.4A CN112132321A (en) | 2020-08-25 | 2020-08-25 | Method for predicting and analyzing forest fire based on machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010865182.4A CN112132321A (en) | 2020-08-25 | 2020-08-25 | Method for predicting and analyzing forest fire based on machine learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112132321A true CN112132321A (en) | 2020-12-25 |
Family
ID=73848942
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010865182.4A Pending CN112132321A (en) | 2020-08-25 | 2020-08-25 | Method for predicting and analyzing forest fire based on machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112132321A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112766133A (en) * | 2021-01-14 | 2021-05-07 | 金陵科技学院 | Automatic driving deviation processing method based on Relieff-DBN |
CN113591873A (en) * | 2021-05-26 | 2021-11-02 | 东南大学 | Flame image classification method based on ensemble learning |
CN113762337A (en) * | 2021-07-29 | 2021-12-07 | 国网河北省电力有限公司经济技术研究院 | Initial fire determination method, device, terminal and storage medium |
CN117035197A (en) * | 2023-08-25 | 2023-11-10 | 成都理工大学 | Intelligent lost circulation prediction method with minimized cost |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107085904A (en) * | 2017-03-31 | 2017-08-22 | 上海事凡物联网科技有限公司 | Forest fire danger class decision method and system based on single classification SVM |
CN108921330A (en) * | 2018-06-08 | 2018-11-30 | 新疆林科院森林生态研究所 | A kind of forest management system |
CN110956187A (en) * | 2019-11-28 | 2020-04-03 | 中国农业科学院农业信息研究所 | Unmanned aerial vehicle image plant canopy information extraction method based on ensemble learning |
US20200242202A1 (en) * | 2019-01-29 | 2020-07-30 | Shenzhen Fugui Precision Ind. Co., Ltd. | Fire development situation prediction device and method |
-
2020
- 2020-08-25 CN CN202010865182.4A patent/CN112132321A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107085904A (en) * | 2017-03-31 | 2017-08-22 | 上海事凡物联网科技有限公司 | Forest fire danger class decision method and system based on single classification SVM |
CN108921330A (en) * | 2018-06-08 | 2018-11-30 | 新疆林科院森林生态研究所 | A kind of forest management system |
US20200242202A1 (en) * | 2019-01-29 | 2020-07-30 | Shenzhen Fugui Precision Ind. Co., Ltd. | Fire development situation prediction device and method |
CN110956187A (en) * | 2019-11-28 | 2020-04-03 | 中国农业科学院农业信息研究所 | Unmanned aerial vehicle image plant canopy information extraction method based on ensemble learning |
Non-Patent Citations (1)
Title |
---|
孙立研等: "基于气象因子深度学习的森林火灾预测方法", 《林业工程学报》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112766133A (en) * | 2021-01-14 | 2021-05-07 | 金陵科技学院 | Automatic driving deviation processing method based on Relieff-DBN |
CN113591873A (en) * | 2021-05-26 | 2021-11-02 | 东南大学 | Flame image classification method based on ensemble learning |
CN113762337A (en) * | 2021-07-29 | 2021-12-07 | 国网河北省电力有限公司经济技术研究院 | Initial fire determination method, device, terminal and storage medium |
CN117035197A (en) * | 2023-08-25 | 2023-11-10 | 成都理工大学 | Intelligent lost circulation prediction method with minimized cost |
CN117035197B (en) * | 2023-08-25 | 2024-06-04 | 成都理工大学 | Intelligent lost circulation prediction method with minimized cost |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112132321A (en) | Method for predicting and analyzing forest fire based on machine learning | |
CN110084151B (en) | Video abnormal behavior discrimination method based on non-local network deep learning | |
CN111967343B (en) | Detection method based on fusion of simple neural network and extreme gradient lifting model | |
CN111708343B (en) | Method for detecting abnormal behavior of field process behavior in manufacturing industry | |
CN112735097A (en) | Regional landslide early warning method and system | |
CN112231562A (en) | Network rumor identification method and system | |
CN112039903B (en) | Network security situation assessment method based on deep self-coding neural network model | |
CN111556016B (en) | Network flow abnormal behavior identification method based on automatic encoder | |
CN112131352A (en) | Method and system for detecting bad information of webpage text type | |
CN111859010B (en) | Semi-supervised audio event identification method based on depth mutual information maximization | |
CN110008699B (en) | Software vulnerability detection method and device based on neural network | |
CN112529638B (en) | Service demand dynamic prediction method and system based on user classification and deep learning | |
CN111641608A (en) | Abnormal user identification method and device, electronic equipment and storage medium | |
CN116307103A (en) | Traffic accident prediction method based on hard parameter sharing multitask learning | |
CN112329974B (en) | LSTM-RNN-based civil aviation security event behavior subject identification and prediction method and system | |
CN112395168A (en) | Stacking-based edge side service behavior identification method | |
CN115004652A (en) | Business wind control processing method and device, electronic equipment and storage medium | |
CN113435124A (en) | Water quality space-time correlation prediction method based on long-time and short-time memory and radial basis function neural network | |
CN115438102A (en) | Space-time data anomaly identification method and device and electronic equipment | |
CN115659244A (en) | Fault prediction method, device and storage medium | |
CN114881173A (en) | Resume classification method and device based on self-attention mechanism | |
CN113609480B (en) | Multipath learning intrusion detection method based on large-scale network flow | |
CN116629716A (en) | Intelligent interaction system work efficiency analysis method | |
Kim et al. | Anomaly pattern detection in streaming data based on the transformation to multiple binary-valued data streams | |
CN111708865A (en) | Technology forecasting and patent early warning analysis method based on improved XGboost algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201225 |
|
RJ01 | Rejection of invention patent application after publication |