CN109408774B - Method for predicting sewage effluent index based on random forest and gradient lifting tree - Google Patents

Method for predicting sewage effluent index based on random forest and gradient lifting tree Download PDF

Info

Publication number
CN109408774B
CN109408774B CN201811323416.1A CN201811323416A CN109408774B CN 109408774 B CN109408774 B CN 109408774B CN 201811323416 A CN201811323416 A CN 201811323416A CN 109408774 B CN109408774 B CN 109408774B
Authority
CN
China
Prior art keywords
sample
random forest
tree
gradient lifting
regression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811323416.1A
Other languages
Chinese (zh)
Other versions
CN109408774A (en
Inventor
张天麟
高俊波
孙伟
赵友标
孙峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Maritime University
Original Assignee
Shanghai Maritime University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Maritime University filed Critical Shanghai Maritime University
Priority to CN201811323416.1A priority Critical patent/CN109408774B/en
Publication of CN109408774A publication Critical patent/CN109408774A/en
Application granted granted Critical
Publication of CN109408774B publication Critical patent/CN109408774B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A20/00Water conservation; Efficient water supply; Efficient water use
    • Y02A20/152Water filtration

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Mathematical Analysis (AREA)
  • Operations Research (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Optimization (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Strategic Management (AREA)
  • Pure & Applied Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Evolutionary Biology (AREA)
  • Economics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Development Economics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Game Theory and Decision Science (AREA)
  • Algebra (AREA)
  • Probability & Statistics with Applications (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)

Abstract

The invention discloses a method for predicting sewage effluent indexes based on random forests and gradient lifting trees, which comprises the following steps: step 1: extracting samples in a place where the samples are put back in an original data training set to form a plurality of sample sets; and 2, step: constructing a random forest according to the sample; calculating the feature importance according to the random forest, and performing attribute screening; and step 3: constructing a gradient lifting tree model according to a sample formed by the screened attributes; and 4, step 4: and putting the real-time monitoring data into a gradient lifting tree model to predict the sewage outlet index of the sewage plant for a period of time in the future. The invention combines the random forest and the gradient lifting tree model to establish a relation model of the sewage effluent index data, and can more accurately predict the sewage effluent index data in a period of time in the future through the dimensionality reduction of the random forest and the high-precision training of the gradient lifting tree.

Description

Method for predicting sewage effluent index based on random forest and gradient lifting tree
Technical Field
The invention relates to the technical field of sewage treatment and machine learning, in particular to a method for predicting a sewage effluent index based on a random forest and a gradient lifting tree.
Background
The town sewage treatment process is a complex biochemical reaction process accompanied by physicochemical reaction, biochemical reaction, phase change process and material and energy conversion and transmission process, the process is complex, and the traditional mathematical modeling is difficult. Many scholars have studied on solving such problems using neural networks. The problems are solved to a certain extent by predicting the effluent indexes of the sewage based on the neural network, but the problems still have the defects of low training speed and need to be improved in model accuracy. And such studies do not avoid extraneous factors in the reaction process, which negatively impacts the training speed and accuracy of the model.
Disclosure of Invention
The invention aims to provide a method for predicting a sewage effluent index based on a random forest and a gradient lifting tree, which aims to establish a relation model of main sewage effluent index data and sewage water quality index data and obtain the main sewage effluent index data according to real-time monitoring.
In order to achieve the aim, the invention provides a method for predicting a sewage effluent index based on a random forest and a gradient lifting tree, which comprises the following steps:
step 1: extracting samples in a place where the samples are replaced in an original data training set to form a plurality of sample sets;
step 2: constructing a random forest according to the sample; calculating feature importance according to the random forest, and screening attributes;
and 3, step 3: constructing a gradient lifting tree model according to a sample formed by the screened attributes;
and 4, step 4: and putting the real-time monitoring data into a gradient lifting tree model to predict the sewage outlet index of the sewage plant for a period of time in the future.
The method for predicting the effluent index of the sewage based on the random forest and the gradient lifting tree is characterized in that the step 1 further comprises the following steps: randomly extracting samples according to the original training set and putting back to construct a regression tree; the samples that are not drawn each time are grouped into the same number of out-of-bag samples as the regression tree.
The method for predicting the effluent index of the sewage based on the random forest and the gradient lifting tree comprises the following steps in step 2:
step 2.1: traversing possible values under each characteristic attribute, and finally selecting a point with the smallest sum of square errors as a segmentation point;
step 2.2: calculating the sum of squared errors of all attributes, and selecting the attribute with the minimum error as a partition attribute;
step 2.3: constructing a regression tree for each divided sample set;
step 2.4: building a plurality of regression trees into a regression forest;
step 2.5: the formed random forest is trained by using a training set; calculating the feature importance of the random forest by calculating the out-bag error of the out-bag sample;
step 2.6: and sorting the features according to the feature importance, and screening out important features.
The method for predicting the effluent index of the sewage based on the random forest and the gradient lifting tree is characterized in that the step 3 specifically comprises the following steps:
step 3.1: constructing a new training sample by using the sample with the screened characteristics;
step 3.2: each regression tree approximately calculates the loss value of the iterative process by using a negative gradient to determine the optimal parameter of each regression tree; updating the calculated difference value of each regression tree, and putting the updated difference value into the next regression tree;
step 3.3: and accumulating the multiple regression trees to form a gradient lifting tree model.
The method for predicting the effluent index of the sewage based on the random forest and the gradient lifting tree is characterized in that the gradient lifting tree model is as follows:
Figure BDA0001856956830000021
wherein J is the number of leaf nodes, I is whether the value representing c belongs to the jth leaf node, f m (x) Is the predicted value of the final model.
Compared with the prior art, the invention has the following beneficial effects:
in the sewage data link, not only expert opinions are referred to, but also the characteristic attributes suitable for the data can be screened out through the random forest model according to the collected sewage data, so that redundant characteristic attributes are deleted, the characteristic dimension reduction is realized, and the model training speed and the data quality are improved. The method uses a gradient lifting tree model method, which is higher in accuracy than methods such as a support vector machine and a neural network, and can improve the prediction accuracy of sewage; according to the method, a random forest and a gradient lifting tree model are combined to establish a relation model of sewage effluent index data, and the sewage effluent index data in a future period of time can be accurately predicted through dimensionality reduction of the random forest and high-precision training of the gradient lifting tree, so that a sewage plant can predict the sewage effluent index data in the future period of time according to the sewage effluent index data detected in real time, and then the sewage plant can judge that the effluent index of the sewage plant meets the national safety standard according to the predicted sewage effluent index data; furthermore, the sewage plant can control the amount of oxygen discharged into the sewage on the basis that the effluent index of the sewage meets the national safety standard, so that the aim of saving the cost of the plant is fulfilled; in a word, after the sewage plant uses the invention, the purposes of energy saving and emission reduction can be achieved, and the treatment cost of the sewage plant can also be reduced.
Drawings
FIG. 1 is a flow chart of a method for predicting effluent index of sewage based on random forests and gradient spanning trees in accordance with the present invention;
FIG. 2 is a flow chart of the steps of random forest screening attributes in the present invention;
FIG. 3 is a flowchart of the steps of constructing a gradient lifting tree model according to the present invention.
Detailed Description
The invention will be further described by means of specific examples in conjunction with the accompanying drawings, which are provided for illustration only and are not intended to limit the scope of the invention.
The invention provides a method for predicting a sewage effluent index based on a random forest and a gradient lifting tree, which comprises the following steps:
step 1: extracting samples in a place where the samples are replaced in an original data training set to form a plurality of sample sets;
the step 1 further comprises the following steps: randomly extracting samples according to the original training set and the place to be placed back to construct a regression tree; the samples that were not drawn each time were grouped into the same number of out-of-bag samples as the regression tree.
Step 2: constructing a random forest according to the sample; calculating the feature importance according to the random forest, and performing attribute screening;
the step 2 specifically comprises the following steps:
step 2.1: traversing possible values under each characteristic attribute, and finally selecting a point with the smallest sum of square errors as a segmentation point;
step 2.2: calculating the sum of squared errors of all attributes, and selecting the attribute with the minimum error as a partition attribute;
step 2.3: constructing a regression tree for each divided sample set;
step 2.4: building a plurality of regression trees into a regression forest;
step 2.5: the formed random forest is trained by using a training set; calculating the feature importance of the random forest by calculating the out-bag error of the out-bag sample;
step 2.6: and sorting the features according to the feature importance, and screening out important features.
And 3, step 3: constructing a gradient lifting tree model according to a sample formed by the screened attributes;
the step 3 specifically comprises the following steps:
step 3.1: constructing the sample with the screened characteristics into a new training sample;
step 3.2: each regression tree approximately calculates the loss value of the iterative process by using a negative gradient to determine the optimal parameter of each regression tree; updating the calculated difference value of each regression tree, and putting the updated difference value into the next regression tree;
step 3.3: and accumulating the multiple regression trees to form a gradient lifting tree model.
And 4, step 4: and putting the real-time monitoring data into a gradient lifting tree model to predict the sewage outlet index of the sewage plant for a period of time in the future.
The gradient lifting tree model is as follows:
Figure BDA0001856956830000041
wherein J is the number of leaf nodes, I is whether the value representing c belongs to the jth leaf node, f m (x) Is the predicted value of the final model.
In a more specific embodiment, the method for predicting the effluent index of sewage based on random forests and gradient lifting trees comprises the following steps:
randomly sampling samples in a place where the samples are put back in an original training data set to construct a plurality of sample sets; the original training data set refers to data of each sewage index acquired by a sewage plant sensor;
selecting splitting attributes according to the sample set, and forming a regression tree (a prediction target is a continuous variable) for the sample set according to the splitting attributes; forming a random forest by a plurality of regression trees;
putting a sample to be trained into a random forest for training, and obtaining the characteristic importance of training data by the random forest according to the calculated out-of-bag errors;
the sample to be trained is a data set to be put into a random forest for training is randomly extracted from original training data;
as an implementation mode, the method for calculating the importance of the sample feature according to the sample set comprises the following steps: before the describing step, the relationship of the sample and the attribute may be expressed as follows:
X=<x 1 ,x 2 ,x 3 …x n >
where X refers to a sample in the sample set, X 1 ,x 2 ...x n Each attribute of one sample is referred to, and n attributes are assumed to be total;
the attributes in the method refer to: one of various sewage indexes collected by a sewage plant sensor is one of indexes such as aeration value (DO), inlet water PH, inlet water Chemical Oxygen Demand (COD), inlet water Total Phosphorus (TP), inlet water ammonia nitrogen (NH 3N), suspended matter concentration (SS), suspended matter solid concentration (MLSS) and the like;
selecting the attribute with the minimum sum of squared errors in the sample attributes as a splitting attribute;
as an embodiment, the selecting, as the split attribute, an attribute with the smallest sum of squared errors in the training samples specifically includes the following steps:
the predicted value obtained by the random forest is the average value of output data in the subset;
the subset refers to that each tree of the random forest divides a sample to be trained into different subsets, the output data refers to the actual value of the characteristic, required to be predicted, of the sample in each subset, and the predicted value refers to the average value of the actual values of the characteristic, required to be predicted, of the sample in the subset;
traversing all possible values under each characteristic attribute, and finally selecting a segmentation point to ensure that the sum of square errors obtained under the segmentation point is minimum;
wherein, the value is the value of the characteristic attribute;
comparing the sum of squared errors of all attributes, selecting the attribute with the minimum sum of squared errors as an optimal partition attribute, and dividing the subset to be trained into two sub-nodes;
as an embodiment, the calculation formula of the sum of squared errors in the selected training subset is:
Figure BDA0001856956830000051
the regression tree selects a segmentation point to divide the attribute into two parts: c1 C2, yi is a predicted value of the training sample, c1, c2 is an average value of all predicted values in the attribute dividing part, s is that each feature has s values, and j represents one of the values of the features;
as an embodiment, the calculating the feature importance specifically includes the following steps:
for each regression tree in the random forest (regression problem), its out-of-bag sample error, denoted err, is calculated using the corresponding out-of-bag sample 1
Randomly adding noise interference to the characteristics of the sample outside the bag, and calculating the error outside the bag again, and recording the error as err 2
Wherein, the out-of-bag sample refers to sample data which is not used as a training sample;
the formula for calculating the importance of a certain feature is as follows:
Figure BDA0001856956830000052
wherein n is the number of samples of the out-of-bag sample, f is the sum of the out-of-bag sample error and the out-of-bag error added with noise interference, and f is taken as the value of the importance of a certain characteristic;
sorting the features according to the calculated feature importance, and screening out important features;
constructing the sample with the screened characteristics into a new training sample;
putting the training samples into a first regression tree to predict the result of the training samples, and calculating an error value;
wherein, the error value refers to the error between the predicted value and the real value;
putting the error value as an input end into the next regression tree to continue calculating the error value;
iterating m regression trees, and accumulating error values of each iteration to form a gradient lifting tree model;
as an embodiment, the configuration of the regression tree specifically includes the following steps:
recursively constructing a regression tree according to a criterion of a least square error;
wherein the least squares error criterion is determined based on the following equation:
Figure BDA0001856956830000061
according to the establishment mode of the decision tree, selecting the jth dimension feature and the corresponding threshold value s of the sample x at each decision node as a segmentation feature and a segmentation threshold value, dividing the node into two regions, wherein the specific formula is as follows:
R 1 (j,s)={x|x[j]< = s } and R 2 (j,s)={x|x[j]>s}
The method comprises the following steps that a sample x refers to one of samples of a sample to be trained, x [ j ] refers to the value of the jth characteristic of the sample x, and the segmentation characteristic refers to the fact that the x divides the characteristic attribute by setting a segmentation threshold value to select the characteristic j so as to achieve the purpose of dividing the characteristic attribute;
as an embodiment, the calculation formula for dividing the node into two regions is as follows:
Figure BDA0001856956830000062
wherein yi is the real value of the sample, and c1 and c2 are the predicted values of the regression tree;
a regression tree divides the feature space (input space) into M units { R } 1 ,R 2 ,...R M Each leaf node of the regression tree corresponds to a cell, which in turn has a fixed output value C m When the input feature is x, the regression tree judges the input feature to be a leaf node, and the output value C corresponding to the leaf node is used m As an output of the regression tree, the calculation formula of the regression tree is as follows:
Figure BDA0001856956830000063
as an embodiment, the constructing the gradient lifting tree model specifically includes the following steps:
the gradient lifting tree model is formed by iterative addition of M regression trees, and the calculation formula is as follows:
f m (x)=f m-1 (x)+T(x+θ m ),m=1…M,
wherein f is m-1 (x) Is the current lifting tree model, T (x, θ) m ) Is a new regression tree generated, θ m Is the coefficient of the regression tree, this coefficient makes the m regression tree error minimum;
the gradient lifting tree needs to select proper regression tree parameters to minimize the loss function, and the calculation formula is as follows:
Figure BDA0001856956830000071
wherein, y i Is the true value of the current m-th tree, f m-1 (x i )+T(x i ;θ m ) The current m-tree prediction value is obtained by adding the current m-tree predicted value and the accumulated error value calculated by the previous m-1 trees, the formula aims to determine a parameter theta which minimizes the loss function L, theta is a parameter of the current m-tree, the formula is different from the previous formula in that the formula is used for obtaining a parameter of the optimal regression tree so that the loss function error is minimized, and the previous similar formula is used for describing the gradient lifting tree addition model.
An approximation of the round of losses in the iterative process is fitted according to the negative gradient of the loss function, thereby determining the parameter θ that minimizes the loss function, by using a squared loss function, as represented by the following equation:
L(y i ,f m-1 (x i )+T(x i ;θ m ))=[y i -f m-1 (x i )-T(x i ,θ m )] 2
wherein, the loss function L is L in the calculation formula for determining the proper regression tree parameter;
the expression for the formula for approximating the loss value of the computational iteration using a negative gradient is as follows:
L(y i ,f m-1 (x i )+T(x i ;θ m ))=[y i -f m-1 (x i )-T(x i ,θ m )] 2 =[r m,i -T(x i ,θ m )] 2
Figure BDA0001856956830000072
where is the L loss function (in this model)The mean square error is used as a loss function), f (x) i ) A predicted value r obtained by training for one training sample in the samples to be trained m,i For the negative gradient calculation formula, finally determining a parameter theta which minimizes the loss function by using the above formula;
training each regression tree according to the method for determining the regression tree parameters in the previous step, and finally finishing the training when the mth tree is iteratively trained;
in training the mth regression tree, the output of the leaf nodes may be represented as follows:
Figure BDA0001856956830000081
wherein c is m,j The predicted value is accumulated with the predicted values obtained from previous m-1 trees to finally obtain the predicted values obtained from previous m-tree training, so that the loss function reaches the minimum, and the previous m regression trees are trained completely;
the model expression of the final gradient lifting tree is as follows:
Figure BDA0001856956830000082
wherein J is the number of leaf nodes, I is whether the value representing c belongs to the jth leaf node, f m (x) Is the predicted value of the final model.
And finally, putting the data monitored by the sewage plant in real time into a gradient lifting tree model to predict the sewage outlet water index of the sewage plant for a period of time in the future.
Obviously, the method for predicting the effluent index of the sewage based on the random forest and the gradient lifting tree, which is provided by the invention, can also be designed into a system based on the combination of the random forest and the gradient lifting tree, and comprises a sample construction module, a random forest training module, a data screening module, a gradient lifting construction module, a gradient lifting tree training module and a prediction module;
the sample construction module is used for randomly extracting samples in a place-back mode in the original training set and constructing a plurality of sample sets so as to facilitate the training of a subsequent random forest model;
the random forest module is used for integrating and constructing a random forest according to the established multiple regression trees;
the random forest training module is used for training the constructed sample set by the random forest so as to obtain the importance of each characteristic attribute;
the data screening module is used for sequencing the importance of each feature attribute calculated by the random forest training module, deleting the features with low importance and finally keeping the important features;
the gradient lifting tree construction module is used for constructing a gradient lifting tree model by iterating all regression trees;
the gradient lifting tree training module is used for training a training set constructed by the characteristics obtained by screening in advance through the constructed gradient lifting tree model;
the prediction module is used for obtaining prediction data, putting the prediction data into a gradient lifting tree for prediction after screening characteristics, and obtaining a prediction result of the sewage effluent index data in a future period of time by adopting an error accumulation mode according to prediction collection.
In conclusion, the invention screens out important factors by using random forests to further improve the model training efficiency and accuracy, and predicts by using a gradient lifting tree model with higher precision than a neural network, so that a sewage plant predicts whether the sewage quality in a future period is within the national discharge standard. Finally, the aeration value input by the factory can be controlled to achieve the purposes of saving the operation cost of the factory and green and safe emission.
While the present invention has been described in detail with reference to the preferred embodiments, it should be understood that the above description should not be taken as limiting the invention. Various modifications and alterations to this invention will become apparent to those skilled in the art upon reading the foregoing description. Accordingly, the scope of the invention should be limited only by the attached claims.

Claims (2)

1. A method for predicting sewage effluent indexes based on random forests and gradient lifting trees is characterized by comprising the following steps:
step 1: extracting samples in a place where the samples are put back in an original data training set to form a plurality of sample sets;
step 2: constructing a random forest according to the sample; calculating the feature importance according to the random forest, performing attribute screening,
step 2.1: traversing possible values under each characteristic attribute, and finally selecting a point with the minimum sum of square errors as a segmentation point;
step 2.2: calculating the sum of squared errors of all attributes, and selecting the attribute with the minimum error as a partition attribute;
step 2.3: constructing a regression tree for each divided sample set;
step 2.4: building a plurality of regression trees into a regression forest;
step 2.5: the formed random forest is trained by using a training set; calculating the feature importance of the random forest by calculating the out-bag error of the out-bag sample;
step 2.6: sorting the features according to the feature importance, and screening out important features; and 3, step 3: constructing a gradient lifting tree model according to a sample formed by the screened attributes,
step 3.1: constructing the sample with the screened characteristics into a new training sample;
step 3.2: each regression tree approximately calculates the loss value of the iterative process by using a negative gradient so as to determine the optimal parameter of each regression tree; updating the calculated difference value of each regression tree, and putting the updated difference value into the next regression tree;
step 3.3: accumulating a plurality of regression trees to form a gradient lifting tree model, wherein the gradient lifting tree model is as follows:
Figure FDA0003828627560000011
wherein J is the number of leaf nodes, I is whether the value representing c belongs to the jth leaf node, and fm (x) is the predicted value of the final model;
and 4, step 4: and putting the real-time monitoring data into a gradient lifting tree model to predict the sewage outlet index of the sewage plant for a period of time in the future.
2. The method for predicting effluent indicators for a random forest and gradient spanning tree as set forth in claim 1, wherein the step 1 further comprises the steps of: randomly extracting samples according to the original training set and putting back to construct a regression tree; the samples that are not drawn each time are grouped into the same number of out-of-bag samples as the regression tree.
CN201811323416.1A 2018-11-07 2018-11-07 Method for predicting sewage effluent index based on random forest and gradient lifting tree Active CN109408774B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811323416.1A CN109408774B (en) 2018-11-07 2018-11-07 Method for predicting sewage effluent index based on random forest and gradient lifting tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811323416.1A CN109408774B (en) 2018-11-07 2018-11-07 Method for predicting sewage effluent index based on random forest and gradient lifting tree

Publications (2)

Publication Number Publication Date
CN109408774A CN109408774A (en) 2019-03-01
CN109408774B true CN109408774B (en) 2022-11-08

Family

ID=65472116

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811323416.1A Active CN109408774B (en) 2018-11-07 2018-11-07 Method for predicting sewage effluent index based on random forest and gradient lifting tree

Country Status (1)

Country Link
CN (1) CN109408774B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110308705A (en) * 2019-06-19 2019-10-08 上海华高汇元工程服务有限公司 A kind of apparatus control method based on big data and artificial intelligence water quality prediction
CN112348039B (en) * 2019-08-07 2023-04-07 中国移动通信集团上海有限公司 Training method of driving behavior analysis model, driving behavior analysis method and equipment
CN110795846B (en) * 2019-10-29 2023-07-14 东北财经大学 Boundary forest model construction method, multi-task soft computing model updating method oriented to complex industrial process and application of multi-task soft computing model updating method
CN110956010B (en) * 2019-11-01 2023-04-18 国网辽宁省电力有限公司阜新供电公司 Large-scale new energy access power grid stability identification method based on gradient lifting tree
CN111429970B (en) * 2019-12-24 2024-03-22 大连海事大学 Method and system for acquiring multiple gene risk scores based on feature selection of extreme gradient lifting method
CN111260149B (en) * 2020-02-10 2023-06-23 北京工业大学 Dioxin emission concentration prediction method
CN111667107B (en) * 2020-05-29 2024-05-14 中国工商银行股份有限公司 Research and development management and control problem prediction method and device based on gradient random forest
CN112580703B (en) * 2020-12-07 2022-07-05 昆明理工大学 Method for predicting morbidity of panax notoginseng in high-incidence stage
CN112733903B (en) * 2020-12-30 2023-11-17 许昌学院 SVM-RF-DT combination-based air quality monitoring and alarming method, system, device and medium
CN113361199A (en) * 2021-06-09 2021-09-07 成都之维安科技股份有限公司 Multi-dimensional pollutant emission intensity prediction method based on time series
CN113344130B (en) * 2021-06-30 2022-01-11 广州市河涌监测中心 Method and device for generating differentiated river patrol strategy
CN113537585B (en) * 2021-07-09 2023-04-07 中海石油(中国)有限公司天津分公司 Oil field production increasing measure recommendation method based on random forest and gradient lifting decision tree
CN113743453A (en) * 2021-07-21 2021-12-03 东北大学 Population quantity prediction method based on random forest
CN114462699A (en) * 2022-01-28 2022-05-10 无锡雪浪数制科技有限公司 Optical fiber production qualification index prediction method based on random forest
CN114913683A (en) * 2022-04-22 2022-08-16 星慧照明工程集团有限公司 Traffic signal lamp monitoring system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106991437A (en) * 2017-03-20 2017-07-28 浙江工商大学 The method and system of sewage quality data are predicted based on random forest
CN108536650A (en) * 2018-04-03 2018-09-14 北京京东尚科信息技术有限公司 Generate the method and apparatus that gradient promotes tree-model

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106991437A (en) * 2017-03-20 2017-07-28 浙江工商大学 The method and system of sewage quality data are predicted based on random forest
CN108536650A (en) * 2018-04-03 2018-09-14 北京京东尚科信息技术有限公司 Generate the method and apparatus that gradient promotes tree-model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于流形降维和梯度提升树的大气腐蚀速率预测模型;梁喜旺等;《装备环境工程》;20180625(第06期);全文 *

Also Published As

Publication number Publication date
CN109408774A (en) 2019-03-01

Similar Documents

Publication Publication Date Title
CN109408774B (en) Method for predicting sewage effluent index based on random forest and gradient lifting tree
CN110782093B (en) PM fusing SSAE deep feature learning and LSTM2.5Hourly concentration prediction method and system
CN109828089B (en) DBN-BP-based water quality parameter nitrous acid nitrogen online prediction method
CN108346293B (en) Real-time traffic flow short-time prediction method
CN110824915B (en) GA-DBN network-based intelligent monitoring method and system for wastewater treatment
CN109558893B (en) Rapid integrated sewage treatment fault diagnosis method based on resampling pool
CN111160776A (en) Method for detecting abnormal working condition in sewage treatment process by utilizing block principal component analysis
CN109919356B (en) BP neural network-based interval water demand prediction method
CN104123476A (en) Gas concentration prediction method and device based on extreme learning machine
CN110782658A (en) Traffic prediction method based on LightGBM algorithm
CN108647807B (en) River flow prediction method
CN112070356A (en) Method for predicting anti-carbonization performance of concrete based on RF-LSSVM model
CN112364560B (en) Intelligent prediction method for working hours of mine rock drilling equipment
CN115906954A (en) Multivariate time sequence prediction method and device based on graph neural network
CN111932039A (en) Train arrival late prediction method and device, electronic equipment and storage medium
CN108961460B (en) Fault prediction method and device based on sparse ESGP (Enterprise service gateway) and multi-objective optimization
CN106200381B (en) A method of according to the operation of processing water control by stages water factory
CN114417740B (en) Deep sea breeding situation sensing method
CN115147645A (en) Membrane module membrane pollution detection method based on multi-feature information fusion
CN115659774A (en) Dam risk Bayesian network model modeling method integrating machine learning
CN105372995A (en) Measurement and control method for sewage disposal system
CN114707692A (en) Wetland effluent ammonia nitrogen concentration prediction method and system based on hybrid neural network
KR101585545B1 (en) A method of Wavelet-based autoregressive fuzzy modeling for forecasting algal blooms
CN112819087B (en) Method for detecting abnormality of BOD sensor of outlet water based on modularized neural network
CN117196883A (en) Sewage treatment decision optimization method and system based on artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant