CN107301513A - Bloom prealarming method and apparatus based on CART decision trees - Google Patents

Bloom prealarming method and apparatus based on CART decision trees Download PDF

Info

Publication number
CN107301513A
CN107301513A CN201710501068.1A CN201710501068A CN107301513A CN 107301513 A CN107301513 A CN 107301513A CN 201710501068 A CN201710501068 A CN 201710501068A CN 107301513 A CN107301513 A CN 107301513A
Authority
CN
China
Prior art keywords
mrow
attribute
point
influence
gini
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710501068.1A
Other languages
Chinese (zh)
Inventor
刘云翔
吴浩
徐琛
李晓丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Institute of Technology
Original Assignee
Shanghai Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Institute of Technology filed Critical Shanghai Institute of Technology
Priority to CN201710501068.1A priority Critical patent/CN107301513A/en
Publication of CN107301513A publication Critical patent/CN107301513A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Abstract

The invention provides a kind of bloom prealarming method and apparatus based on CART decision trees, the factor of influence of K influence wawter bloom is chosen, the coefficient correlation between the K factor of influence and default decision attribute is calculated respectively;T factor of influence is chosen as the conditional attribute of influence wawter bloom, wherein T is the integer more than 0 and less than or equal to K;Based on Fayyad boundary point cor-responding identified theorems, the corresponding GINI coefficients of each conditional attribute in data set are obtained;The data set refers to:The data acquisition system that the parameter detected from different water body environments is constituted;The cut-point of each conditional attribute is determined according to the GINI coefficients;Decision tree is built according to the cut-point and coefficient correlation recurrence;Bloom prealarming is carried out to water body based on the decision tree.Method in the present invention solves traditional CART decision-tree models and there are problems that run time is longer and precision of prediction, and wawter bloom that can be effectively to water body is predicted, and forecasting efficiency is high.

Description

Bloom prealarming method and apparatus based on CART decision trees
Technical field
The present invention relates to water environment protection technical field, in particular it relates to the bloom prealarming method based on CART decision trees And device.
Background technology
Because a large amount of town sewages and industrial and agricultural wastewater flow into the water bodys such as river, river, lake so that the pollutional load of water body is not Disconnected increase, causes wawter bloom event to take place frequently.In order to which preferably the water body of eutrophication is monitored and managed, many researchers carry Go out and be predicted by setting up bloom prealarming model come the wawter bloom to water body.
Decision tree (Classification And Regression Tree, CART) be 1984 by Breiman, The algorithm that Friedman, Olshen, Stone are proposed.CART algorithms use a kind of technology of two points of recursive subdivisions, and based on information The algorithm of entropy is different, and division of the CART algorithms to each sample set calculates GINI coefficients, wherein, GINI coefficients are metric datas point Area's either sample set E impure degree, GINI coefficients are smaller, divide more reasonable.CART algorithms are always by current sample set point It is segmented into two sub- sample sets so that each non-leaf node of the decision tree of generation only has two branches.Therefore CART algorithms life Into decision tree be binary tree simple for structure.CART decision trees both can may also be used for doing regression tree for doing classification tree. But, the problem of traditional CART algorithms have computationally intensive and long operational time.
The content of the invention
For defect of the prior art, it is an object of the invention to provide a kind of bloom prealarming side based on CART decision trees Method and device.
In a first aspect, the present invention provides a kind of bloom prealarming method based on CART decision trees, including:
The factor of influence of K influence wawter bloom is chosen, the K factor of influence and default decision attribute (leaf are calculated respectively Green plain a concentration) between coefficient correlation;Wherein, K is the integer more than 0;
T factor of influence is chosen from K factor of influence according to the value of the coefficient correlation, the T factor of influence is made To influence the conditional attribute of wawter bloom, wherein T is the integer more than 0 and less than or equal to K;
Based on Fayyad boundary point cor-responding identified theorems, the corresponding GINI coefficients of each conditional attribute in data set are obtained;Wherein, The data set refers to:The data acquisition system that the parameter detected from different water body environments is constituted;
The cut-point of each conditional attribute is determined according to the GINI coefficients;
Decision tree is built according to the cut-point and coefficient correlation recurrence;
Bloom prealarming is carried out to water body based on the decision tree.
Alternatively, the calculation formula of the coefficient correlation between the K factor of influence and default decision attribute is as follows:
In formula:Represent i-th of conditional attribute AiWith the coefficient correlation between default decision attribute B, AiRepresent i-th Conditional attribute, i is the integer more than or equal to 1 and less than or equal to K, and B represents default decision attribute, Cov (Ai, B) and represent AiWith B Covariance, D (Ai) represent AiVariance, D (B) represent B variance.
Alternatively, it is described to be based on Fayyad boundary point cor-responding identified theorems, obtain each conditional attribute in data set corresponding GINI coefficients, including:
Assuming that the data set E is divided into two subset E according to either condition attribute A1And E2, then the conditional attribute A The calculation formula of corresponding data collection E GINI coefficients is as follows:
Wherein:
In formula:GINIA(E) gini indexs of the E under A attributes is represented,Represent subset E1Ratio in set E, GINI(E1) represent E1Gini index,Represent subset E2Ratio in set E, GINI (E2) represent E2Geordie refer to Number, GINI (E) represents E gini index, piTuple belongs to the probability of decision attribute final classification result in expression E, and m represents to determine The number of plan attribute data collection classification.
Alternatively, the cut-point that each conditional attribute is determined according to the GINI coefficients, including:
Cut-point is obtained for each conditional attribute according to GINI coefficients minimum principle, the GINI coefficients minimum principle is Refer to when based on either condition attribute by the Segmentation of Data Set into two subsets when so that the conditional attribute is on the data The GINI coefficients of collection are minimum;
Using cut-point of GINI coefficients when minimum as the conditional attribute cut-point.
Alternatively, it is described that decision tree is built according to the cut-point and coefficient correlation recurrence, including:
A1:The maximum conditional attribute of coefficient correlation in data set is chosen as the 1st layer of node;The conditional attribute is made For root node;
A2:The initial value for making N is 1;
A3:Data subset positioned at N+1 layers, institute are determined according to the corresponding cut-point of the conditional attribute of the N node layers The maximum conditional attribute of coefficient correlation in N+1 layers of data subset is stated as N+1 layers of node;
A4:The data subset judged positioned at N+1 layers whether there is alienable conditional attribute, if there is alienable bar Part attribute, then make N value from increasing 1, return and perform A3, if in the absence of alienable conditional attribute, performing A5;
A5:Decision tree is generated, terminates flow.
Alternatively, in addition to:Beta pruning is carried out to the decision tree using cost complexity pruning algorithms, what is be simplified determines Plan tree.
Alternatively, in addition to:Decision tree based on the simplification carries out bloom prealarming to water body.
Second aspect, the present invention provides a kind of bloom prealarming device based on CART decision trees, including:First chooses mould Block, second choose module, acquisition module, cut-point determining module, decision tree and build module and warning module;
It is described first choose module, for choose K influence wawter bloom factor of influence, calculate respectively it is described K influence because The sub coefficient correlation between default decision attribute;Wherein, K is the integer more than 0;
It is described second choose module, for chosen according to the value of the coefficient correlation from K factor of influence T influence because Son, the T factor of influence is as the conditional attribute of influence wawter bloom, and wherein T is the integer more than 0 and less than or equal to K;
The acquisition module, for based on Fayyad boundary point cor-responding identified theorems, obtaining each conditional attribute pair in data set The GINI coefficients answered;Wherein, the data set refers to:The data set that the parameter detected from different water body environments is constituted Close;
The cut-point determining module, the cut-point for determining each conditional attribute according to the GINI coefficients;
The decision tree builds module, for building decision tree according to the cut-point and coefficient correlation recurrence;
The warning module, for carrying out bloom prealarming to water body based on the decision tree.
Alternatively, the cut-point determining module, specifically for:
Cut-point is obtained for each conditional attribute according to GINI coefficients minimum principle, the GINI coefficients minimum principle is Refer to when based on either condition attribute by the Segmentation of Data Set into two subsets when so that the conditional attribute is on the data The GINI coefficients of collection are minimum;
Using cut-point of GINI coefficients when minimum as the conditional attribute cut-point.
Alternatively, in addition to:Pruning module, the pruning module, for utilizing cost complexity pruning algorithms to described Decision tree carries out beta pruning, the decision tree being simplified.
Compared with prior art, the present invention has following beneficial effect:
The bloom prealarming method and apparatus based on CART decision trees that the present invention is provided, the influence of wawter bloom is influenceed by calculating The factor selects conditional attribute with the coefficient correlation of default decision attribute;And obtained based on Fayyad boundary points cor-responding identified theorems The GINI coefficients of each conditional attribute are taken, the cut-point that the minimum cut-point of GINI coefficients is the conditional attribute is chosen, according to The cut-point of each conditional attribute and the raw decision tree of coefficient correlation recurrence, using decision tree come early warning water bloom of water body, it is of the invention in Method solve traditional CART decision-tree models and there are problems that run time is longer and precision of prediction, can be effective Ground is predicted to the wawter bloom of water body, and forecasting efficiency is high.
Brief description of the drawings
By reading the detailed description made with reference to the following drawings to non-limiting example, further feature of the invention, Objects and advantages will become more apparent upon:
Fig. 1 is the bloom prealarming method flow diagram based on CART decision trees that the embodiment of the present invention one is provided;
The generation method flow chart for the CART decision trees that Fig. 2 provides for the present invention.
Embodiment
With reference to specific embodiment, the present invention is described in detail.Following examples will be helpful to the technology of this area Personnel further understand the present invention, but the invention is not limited in any way.It should be pointed out that to the ordinary skill of this area For personnel, without departing from the inventive concept of the premise, some changes and improvements can also be made.These belong to the present invention Protection domain.
Fig. 1 is the bloom prealarming method based on CART decision trees that the embodiment of the present invention one is provided, as shown in figure 1, this reality Applying the method in example can include:
S101, the factor of influence for choosing K influence wawter bloom, calculate the K factor of influence and belong to default decision-making respectively Coefficient correlation between property;Wherein, K is the integer more than 0.
In the present embodiment, the factor of influence of the influence wawter bloom includes:Chlorophyll a (Chl-a), water temperature (T), PH, chemistry Oxygen demand (COD), total nitrogen (TN), total phosphorus (TP), dissolved oxygen (DO), water quality, the meteorological factor such as illumination (E);The default bar Part attribute includes:Total phosphorus (TP), total nitrogen (TN), temperature (T), COD (COD), PH.
S102, choose T factor of influence from K factor of influence according to the value of the coefficient correlation, the T influence because Son is as the conditional attribute of influence wawter bloom, and wherein T is the integer more than 0 and less than or equal to K.
In the present embodiment, T factor of influence is chosen from K factor of influence according to the size of coefficient correlation, for example, can be with The factor of influence for selecting coefficient correlation big.
S103, based on Fayyad boundary point cor-responding identified theorems, obtain the corresponding GINI coefficients of each conditional attribute in data set; Wherein, the data set refers to:The data acquisition system that the parameter detected from different water body environments is constituted.
In the present embodiment, the GINI coefficients of each conditional attribute, GINI systems are obtained by Fayyad boundary points cor-responding identified theorems Number is metric data subregion either sample set E impure degree, and GINI coefficients are smaller, divide more reasonable.
S104, the cut-point for determining according to the GINI coefficients each conditional attribute.
In the present embodiment, the corresponding cut-point of each conditional attribute is chosen by the size of GINI coefficients, for example can be with The cut-point at GINI coefficients minimum is chosen as the cut-point of the conditional attribute.
S105, decision tree built according to the cut-point and coefficient correlation recurrence.
In the present embodiment, root node can be used as by choosing the maximum conditional attribute of coefficient correlation, described section is obtained The cut-point of point, obtains next layer of Sub Data Set, by judging in the Sub Data Set to belong to the presence or absence of alienable condition Property gradually divide Sub Data Set, until without alienable conditional attribute again, generating decision tree, institute in the Sub Data Set of the bottom Decision tree is stated for binary tree.
S106, based on the decision tree to water body carry out bloom prealarming.
Alternatively, the calculation formula of the coefficient correlation between the K factor of influence and default decision attribute is as follows:
In formula:Represent i-th of conditional attribute AiWith the coefficient correlation between default decision attribute B, AiRepresent i-th Conditional attribute, i is the integer more than or equal to 1 and less than or equal to K, and B represents default decision attribute, Cov (Ai, B) and represent AiWith B Covariance, D (Ai) represent AiVariance, D (B) represent B variance.
Alternatively, it is described to be based on Fayyad boundary point cor-responding identified theorems, obtain each conditional attribute in data set corresponding GINI coefficients, including:
Assuming that the data set E is divided into two subset E according to either condition attribute A1And E2, then the conditional attribute A The calculation formula of corresponding data collection E GINI coefficients is as follows:
Wherein:
In formula:GINIA(E) gini indexs of the E under A attributes is represented,Represent subset E1Ratio in set E, GINI(E1) represent E1Gini index,Represent subset E2Ratio in set E, GINI (E2) represent E2Geordie refer to Number, GINI (E) represents E gini index, piTuple belongs to the probability of decision attribute final classification result in expression E, and m represents to determine The number of plan attribute data collection classification.
Alternatively, the cut-point that each conditional attribute is determined according to the GINI coefficients, including:
Cut-point is obtained for each conditional attribute according to GINI coefficients minimum principle, the GINI coefficients minimum principle is Refer to when based on either condition attribute by the Segmentation of Data Set into two subsets when so that the conditional attribute is on the data The GINI coefficients of collection are minimum;
Using cut-point of GINI coefficients when minimum as the conditional attribute cut-point.
The present embodiment, influences the factor of influence of wawter bloom to be selected with the coefficient correlation of default decision attribute by calculating Conditional attribute;And obtain the GINI coefficients of each conditional attribute based on Fayyad boundary points cor-responding identified theorems, choose GINI coefficients Minimum cut-point is the cut-point of the conditional attribute, is determined according to the cut-point of each conditional attribute and the life of coefficient correlation recurrence Plan tree, using decision tree come early warning water bloom of water body, the method in the present invention solves traditional CART decision-tree models and there is fortune The problems such as row time is longer and precision of prediction is inadequate, wawter bloom that can be effectively to water body is predicted, and forecasting efficiency is high.
The generation method flow chart for the CART decision trees that Fig. 2 provides for the present invention, as shown in Fig. 2 dividing described in the basis Cutpoint and coefficient correlation recurrence build decision tree, including:
A1:The maximum conditional attribute of coefficient correlation in data set is chosen as the 1st layer of node;The conditional attribute is made For root node;
A2:The initial value for making N is 1;
A3:Data subset positioned at N+1 layers, institute are determined according to the corresponding cut-point of the conditional attribute of the N node layers The maximum conditional attribute of coefficient correlation in N+1 layers of data subset is stated as N+1 layers of node;
A4:The data subset judged positioned at N+1 layers whether there is alienable conditional attribute, if there is alienable bar Part attribute, then make N value from increasing 1, return and perform A3, if in the absence of alienable conditional attribute, performing A5;
A5:Decision tree is generated, terminates flow.
Alternatively, method shown in above-mentioned Fig. 1, can also include:Using cost complexity pruning algorithms to the decision tree Carry out beta pruning, the decision tree being simplified.
Alternatively, method shown in above-mentioned Fig. 1, can also include:Decision tree based on the simplification carries out wawter bloom to water body Early warning.
With reference to specific embodiment, more detailed explanation is done to the technical scheme in the present invention.
The factor of influence of multiple influence wawter blooms is chosen first, and calculates the phase relation between the factor of influence and conditional attribute Number, it is specific as shown in table 1.
The correlation coefficient charts of table 1
In table 1, TN represents total nitrogen, and TP represents total phosphorus, and PH represents hydrogen ionexponent, and DO represents dissolved oxygen, and COD is represented COD, T represents temperature, and E represents illumination, TN:TP represents N/P ratio, wherein, conditional attribute is total phosphorus (TP), total nitrogen (TN), temperature (T), COD (COD), PH.
Secondly, five factors of influence of TN, TP, PH, COD, T are selected from table 1 as conditional attribute.
With reference to Fayyad boundary point cor-responding identified theorems, the GINI coefficients of each attribute in data set are calculated, each attribute is selected Minimum GINI coefficients are used as the attribute cut-point at middle cut-point.
GINI coefficients are metric data subregion either sample set E impure degree.It is defined as follows:
Wherein, classification collection is { C1, C2 ... Cn }, | Ci | to belong to Ci sample size, pi=in E | Ci |/| E | in E Sample belong to classification Ci probability.
When only sample set is divided into two, training sample set E is divided into two subsets E1 and E2 by attribute A, then is divided E GINI coefficients are afterwards
Wherein, | Ek|/| E | belong to the probability of kth (k=1,2) individual subset for sample in sample set.
The optimal threshold point (cut-point) for the conditional attribute selected by the above method, as shown in table 2.
The attribute optimal threshold table of table 2
Result and coefficient correlation recursive generation decision tree in table 2.
It is possible to further carry out beta pruning to the decision tree of the generation by cost complexity pruning algorithms, letter is formed Clean decision tree.
The embodiment of the present invention additionally provides a kind of bloom prealarming device based on CART decision trees, and first chooses module, the Two, which choose module, acquisition module, cut-point determining module, decision tree, builds module and warning module;
It is described first choose module, for choose K influence wawter bloom factor of influence, calculate respectively it is described K influence because The sub coefficient correlation between default decision attribute;Wherein, K is the integer more than 0;
It is described second choose module, for chosen according to the value of the coefficient correlation from K factor of influence T influence because Son, the T factor of influence is as the conditional attribute of influence wawter bloom, and wherein T is the integer more than 0 and less than or equal to K;
The acquisition module, for based on Fayyad boundary point cor-responding identified theorems, obtaining each conditional attribute pair in data set The GINI coefficients answered;Wherein, the data set refers to:The data set that the parameter detected from different water body environments is constituted Close.
The cut-point determining module, the cut-point for determining each conditional attribute according to the GINI coefficients;
The decision tree builds module, for building decision tree according to the cut-point and coefficient correlation recurrence;
The warning module, for carrying out bloom prealarming to water body based on the decision tree.
Alternatively, the cut-point determining module, specifically for:
Cut-point is obtained for each conditional attribute according to GINI coefficients minimum principle, the GINI coefficients minimum principle is Refer to when based on it is any certainly conditionity by the Segmentation of Data Set into two subsets when so that the conditional attribute is on the data The GINI coefficients of collection are minimum;
Using cut-point of GINI coefficients when minimum as the conditional attribute cut-point.
Further, in addition to:Pruning module, the pruning module, for utilizing cost complexity pruning algorithms to institute State decision tree and carry out beta pruning, the decision tree being simplified.
It should be noted that the step in the bloom prealarming method based on CART decision trees that the present invention is provided, can To be achieved using corresponding module, unit in the bloom prealarming device based on CART decision trees etc., art technology The technical scheme that personnel are referred to the system realizes the step flow of methods described, i.e. the embodiment in the system can It is interpreted as realizing the preference of methods described, will not be described here.
One skilled in the art will appreciate that except realizing the system that the present invention is provided in pure computer readable program code mode And its beyond each device, can be caused completely by the way that method and step is carried out into programming in logic the system that provides of the present invention and its Each device is in the form of gate, switch, application specific integrated circuit, programmable logic controller (PLC) and embedded microcontroller etc. To realize identical function.So, the system and its every device that the present invention is provided are considered a kind of hardware component, and right The device for realizing various functions included in it can also be considered as the structure in hardware component;It will can also be used to realize respectively The device of kind of function, which is considered as, not only can be the software module of implementation method but also can be the structure in hardware component.
The specific embodiment of the present invention is described above.It is to be appreciated that the invention is not limited in above-mentioned Particular implementation, those skilled in the art can make a variety of changes or change within the scope of the claims, this not shadow Ring the substantive content of the present invention.In the case where not conflicting, feature in embodiments herein and embodiment can any phase Mutually combination.

Claims (10)

1. a kind of bloom prealarming method based on CART decision trees, it is characterised in that including:
The factor of influence of K influence wawter bloom is chosen, the phase between the K factor of influence and default decision attribute is calculated respectively Relation number;Wherein, K is the integer more than 0;
T factor of influence is chosen from K factor of influence according to the value of the coefficient correlation, the T factor of influence is used as shadow The conditional attribute of China of Xiangshui County, wherein T is the integer more than 0 and less than or equal to K;
Based on Fayyad boundary point cor-responding identified theorems, the corresponding GINI coefficients of each conditional attribute in data set are obtained;Wherein, it is described Data set refers to:The data acquisition system that the parameter detected from different water body environments is constituted;
The cut-point of each conditional attribute is determined according to the GINI coefficients;
Decision tree is built according to the cut-point and coefficient correlation recurrence;
Bloom prealarming is carried out to water body based on the decision tree.
2. the bloom prealarming method according to claim 1 based on CART decision trees, it is characterised in that the K influence The calculation formula of coefficient correlation between the factor and default decision attribute is as follows:
<mrow> <msub> <mi>&amp;rho;</mi> <msub> <mi>A</mi> <mrow> <mi>i</mi> <mi>B</mi> </mrow> </msub> </msub> <mo>=</mo> <mfrac> <mrow> <mi>C</mi> <mi>o</mi> <mi>v</mi> <mrow> <mo>(</mo> <msub> <mi>A</mi> <mi>i</mi> </msub> <mo>,</mo> <mi>B</mi> <mo>)</mo> </mrow> </mrow> <msqrt> <mrow> <mi>D</mi> <mrow> <mo>(</mo> <msub> <mi>A</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <msqrt> <mrow> <mi>D</mi> <mrow> <mo>(</mo> <mi>B</mi> <mo>)</mo> </mrow> </mrow> </msqrt> </mrow> </msqrt> </mfrac> </mrow>
In formula:Represent i-th of conditional attribute AiWith the coefficient correlation between default decision attribute B, AiRepresent i-th of condition Attribute, i is the integer more than or equal to 1 and less than or equal to K, and B represents default decision attribute, Cov (Ai, B) and represent AiWith B association Variance, D (Ai) represent AiVariance, D (B) represent B variance.
3. the bloom prealarming method according to claim 1 based on CART decision trees, it is characterised in that described to be based on Fayyad boundary point cor-responding identified theorems, obtain the corresponding GINI coefficients of each conditional attribute in data set, including:
Assuming that the data set E is divided into two subset E according to either condition attribute A1And E2, then the conditional attribute A is corresponding The calculation formula of data set E GINI coefficients is as follows:
<mrow> <msub> <mi>GINI</mi> <mi>A</mi> </msub> <mrow> <mo>(</mo> <mi>E</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mo>|</mo> <msub> <mi>E</mi> <mn>1</mn> </msub> <mo>|</mo> </mrow> <mrow> <mo>|</mo> <mi>E</mi> <mo>|</mo> </mrow> </mfrac> <mi>G</mi> <mi>I</mi> <mi>N</mi> <mi>I</mi> <mrow> <mo>(</mo> <msub> <mi>E</mi> <mn>1</mn> </msub> <mo>)</mo> </mrow> <mo>+</mo> <mfrac> <mrow> <mo>|</mo> <msub> <mi>E</mi> <mn>2</mn> </msub> <mo>|</mo> </mrow> <mrow> <mo>|</mo> <mi>E</mi> <mo>|</mo> </mrow> </mfrac> <mi>G</mi> <mi>I</mi> <mi>N</mi> <mi>I</mi> <mrow> <mo>(</mo> <msub> <mi>E</mi> <mn>2</mn> </msub> <mo>)</mo> </mrow> </mrow>
Wherein:
In formula:GINIA(E) gini indexs of the E under A attributes is represented,Represent subset E1Ratio in set E, GINI (E1) represent E1Gini index,Represent subset E2Ratio in set E, GINI (E2) represent E2Gini index, GINI (E) represents E gini index, piTuple belongs to the probability of decision attribute final classification result in expression E, and m represents decision-making The number of attribute data collection classification.
4. the bloom prealarming method according to claim 1 based on CART decision trees, it is characterised in that described in the basis GINI coefficients determine the cut-point of each conditional attribute, including:
Cut-point is obtained for each conditional attribute according to GINI coefficients minimum principle, the GINI coefficients minimum principle refers to work as Based on either condition attribute by the Segmentation of Data Set into two subsets when so that the conditional attribute is on the data set GINI coefficients are minimum;
Using cut-point of GINI coefficients when minimum as the conditional attribute cut-point.
5. the bloom prealarming method based on CART decision trees according to any one in claim 1-4, it is characterised in that It is described that decision tree is built according to the cut-point and coefficient correlation recurrence, including:
A1:The maximum conditional attribute of coefficient correlation in data set is chosen as the 1st layer of node;It regard the conditional attribute as root Node;
A2:The initial value for making N is 1;
A3:Data subset positioned at N+1 layers, the N+1 are determined according to the corresponding cut-point of the conditional attribute of the N node layers The maximum conditional attribute of coefficient correlation is used as N+1 layers of node in the data subset of layer;
A4:The data subset judged positioned at N+1 layers whether there is alienable conditional attribute, if there is alienable condition category Property, then N value is made from increasing 1, returns and performs A3, if in the absence of alienable conditional attribute, performing A5;
A5:Decision tree is generated, terminates flow.
6. the bloom prealarming method according to claim 1 based on CART decision trees, it is characterised in that also include:Utilize Cost complexity pruning algorithms carry out beta pruning, the decision tree being simplified to the decision tree.
7. the bloom prealarming method according to claim 6 based on CART decision trees, it is characterised in that also include:It is based on The decision tree of the simplification carries out bloom prealarming to water body.
8. a kind of bloom prealarming device based on CART decision trees, it is characterised in that including:First chooses module, the second selection Module, acquisition module, cut-point determining module, decision tree build module and warning module;
It is described first choose module, for choose K influence wawter bloom factor of influence, calculate respectively the K factor of influence and Coefficient correlation between default decision attribute;Wherein, K is the integer more than 0;
Described second chooses module, for choosing T factor of influence from K factor of influence according to the value of the coefficient correlation, The T factor of influence is as the conditional attribute of influence wawter bloom, and wherein T is the integer more than 0 and less than or equal to K;
The acquisition module, it is corresponding for based on Fayyad boundary point cor-responding identified theorems, obtaining each conditional attribute in data set GINI coefficients;Wherein, the data set refers to:The data acquisition system that the parameter detected from different water body environments is constituted;
The cut-point determining module, the cut-point for determining each conditional attribute according to the GINI coefficients;
The decision tree builds module, for building decision tree according to the cut-point and coefficient correlation recurrence;
The warning module, for carrying out bloom prealarming to water body based on the decision tree.
9. the bloom prealarming device according to claim 8 based on CART decision trees, it is characterised in that the cut-point is true Cover half block, specifically for:
Cut-point is obtained for each conditional attribute according to GINI coefficients minimum principle, the GINI coefficients minimum principle refers to work as Based on either condition attribute by the Segmentation of Data Set into two subsets when so that the conditional attribute is on the data set GINI coefficients are minimum;
Using cut-point of GINI coefficients when minimum as the conditional attribute cut-point.
10. the bloom prealarming device according to claim 8 based on CART decision trees, it is characterised in that also include:Beta pruning Module, the pruning module, for carrying out beta pruning to the decision tree using cost complexity pruning algorithms, what is be simplified determines Plan tree.
CN201710501068.1A 2017-06-27 2017-06-27 Bloom prealarming method and apparatus based on CART decision trees Pending CN107301513A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710501068.1A CN107301513A (en) 2017-06-27 2017-06-27 Bloom prealarming method and apparatus based on CART decision trees

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710501068.1A CN107301513A (en) 2017-06-27 2017-06-27 Bloom prealarming method and apparatus based on CART decision trees

Publications (1)

Publication Number Publication Date
CN107301513A true CN107301513A (en) 2017-10-27

Family

ID=60135971

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710501068.1A Pending CN107301513A (en) 2017-06-27 2017-06-27 Bloom prealarming method and apparatus based on CART decision trees

Country Status (1)

Country Link
CN (1) CN107301513A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108961641A (en) * 2018-07-24 2018-12-07 民航成都电子技术有限责任公司 A method of the reduction capacitor based on classification tree encloses boundary's alarm system false-alarm
CN109146195A (en) * 2018-09-06 2019-01-04 北方爆破科技有限公司 A kind of blast fragmentation size prediction technique based on cart tree regression algorithm
CN109492712A (en) * 2018-12-17 2019-03-19 上海应用技术大学 The method for establishing internet finance air control model
CN109639283A (en) * 2018-11-26 2019-04-16 南京理工大学 Workpiece coding method based on decision tree
CN109858489A (en) * 2019-01-15 2019-06-07 青岛海信网络科技股份有限公司 A kind of alert method for early warning and equipment
CN113033110A (en) * 2021-05-27 2021-06-25 深圳市城市交通规划设计研究中心股份有限公司 Important area personnel emergency evacuation system and method based on traffic flow model
CN113110385A (en) * 2021-04-16 2021-07-13 广东电网有限责任公司计量中心 Decision tree-based start-stop early warning method and device for metering automatic verification system
CN113156529A (en) * 2021-05-07 2021-07-23 广东电网有限责任公司计量中心 Start-stop control method, system, terminal and storage medium of metrological verification assembly line
CN113505818A (en) * 2021-06-17 2021-10-15 广东工业大学 Aluminum melting furnace energy consumption abnormity diagnosis method, system and equipment with improved decision tree algorithm

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008214942A (en) * 2007-03-02 2008-09-18 Gunma Prefecture Prediction of generation of water-bloom and method for preventing its generation
CN106202163A (en) * 2016-06-24 2016-12-07 中国环境科学研究院 Tongjiang lake ecological monitoring information management and early warning system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008214942A (en) * 2007-03-02 2008-09-18 Gunma Prefecture Prediction of generation of water-bloom and method for preventing its generation
CN106202163A (en) * 2016-06-24 2016-12-07 中国环境科学研究院 Tongjiang lake ecological monitoring information management and early warning system

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
张亮,宁芊: "CART决策树的两种改进及应用", 《计算机工程与设计》 *
曾勇,杨志峰,刘静玲: "城市湖泊水华预警模型研究", 《水科学进展》 *
肖永辉,王志刚,刘曙照: "水体富营养化及蓝藻水华预警模型研究进展", 《环境科学与技术》 *
董跃华: "基于相关系数的决策树优化算法", 《计算机工程与科学》 *
赵翔,祁云嵩,刘同明: "协方差及相关系数在决策树构造中的应用", 《华东船舶工业学院学报(自然科学版)》 *
陈云,戴锦芳: "基于遥感数据的太湖蓝藻水华信息识别方法", 《湖泊科学》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108961641A (en) * 2018-07-24 2018-12-07 民航成都电子技术有限责任公司 A method of the reduction capacitor based on classification tree encloses boundary's alarm system false-alarm
CN109146195A (en) * 2018-09-06 2019-01-04 北方爆破科技有限公司 A kind of blast fragmentation size prediction technique based on cart tree regression algorithm
CN109639283A (en) * 2018-11-26 2019-04-16 南京理工大学 Workpiece coding method based on decision tree
CN109639283B (en) * 2018-11-26 2022-11-04 南京理工大学 Workpiece coding method based on decision tree
CN109492712A (en) * 2018-12-17 2019-03-19 上海应用技术大学 The method for establishing internet finance air control model
CN109858489A (en) * 2019-01-15 2019-06-07 青岛海信网络科技股份有限公司 A kind of alert method for early warning and equipment
CN113110385A (en) * 2021-04-16 2021-07-13 广东电网有限责任公司计量中心 Decision tree-based start-stop early warning method and device for metering automatic verification system
CN113156529A (en) * 2021-05-07 2021-07-23 广东电网有限责任公司计量中心 Start-stop control method, system, terminal and storage medium of metrological verification assembly line
CN113033110A (en) * 2021-05-27 2021-06-25 深圳市城市交通规划设计研究中心股份有限公司 Important area personnel emergency evacuation system and method based on traffic flow model
CN113505818A (en) * 2021-06-17 2021-10-15 广东工业大学 Aluminum melting furnace energy consumption abnormity diagnosis method, system and equipment with improved decision tree algorithm

Similar Documents

Publication Publication Date Title
CN107301513A (en) Bloom prealarming method and apparatus based on CART decision trees
CN106846425A (en) A kind of dispersion point cloud compression method based on Octree
CN107622322B (en) Forecasting factor identification method of medium-long term runoff and forecasting method of medium-long term runoff
Cuss et al. Analysis of dissolved organic matter fluorescence using self-organizing maps: mini-review and tutorial
CN101826166A (en) Novel recognition method of neural network patterns
CN104268629A (en) Complex network community detecting method based on prior information and network inherent information
Matić et al. Oscillating Adriatic temperature and salinity regimes mapped using the Self-Organizing Maps method
Wang et al. A new approach of obtaining reservoir operation rules: Artificial immune recognition system
Liu et al. Water bloom warning model based on random forest
CN116245019A (en) Load prediction method, system, device and storage medium based on Bagging sampling and improved random forest algorithm
CN110909949A (en) Near-shore sea area chlorophyll a concentration prediction method based on clustering-regression algorithm
Ramik et al. Fuzzy mathematical programming: a unified approach based on fuzzy relations
CN113344130A (en) Method and device for generating differentiated river patrol strategy
CN107492129A (en) Non-convex compressed sensing optimal reconfiguration method with structuring cluster is represented based on sketch
CN113887119A (en) River water quality prediction method based on SARIMA-LSTM
Cao et al. Adopting improved Adam optimizer to train dendritic neuron model for water quality prediction
CN104504719A (en) Image edge detection method and equipment
Cao et al. Discovery of predictive rule sets for chlorophyll-a dynamics in the Nakdong River (Korea) by means of the hybrid evolutionary algorithm HEA
Feng Generalized rough fuzzy sets based on soft sets
CN103745079A (en) Curve fitting method based on abstract convex estimations
CN106888112A (en) A kind of node migration network blocks optimization method based on quick splitting algorithm
CN103218543B (en) A kind of method and system distinguishing protein coding gene and Noncoding gene
KR102432212B1 (en) Methods for quantification and evaluation of water quality using data-mining
Yalin et al. Dissolved oxygen prediction model which based on fuzzy neural network
Wang et al. Geographic distribution of bacterial communities of inland waters in China

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20171027

RJ01 Rejection of invention patent application after publication