CN113505818A - Aluminum melting furnace energy consumption abnormity diagnosis method, system and equipment with improved decision tree algorithm - Google Patents

Aluminum melting furnace energy consumption abnormity diagnosis method, system and equipment with improved decision tree algorithm Download PDF

Info

Publication number
CN113505818A
CN113505818A CN202110675134.3A CN202110675134A CN113505818A CN 113505818 A CN113505818 A CN 113505818A CN 202110675134 A CN202110675134 A CN 202110675134A CN 113505818 A CN113505818 A CN 113505818A
Authority
CN
China
Prior art keywords
decision tree
energy consumption
data set
melting furnace
tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110675134.3A
Other languages
Chinese (zh)
Inventor
杨海东
朱成就
徐康康
印四华
周俊霖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202110675134.3A priority Critical patent/CN113505818A/en
Publication of CN113505818A publication Critical patent/CN113505818A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound

Abstract

The invention belongs to the technical field of aluminum melting furnace energy consumption data diagnosis, and particularly relates to an aluminum melting furnace energy consumption abnormity diagnosis method, system and device for improving a decision tree algorithm. The improved decision tree algorithm can efficiently classify the energy consumption data of the aluminum melting furnace and realize the diagnosis of the data; in addition, the method carries out the establishment of the decision tree based on the Fayyad boundary theorem by analyzing the characteristics of the decision tree algorithm, reduces the time consumed by traversing nodes during the tree establishment, and improves the tree establishment efficiency by trimming the Fayyad-CART by adopting post pruning CCP.

Description

Aluminum melting furnace energy consumption abnormity diagnosis method, system and equipment with improved decision tree algorithm
Technical Field
The invention belongs to the technical field of aluminum melting furnace energy consumption data diagnosis, and particularly relates to an aluminum melting furnace energy consumption abnormity diagnosis method, system, equipment and storage medium for improving a decision tree algorithm.
Background
In the production process of the aluminum profile, the energy consumption in the smelting link is the largest. The abnormal loss of energy consumption in the production process is one of the main causes of energy waste in the smelting link. Therefore, in order to achieve sustainable development of enterprises and realize low-carbon production and green production of the enterprises, the energy consumption of high-energy-consumption equipment (aluminum melting furnace) in the production process of the aluminum profile needs to be monitored, so that the energy consumption in the production process is reduced, and the benefit of the aluminum profile production enterprises is guaranteed.
Data generated by an aluminum melting furnace during normal production or under different failure modes can have different data characteristics. In actual production, the type and cause of the fault are determined by the experience of workers, and corresponding measures are taken to process the fault. The treatment method has extremely low efficiency, and when the treatment is improper, the consumption of energy sources continues to increase, and even serious safety accidents occur. Meanwhile, the traditional abnormity detection mode also increases the labor cost consumption of operation and maintenance units and overhaul units. Therefore, the research of the aluminum melting furnace energy consumption abnormity diagnosis system based on the data mining means has important practical production significance.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides the method, the system, the equipment and the storage medium for diagnosing the energy consumption abnormity of the aluminum melting furnace by improving the decision tree algorithm, so that the energy consumption data of the aluminum melting furnace can be efficiently classified, and the energy consumption data can be conveniently analyzed.
In order to solve the technical problems, the invention adopts the technical scheme that: an aluminum melting furnace energy consumption abnormity diagnosis method for improving a decision tree algorithm comprises the following steps:
s1, data processing: collecting an energy consumption data set of an aluminum melting furnace, and adopting a standard data set UCI; dividing the data set into a training data set and a testing data set;
s2, training a training data set according to a Fayyad-CART algorithm to generate an original decision tree;
s3, pruning the original decision tree generated in the step S2 by using a CCP pruning method;
s4, carrying out accuracy test on the trimmed decision tree by using a test data set, wherein if the trimmed decision tree passes the accuracy test, the trimmed decision tree in the step S3 is a final available decision tree; if not, go back to step S1 to reestablish the decision tree;
s5, real-time data classification: and acquiring energy consumption real-time data of the aluminum melting furnace, extracting classification attributes, classifying the real-time data by using the available decision tree generated in the step S4, and completing diagnosis of the energy consumption data of the aluminum melting furnace.
Further, the Fayyad-CART algorithm specifically comprises the following steps:
s21, preprocessing a data set, and arranging continuous attributes in an ascending order according to the numerical value, wherein the discrete attribute characteristics are kept unchanged;
s22, calculating the optimal segmentation point of the discrete attribute according to a traditional Gini value calculation mode, calculating the boundary point of the continuous attribute according to Fayyad boundary theorem, and determining the optimal segmentation point; comparing and calculating the minimum Gini values corresponding to all the attributes, and finding out the minimum Gini value to determine the segmented attributes;
s23, taking the segmentation attribute obtained in the step S22 as a root node, and taking the optimal segmentation point as the segmentation of the node to form a left branch and a right branch; if the ending condition is not reached, continue the recursive step S22, and continue to generate new subtrees, otherwise, complete the CART tree construction.
Further, for a sample D, the Gini value is calculated using the following formula:
Figure BDA0003120380670000021
in the formula, N represents the number of classifications, pkRepresenting the probability magnitude of the kth class; p is a radical ofk′Indicating the probability size of the kth class, where k' ≠ k.
Furthermore, the Fayyad boundary theorem includes that the optimal segmentation point of continuous attributes is always on the boundary point of different classification categories no matter how many data, attributes or classification categories are in a given training data set; based on the Fayyad boundary theorem, when the optimal segmentation of continuous attributes in a training data set is calculated, only a certain continuous attribute value needs to be sequenced; after sorting the training set D according to the ascending order of the sequential attributes S, if anyTwo adjacent recordings D1,D2The two records belong to two different categories, respectively, and satisfy S (D)1)<S(D2) Let the boundary point be
Figure BDA0003120380670000022
Only the boundary points of different categories need to be compared when selecting the optimal segmentation point.
Further, the step S3 specifically includes:
s31, inputting decision tree T0The average increase rate delta of the initialization error rate is infinite;
s32, calculating the error R (T) of each node from bottom to topt) Number of nodes of each subtree
Figure BDA0003120380670000023
And the current error growth rate
Figure BDA0003120380670000031
Comparing the magnitude of delta with that of g (t), and updating delta to be the minimum value of the two;
s33, traversing internal nodes of the decision tree from top to bottom, and if g (T) is delta, pruning the current nodes to obtain a new tree T;
s34, judging whether T is only composed of root nodes, if not, returning to the step S32 to continue, and otherwise, jumping to the step S35;
s35, selecting an optimal sub-tree from the sub-tree sequence by adopting a cross validation method.
Further, assume that the sub-tree at node T is denoted TtThen the average increase rate of the error rate of each node can be expressed as:
Figure BDA0003120380670000032
wherein R (T) represents the error of the subtree at the node T after being cut, R (T)t) The error before the sub-tree is cut is shown,
Figure BDA0003120380670000033
then representing the number of nodes of the subtree; in the pruning operation, the subtree corresponding to the minimum delta value is taken to be pruned each time.
The invention also provides an aluminum melting furnace energy consumption abnormity diagnosis system based on the improved decision tree algorithm, which comprises the following steps:
a data processing module: the device is used for collecting an energy consumption data set of the aluminum melting furnace and adopting a standard data set UCI; dividing the data set into a training data set and a testing data set;
a decision tree classification module: training a training data set according to a Fayyad-CART algorithm to generate an original decision tree;
CCP pruning module: the method is used for pruning the original decision tree generated by the decision tree classification module by using a CCP pruning method;
a judging module: the decision tree pruning module is used for carrying out accuracy test on the pruned decision tree by using the test data set, and if the pruned decision tree passes the accuracy test, the decision tree generated by the CCP pruning module is a final available decision tree; if not, returning to the data processing module, and reestablishing the decision tree;
a real-time data classification module: the method is used for obtaining the energy consumption real-time data of the aluminum melting furnace, extracting the classification attribute, classifying the real-time data by using the available decision tree generated by the CCP pruning module, and completing the diagnosis of the energy consumption data of the aluminum melting furnace.
Further, the decision tree classification module includes:
a pretreatment unit: the system is used for preprocessing the data set, sequencing continuous attributes in an ascending order according to the numerical value, and keeping the discrete attribute characteristics unchanged;
division point confirmation means: the method is used for calculating the optimal segmentation point of the discrete attribute according to the traditional Gini value calculation mode, calculating the boundary point of the continuous attribute according to the Fayyad boundary theorem and determining the optimal segmentation point; comparing and calculating the minimum Gini values corresponding to all the attributes, and finding out the minimum Gini value to determine the segmented attributes;
a branching unit: the division attribute obtained by the division point confirmation unit is used as a root node, and the optimal division point is used as the division of the node to form a left branch and a right branch; and if the ending condition is not met, continuing returning to the partition point confirming unit, and continuing generating a new sub-tree, otherwise, completing the construction of the CART tree.
The invention also provides a computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the steps of the method described above when executing the computer program.
The invention also provides a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method described above.
Compared with the prior art, the beneficial effects are: the invention provides a method, a system, equipment and a storage medium for diagnosing the energy consumption abnormity of an aluminum melting furnace by improving a decision tree algorithm, and the improved decision tree algorithm can efficiently classify the energy consumption data of the aluminum melting furnace and realize the diagnosis of the data; in addition, the method carries out the establishment of the decision tree based on the Fayyad boundary theorem by analyzing the characteristics of the decision tree algorithm, reduces the time consumed by traversing nodes during the tree establishment, and improves the tree establishment efficiency by trimming the Fayyad-CART by adopting post pruning CCP.
Drawings
FIG. 1 is a basic CART tree description diagram.
FIG. 2 is a schematic flow diagram of a basic CART tree algorithm.
FIG. 3 is a schematic diagram of a decision tree model building process according to the present invention.
FIG. 4 is a schematic diagram of a real-time data classification process according to the present invention.
FIG. 5 is a comparison of the classification accuracy of the algorithm in the embodiment.
FIG. 6 is a comparison diagram of the classification speed of the algorithm in the embodiment.
FIG. 7 is a diagram illustrating an embodiment of an energy consumption pattern recognition tree constructed by applying the improved decision tree algorithm provided by the present invention.
Detailed Description
The drawings are for illustration purposes only and are not to be construed as limiting the invention; for a better understanding of the present embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of the actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted. The positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the invention.
As shown in fig. 3 and 4, a method for diagnosing an abnormal energy consumption of an aluminum melting furnace by improving a decision tree algorithm comprises the following steps:
s1, data processing: collecting an energy consumption data set of an aluminum melting furnace, and adopting a standard data set UCI; dividing the data set into a training data set and a testing data set;
s2, training a training data set according to a Fayyad-CART algorithm to generate an original decision tree;
s3, pruning the original decision tree generated in the step S2 by using a CCP pruning method;
s4, carrying out accuracy test on the trimmed decision tree by using a test data set, wherein if the trimmed decision tree passes the accuracy test, the trimmed decision tree in the step S3 is a final available decision tree; if not, go back to step S1 to reestablish the decision tree;
s5, real-time data classification: and acquiring energy consumption real-time data of the aluminum melting furnace, extracting classification attributes, classifying the real-time data by using the available decision tree generated in the step S4, and completing diagnosis of the energy consumption data of the aluminum melting furnace.
In the invention, aiming at the defects of the basic CART decision tree, an improved CART decision tree algorithm model is provided. On the basis of a basic CART tree, Fayyad boundary point judgment is adopted to select an optimal segmentation threshold value of continuous attributes, time consumed by an algorithm in calculating Gini coefficients for multiple times is reduced, then pruning is carried out on a decision tree, and high error rate caused by the fact that the decision tree is over-fitted is prevented. The specific use of the Fayyad boundary point optimization and pruning method will be described in detail below.
The Fayyad boundary theorem states that the optimal segmentation point for continuous attributes must be at the boundary point of different classification classes regardless of the number of data, attributes, or classification classes in a given training dataset. Thus, for a continuous attribute, its optimal segmentation is positively determined at the boundary point.
Based on the Fayyad boundary theorem, when calculating the optimal segmentation of the continuous attributes in the training data set, only a certain continuous attribute value needs to be sorted, for example, after the training set D is sorted according to the ascending order of the continuous attributes S, if there are two adjacent records D1,D2The two records belong to two different categories, respectively, and satisfy S (D)1)<S(D2) Let the boundary point be
Figure BDA0003120380670000051
Only the boundary points of different categories need to be compared when selecting the optimal segmentation point. For example, if there is an instance data set { x }1, x2,···,x10And assume that the data set is ordered, if { x }1,x2,···,x5Belong to class1The remainder { x6,x7,···,x10Belong to class2Then the CART tree after using the Fayyad boundary theorem no longer needs to compute all values, only class needs to be considered1And class2Boundary point of
Figure BDA0003120380670000052
And (4) finishing.
The selection of the optimal segmentation point of the continuous attribute by using the Fayyad boundary theorem only needs to calculate the kini values at the boundary point, and compared with the basic CART decision tree algorithm which needs to traverse the kini values of all the points, the time is greatly shortened.
The Fayyad-CART algorithm designed based on the Fayyad boundary theorem specifically comprises the following steps:
s21, preprocessing a data set, and arranging continuous attributes in an ascending order according to the numerical value, wherein the discrete attribute characteristics are kept unchanged;
s22, calculating the optimal segmentation point of the discrete attribute according to a traditional Gini value calculation mode, calculating the boundary point of the continuous attribute according to Fayyad boundary theorem, and determining the optimal segmentation point; comparing and calculating the minimum Gini values corresponding to all the attributes, and finding out the minimum Gini value to determine the segmented attributes;
s23, taking the segmentation attribute obtained in the step S22 as a root node, and taking the optimal segmentation point as the segmentation of the node to form a left branch and a right branch; if the ending condition is not reached, continue the recursive step S22, and continue to generate new subtrees, otherwise, complete the CART tree construction.
Wherein, for a sample D, the Gini value is calculated using the following formula:
Figure BDA0003120380670000061
in the formula, N represents the number of classifications, pkRepresenting the probability magnitude of the kth class; p is a radical ofk′Indicating the probability size of the kth class, where k' ≠ k.
If the sample D is divided into two parts according to the value of the feature a, the Gini coefficient is expressed as:
Figure BDA0003120380670000062
in the formula, D1Representing a divided sample D1Number of (2), D2Representing a divided sample D2The number of (2).
In addition, the CART decision tree based on the Fayyad boundary theorem optimizes the operation efficiency of the algorithm, but the accuracy of the algorithm is not improved. Therefore, a CART decision tree algorithm based on pruning method improvement is proposed herein to improve this problem. Noise problems are generally not considered when the CART decision tree generates the tree, learning of a training data set is generally complete fitting, the generated decision tree and the training data are completely fitted, and the completely fitted model is easy to over-fit in actual engineering, so that the generated decision tree is difficult to predict classification of actual data. Pruning can reduce the noise problem on the CART tree, while the simplified CART tree is also more easily understood.
The process of CCP pruning on the CART decision tree mainly comprises two steps, firstly generating a sub-tree sequence { T ] from the original tree according to heuristic rules0,T1,···,TnWhere T isnIs a root node, Ti+1From TiAnd (4) generating. And then selecting an optimal decision tree from the subtree sequence generated in the previous step according to the estimation of the error rate of the tree.
Let the subtree at node T be denoted TtThen the average increase rate of the error rate of each node can be expressed as:
Figure BDA0003120380670000071
wherein R (T) represents the error of the subtree at the node T after being cut, R (T)t) The error before the sub-tree is cut is shown,
Figure BDA0003120380670000072
the number of nodes of the subtree is indicated. In the pruning operation, the subtree corresponding to the minimum delta value is taken to be pruned each time.
Specifically, step S3 includes:
s31, inputting decision tree T0The average increase rate delta of the initialization error rate is infinite;
s32, calculating the error R (T) of each node from bottom to topt) Number of nodes of each subtree
Figure BDA0003120380670000073
And the current error growth rate
Figure BDA0003120380670000074
Comparing the magnitude of delta with that of g (t), and updating delta to be the minimum value of the two;
s33, traversing internal nodes of the decision tree from top to bottom, and if g (T) is delta, pruning the current nodes to obtain a new tree T;
s34, judging whether T is only composed of root nodes, if not, returning to the step S32 to continue, and otherwise, jumping to the step S35;
s35, selecting an optimal sub-tree from the sub-tree sequence by adopting a cross validation method.
Examples
1. Experimental Environment
The experimental environment is Intel (R) core (TM) I7-9750H 2.60GHZ processor, memory 32GB, and the software for model calculation uses Jupyter notewood compiling tool in Anaconda (4.8.3).
2. Experimental data
The data set adopts 8 groups of standard data sets in UCI, and the sample distribution of the data set is shown in Table 1. Experiments the feasibility of the improved CART algorithm was verified by comparing the basic CART algorithm, the C4.5 algorithm, and the improved CART algorithm.
Table 1 UCI standard data set description
Figure BDA0003120380670000075
Figure BDA0003120380670000081
The experiment was trained by selecting 80% of the data sets and the remaining 20% for result verification. The results of the experiment are shown in FIGS. 5 and 6.
As can be seen from fig. 5 and 6, the improved CART algorithm uses the Fayyad boundary theorem, and when processing continuous attributes, it is not necessary to calculate Gini values of all nodes many times, but only boundary points are processed, so that the calculation amount can be reduced, and stable and efficient operation performance is provided for the generation of the decision tree. In the aspect of accuracy, although the performance of various classification algorithms is reduced on a large-scale and multi-attribute complex data set, the classification accuracy of the improved CART decision tree can still be ensured to be more than 83%. Therefore, compared with the traditional classification algorithm C4.5 and the basic CART algorithm, the improved CART decision tree algorithm has higher accuracy and higher operation speed. The invention adopts the improved CART decision tree algorithm to carry out mode judgment and analysis on the clustered energy consumption data.
3. Energy consumption abnormal case analysis based on real-time monitoring system
To enable the formulation of clear energy consumption pattern decision rules, a historical data set X is used1-2The gas data in (1) is exemplified by adding two data characteristics of a time point and a date type to the original data, wherein the date type comprises two attributes of a factory working day and a non-working day according to the data characteristicsIn practiceThe identified energy consumption pattern is added to the original data as a feature of the data, and the partial energy consumption data after adding the three data features is shown in table 2.
TABLE 2 energy consumption data with data characterization
Figure BDA0003120380670000082
Figure BDA0003120380670000091
Fig. 7 shows a decision model for constructing an energy consumption mode by applying an improved decision tree algorithm to the energy consumption data.
As can be seen from FIG. 7, for the historical data X1-2After training, 3 energy consumption modes can be obtained, which is a mode of dividing according to time dimension and is suitable for production of aluminum profiles of current batches and brands. In order to verify the classification accuracy of the improved decision tree, a ten-fold cross-validation method is adopted to randomly divide a data set into ten parts, nine parts are randomly selected as a training set, one part is selected as a testing set, and the obtained error rate is 8.1%. In conclusion, the aluminum melting furnace energy consumption mode judgment based on the improved decision tree algorithm is effective and feasible. And subsequently, for the real-time production data of the aluminum profiles of the same batch and the same brand under the same environment, the energy consumption mode to which the aluminum profiles belong can be accurately identified by utilizing the currently constructed energy consumption mode discrimination tree.
To verify the overall feasibility of the proposed anomaly diagnosis model, the experiment will be compared with the enterprise's original Energy Management System (EMS). Because the abnormal data collected on site are relatively less, the experimental data of the invention adopts the data X collected in 3, 2 and 20203And based on this data set, partial anomaly data is made for checking the correctness of the proposed model. Artificially manufacturing 20 groups of exceptions (excluding original exception data) according to three modes respectively to obtain a brand-new data set X3-1,X3-2,X3-3. The clustering model and the classification model of the proposed algorithm adopt the models designed in section 4.3, and the algorithm model operates in a service calling mode and is operated by X1-2The data set is input as a historical data set, and the algorithmic model proposed herein is trained to X3-1,X3-2,X3-3The data set is used as real-time energy consumption data simulation real-time input, and the abnormal conditions detected by a traditional EMS system (traditional) and a dynamic model (new) in the text are shown in FIG. 8.
As can be seen from fig. 8, the number of detections in the conventional EMS system is far greater than that in the dynamic model proposed herein, the number of abnormal alarms is too many, and the detection accuracy of the system is low. In the traditional EMS system, the set threshold range is too large, manual experience is excessively relied on, some data with fuzzy boundaries cannot be well processed, and excessive alarming causes great resource waste. The dynamic model has high detection accuracy, the model is combined with the traditional EMS system, the idea of manual threshold setting in the original system is adopted, the threshold range is set to be smaller, and fuzzy data of the boundary are subjected to operation analysis by an algorithm in a quantitative analysis model, so the detection result is more reasonable.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.
It should be understood that the above-described examples are merely illustrative for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (10)

1. An aluminum melting furnace energy consumption abnormity diagnosis method for improving a decision tree algorithm is characterized by comprising the following steps:
s1, data processing: collecting an energy consumption data set of the aluminum melting furnace, and dividing the data into a training data set and a testing data set;
s2, training a training data set according to a Fayyad-CART algorithm to generate an original decision tree;
s3, pruning the original decision tree generated in the step S2 by using a CCP pruning method;
s4, carrying out accuracy test on the trimmed decision tree by using a test data set, wherein if the trimmed decision tree passes the accuracy test, the trimmed decision tree in the step S3 is a final available decision tree; if not, returning to the step S1 to reestablish the decision tree;
s5, real-time data classification: and acquiring energy consumption real-time data of the aluminum melting furnace, extracting classification attributes, and classifying the real-time data by using the available decision tree generated in the step S4 to finish diagnosis of the energy consumption data of the aluminum melting furnace.
2. The method for diagnosing the abnormal energy consumption of the aluminum melting furnace by improving the decision tree algorithm as claimed in claim 1, wherein the Fayyad-CART algorithm specifically comprises the following steps:
s21, preprocessing a data set, and arranging continuous attributes in an ascending order according to the numerical value, wherein the discrete attribute characteristics are kept unchanged;
s22, calculating the optimal segmentation point of the discrete attribute according to a traditional Gini value calculation mode, calculating the boundary point of the continuous attribute according to Fayyad boundary theorem, and determining the optimal segmentation point; comparing and calculating the minimum Gini values corresponding to all the attributes, and finding out the minimum Gini value to determine the segmented attributes;
s23, taking the segmentation attribute obtained in the step S22 as a root node, and taking the optimal segmentation point as the segmentation of the node to form a left branch and a right branch; if the ending condition is not reached, continue the recursive step S22, and continue to generate new subtrees, otherwise, complete the CART tree construction.
3. The method of claim 2, wherein for a sample D, the Gini value is calculated by the following formula:
Figure FDA0003120380660000011
in the formula, N represents the number of classifications, pkRepresenting the probability magnitude of the kth class; p is a radical ofk′Indicates the probability size of the kth class, where k' ≠ k.
4. The method for diagnosing the abnormal energy consumption of the aluminum melting furnace by improving the decision tree algorithm as claimed in claim 2, wherein the Fayyad boundary theorem includes that the optimal division point of continuous attributes is always on the boundary points of different classification categories no matter how much the number of data, the attributes or the classification categories are in a given training data set; based on the Fayyad boundary theorem, when the optimal segmentation of continuous attributes in a training data set is calculated, only a certain continuous attribute value needs to be sequenced; after the training set D is sorted in ascending order according to the continuous attribute S, if there are two adjacent records D1,D2The two records belong to two different categories, respectively, and satisfy S (D)1)<S(D2) Let the boundary point be
Figure FDA0003120380660000021
Only the boundary points of different categories need to be compared when selecting the optimal segmentation point.
5. The method for diagnosing the abnormal energy consumption of the aluminum melting furnace by improving the decision tree algorithm as claimed in claim 2, wherein the step S3 specifically comprises:
s31, inputting decision tree T0The average increase rate delta of the initialization error rate is infinite;
s32, calculating the error R (T) of each node from bottom to topt) Number of nodes of each subtree
Figure FDA0003120380660000022
And the current error growth rate
Figure FDA0003120380660000023
Comparing the magnitude of delta with that of g (t), and updating delta to be the minimum value of the two;
s33, traversing internal nodes of the decision tree from top to bottom, and if g (T) is delta, pruning the current nodes to obtain a new tree T;
s34, judging whether T is only composed of root nodes, if not, returning to the step S32 to continue, otherwise, jumping to the step S35;
s35, selecting an optimal sub-tree from the sub-tree sequence by adopting a cross validation method.
6. The method of claim 5, wherein the sub-tree at the node T is assumed to be denoted TtThen the average increase rate of the error rate of each node can be expressed as:
Figure FDA0003120380660000024
wherein R (T) represents the error of the subtree at the node T after being cut, R (T)t) The error before the sub-tree is cut is shown,
Figure FDA0003120380660000025
then the node representing the sub-treeThe number of the cells; in the pruning operation, the sub-tree corresponding to the minimum delta value is taken to be pruned each time.
7. An aluminum melting furnace energy consumption abnormity diagnosis system for improving a decision tree algorithm is characterized by comprising the following steps:
a data processing module: the device is used for collecting an energy consumption data set of the aluminum melting furnace and adopting a standard data set UCI; dividing the data set into a training data set and a testing data set;
a decision tree classification module: the system is used for training a training data set according to a Fayyad-CART algorithm to generate an original decision tree;
CCP pruning module: the method is used for pruning the original decision tree generated by the decision tree classification module by using a CCP pruning method;
a judging module: the decision tree pruning module is used for carrying out accuracy test on the pruned decision tree by using the test data set, and if the pruned decision tree passes the accuracy test, the decision tree generated by the CCP pruning module is a final available decision tree; if not, returning to the data processing module, and reestablishing the decision tree;
a real-time data classification module: the method is used for obtaining energy consumption real-time data of the aluminum melting furnace, extracting classification attributes, classifying the real-time data by using an available decision tree generated by the CCP pruning module, and completing diagnosis of the energy consumption data of the aluminum melting furnace.
8. The system for diagnosing the abnormal energy consumption of the aluminum melting furnace by improving the decision tree algorithm as claimed in claim 7, wherein the decision tree classification module comprises:
a pretreatment unit: the system is used for preprocessing the data set, and arranging the continuous attributes in an ascending order according to the numerical value, wherein the discrete attribute characteristics are kept unchanged;
division point confirmation means: the method is used for calculating the optimal segmentation point of the discrete attribute according to the traditional Gini value calculation mode, calculating the boundary point of the continuous attribute according to the Fayyad boundary theorem and determining the optimal segmentation point; comparing and calculating the minimum Gini values corresponding to all the attributes, and finding out the minimum Gini value to determine the segmented attributes;
a branching unit: the division attribute obtained by the division point confirmation unit is used as a root node, and the optimal division point is used as the division of the node to form a left branch and a right branch; and if the ending condition is not met, continuing returning to the partition point confirming unit, and continuing generating a new sub-tree, otherwise, completing the construction of the CART tree.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 6 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.
CN202110675134.3A 2021-06-17 2021-06-17 Aluminum melting furnace energy consumption abnormity diagnosis method, system and equipment with improved decision tree algorithm Pending CN113505818A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110675134.3A CN113505818A (en) 2021-06-17 2021-06-17 Aluminum melting furnace energy consumption abnormity diagnosis method, system and equipment with improved decision tree algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110675134.3A CN113505818A (en) 2021-06-17 2021-06-17 Aluminum melting furnace energy consumption abnormity diagnosis method, system and equipment with improved decision tree algorithm

Publications (1)

Publication Number Publication Date
CN113505818A true CN113505818A (en) 2021-10-15

Family

ID=78010072

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110675134.3A Pending CN113505818A (en) 2021-06-17 2021-06-17 Aluminum melting furnace energy consumption abnormity diagnosis method, system and equipment with improved decision tree algorithm

Country Status (1)

Country Link
CN (1) CN113505818A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102289585A (en) * 2011-08-15 2011-12-21 重庆大学 Real-time monitoring method for energy consumption of public building based on data mining
CN106250905A (en) * 2016-07-08 2016-12-21 复旦大学 A kind of real time energy consumption method for detecting abnormality of combination colleges and universities building structure feature
CN107301513A (en) * 2017-06-27 2017-10-27 上海应用技术大学 Bloom prealarming method and apparatus based on CART decision trees
CN111161095A (en) * 2019-12-16 2020-05-15 南京松数科技有限公司 Method for detecting abnormal consumption of building energy
CN111191712A (en) * 2019-12-27 2020-05-22 浙江工业大学 Printing and dyeing setting machine energy consumption classification prediction method based on gradient lifting decision tree
US20200183769A1 (en) * 2018-12-10 2020-06-11 Vmware, Inc. Methods and systems that detect and classify incidents and anomolous behavior using metric-data observations
US20200242493A1 (en) * 2019-01-30 2020-07-30 International Business Machines Corporation Operational energy consumption anomalies in intelligent energy consumption systems

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102289585A (en) * 2011-08-15 2011-12-21 重庆大学 Real-time monitoring method for energy consumption of public building based on data mining
CN106250905A (en) * 2016-07-08 2016-12-21 复旦大学 A kind of real time energy consumption method for detecting abnormality of combination colleges and universities building structure feature
CN107301513A (en) * 2017-06-27 2017-10-27 上海应用技术大学 Bloom prealarming method and apparatus based on CART decision trees
US20200183769A1 (en) * 2018-12-10 2020-06-11 Vmware, Inc. Methods and systems that detect and classify incidents and anomolous behavior using metric-data observations
US20200242493A1 (en) * 2019-01-30 2020-07-30 International Business Machines Corporation Operational energy consumption anomalies in intelligent energy consumption systems
CN111161095A (en) * 2019-12-16 2020-05-15 南京松数科技有限公司 Method for detecting abnormal consumption of building energy
CN111191712A (en) * 2019-12-27 2020-05-22 浙江工业大学 Printing and dyeing setting machine energy consumption classification prediction method based on gradient lifting decision tree

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
刘云翔等: "基于改进CART决策树建立水华预警模型", 《中国农村水利水电》, no. 1, 15 January 2018 (2018-01-15), pages 26 - 28 *
张亮等: "CART决策树的两种改进及应用", 《计算机工程与设计》, vol. 36, no. 5, 16 May 2015 (2015-05-16), pages 1209 - 1213 *
王乐: "基于数据挖掘的铝电解过程槽电压智能优化控制策略研究", 《中国优秀硕士学位论文全文数据库(工程科技Ⅰ辑)》, no. 02, 15 February 2018 (2018-02-15), pages 023 - 120 *

Similar Documents

Publication Publication Date Title
CN106483947A (en) Distribution Running State assessment based on big data and method for early warning
US20210041862A1 (en) Malfunction early-warning method for production logistics delivery equipment
CN108345544B (en) Software defect distribution influence factor analysis method based on complex network
CN107918830B (en) Power distribution network running state evaluation method based on big data technology
CN110569867A (en) Decision tree algorithm-based power transmission line fault reason distinguishing method, medium and equipment
CN109977535A (en) A kind of line loss abnormality diagnostic method, device, equipment and readable storage medium storing program for executing
CN110335168B (en) Method and system for optimizing power utilization information acquisition terminal fault prediction model based on GRU
CN105701596A (en) Method for lean distribution network emergency maintenance and management system based on big data technology
CN106204330A (en) A kind of power distribution network intelligent diagnosis system
CN110750524A (en) Method and system for determining fault characteristics of active power distribution network
CN110135716B (en) Power grid infrastructure project dynamic early warning identification method and system
CN110826237B (en) Wind power equipment reliability analysis method and device based on Bayesian belief network
CN115508672B (en) Power grid main equipment fault tracing reasoning method, system, equipment and medium
CN110570012A (en) Storm-based power plant production equipment fault early warning method and system
CN112529053A (en) Short-term prediction method and system for time sequence data in server
CN111695666A (en) Wind power ultra-short term conditional probability prediction method based on deep learning
CN114118588A (en) Peak-facing summer power failure prediction method based on game feature extraction under clustering undersampling
CN110781206A (en) Method for predicting whether electric energy meter in operation fails or not by learning meter-dismantling and returning failure characteristic rule
CN116432123A (en) Electric energy meter fault early warning method based on CART decision tree algorithm
CN114548494A (en) Visual cost data prediction intelligent analysis system
CN113030633B (en) GA-BP neural network-based power distribution network fault big data analysis method and system
CN113469252A (en) Extra-high voltage converter valve operation state evaluation method considering unbalanced samples
CN110348005B (en) Distribution network equipment state data processing method and device, computer equipment and medium
CN112819208A (en) Spatial similarity geological disaster prediction method based on feature subset coupling model
CN109635008B (en) Equipment fault detection method based on machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20211015

RJ01 Rejection of invention patent application after publication