CN117574180B - Fuel production and emission system data correlation control management system - Google Patents
Fuel production and emission system data correlation control management system Download PDFInfo
- Publication number
- CN117574180B CN117574180B CN202410065598.6A CN202410065598A CN117574180B CN 117574180 B CN117574180 B CN 117574180B CN 202410065598 A CN202410065598 A CN 202410065598A CN 117574180 B CN117574180 B CN 117574180B
- Authority
- CN
- China
- Prior art keywords
- data
- index
- boundary
- analysis module
- correlation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 239000000446 fuel Substances 0.000 title claims abstract description 22
- 238000004519 manufacturing process Methods 0.000 title claims abstract description 17
- 238000004458 analytical method Methods 0.000 claims abstract description 67
- 230000011218 segmentation Effects 0.000 claims abstract description 56
- 230000008859 change Effects 0.000 claims abstract description 52
- 230000000694 effects Effects 0.000 claims abstract description 45
- 238000010219 correlation analysis Methods 0.000 claims abstract description 42
- 238000005259 measurement Methods 0.000 claims abstract description 26
- 238000000034 method Methods 0.000 claims abstract description 26
- 238000007726 management method Methods 0.000 claims abstract description 15
- 238000010801 machine learning Methods 0.000 claims abstract description 12
- 238000005516 engineering process Methods 0.000 claims abstract description 9
- 230000008569 process Effects 0.000 claims description 17
- 238000012800 visualization Methods 0.000 claims description 13
- 238000003860 storage Methods 0.000 claims description 10
- 238000010248 power generation Methods 0.000 claims description 9
- 230000000737 periodic effect Effects 0.000 claims description 7
- 238000010586 diagram Methods 0.000 claims description 5
- 239000006185 dispersion Substances 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 3
- 238000007405 data analysis Methods 0.000 abstract description 2
- 230000002349 favourable effect Effects 0.000 abstract 1
- 238000009826 distribution Methods 0.000 description 8
- 238000004364 calculation method Methods 0.000 description 7
- 238000012352 Spearman correlation analysis Methods 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 4
- 238000004590 computer program Methods 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- MWUXSHHQAYIFBG-UHFFFAOYSA-N Nitric oxide Chemical compound O=[N] MWUXSHHQAYIFBG-UHFFFAOYSA-N 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000036626 alertness Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000012098 association analyses Methods 0.000 description 1
- 238000012097 association analysis method Methods 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000002485 combustion reaction Methods 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000001627 detrimental effect Effects 0.000 description 1
- 230000001747 exhibiting effect Effects 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012806 monitoring device Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000013486 operation strategy Methods 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 239000002351 wastewater Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
- G06F18/2113—Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Testing And Monitoring For Control Systems (AREA)
Abstract
The invention discloses a data relevance control management system of a fuel production and emission system, in particular to the field of data analysis, which is characterized in that a fluctuation change index and a boundary effect influence degree index are extracted from a data set corresponding to each index by making a segmentation rule, and a feasible measurement coefficient is formed by using a machine learning technology. The method is favorable for determining the optimal segmentation rule suitable for the correlation analysis of the indexes of the thermal power plant, and improves the accuracy and the interpretability of the analysis. Meanwhile, a segmentation rule which is judged to be an available signal is adopted to segment, rank and calculate the Szechwan correlation of the data set, the analysis efficiency is improved on the premise of guaranteeing the analysis precision, the new segmentation rule ensures that each small segment of data has similar characteristics, the relations of different data subsets are captured more finely, the average Szechwan correlation of each small segment is taken to provide a more robust association relation measure, the dependence on individual small segments is reduced, and the risk of misleading conclusions is reduced.
Description
Technical Field
The present invention relates to the field of data analysis, and more particularly, to a fuel production and emission system data correlation control management system.
Background
The production process of a power plant is essentially a process in which energy is continuously converted, and there is a correlation between parameters. Correlation analysis may discover the relevance of certain attributes of the large amount of operational data. The association analysis method has the advantages of strong objectivity and comprehensive analysis level, is better for processing complex systems with various data structures and ambiguous information, and is suitable for analyzing big data of the power plant. And analyzing massive operation data and operation parameters by utilizing correlation analysis, mining the association relation among the related parameter values, and mining the association relation in the operation data of the power plant by utilizing the correlation analysis, thereby having important significance for managing and operating the power station. Through deep knowledge of the association relation among the parameters, the management layer can grasp the internal rule of the production process more accurately, is beneficial to optimizing the operation strategy, improving the energy conversion efficiency and continuously improving the system performance. In addition, the association analysis is also beneficial to finding potential problems in advance, optimizing maintenance plans and reducing equipment failure risks. Therefore, the large data of the power plant is deeply mined by utilizing the Szelman correlation analysis, and the method has important significance for improving the operation efficiency of the power plant, reducing the cost and guaranteeing the stability of power supply.
In practice, there are some potential limitations of the conventional correlation analysis method, for example, for the spearman correlation analysis, the spearman correlation analysis belongs to a non-parameter statistical method, and has the advantages that there is no specific requirement on the distribution of the original variables, the data error and the extreme value reaction are insensitive, the difference between actual values has no direct influence on the calculation result, the application range is wider, but the sorting is performed first, which is a relatively time-consuming operation, especially when the data volume is huge, the calculation efficiency is not high enough, therefore, in the prior art, some start dividing the data into smaller segments according to the formulated segmentation rule, calculating the sorting and spearman correlation of each segment respectively, and then re-integrating the result. In many cases, this operation reduces the computational complexity in a relatively short time, but the computational accuracy is susceptible to interference from several conditions:
if the data varies significantly across the range, segmenting the data may result in inaccurate ordering and correlation results in certain segments. This may be more pronounced in the case of non-uniform data distribution.
When computing within each segment, the data of the boundary may be only partially affected, which may lead to inaccuracy of the computation results at the boundary. In particular, if the relationship of the variables varies widely across the data set, there may be greater uncertainty at the boundary.
In order to solve the above problems, a technical solution is now provided.
Disclosure of Invention
In order to overcome the defects in the prior art, the embodiment of the invention provides that the feasible measurement coefficient is formed by a machine learning technology by specifying a segmentation rule, extracting a fluctuation change index and a boundary effect influence degree index from a data set corresponding to each index. The process is helpful for determining the optimal segmentation rule suitable for the correlation analysis of the indexes of the thermal power plant, and the accuracy and the interpretability of the analysis are improved. Meanwhile, a segmentation rule which is judged as an available signal is adopted to segment, rank and calculate the Szelman correlation of the data set, so that the efficiency and the robustness of correlation analysis are improved, and the analysis efficiency is improved on the premise of guaranteeing the analysis precision. The new segmentation rules ensure that each small piece of data has similar characteristics, and more finely captures the relationship of different data subsets. Taking the average spearman correlation for each segment provides a more robust correlation metric, reduces the dependence on individual segments, reduces the risk of misleading conclusions, and solves the problems presented in the background art.
In order to achieve the above purpose, the present invention provides the following technical solutions:
comprising the following steps: the system comprises a data acquisition module, a change degree analysis module, a boundary influence analysis module, a comprehensive analysis module, a correlation analysis module and a result visualization module;
the data acquisition module counts each index, invokes the source data of each index, gathers the source data of each index into the analysis pool for storage, gathers the periodic data set corresponding to each index, and transmits the data set to the change degree analysis module and the boundary influence analysis module;
the change degree analysis module segments the data set, calculates the fluctuation and the data dispersion degree of the data set, obtains the fluctuation change index of the data set in the whole range, and sends the fluctuation change index to the comprehensive analysis module;
the boundary influence analysis module segments the data set, calculates the difference degree of the mean value inside and outside the boundary, analyzes and defines the average level difference between the data subsets under the boundary condition, obtains the boundary effect influence degree index, and sends the boundary effect influence degree index to the comprehensive analysis module;
the comprehensive analysis module collects output data of the change degree analysis module and the boundary influence analysis module, analyzes the model through a machine learning technology framework, outputs a feasible measurement coefficient, judges a segmentation rule according to the feasible measurement coefficient, and sends a judgment result to the data screening module;
the correlation analysis module determines a segmentation rule according to the judging result, divides the data set into smaller segments, calculates the sequence and the Spekerman correlation of each segment respectively, integrates the results to obtain the association relation result of each index, and sends the association relation result of each index to the result visualization module;
the result visualization module is used for visualizing the output content of the correlation analysis module.
In a preferred embodiment, the operation of the data acquisition module comprises the following:
extracting corresponding data from data sources of fuel indexes, boiler operation indexes, emission indexes and power generation indexes, summarizing original data of each index according to time stamps, dividing the data according to unit time and cycle time, setting time segment identifiers, organizing the data of each index into continuous unit time data sets according to the same identifiers, and further merging the continuous unit time data sets into periodic data sets to obtain the data sets with comparability corresponding to each index.
In a preferred embodiment, the operation of the change degree analysis module includes the following:
step one, a data set is called, and the data set is divided according to a set segmentation rule;
calculating the ratio of the difference between the maximum value and the minimum value in each segment to the minimum value to obtain the relative difference of each segment;
step three, taking an average value of the relative differences of all the sections;
and step four, summarizing the relative differences of all the segments, calculating to obtain the standard deviation of the relative differences, and calculating the ratio of the standard deviation of the relative differences to the overall difference to obtain the fluctuation change index.
In a preferred embodiment, the operation of the boundary-effect analysis module includes the following:
step one, a data set is called, and the data set is divided into a plurality of subsets according to a formulated segmentation rule to form a boundary;
calculating the average value of the data in each data subset to obtain the average value in the boundary and the average value outside the boundary, and calculating the ratio of the difference value of the average value inside the boundary and the average value outside the boundary to obtain the average value difference degree;
step three, taking an average value of the average value difference degrees to obtain a comprehensive average value difference degree;
and step four, summarizing the mean value difference degree of all the segments, calculating to obtain a mean value difference degree standard deviation, and then calculating the ratio of the mean value difference degree standard deviation to the comprehensive mean value difference degree to obtain a boundary effect influence degree index.
In a preferred embodiment, the operation of the analysis-by-synthesis module comprises the following:
summarizing the fluctuation change index and the boundary effect influence degree index, and distributing corresponding weights for the fluctuation change index and the boundary effect influence degree index by using a machine learning technology; and combining the fluctuation change index and the boundary effect influence degree index with the corresponding weight values, and carrying out weighted summation to obtain a feasible measurement coefficient.
In a preferred embodiment, the feasible measurement coefficient is compared with the classification threshold value, and if the feasible measurement coefficient is greater than or equal to the classification threshold value, an adjustment signal is generated; if the feasible measurement coefficient is smaller than the classification threshold value, a usable signal is generated.
In a preferred embodiment, the operation of the correlation analysis module comprises the following:
the segmentation rule of the determined available signals is used, the data set is divided into smaller segments, the data of each small segment is ranked, the data is ordered according to the value of each index, the ranking is used for replacing the original value, then the spearman correlation among the indexes in each segment is calculated, the correlation coefficient of each small segment is obtained, and the spearman correlation coefficient obtained by each small segment is averaged to obtain the average association relation among the indexes.
In a preferred embodiment, the operation of the result visualization module comprises the following:
based on the average association between the individual indicators, a thermodynamic diagram is used to demonstrate the correlation results between the fuel indicator, the boiler operating indicator, the emissions indicator, and the power generation indicator.
The fuel production and emission system data relevance control management system has the technical effects and advantages that:
1. according to the specified segmentation rule, the fluctuation change index and the boundary effect influence degree index are extracted from the periodic data set corresponding to each index. For the fluctuation index, the relative difference of the data inside each data segment is calculated, and the fluctuation condition of the data inside each segment is measured by comparing the percentage of the difference between the maximum value and the minimum value relative to the minimum value. The boundary effect impact level index evaluates the change in average level by calculating the degree of mean difference within each subset of data, and comparing the ratio of the difference of the mean inside and outside the boundary to the mean outside the boundary. Then, a weight is distributed to the fluctuation change index and the boundary effect influence degree index by using a machine learning technology, so that a feasible measurement coefficient is formed. By applying the feasible measurement coefficients to evaluate the effectiveness of adopting the specific segmentation rules, the optimal segmentation rules suitable for the efficient correlation analysis of various indexes of the thermal power plant are determined. The accuracy and the interpretability of analysis are improved, so that the segmentation rule is more fit with the actual data characteristics, and the requirements of a thermal power plant are better met.
2. The method for segmenting, ranking and calculating the spearman correlation of the data set by using the segmentation rule of the signal judged to be available is beneficial to improving the efficiency of analyzing the correlation while guaranteeing the accuracy of the correlation analysis. First, the new segmentation rules ensure that the data within each small segment has similar characteristics, representing a relatively stable subset of data. Facilitating finer capturing of relationships between different subsets of data rather than unified correlation analysis across the entire data set. Second, by ranking and calculating the spearman correlation for each small piece of data, non-linear relationships and outliers can be handled more efficiently because spearman correlation is not sensitive to specific values of the data, but focuses on sequential relationships between the data. This helps to reduce the influence of outliers on the correlation analysis results, improving the robustness of the analysis. Finally, taking the average value of the spearman correlation coefficient obtained by each small segment can provide a more robust association relation measure. The average association relation is more representative, so that the correlation of different small segments is integrated, and the excessive dependence on individual small segments is reduced. This helps to reduce misleading conclusions due to a particular situation of a small segment.
Drawings
FIG. 1 is a schematic diagram of a data correlation control management system for a fuel production and exhaust system according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
FIG. 1 illustrates a fuel production emissions system data correlation control management system of the present invention, comprising: the system comprises a data acquisition module, a change degree analysis module, a boundary influence analysis module, a comprehensive analysis module, a correlation analysis module and a result visualization module;
the data acquisition module counts each index, invokes the source data of each index, gathers the source data of each index into the analysis pool for storage, gathers the data set corresponding to the periodicity of each index, and transmits the data set to the change degree analysis module and the boundary influence analysis module;
the change degree analysis module segments the data set, calculates the fluctuation and the data dispersion degree of the data set, obtains the fluctuation change index of the data set in the whole range, and sends the fluctuation change index to the comprehensive analysis module;
the boundary influence analysis module segments the data set, calculates the difference degree of the mean value inside and outside the boundary, analyzes and defines the average level difference between the data subsets under the boundary condition, obtains the boundary effect influence degree index, and sends the boundary effect influence degree index to the comprehensive analysis module;
the comprehensive analysis module collects output data of the change degree analysis module and the boundary influence analysis module, analyzes the model through a machine learning technology framework, outputs a feasible measurement coefficient, judges a segmentation rule according to the feasible measurement coefficient, and sends a judgment result to the data screening module;
the correlation analysis module determines a segmentation rule according to the judging result, divides the data set into smaller segments, calculates the sequence and the Spekerman correlation of each segment respectively, integrates the results to obtain the association relation result of each index, and sends the association relation result of each index to the result visualization module;
the result visualization module is used for visualizing the output content of the correlation analysis module.
The operation process of the data acquisition module comprises the following steps:
and (3) extracting and acquiring the corresponding data from the data source of the fuel index. Acquisition is performed using corresponding sensors, metrology equipment, or other data acquisition devices. The collected data comprise key information such as fuel type, consumption, duty ratio and the like;
extracting related data from a data source of a boiler operation index, wherein the related data comprise parameters such as boiler temperature, pressure, combustion efficiency and the like, and acquiring the parameters by using corresponding sensors;
emission indexes including nitrogen oxide concentration, exhaust emission, waste water emission and the like are obtained. To monitoring devices, sensors or online monitoring systems. The timeliness and the accuracy of the data are ensured, and the problems of sampling frequency, data precision and the like are required to be processed;
and extracting data from a data source of the power generation index, wherein the data comprises information such as power generation amount, power generation efficiency and the like. For different types of power generation equipment, a specific data extraction mode is required to ensure the integrity and reliability of data.
Summarizing the original data of each index according to the marks of the time stamps, summarizing the original data of each index according to the time stamps, and dividing the data according to unit time and cycle time. Setting a time segment identifier, organizing the data of each index into continuous unit time data sets according to the same identifier, and further merging the continuous unit time data sets into periodic data sets to obtain the comparable data sets corresponding to each index. This process aims to ensure time consistency and comparability of the data so that the data of different indicators can be effectively compared and analyzed within the same time period.
The operation process of the change degree analysis module comprises the following steps:
segmentation of data when the data exhibits significant variation throughout the range may result in inaccurate ordering and correlation results in certain segments, especially in the case of non-uniform data distribution. Uneven distribution means that there may be large differences in density and distribution characteristics of the data over different intervals. In this case, simply dividing the data into segments may cause the number of samples in some of the intervals to be too small or too large, thereby affecting the accuracy of the ordering and the stability of the correlation.
In particular, if the data changes more strongly in one interval and less in another interval, splitting the data may result in a greater disturbance of the ordering and correlation results in the strongly changing interval. This may mask or exaggerate the true relationship of the data, distorting the analysis results. It is therefore important to ensure that the analysis of the data is able to take into account the overall trend and relationship, avoiding unilateral or local conclusions, especially in the case of non-uniform data changes. Finer approaches in the analysis, such as segmentation based on data density or more flexible analysis techniques, are needed to ensure that features and trends of the data can be accurately captured within each interval. Such analysis is more helpful in revealing the true structure of the data, improving the accuracy of understanding and interpretation of complex data distributions.
Step one, a data set is called, and the data set is divided according to a set segmentation rule;
and step two, calculating the relative difference of the data in each segment. The relative difference is measured by comparing the ratio of the difference between the maximum and minimum values within each segment to the minimum value. This reflects the fluctuating situation of the data within each segment; namely:;
the calculation of this value represents the relative change of the data within each segment.
And thirdly, integrating the relative differences of the segments to calculate the overall difference. Taking the average value of the relative differences of all the segments, namely:;
this step provides a comprehensive assessment of the overall dataset wave conditions.
And step four, summarizing the relative differences of all the segments, calculating to obtain the standard deviation of the relative differences, and calculating the ratio of the standard deviation of the relative differences to the overall difference to obtain the fluctuation change index.
The fluctuation change index is used to represent and express the situation in which the set data shows a significant change over the whole range. The index is obtained by dividing the data set according to a set segmentation rule, calculating relative difference in each segment and integrating the relative differences. Specifically: if the fluctuation change index is large, it means that the data exhibits significant fluctuation and change over the whole range. Meaning that there are larger peaks and valleys in the dataset and the magnitude of the data change is greater in the different segments. If the fluctuation index is small, it means that the fluctuation of the data is relatively smooth over the whole range. Indicating that the relative differences in data within the individual segments are small, the overall exhibiting a smoother trend.
The operation process of the boundary influence analysis module comprises the following steps:
the significance of analyzing segment boundaries in spearman correlation analysis is reflected in the full understanding of the data set characteristics and the assurance of the accuracy of the results. The spearman correlation analysis is used as a non-parameter statistical method, and has the advantages that no specific requirement is required for the distribution of original variables, the data errors and extreme values are insensitive, the difference between actual values does not influence the calculation result, and the application range is wide. However, to increase computational efficiency, the data is sometimes analyzed in smaller segments. In this case, the importance of the analysis segment boundary appears. When computing within each segment, the data of the boundary may be only partially affected, especially when the relationship of the variables varies widely across the data set, there may be greater uncertainty at the boundary. Such uncertainty may lead to inaccuracy in the calculation at the boundary, thereby affecting the final correlation analysis result. Thus, by delving into the characteristics of the dataset at the segment boundaries, the extent of boundary effects can be better understood, helping to identify potential sources of uncertainty. This helps researchers evaluate the interrelationship of data more accurately, ensuring reliable and consistent results throughout the analysis. This underscores the importance of full knowledge of the data set characteristics and alertness to potential uncertainties when employing a segmentation analysis strategy.
Step one, a data set is called, and the data set is divided into a plurality of subsets according to a formulated segmentation rule to form a boundary;
step two, calculating a mean value of the data in each data subset to obtain a mean value in the boundary and a mean value outside the boundary, wherein the specific calculation formula is as follows:
;
;
the average difference degree is obtained by calculating the ratio of the difference value of the average value inside and outside the boundary to the average value outside the boundary, and the average level difference inside and outside the boundary is measured by the average difference degree;;
the mean degree of difference represents the mean level difference between the data subsets under defined boundary conditions;
step three, taking an average value of the average value difference degrees to obtain a comprehensive average value difference degree;
and step four, summarizing the mean value difference degree of all the segments, calculating to obtain a mean value difference degree standard deviation, and calculating the ratio of the mean value difference degree standard deviation to the comprehensive mean value difference degree to obtain a boundary effect influence degree index.
The boundary effect impact level index is used to measure the average level difference between subsets of data under defined segment boundary conditions. Specifically: the boundary effect influence degree index reflects the relative difference degree between the average values of the data subsets under different boundary conditions; the greater the boundary effect impact level index, the more significant the difference between the mean values of the data subsets at the boundary. This may indicate that the data exhibits greater instability and fluctuations near the boundary, reflecting that the boundary conditions have a more pronounced impact on the data mean; conversely, if the boundary effect impact level index is smaller, it is indicated that the mean difference at the boundary is relatively smaller. This may indicate that the average level of the data varies more consistently across the boundary and that the boundary conditions have a more gradual effect on the average.
Influence on correlation analysis: the boundary effect impact level index provides information about the mean change of the data under boundary conditions. In the correlation analysis, if the mean value difference of the data near the boundary is large, the interpretation of the correlation is easily affected. Because correlations typically look at trends between variables, mean differences at boundaries tend to introduce additional noise, affecting the interpretation of the correlations.
The greater the extent of influence index of the boundary effect, the more careful interpretation of the correlation results or difficulty in interpretation may be required, which is detrimental to the analysis of the correlation, particularly when the influence of the boundary condition on the data is significant.
The operation process of the comprehensive analysis module comprises the following steps:
summarizing the fluctuation change index and the boundary effect influence degree index, and normalizing the fluctuation change index and the boundary effect influence degree index to ensure that the two indexes have similar numerical ranges; selecting an appropriate machine learning model, such as linear regression, decision tree, random forest and the like, as a model for exponential weight distribution;
using a set of all sample data including a fluctuation variation index and a boundary effect influence degree index, dividing the set into a training set and a testing set, and training and verifying a model;
training a machine learning model by using a training set, wherein the input characteristics are a fluctuation change index and a boundary effect influence degree index, and the output is weight corresponding to the fluctuation change index and the boundary effect influence degree index;
verifying the trained model by using the test set, and evaluating the performance of the model;
acquiring weights corresponding to fluctuation change indexes and boundary effect influence degree indexes output by a machine learning model;
combining the fluctuation change index and the boundary effect influence degree index with the corresponding weight values and carrying out weighted summation to obtain a feasible measurement coefficient, for example, the feasible measurement coefficient can be calculated by the following formula:
;
in the method, in the process of the invention,is a viable metric coefficient, +.>And->The fluctuation change index and the boundary effect influence degree index, respectively, < >>And->Weight values of the fluctuation index and the boundary effect influence degree index, respectively, and +.>And->Are all greater than 0.
The feasible measurement coefficient is used for reflecting and representing the interference degree of the set segmentation rule on the correlation analysis precision, and the larger the feasible measurement coefficient is, the larger the interference degree of the segmentation rule on the correlation analysis is. This means that the data set exhibits more complex fluctuations and boundary effects under the set segmentation rules, so that the result of the correlation analysis is greatly affected.
The smaller the viable metric coefficient, the less disturbing the correlation analysis by the segmentation rules. The method has the advantages that the data set is relatively stable under the set segmentation rule, fluctuation is gentle, influence of boundary effect on correlation analysis is limited, influence of the current segmentation rule on accuracy of subsequent correlation analysis is limited, and the segmentation rule is reasonable.
The feasible measure coefficients are compared to a classification threshold,
if the feasible measurement coefficient is larger than or equal to the classification threshold value, the segmentation rule has larger interference degree on the correlation analysis. This means that the data set exhibits more complex fluctuations and boundary effects under the set segmentation rules, so that the result of the correlation analysis is greatly affected. In this case, the segmentation rules introduce more noise or nonlinear relations, so that the segmentation rules need to be modified to generate adjustment signals;
if the feasible measurement coefficient is smaller than the classification threshold, the segmentation rule has smaller interference degree on the correlation analysis. The method has the advantages that the data set is relatively stable under the set segmentation rule, fluctuation is gentle, the influence of the boundary effect on correlation analysis is limited, the influence of the current segmentation rule on the accuracy of subsequent correlation analysis is limited, the segmentation rule is reasonable, and the available signal is generated.
According to the specified segmentation rule, the fluctuation change index and the boundary effect influence degree index are extracted from the periodic data set corresponding to each index. For the fluctuation index, the relative difference of the data inside each data segment is calculated, and the fluctuation condition of the data inside each segment is measured by comparing the percentage of the difference between the maximum value and the minimum value relative to the minimum value. The boundary effect impact level index evaluates the change in average level by calculating the degree of mean difference within each subset of data, and comparing the ratio of the difference of the mean inside and outside the boundary to the mean outside the boundary. Then, a weight is distributed to the fluctuation change index and the boundary effect influence degree index by using a machine learning technology, so that a feasible measurement coefficient is formed. By applying the feasible measurement coefficients to evaluate the effectiveness of adopting the specific segmentation rules, the optimal segmentation rules suitable for the efficient correlation analysis of various indexes of the thermal power plant are determined. The accuracy and the interpretability of analysis are improved, so that the segmentation rule is more fit with the actual data characteristics, and the requirements of a thermal power plant are better met.
The operation process of the correlation analysis module comprises the following steps:
the data set is divided into smaller segments using segmentation rules that determine the available signals. Ensuring that the data within each segment has similar characteristics under the new segmentation rules, so that each small segment represents a relatively stable subset of data;
ranking the data for each small segment may be performed by ordering the data by the value of each index and replacing the original value with the ranking. Then, calculating the spearman correlation among the indexes in each segment to obtain a correlation coefficient of each small segment;
and averaging the spearman correlation coefficient obtained by each small segment to obtain the average association relation among the indexes.
The method for segmenting, ranking and calculating the spearman correlation of the data set by using the segmentation rule of the signal judged to be available is beneficial to improving the efficiency of analyzing the correlation while guaranteeing the accuracy of the correlation analysis. First, the new segmentation rules ensure that the data within each small segment has similar characteristics, representing a relatively stable subset of data. Facilitating finer capturing of relationships between different subsets of data rather than unified correlation analysis across the entire data set. Second, by ranking and calculating the spearman correlation for each small piece of data, non-linear relationships and outliers can be handled more efficiently because spearman correlation is not sensitive to specific values of the data, but focuses on sequential relationships between the data. This helps to reduce the influence of outliers on the correlation analysis results, improving the robustness of the analysis. Finally, taking the average value of the spearman correlation coefficient obtained by each small segment can provide a more robust association relation measure. The average association relation is more representative, so that the correlation of different small segments is integrated, and the excessive dependence on individual small segments is reduced. This helps to reduce misleading conclusions due to a particular situation of a small segment.
The operation process of the result visualization module comprises the following steps:
based on the average association between the individual indicators, a thermodynamic diagram is used to demonstrate the correlation results between the fuel indicator, the boiler operating indicator, the emissions indicator, and the power generation indicator.
Through visual display of average association, the correlation among the indexes is conveniently and intuitively understood, so that a more intelligent data driving decision is made, and the system operation is optimized, the efficiency is improved, and the potential risk is reduced. The thermodynamic diagram is an intuitive and easily understood visualization tool suitable for conveying statistical results to non-professionals, and can better convey information of a correlation matrix by visualizing average association relations, so that complex statistical results are easier to understand and share.
The above formulas are all formulas with dimensionality removed and numerical calculation, the formulas are formulas with the latest real situation obtained by software simulation through collecting a large amount of data, and preset parameters and threshold selection in the formulas are set by those skilled in the art according to the actual situation.
The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, the processes or functions described in accordance with the embodiments of the present application are all or partially produced. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more sets of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the several embodiments provided in this application, it should be understood that the disclosed systems and apparatuses may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or other various media capable of storing program codes.
The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Finally: the foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.
Claims (8)
1. A fuel production and emission system data association control management system, comprising: the system comprises a data acquisition module, a change degree analysis module, a boundary influence analysis module, a comprehensive analysis module, a correlation analysis module and a result visualization module;
the data acquisition module counts each index, invokes the source data of each index, gathers the source data of each index into the analysis pool for storage, gathers the periodic data set corresponding to each index, and transmits the data set to the change degree analysis module and the boundary influence analysis module;
the change degree analysis module segments the data set, calculates the fluctuation and the data dispersion degree of the data set, obtains the fluctuation change index of the data set in the whole range, and sends the fluctuation change index to the comprehensive analysis module;
the boundary influence analysis module segments the data set, calculates the difference degree of the mean value inside and outside the boundary, analyzes and defines the average level difference between the data subsets under the boundary condition, obtains the boundary effect influence degree index, and sends the boundary effect influence degree index to the comprehensive analysis module;
the comprehensive analysis module collects output data of the change degree analysis module and the boundary influence analysis module, analyzes the model through a machine learning technology framework, outputs a feasible measurement coefficient, judges a segmentation rule according to the feasible measurement coefficient, and sends a judgment result to the data screening module;
the correlation analysis module determines a segmentation rule according to the judging result, divides the data set into smaller segments, calculates the sequence and the Spekerman correlation of each segment respectively, integrates the results to obtain the association relation result of each index, and sends the association relation result of each index to the result visualization module;
the result visualization module is used for visualizing the output content of the correlation analysis module.
2. The fuel production and emission system data association control management system according to claim 1, wherein:
the operation process of the data acquisition module comprises the following steps:
extracting corresponding data from data sources of fuel indexes, boiler operation indexes, emission indexes and power generation indexes, summarizing original data of each index according to time stamps, dividing the data according to unit time and cycle time, setting time segment identifiers, organizing the data of each index into continuous unit time data sets according to the same identifiers, and further merging the continuous unit time data sets into periodic data sets to obtain the data sets with comparability corresponding to each index.
3. The fuel production and emission system data association control management system according to claim 2, wherein:
the operation process of the change degree analysis module comprises the following steps:
step one, a data set is called, and the data set is divided according to a set segmentation rule;
calculating the ratio of the difference between the maximum value and the minimum value in each segment to the minimum value to obtain the relative difference of each segment;
step three, taking an average value of the relative differences of all the sections;
and step four, summarizing the relative differences of all the segments, calculating to obtain the standard deviation of the relative differences, and calculating the ratio of the standard deviation of the relative differences to the overall difference to obtain the fluctuation change index.
4. The fuel production and emission system data association control management system according to claim 3, wherein:
the operation process of the boundary influence analysis module comprises the following steps:
step one, a data set is called, and the data set is divided into a plurality of subsets according to a formulated segmentation rule to form a boundary;
calculating the average value of the data in each data subset to obtain the average value in the boundary and the average value outside the boundary, and calculating the ratio of the difference value of the average value inside the boundary and the average value outside the boundary to obtain the average value difference degree;
step three, taking an average value of the average value difference degrees to obtain a comprehensive average value difference degree;
and step four, summarizing the mean value difference degree of all the segments, calculating to obtain a mean value difference degree standard deviation, and then calculating the ratio of the mean value difference degree standard deviation to the comprehensive mean value difference degree to obtain a boundary effect influence degree index.
5. The fuel production and emission system data association control management system according to claim 4, wherein:
the operation process of the comprehensive analysis module comprises the following steps:
summarizing the fluctuation change index and the boundary effect influence degree index, and distributing corresponding weights for the fluctuation change index and the boundary effect influence degree index by using a machine learning technology; and combining the fluctuation change index and the boundary effect influence degree index with the corresponding weight values, and carrying out weighted summation to obtain a feasible measurement coefficient.
6. The fuel production and emission system data association control management system according to claim 5, wherein:
comparing the feasible measurement coefficient with a classification threshold, and generating an adjustment signal if the feasible measurement coefficient is greater than or equal to the classification threshold; if the feasible measurement coefficient is smaller than the classification threshold value, a usable signal is generated.
7. The fuel production and emission system data association control management system according to claim 6, wherein:
the operation process of the correlation analysis module comprises the following steps:
the segmentation rule of the determined available signals is used, the data set is divided into smaller segments, the data of each small segment is ranked, the data is ordered according to the value of each index, the ranking is used for replacing the original value, then the spearman correlation among the indexes in each segment is calculated, the correlation coefficient of each small segment is obtained, and the spearman correlation coefficient obtained by each small segment is averaged to obtain the average association relation among the indexes.
8. The fuel production and emission system data association control management system according to claim 7, wherein:
the operation process of the result visualization module comprises the following steps:
based on the average association between the individual indicators, a thermodynamic diagram is used to demonstrate the correlation results between the fuel indicator, the boiler operating indicator, the emissions indicator, and the power generation indicator.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410065598.6A CN117574180B (en) | 2024-01-17 | 2024-01-17 | Fuel production and emission system data correlation control management system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410065598.6A CN117574180B (en) | 2024-01-17 | 2024-01-17 | Fuel production and emission system data correlation control management system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117574180A CN117574180A (en) | 2024-02-20 |
CN117574180B true CN117574180B (en) | 2024-03-19 |
Family
ID=89862954
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410065598.6A Active CN117574180B (en) | 2024-01-17 | 2024-01-17 | Fuel production and emission system data correlation control management system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117574180B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118071196A (en) * | 2024-02-21 | 2024-05-24 | 华能国际电力股份有限公司大连电厂 | Power generation set level index management system based on table format visualization |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109614981A (en) * | 2018-10-17 | 2019-04-12 | 东北大学 | The Power System Intelligent fault detection method and system of convolutional neural networks based on Spearman rank correlation |
CN114997496A (en) * | 2022-06-06 | 2022-09-02 | 四川大学 | Unsupervised reservoir intelligent segmentation method based on space-time sequence data constraint |
CN116345555A (en) * | 2023-03-29 | 2023-06-27 | 国网河南省电力公司安阳供电公司 | CNN-ISCA-LSTM model-based short-term photovoltaic power generation power prediction method |
CN117091846A (en) * | 2023-08-23 | 2023-11-21 | 中国船舶集团有限公司第七一一研究所 | Diesel engine abnormal state detection method and device |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
NL2015086B1 (en) * | 2015-07-03 | 2017-01-30 | Daf Trucks Nv | Method, apparatus, and system for diagnosing at least one NOx-sensor of a diesel engine system. |
-
2024
- 2024-01-17 CN CN202410065598.6A patent/CN117574180B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109614981A (en) * | 2018-10-17 | 2019-04-12 | 东北大学 | The Power System Intelligent fault detection method and system of convolutional neural networks based on Spearman rank correlation |
CN114997496A (en) * | 2022-06-06 | 2022-09-02 | 四川大学 | Unsupervised reservoir intelligent segmentation method based on space-time sequence data constraint |
CN116345555A (en) * | 2023-03-29 | 2023-06-27 | 国网河南省电力公司安阳供电公司 | CNN-ISCA-LSTM model-based short-term photovoltaic power generation power prediction method |
CN117091846A (en) * | 2023-08-23 | 2023-11-21 | 中国船舶集团有限公司第七一一研究所 | Diesel engine abnormal state detection method and device |
Non-Patent Citations (2)
Title |
---|
Cheng L et al..《Journal of Engineering Mechanics》.2016,全文. * |
大武水源地地下水中NO_3-N动态变化特征及其影响因素分析;吴庆;郭永丽;翟远征;尹芝华;赵红亮;张军军;李常锁;;水文;20171225(第06期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN117574180A (en) | 2024-02-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112596495B (en) | Industrial equipment fault diagnosis method and system based on knowledge graph | |
de la Hermosa González | Wind farm monitoring using Mahalanobis distance and fuzzy clustering | |
CN117574180B (en) | Fuel production and emission system data correlation control management system | |
Kirchen et al. | Metrics for the evaluation of data quality of signal data in industrial processes | |
CN117556366B (en) | Data abnormality detection system and method based on data screening | |
US20220179393A1 (en) | Machine tool evaluation method, machine tool evaluation system and medium | |
CN116432123A (en) | Electric energy meter fault early warning method based on CART decision tree algorithm | |
CN118211882B (en) | Product quality management system and method based on big data | |
CN116186624A (en) | Boiler assessment method and system based on artificial intelligence | |
CN117436569A (en) | Nuclear power equipment fault prediction and intelligent calibration method and system based on random forest | |
CN117035563B (en) | Product quality safety risk monitoring method, device, monitoring system and medium | |
CN117009741A (en) | Abnormal analysis method and system based on generated energy | |
Yuan et al. | Issues of intelligent data acquisition and quality for manufacturing decision-support in an Industry 4.0 context | |
CN116737549A (en) | Time sequence database stability test method | |
Ma et al. | A systematic data characteristic understanding framework towards physical-sensor big data challenges | |
CN111027680B (en) | Monitoring quantity uncertainty prediction method and system based on variational self-encoder | |
CN113469559A (en) | Quality bit design and display method and system based on data quality inspection | |
Sinha et al. | Real-Time Well Constraint Detection Using an Intelligent Surveillance System | |
Zhao et al. | Burst detection in district metering areas using flow subsequences clustering–reconstruction analysis | |
CN117932520B (en) | Solid biological waste treatment equipment monitoring method based on data identification | |
Weng et al. | A Correlation Analysis-Based Multivariate Alarm Method With Maximum Likelihood Evidential Reasoning | |
CN118071168B (en) | Comprehensive energy management system | |
CN118425098B (en) | Distributed laser methane detection method and system | |
CN118194138B (en) | Pressure-bearing equipment damage mode identification method and system | |
CN116130130A (en) | Nuclear power plant transient intelligent identification and classification method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |