CN116228176B

CN116228176B - Sewage treatment data efficient management system based on data processing

Info

Publication number: CN116228176B
Application number: CN202310518393.4A
Authority: CN
Inventors: 吴用; 程凯; 褚巍; 周亚斌; 宋浩; 孙文潭
Original assignee: Anhui Wanxin Environmental Technology Co ltd
Current assignee: Anhui Kexin Environmental Protection Co.,Ltd.
Priority date: 2023-05-10
Filing date: 2023-05-10
Publication date: 2023-07-18
Anticipated expiration: 2043-05-10
Also published as: CN116228176A

Abstract

The invention relates to the technical field of data processing, in particular to a sewage treatment data efficient management system based on data processing, which comprises the following components: the data acquisition module is used for acquiring sewage data at each moment; the complexity analysis module is used for obtaining a local variation value, obtaining a first local correlation value according to the local variation value and the overall correlation of data points in the data, and obtaining information complexity by combining the first local correlation value and the overall correlation; the abnormality degree analysis module is used for acquiring initial abnormality degrees according to time sequence distribution and dispersion change trend of difference data points of the data points, and adjusting the initial abnormality degrees by combining the difference of the first local correlation value and the second local correlation value to obtain optimized abnormality degrees; and the sewage data management module is used for acquiring the Huffman coding priority of the data points by combining the information complexity, the optimized abnormality degree and the frequency. The invention encodes the data based on the Huffman coding priority, thereby improving the storage safety of the important data.

Description

Sewage treatment data efficient management system based on data processing

Technical Field

The invention relates to the technical field of data processing, in particular to a sewage treatment data efficient management system based on data processing.

Background

Under the guidance of the concept of intelligent water affair, many sewage treatment plants manage and control sewage treatment data by using a sewage treatment data high-efficiency management system, the system adopts sensing equipment such as data acquisition, transmission and the like to detect the water quality data of sewage on line, and the analysis of water quality and the control of the system are completed based on the water quality data. In this process, the monitoring data of the water quality data often needs to be stored in the system as a historical database for subsequent process analysis.

When sewage treatment data is stored by huffman coding, the longer the coding length of the data is, the more difficult the data is to be stored, and the phenomenon of data loss is likely to occur. Therefore, when storing the data with richer and more complex information, the data needs to finish Huffman coding later, that is, the priority of the coding of the data is set lower, so that the coding length of the data is shortened, and then the data is stored, thereby reducing the loss of important information. For abnormal data generated by abnormality in the sewage treatment process, the frequency of the abnormal data in the sewage treatment data is low, so that when the traditional Huffman storage is carried out on the abnormal data, the corresponding coding length is also long, which indicates that the Huffman coding of the abnormal data is finished earlier, namely the priority of the coding of the abnormal data is set to be higher. The abnormal data can be used for diagnosing and identifying the sewage treatment abnormal condition, when the priority level of the abnormal data is set higher, the abnormal data is easy to lose in storage, the abnormal data can lose the original meaning, the priority level of the abnormal data is indicated to have errors, and the diagnosis of the sewage treatment abnormal condition is misjudged.

Disclosure of Invention

In order to solve the technical problem of data loss caused by inaccurate priority setting of sewage data, the invention aims to provide an intelligent monitoring system and a monitoring method for a building hanging basket, and the adopted technical scheme is as follows:

the invention provides a sewage treatment data high-efficiency management system based on data treatment, which comprises:

the data acquisition module is used for acquiring data points of at least two types of sewage parameter data in the sewage at each moment;

the complexity analysis module is used for obtaining a local change value of each data point according to the change of the data value of each type of sewage parameter data in the adjacent data points; combining the local variation values of the corresponding data points in the two types of sewage parameter data at the same time and the overall correlation of the corresponding two types of sewage parameter data to obtain a first local correlation value between the corresponding data points; acquiring the information complexity of each data point according to the first local correlation value between each data point and all other types of sewage parameter data and the overall correlation of the class to which the data point belongs;

the abnormality degree analysis module is used for obtaining the dispersion degree of each data point in each type of sewage parameter data; under each type of sewage parameter data, obtaining difference data points according to the dispersion of each data point in a local window preset by each data point; acquiring initial anomaly degree of each data point by combining time sequence distribution of the difference data point, dispersion of the difference data point and dispersion change trend of the difference data point in the local window corresponding to each data point; obtaining a second local correlation value between the difference data point corresponding to each data point and other types of sewage parameter data; the initial abnormality degree is adjusted according to the difference between the first local correlation value and the second local correlation value corresponding to each data point, and the optimal abnormality degree of each data point is obtained;

The sewage data management module is used for acquiring the Hough coding priority of each data point by combining the information complexity of each data point, the optimization abnormality degree and the frequency of the corresponding data point; and encoding and storing the sewage data according to the Hough encoding priority.

Further, the method for obtaining the local variation value in the complexity analysis module includes:

and under each type of sewage parameter data, taking the difference value of the data value between each data point and the data point at the next moment of the data point as the local change value of the corresponding data point.

Further, the method for obtaining the first local correlation value in the complexity analysis module includes:

selecting any two kinds of sewage parameter data as to-be-detected data and target data respectively, and selecting any one data point in the to-be-detected data as to-be-detected data point; taking the data point which is the same as the data point to be measured in the target class data and takes the same time as the data point to be measured as a target data point;

taking the ratio of the absolute value of the local variation value of the data point to be measured to the data value of the data point to be measured as a local variation ratio to be measured, and taking the ratio of the absolute value of the local variation value of the target data point to the data value of the target data point as a target local variation ratio; performing negative correlation mapping on the absolute value of the difference between the local change ratio to be detected and the target local change ratio to obtain an initial local change value between a data point to be detected and a target data point; taking the product of the overall correlation between the data to be measured and the target class data and the initial local variation value as a first local correlation value between the data to be measured and the target data point;

And changing the data to be detected and the target data to obtain a first local correlation value between corresponding data points at the same moment in any two types of sewage parameter data.

Further, the method for acquiring the information complexity in the complexity analysis module comprises the following steps:

taking the product of the overall correlation between the data to be measured and the target class data and the first local correlation value between the data to be measured and the target data point as the initial information complexity of the data to be measured; calculating the average value of the initial information complexity of all the data points to be measured as the information complexity of the data points to be measured; and changing the data points to be detected, and obtaining the information complexity of each data point in each type of sewage parameter data.

Further, the method for acquiring the dispersion in the abnormality degree analysis module comprises the following steps:

acquiring the occurrence frequency of data values corresponding to data points in each type of sewage parameter data, and acquiring a data point corresponding to the maximum frequency value as a target frequency data point;

taking the absolute value of the difference value of the data value between each data point in each type of sewage parameter data and the target frequency data point as a molecule, and taking the ratio obtained by taking the data value of the data point as a denominator as the dispersion of the data points.

Further, the method for acquiring the difference data points in the abnormality degree analysis module comprises the following steps:

acquiring the dispersion of each data point in a local window preset by the data point to be measured, and taking the data point with the dispersion not being zero as a difference data point of the data point to be measured; and changing the data points to be detected, and acquiring the difference data points of each data point.

Further, the method for acquiring the initial abnormality degree in the abnormality degree analysis module includes:

obtaining the initial abnormality degree according to an initial abnormality degree formula, wherein the calculation formula of the initial abnormality degree is as follows:wherein->Is data point->M is expressed as data point +.>The number of difference data points within the local window preset for the center, +.>To be based on data pointsThe number of data points within the local window preset for the center, +.>Is data point->The dispersion of the ith difference data point, +.>For data points +.>Standard deviation of the dispersion of the difference data points within a local window preset for the center, +.>For data points +.>In a local window preset for the center, taking the ith difference data point as the standard deviation of the dispersion of each difference data point in the local window preset for the center, +. >Is data point->Time data corresponding to the acquisition time, +.>Data pointsTime data corresponding to the ith difference data point in the collection; />Is a preset constant; />As a function of absolute value.

Further, the method for obtaining the optimized abnormality degree in the abnormality degree analysis module comprises the following steps:

selecting any one of the difference data points to be measured as the difference data point to be measured; taking the data point which is the same as the data point to be measured in the target class data and is at the same time as the target difference data point;

normalizing the absolute value of the difference between the first local correlation value and the second local correlation value, and taking the normalized result as a first optimization abnormality degree of the data point to be measured;

changing the target class data and the difference data point to be detected, acquiring all the first optimization abnormal degrees of the data points in the data point to be detected and other class data, and taking the average value of all the first optimization abnormal degrees as the second optimization abnormal degree of the data point to be detected; taking the product of the initial abnormality degree and the second optimization abnormality degree of the data point to be measured as the optimization abnormality degree of the data point to be measured.

Further, the method for acquiring the hough coding priority in the sewage data management module comprises the following steps:

and taking the sum of the information complexity and the optimization abnormality degree of each data point as the priority weight of the corresponding data point, and carrying out negative correlation mapping on the product of the priority weight and the occurrence frequency of the data value corresponding to the data point to obtain the Hough coding priority of the data point.

Further, the method for obtaining the overall correlation in the complexity analysis module comprises the following steps:

and taking the absolute value of the Pearson correlation coefficient between any two types of sewage parameter data as the integral correlation between the two types of sewage parameter data.

The invention has the following beneficial effects:

in the embodiment of the invention, because relations may exist among different types of sewage parameter data in sewage, in order to facilitate analysis of complexity and abnormality of data points in each type of sewage parameter data in the sewage parameter data and enable analysis to be more accurate, when correlation analysis is carried out on the different types of sewage parameter data, local change values are obtained through the data value change condition of adjacent data points based on analysis of local change of each data point of each type of sewage parameter data, the overall correlation reflects the overall relation of two types of sewage parameter data, the local change values represent the local change condition of corresponding data points at the same moment in the two types of sewage parameter data, and the overall relation and the local relation are comprehensively analyzed, so that the first local correlation value among the data points is more accurate; because the data points in certain type of sewage parameter data and a plurality of data points in other types of sewage parameter data at the same moment can have correlation, the first local correlation value between each data point and all other types of sewage parameter data and the overall correlation of the type to which the data point belongs are comprehensively analyzed, so that the information complexity of the data point can reflect the correlation between different types of data more; when the sewage treatment process is abnormal, the number of normal data points is larger than that of difference data points, so that the difference data points of each data point can be obtained according to the dispersion of the data points; because the abnormal data points generated by the abnormality in the sewage treatment process show the aggregation on the time sequence, and the fluctuation of the abnormal data points is stronger relative to the normal data, the influence degree of the difference data points which are closer to the data points on the data points is larger, and the variation trend of the dispersion of the abnormal data points of the data points is smaller compared with the variation trend of the dispersion of the abnormal data points in a local window preset by each abnormal data point of the data points, the initial abnormality degree of each data point is obtained by combining the time sequence distribution of the difference data points, the dispersion of the difference data points and the dispersion variation trend of the dispersion in the local window preset by each data point; because the initial abnormality degree only analyzes the abnormality degree of single-class data, the abnormality degree of the obtained data is not accurate enough, in order to improve the accuracy of the abnormality degree of data points, the local correlation between different data points in certain class of sewage parameter data and corresponding data points in other classes of sewage parameter data at the same moment is required to be comprehensively analyzed to obtain a second local correlation value, and the difference between the first local correlation value and the second local correlation value is used for adjusting the initial abnormality degree so as to improve the accuracy of the abnormality degree of the data points and obtain the optimized abnormality degree of each data point; the frequency of the data points is adjusted based on the information complexity and the optimization abnormality degree of the data points to obtain the Huffman coding priority, coding is carried out according to the Huffman coding priority, so that the coding length of important data when Huffman coding is carried out is obviously shortened compared with the coding length of the important data when Huffman coding is carried out directly based on the frequency of the data points, the data points are stored according to the Huffman coding priority, and the efficient management of sewage data in a system is improved.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

Fig. 1 is a system block diagram of a sewage treatment data efficient management system based on data processing according to an embodiment of the present invention.

Detailed Description

In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following is a detailed description of specific implementation, structure, characteristics and effects of an intelligent monitoring system and a monitoring method for a construction hanging basket according to the invention with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The invention aims at the specific scene: the sewage treatment data management system needs to be stored in the system after the sewage quality data are collected.

The invention provides an intelligent monitoring system and a monitoring method for a construction hanging basket, which are specifically described below with reference to the accompanying drawings.

Referring to fig. 1, a system block diagram of a sewage treatment data efficient management system based on data processing according to an embodiment of the present invention is shown, where the system includes: the system comprises a data acquisition module 101, a complexity analysis module 102, an abnormality analysis module 103 and a sewage data management module 104.

The data acquisition module 101 is used for acquiring data points of at least two types of sewage parameter data in sewage at each moment.

In the sewage treatment data high-efficiency management system, water quality data are often sampled through a sensor or a water quality analyzer, and water quality monitoring is completed through real-time data analysis and treatment. In the embodiment of the invention, the real-time acquisition of the sewage treatment data is completed through the sensor arranged in the sewage treatment data high-efficiency management system, and the specific data acquired are as follows: the PH value, oxygen content, BOD, COD, total nitrogen, solubility, turbidity, water quality chromaticity and other data of the sewage.

The complexity analysis module 102 is configured to obtain a local variation value of each data point according to the variation of the data value of each type of sewage parameter data in the adjacent data points; combining the local variation values of the corresponding data points in the two types of sewage parameter data at the same time and the overall correlation of the corresponding two types of sewage parameter data to obtain a first local correlation value between the corresponding data points; and acquiring the information complexity of each data point according to the first local correlation value between each data point and all other types of sewage parameter data and the overall correlation of the class to which the data point belongs.

Because of the large variety and number of collected wastewater data, the wastewater parameter data often needs to be transmitted to a system for storage for subsequent analysis. When the sewage data is stored by using the huffman coding, since the huffman coding is based on the occurrence frequency of the data, the data with a larger frequency has a shorter corresponding storage coding length and the data with a smaller frequency has a longer corresponding storage coding length.

When the sewage treatment data is stored, the longer the coding length of the data is, the more difficult the data is to be stored, and the data is easy to be lost, so when the data is stored, the coding length of the data with rich and complex information is set to be shorter, and then the data is stored, thereby reducing the loss of important information. Abnormal data is generated when the sewage treatment process is abnormal, and the abnormal data can be used for identifying abnormal sewage treatment conditions. When the abnormal data storage is lost, the abnormal data lose the original meaning, so that erroneous judgment occurs in abnormal condition diagnosis, and therefore the coding length of the abnormal data is required to be set to be shorter and then the abnormal data is stored, and the loss of abnormal information can be reduced. In order to reduce the loss of important information, when the sewage parameter data is stored, the coding length of the data containing rich information and abnormal data is set to be short, namely, complexity analysis and abnormality analysis are carried out on the sewage parameter data, and the Hough coding priority of the corresponding data is obtained by combining the complexity and the abnormality of the sewage parameter data, so that the coding length of the data during storage is determined, and the important information during data storage is less prone to loss.

Data acquisition is performed on sewage, D types of acquired data exist, D is set to be 20 in the embodiment of the invention, and the D can be adjusted according to specific implementation scenes. D different kinds of data are analyzed, and as certain correlation exists among the different kinds of data, for example, the solubility in the sewage data and the turbidity have correlation, and when the solubility is increased, the turbidity is reduced; and vice versa. Therefore, it is considered that the solubility data and the turbidity data in the sewage data have a certain correlation, and the larger the correlation is, the larger the amount of information contained in the data is, and the data is more important than other data, and when the data is stored, the corresponding code length should be made as short as possible.

In the conventional correlation analysis method, the overall relation among different types of data is mainly analyzed, and the local change condition of the data in each type of sewage parameter data is ignored, so that the complexity analysis of the data is not accurate enough, and therefore, the embodiment of the invention analyzes the complexity of the sewage parameter data by combining the local change of the data in each type of sewage parameter data and the correlation among the different types of data, and the analysis of the information complexity of the sewage data is more accurate.

The 20 kinds of sewage parameter data collected from sewage are analyzed, a certain correlation exists between any two kinds of sewage parameter data, and the larger the correlation between certain kind of sewage parameter data and other 19 kinds of sewage parameter data is, the more abundant information contained in each data point in the kind of sewage parameter data is indicated. In order to obtain the information complexity of each data point in each type of sewage parameter data, a first local correlation value between data points corresponding to any one data point in a certain type of sewage parameter data and other 19 types of sewage parameter data at the same acquisition time is required to be obtained, and then the first local correlation value between data points in the certain type of sewage parameter data and data points corresponding to the other 19 types of sewage parameter data at the same acquisition time is comprehensively analyzed to obtain the information complexity of each data point in each type of sewage parameter data. The analysis process of the information complexity of each data point in each type of sewage parameter data is specifically as follows:

first, a first local correlation value between data points corresponding to any one of the other 19 types of sewage parameter data and a certain type of sewage parameter data at the same acquisition time is acquired.

Because the correlation between the data points corresponding to different kinds of data at the same acquisition time is correlated with the whole data of the different kinds of data, the whole correlation between any two kinds of sewage parameter data is acquired as the basis for analyzing the first local correlation value.

Preferably, the overall correlation obtaining method is as follows: and taking the absolute value of the Pearson correlation coefficient between any two types of sewage parameter data as the integral correlation between the two types of sewage parameter data. As an example, to take the Q class data Q { Q ₁ ，Q ₂ ，…，Q _N Sum W class data W { W } ₁ ，W ₂ ，…，W _N Two types of data are exemplified, wherein Q ₁ Data value of data point acquired at time 1 of Q-class data, Q ₂ Data value of data point collected at time 2 of Q-class data, Q _N The data value of the data point collected at the nth moment of the Q-class data; w (W) ₁ Data value of data point collected at time 1 of W-class data, W ₂ Data value of data point collected at time 2 of W-class data, W _N And the data value of the data point acquired at the nth time of the W-class data. Since the pearson correlation coefficient reflects the linear correlation degree between two random variables, the pearson correlation coefficient between the Q-class data and the W-class data can reflect the correlation relationship between the two classes of data, and the pearson correlation coefficient R between the Q-class data and the W-class data _QW Is marked as S as the overall correlation between Q-class data and W-class data _QW . Wherein the pearson correlation coefficient is a known techniqueBy the operation, and R _QW The range of values of (1) is [ -1,1], and the specific calculation method is not described here in detail.

In the embodiment of the invention, each data point in each type of sewage parameter data needs to be analyzed, but the overall correlation reflects the overall correlation of each data point in each type of sewage parameter data, the local variation of each data point in each type of sewage parameter data is ignored, and the correlation obtained by directly endowing the overall correlation to the local data is not accurate enough, so the embodiment of the invention can accurately obtain the correlation of the data points based on the local variation of different types of data in time sequence.

And analyzing the local change condition of each data point in time sequence of each type of sewage parameter data respectively, and providing data reference for analyzing the first local correlation value between the corresponding data points at the same time in any two types of sewage parameter data. Preferably, the local variation value of each data point is obtained by the following steps: and under each type of sewage parameter data, taking the data value difference between each data point and the data point at the next moment of the data point as the local change value of the corresponding data point.

Taking Q-class data as an example, selecting the ith data point Q in the Q-class data at the time sequence _i ，Q _i＋１ For the (i+1) th data point of the Q-class data in time sequence, the data point Q can be obtained _i Local variation value in time sequence. The increase condition of the data can be judged by the positive and negative of the local change value of the data point, and when the local change value is positive, the data point Q is described _i Is growing; when the local variation value is negative, the data point Q is described _i Is a negative increase; when the local variation value is 0, the data point Q is described _i Unchanged. According to the method, the ith data point W in the W class data on time sequence _i The same analysis is performed to obtain a data point W _i Local variation value in time sequence->Value of->Wherein W is _i＋１ Is the (i+1) th data point of the W-class data in time sequence.

And combining the overall correlation between any two types of sewage parameter data and the difference degree of the local change condition between the corresponding data points at the same time in the two types of sewage parameter data to obtain a first local correlation value between the corresponding data points at the same time in the any two types of sewage parameter data. Preferably, the method for acquiring the first local correlation value is as follows: selecting any two kinds of sewage parameter data as to-be-detected data and target data respectively, and selecting any one data point in the to-be-detected data as to-be-detected data point; taking the data point which is the same as the data point to be measured in the target class data as a target data point; taking the ratio of the absolute value of the local variation value of the data point to be measured to the data value of the data point to be measured as the local variation ratio to be measured, and taking the ratio of the absolute value of the local variation value of the target data point to the data value of the target data point as the target local variation ratio; performing negative correlation mapping on the absolute value of the difference between the local change ratio to be detected and the target local change ratio to obtain an initial local change value between the data point to be detected and the target data point; taking the product of the overall correlation between the data of the class to be measured and the data of the target class and the initial local variation value as a first local correlation value between the data point to be measured and the target data point; and changing the data to be detected and the data of the target class, and obtaining a first local correlation value between corresponding data points at the same moment in any two kinds of sewage parameter data.

It should be noted that, in the embodiment of the invention, different types of data collected in sewage are analyzed, corresponding units of the different types of data are inconsistent, and the data values of data points in the different types of data have larger differences. If only the data values of the data points in the different types of data are considered, the difference between units of the different types of data is ignored, and the local variation values of the data points in the same time in the different types of data are directly subtracted, so that a larger error occurs in calculating the first local correlation value between the data points. Thus, in order to avoid the difference of the data unitsThe difference of local variation values among the data points is overlarge, the difference degree of the local variation condition among the data points cannot be accurately judged, the ratio of the local variation value of the data point to the data value of the data point is required to be calculated, the data variation proportion of the data point is obtained, the difference of the data variation proportion among the data points at the same moment in different types of data is obtained, and the first local correlation value among the data points is obtained according to the difference of the local variation value proportion among the two data points. Taking the oxygen content of R-class data and the PH value of L-class data in sewage as examples for analysis, and the oxygen content R at the ith moment _i The data value of (2) ispH value L _i The data value of (2) is 6, and the oxygen content R at the (i+1) th moment _i＋１ The data value of (2) ispH value L _i＋１ The data value of (2) is 7, and the oxygen content R is obtained _i Is +.>pH value L _i The local variation value of (1) is 1, the oxygen content R _i Is 0.1, pH value L _i The local variation ratio of (2) was 0.167, and it can be seen that the oxygen content R _i And pH L _i The difference of local change conditions is more accurate, so that the oxygen content R _i And pH L _i The first local correlation value between them is more accurate.

As one example, combine the overall correlation between class Q data and class W data, and data point Q _i And data point W _i Local difference case acquisition data point Q between _i And data point W _i The first local correlation value is calculated according to the following formula:wherein->Data point Q _i And data point W _i A first local correlation value between S _QW For the overall correlation between class Q data and class W data,/>Local variation value for the ith data point in the Q-class data,/for the data point>Data value for the ith data point in class Q data, -/->For the local change value of the ith data point in W-class data, W _i A data value of an ith data point in W-type data; / >Is an exponential function based on a natural constant e, < ->As a function of absolute value.

S is the same as that of S _QW For the overall correlation between Q-class data and W-class data, when S _QW The larger the correlation between the Q-class data and the W-class data is, the larger the first local correlation value is;for data point Q at time i _i The local data change ratio of (1) shows the data point Q at the (i+1) th moment _i＋１ Relative to data point Q _i Data change degree of->Data point W at the ith moment _i The local data change ratio of (2) shows the data point W at the (i+1) th moment _i＋１ Relative to data point W _i The data change degree of the adjacent data points is analyzed under the similar data, so that the local change of the data points is more referential; />Data point Q _i And data point W _i The difference of the change ratio in time sequence can avoid the overlarge difference of local change values among data points caused by the unit difference of the data, when +.>The smaller the description data point Q _i And data point W _i The more similar the degree of difference of the local variation of the data points under the corresponding class data, the data point Q _i And data point W _i The greater the first local correlation value between.

And secondly, comprehensively analyzing the first local correlation value between the data points in certain data and the data points corresponding to other 19 data at the same acquisition time to obtain the information complexity of each data point in each data.

In the embodiment of the invention, 20 different kinds of sewage parameter data in sewage are collected by using a data point Q _i For example, at the acquisition data point Q _i At the same time, 19 corresponding data exist in the sensor, and the data point Q is obtained _i The correlation of the corresponding data in the other 19 is analyzed and the data point Q _i Is analyzed by combining all the correlation relations of the data points Q _i Global correlation with respect to other 19 data, thereby obtaining data point Q _i Is provided.

Preferably, the method for acquiring the information complexity of each data point in each type of sewage parameter data comprises the following steps: taking the product of the overall correlation between the data of the class to be measured and the data of the target class and the first local correlation value between the data point to be measured and the target data point as the initial information complexity of the data point to be measured; calculating an average value of initial information complexity of all data points to be measured as information complexity of the data points to be measured; changing data points to be detected, and obtaining the information complexity of each data point in each type of sewage parameter data.

As one example, the integral correlation between the different kinds of sewage parameter data and the first time corresponding data point in the corresponding kind of sewage parameter data are combined And a local correlation value, obtaining the information complexity of each data point. Data point Q _i Information complexity U of (2) _Qi The calculation formula of (2) is as follows:wherein->Data pointsThe information complexity of (2) is D is the type of data acquired during sewage treatment, S _Qj For the overall correlation between class Q data and class j data, +.>For the ith data point Q in the Q-class data _i With the ith data point j in the jth class of data _i A first local correlation value therebetween; />As a function of absolute value.

When the overall correlation between the Q-class data and the j-th class data is larger, the information amount contained in each data point in the Q-class data is more complex, and the overall correlation S between the Q-class data and the j-th class data is described _Qj The larger the weight of the overall correlation between different kinds of data as the first local correlation value is, the larger the weight is, so that the data point Q _i Information complexity U of (2) _Qi The larger; calculate data point Q _i Information complexity U of (2) _Qi When it is necessary to compare the data point Q _i Correlation with the remaining 19 data is accumulated and averaged, as data point Q _i The greater the first local correlation value with the remaining 19 data, the more data point Q is illustrated _i The more important the information of (1), the data point Q _i Information complexity U of (2) _Qi The larger.

Thus, the construction of the information complexity of each data in each type of data is completed.

An anomaly analysis module 103, configured to obtain a dispersion of each data point in each type of sewage parameter data; under each type of sewage parameter data, obtaining difference data points according to the dispersion of each data point in a local window preset by each data point; acquiring initial anomaly degree of each data point by combining time sequence distribution of the difference data point, dispersion of the difference data point and dispersion change trend of the difference data point in the local window corresponding to each data point; obtaining a second local correlation value between the difference data point corresponding to each data point and other types of sewage parameter data; and adjusting the initial abnormality degree according to the difference between the first local correlation value and the second local correlation value corresponding to each data point, and obtaining the optimal abnormality degree of each data point.

For single data, each data point has its corresponding information complexity, which reflects the importance of the data point to some extent. If the information complexity of the data points is directly added to the construction of the data frequency index, when the sewage treatment process generates abnormal data due to abnormality, the abnormal data can cause the correlation between original data to be destroyed, and the information complexity of the abnormal data is lower, so that the coding length of the abnormal data is longer and the abnormal data is easy to lose when the Huffman coding is carried out on the abnormal data. In order to prevent the occurrence of such a situation, abnormal data in each type of data is acquired, and the degree of abnormality of the data is analyzed based on the difference between the abnormal data and the data acquired when the sewage treatment is normal, so that the analysis of the optimized degree of abnormality of the sewage data is more accurate.

The sewage data collected by the sewage treatment system is monitored, and the abnormal condition of the sewage treatment process is not generated frequently, so that the normal data in the monitored and obtained sewage data occupy most of the data of the sewage parameter data, and the abnormal data occupy a smaller part, and the dispersion of each data point is obtained because the dispersion condition of the data points in the sewage parameter data can reflect the abnormal condition of the data. Preferably, the method for acquiring the dispersion is as follows: acquiring the occurrence frequency of data values corresponding to data points in each type of sewage parameter data, and acquiring a data point corresponding to the maximum frequency value as a target frequency data point; taking the absolute value of the difference value of the data value between each data point in each type of sewage parameter data and the target frequency data point as a molecule, and taking the ratio obtained by taking the data value of the data point as a denominator as the dispersion of the data points.

Taking Q-class data as an example, counting the occurrence frequency of the data value corresponding to each data point in the Q-class data, obtaining the frequency corresponding to the data value of each data point, and recording the data point corresponding to the maximum frequency as. With data point Q _A For example, data point Q _A Is>. Data point Q _A The dispersion of (2) is obtained based on the difference between the data value of the data point and the data value corresponding to the maximum frequency value in the Q-class data, so the differenceThe larger, the data point Q _A The greater the dispersion of (2).

Because the dispersion of the data can only reflect the difference between single data and normal data, the interference data such as noise points cannot be judged, namely, the dispersion can not accurately reflect the abnormal degree of the data, abnormal data points in the local range of each data point are obtained, and the initial abnormal degree of each data point is judged according to the position distribution condition of the abnormal data points. Preferably, the specific acquisition method of the difference data point of each data point is as follows: acquiring the dispersion of each data point in a local window preset by the data point to be measured, and taking the data point with the dispersion not being zero as a difference data point of the data point to be measured; changing the data points to be measured, and obtaining a difference data point of each data point. In the embodiment of the invention, the size of the local window preset by the data point is set as follows according to the experience valueThe practitioner can set the settings according to the actual situation.

The dispersion of the abnormal data points caused by the same type of abnormality in the sewage treatment process often has certain similarity, namely the data value of the abnormal data points is greatly different from the data value of the normal data points, so that the dispersion of the different data points is used as an important factor for judging the abnormal condition of the data points. Meanwhile, the abnormal data is easily affected by abnormal conditions in sewage treatment, under the condition that the system is unstable in operation, the fluctuation of the abnormal data points is stronger than that of the normal data, and the variation trend of the dispersion of the abnormal data points of the data points is smaller than that of the dispersion of the abnormal data points in a local window preset by each abnormal data point of the data points, so that the abnormal degree of each data point is represented according to the variation trend of the dispersion of the different data points in the local window preset by the data points. When the sewage treatment process is abnormal, the data points collected by the system are abnormal data points, the abnormal data points are aggregated in time sequence, and the influence degree of the data points is larger when the different data points are closer to the data points, so that the variation trend of the dispersion degree of the different data points is adjusted according to the time sequence distribution of the data points and the different data points, and the initial abnormal degree of each data point is obtained.

Preferably, the initial abnormality degree is obtained according to an initial abnormality degree formula, and the calculation formula of the initial abnormality degree is:wherein->Data point Q _A M is the initial degree of abnormality in data point Q _A The number of difference data points within the local window preset for the center, +.>To be as data point Q _A The number of data points within the local window preset for the center, +.>Data point Q _A The dispersion of the ith difference data point, +.>To be as data point Q _A Standard deviation of the dispersion of the difference data points within a local window preset for the center, +.>To be as data point Q _A In a local window preset for the center, taking the ith difference data point as the standard deviation of the dispersion of each difference data point in the local window preset for the center, +.>Data point Q _A Time data corresponding to the acquisition time, +.>Data point Q _A Time data corresponding to the ith difference data point in the collection; />Taking the minimum positive number of 0.001 as a preset constant, and preventing the denominator from being 0; />As a function of absolute value.

Note that, when data point Q _A The more abnormal data points in the preset local window are, the description data point Q _A Possibly collected when abnormality occurs in the sewage treatment process, the data point Q _A The greater the initial degree of anomaly; data point Q _A The greater the dispersion of the ith difference data point, the more data value and data point Q of that difference data point are accounted for _A The difference of the data values of normal data points in the preset local window is large, resulting in data point Q _A The greater the initial degree of anomaly; due to the point Q of the AND _A The closer difference data point is to data point Q _A The greater the degree of influence of (a), the time-series is made to be equal to the data point Q _A The closer the difference data points are, the greater the standard deviation of the dispersion, so the time difference between the data points and the time of the difference data point collectionAs the weight of the dispersion standard deviation of the difference data points in the local window preset by the data points; in the case of unstable system operation, the outlier data point is more fluctuating relative to the normal data by analyzing the data point Q _A The difference between the standard deviation of the dispersion data points within the preset local window is +.>The smaller the description data point Q _A Abnormal fluctuation conditions of (1) and data point Q _A The more similar the abnormal fluctuation of the abnormal data points of (a), so data point Q _A The greater the initial degree of anomaly.

The initial abnormality degree analysis of the data points is completed through the dispersion degree of the difference data points in the local window preset by the data points, but the initial abnormality degree is only analyzed for the data points in the similar sewage parameter data, and the abnormality condition of the data points in the single type of sewage parameter data is more accurate. However, in order to achieve a certain correlation between different types of data in the sewage treatment system, when an abnormality occurs in the sewage treatment process, the correlation between some data is destroyed, and the specific scene is as follows: when sodium hypochlorite is used for treating ammonia nitrogen wastewater in sewage treatment, a device for adding sodium hypochlorite fails, and the sodium hypochlorite and the ammonia nitrogen wastewater react under normal conditions, so that the content of the sodium hypochlorite and the ammonia nitrogen wastewater in the sewage is reduced, and the device has a certain correlation; when the sodium hypochlorite fails, the sodium hypochlorite cannot be added, the concentration of the sodium hypochlorite is reduced to a concentration which is difficult to react, ammonia nitrogen wastewater is continuously added, the ammonia nitrogen content is continuously increased, namely the sodium hypochlorite content is reduced, and the ammonia nitrogen content is increased, so that the correlation between the sodium hypochlorite and the ammonia nitrogen wastewater is destroyed.

Therefore, when analyzing the data, only the degree of abnormality of the single type of data is analyzed, the degree of abnormality of the obtained data is not accurate enough, and misjudgment is easy to occur. Therefore, the local correlation between the data point in the sewage parameter data of a certain class and the corresponding data point at the same time in the sewage parameter data of other classes needs to be comprehensively analyzed, and the comprehensive correlation between the data point in the sewage parameter data of the class and the data point of the other 19 classes is considered, namely, the data point is comprehensively analyzed through the first local correlation value of the data point and the second local correlation value of the difference data point of the data point, the initial abnormal degree of the data point is adjusted according to the result obtained by the comprehensive analysis, the abnormal degree of the data point is optimized, so that the abnormal degree of the data point is more comprehensive and accurate, and the optimized abnormal degree of the data point is obtained. And obtaining the optimal abnormality degree of each data point in each type of data. The specific acquisition method for optimizing the abnormality degree comprises the following steps: selecting any one of the difference data points of the data points to be detected as the difference data point to be detected; taking the data point which is the same as the data point to be measured in the target class data as a target difference data point; normalizing the absolute value of the difference between the first local correlation value and the second local correlation value, and taking the normalized result as a first optimization abnormality degree of the data point to be measured; changing target class data and difference data points to be measured, acquiring all first optimization abnormal degrees of data points in the data points to be measured and other class data, and taking the average value of all first optimization abnormal degrees as the second optimization abnormal degree of the data points to be measured; taking the product of the initial abnormality degree and the second optimization abnormality degree of the data point to be measured as the optimization abnormality degree of the data point to be measured. It should be noted that, the methods for acquiring the first local correlation value and the second local correlation value are identical, and the difference is that the first local correlation value is a data point for the object, and the second local correlation value is a difference data point in a local window preset for the data point, which is not described herein.

As one example, combine data point Q _A First local correlation value with other classes of data points, and data point Q _A Second local correlation value between the difference data point and other class data for data point Q _A Optimizing the initial abnormality degree of (1) to obtain a data point Q _A Is the optimal degree of abnormality of data point Q _A The calculation formula of the optimized abnormality degree of (2) is as follows:wherein->Data point Q _A Is (are) optimized for abnormality degree, /)>Data point Q _A D is the type of data collected during sewage treatment, m is the data point Q _A Number of difference data points in preset local window, +.>Data point Q _A A first local correlation value between the ith difference data point of (a) and the ith data point of the jth class of data,/a second local correlation value between the ith difference data point of (b) and the ith data point of (a) class of data>Data point Q _A A first local correlation value with a data points in the j-th class of data; />As absolute value function, norm is normalization function.

Note that, when data point Q _A First local correlation values between data points corresponding to other categoriesData point Q _A Second local correlation value between the difference data point of (c) and the corresponding data point of the other category +.>Local correlation difference between->The larger the description data point Q _A Data point Q _A The greater the local difference between the difference data points of (a) and the other classes of data respectively, i.e. the normal data point Q _A The greater the degree of anomaly with its outlier data points; at the point of data Q _A Is->Mean value of (1) is data point Q _A The greater the average of the local correlation differences, the greater the weight of the initial degree of anomaly of (2), the data point Q _A Is>The greater the weight of (1) is such that data point Q _A The greater the degree of optimization anomaly; data point Q _A Is>The larger the description data point Q _A The greater the degree of abnormality of the data points in the Q-class data, the more +.>Weighted data point Q _A Optimizing degree of abnormality->The larger.

Thus, the construction of the optimized abnormal degree of each data in each type of data is completed.

The sewage data management module 104 is configured to acquire a hough coding priority of each data point by combining the information complexity of each data point, the optimization anomaly degree and the frequency of occurrence of the corresponding data point; and encoding and storing the sewage data according to the Hough encoding priority.

The traditional Huffman coding firstly creates a tree according to the occurrence frequency of characters, then generates a specific code for each character through the tree structure, uses shorter codes for characters with high occurrence frequency and uses longer codes for characters with low occurrence frequency, thus reducing the average length of the character strings after the codes, and achieving the aim of lossless compression of data. According to the embodiment of the invention, the information complexity and the optimization abnormality degree of the data points are added into the encoding process of the traditional Huffman encoding based on frequency, so that the encoding length of the data containing rich information and the abnormal data containing the abnormal information in the sewage treatment process is shorter, and the information is stored, thereby reducing the phenomena of data loss and the like, and facilitating the subsequent construction of a database of sewage information and the recognition of the abnormal condition in the sewage treatment process. Herein, huffman coding is a well-known technique, and specific methods are not described herein.

In the Huffman coding process, the earlier the information finishes coding, the longer the coding length of the information; the later the information is encoded, the shorter the encoding length of the information. Therefore, it is necessary to make the data containing abundant information and the abnormal data containing abnormal information of the sewage treatment process finish encoding later, that is, the lower the priority of encoding is, the huffman encoding priority of each data point is set by the complexity and the degree of abnormality of the information contained in the data. Preferably, the specific acquisition process of the huffman coding priority is as follows: and taking the sum of the information complexity and the optimization abnormality degree of each data point as the priority weight of the corresponding data point, and carrying out negative correlation mapping on the product of the priority weight and the occurrence frequency of the data value corresponding to the data point to obtain the Hough coding priority of the data point.

And carrying out priority analysis of the data points by combining the complexity and the abnormality degree of each data point, and obtaining the Huffman coding priority of each data point. The huffman coding priority is calculated as follows:wherein->Is data point->Information complexity of->Data point Q _A Is (are) optimized for abnormality degree, /)>Data point Q _A The frequency of occurrence of data values in the Q class of data; exp is an exponential function based on a natural constant e.

Note that, when data point Q _A The more abundant the information is, the greater the degree of abnormality is, i.e、/>The larger the data point Q is, the more the data point Q is required _A Late completion of encoding, data point Q _A The shorter the code length of (2); since Huffman coding is based on the frequency of occurrence of a character, the data point Q is encoded _A Complexity and abnormality integrated result ++>Data point Q _A Frequency of occurrence of data value of (2) in Q-class data +.>Weighting, i.e. the more informative the data point is, the greater the degree of abnormality isThe larger the description point Q _A The lower the huffman coding priority.

And acquiring the Huffman coding priority of each data point in each type of sewage parameter data according to the method.

Based on the information complexity and the optimized abnormality degree of the sewage parameter data, the Huffman coding priority of each data point in the sewage treatment data system is obtained, and the data containing important information and abnormal data are deeply analyzed, so that Huffman coding of the data is completed later, the coding length of the data is shorter, the data is easier to store and read in the system, the data loss is difficult to occur, and the storage and reading efficiency and the safety of the important data are ensured.

In summary, in the embodiment of the present invention, the data acquisition module is configured to acquire data points of various types of data in the sewage at each moment; the complexity analysis module is used for acquiring local variation values according to the data value difference of the adjacent data points, acquiring a first local correlation value according to the local variation values and the overall correlation of the data points in the data, and acquiring information complexity by combining the first local correlation value and the overall correlation; the abnormality degree analysis module is used for acquiring initial abnormality degrees according to time sequence distribution and dispersion change trend of difference data points of the data points, acquiring second local correlation values according to local change values of the difference data points of the data points and corresponding data points in other data, and adjusting the initial abnormality degrees by combining the difference between the first local correlation values and the second local correlation values to obtain optimized abnormality degrees; and the sewage data management module is used for acquiring the Huffman coding priority of the data points by combining the information complexity, the optimized abnormality degree and the frequency. The invention encodes the data based on the Huffman coding priority, thereby improving the storage safety of the important data.

It should be noted that: the sequence of the embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. The processes depicted in the accompanying drawings do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.

The foregoing description of the preferred embodiments of the present invention is not intended to be limiting, but rather, any modifications, equivalents, improvements, etc. that fall within the principles of the present invention are intended to be included within the scope of the present invention.

Claims

1. The utility model provides a sewage treatment data high-efficient management system based on data processing which characterized in that, this system includes:

the sewage data management module is used for acquiring the Hough coding priority of each data point by combining the information complexity of each data point, the optimization abnormality degree and the frequency of the corresponding data point; encoding and storing sewage data according to the Hough encoding priority;

the method for acquiring the local variation value comprises the following steps:

Under each type of sewage parameter data, taking a data value difference value between each data point and a data point at the next moment of the data point as a local change value of the corresponding data point;

the method for acquiring the first local correlation value comprises the following steps:

Changing the data to be detected and the target data to obtain a first local correlation value between corresponding data points at the same moment in any two types of sewage parameter data;

the method for acquiring the information complexity comprises the following steps:

taking the product of the overall correlation between the data to be measured and the target class data and the first local correlation value between the data to be measured and the target data point as the initial information complexity of the data to be measured; calculating the average value of the initial information complexity of all the data points to be measured as the information complexity of the data points to be measured; changing the data points to be detected, and obtaining the information complexity of each data point in each type of sewage parameter data;

the method for acquiring the optimization anomaly degree comprises the following steps:

Changing the target class data and the difference data point to be detected, acquiring all the first optimization abnormal degrees of the data points in the data point to be detected and other class data, and taking the average value of all the first optimization abnormal degrees as the second optimization abnormal degree of the data point to be detected; taking the product of the initial abnormality degree and the second optimization abnormality degree of the data point to be measured as the optimization abnormality degree of the data point to be measured;

the method for acquiring the Hough coding priority comprises the following steps:

2. The efficient management system for sewage treatment data based on data processing according to claim 1, wherein the method for obtaining the dispersion in the abnormality degree analysis module comprises the following steps:

3. The efficient sewage treatment data management system based on data processing according to claim 1, wherein the method for acquiring the difference data points in the abnormality degree analysis module comprises the following steps:

4. The efficient sewage treatment data management system based on data processing according to claim 1, wherein the method for acquiring the initial abnormality degree in the abnormality degree analysis module comprises the following steps:

obtaining the initial abnormality degree according to an initial abnormality degree formula, wherein the calculation formula of the initial abnormality degree is as follows:the method comprises the steps of carrying out a first treatment on the surface of the In (1) the->Data point Q _A M is the initial degree of abnormality in data point Q _A The number of difference data points within the local window preset for the center, +. >To be as data point Q _A The number of data points within the local window preset for the center, +.>Data point Q _A The dispersion of the ith difference data point, +.>To be as data point Q _A Standard deviation of the dispersion of the difference data points within a local window preset for the center, +.>To be as data point Q _A In a local window preset for the center, taking the ith difference data point as the standard deviation of the dispersion of each difference data point in the local window preset for the center, +.>Data point Q _A Time data corresponding to the acquisition time, +.>For numbers of digitsPoint Q _A Time data corresponding to the ith difference data point in the collection; />Is a preset constant; />As a function of absolute value.

5. The efficient management system for sewage treatment data based on data processing according to claim 1, wherein the method for obtaining the overall correlation in the complexity analysis module comprises the following steps: