CN115423036A

CN115423036A - Method and system for measuring sewage index

Info

Publication number: CN115423036A
Application number: CN202211174721.5A
Authority: CN
Inventors: 钟伟民; 杜文莉; 曹志兴; 钱锋; 彭鑫
Original assignee: East China University of Science and Technology
Current assignee: East China University of Science and Technology
Priority date: 2022-09-26
Filing date: 2022-09-26
Publication date: 2022-12-02

Abstract

The invention provides a method for measuring a sewage index, a system for measuring the sewage index and a storage medium. The method for measuring the sewage index comprises the following steps: acquiring historical data of sewage treatment and a query sample to be predicted; according to the historical data and the query sample, determining a similar training sample and an associated input variable corresponding to the query sample; and performing slow feature regression analysis on the query sample according to the similar training sample and the associated input variable to determine a numerical value of the sewage index of the query sample.

Description

Method and system for measuring sewage index

Technical Field

The invention relates to the technical field of sewage treatment, in particular to a sewage index measuring method, a sewage index measuring system and a corresponding computer readable storage medium.

Background

China's per capita water resource is only one fourth of the average global level, and is a country with relatively poor per capita water resource. However, with the continuous expansion of urban scale in China and the continuous influx of a large number of people, the usage amount of water resources is gradually increased, and the discharge amount of urban sewage is also increased. In order to meet the requirements of sustainable development and improve the utilization rate of water resources, sewage treatment is more and more important.

For the sewage treatment process, the accurate prediction of the key indexes of sewage treatment can effectively help a factory to judge and check whether the sewage after treatment meets the treatment standard and other key problems, thereby reducing the sewage treatment cost for the factory and improving the sewage treatment efficiency. However, due to the problems of sensor failure and the like, historical data are often lost, and great difficulty is caused in predicting key indexes of sewage treatment. Soft measurement is an effective technique for solving the problem of prediction of key indexes of sewage. In the field of sewage treatment, due to the nonlinearity and the dynamic property of the sewage treatment process, common soft measurement methods such as Support Vector Regression (SVR) and the like are difficult to effectively solve data loss, and the model prediction effect is poor, so that the actual requirements are difficult to meet.

In order to overcome the above defects in the prior art, there is a need in the art for a method for determining a sewage index, which is used for accurately predicting the sewage index in the case of data loss to determine an accurate value of the sewage index.

Disclosure of Invention

The following presents a simplified summary of one or more aspects in order to provide a basic understanding of such aspects. This summary is not an extensive overview of all contemplated aspects, and is intended to neither identify key or critical elements of all aspects nor delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more aspects in a simplified form as a prelude to the more detailed description that is presented later.

In order to overcome the above defects in the prior art, the present invention provides a method for measuring a sewage index, a device for measuring a sewage index, and a computer readable storage medium corresponding thereto, which can accurately predict a sewage index in the case of data loss to determine an accurate value of the sewage index.

Specifically, the above-described contaminant determination method provided according to the first aspect of the present invention includes the steps of: and acquiring historical data of sewage treatment and a query sample to be predicted. And determining similar training samples and associated input variables corresponding to the query samples according to the historical data and the query samples. And performing slow feature regression analysis on the query sample according to the similar training sample and the associated input variable so as to determine the numerical value of the sewage index of the query sample.

Further, in some embodiments of the present invention, the step of determining similar training samples and associated input variables corresponding to the query sample based on the historical data and the query sample comprises: checking the integrity of the query sample, and filling a corresponding missing value if the query sample has data missing. And determining the historical data with higher similarity to the query sample as the similar training sample according to the historical data. And determining candidate input variables with higher correlation with the sewage index as the correlation input variables according to the historical data.

Further, in some embodiments of the present invention, the step of checking the integrity of the query sample, and if the query sample has data missing, filling the corresponding missing value includes: and carrying out data preprocessing on the historical data. And determining the potential posterior distribution of the historical data according to the preprocessed historical data. And filling the data missing of the query sample according to the historical data and the potential posterior distribution of the historical data.

Further, in some embodiments of the present invention, the step of pre-processing the historical data comprises: and carrying out mean variance normalization on the historical data. And determining the dimensionality of the potential variable in the historical data according to the historical data after the mean variance normalization.

Further, in some embodiments of the present invention, the step of determining the historical data with higher similarity to the query sample as the similar training sample according to the historical data includes: determining a first SKL divergence of the historical data. And determining the difference degree of the latent variables of the historical data and the query sample according to the first SKL divergence so as to select the similar training sample.

Further, in some embodiments of the present invention, the step of determining a difference between the historical data and the latent variable of the query sample according to the first SKL divergence to select the similar training sample includes: and selecting N1 historical data with a corresponding quantity, with smaller first SKL divergence, from the historical data as the similar training samples according to a preset quantity N1.

Further, in some embodiments of the present invention, the step of determining a candidate input variable having a higher correlation with the sewage indicator as the associated input variable according to the historical data includes: and approximating the distribution of each candidate input variable and the predicted value by a Gaussian kernel density estimation method. And calculating a probability distribution function of each candidate input variable. And determining a second SKL divergence of the candidate input variable according to the distribution difference between the candidate input variable and the predicted value. And determining the associated input variable from the candidate input variables according to the second SKL divergence.

Further, in some embodiments of the present invention, the step of determining the associated input variable from the candidate input variables according to the second SKL divergence includes: and selecting N2 candidate input variables with smaller second SKL divergence from the candidate input variables as the associated input variables according to the preset number N2.

Further, in some embodiments of the present invention, the step of performing a slow feature regression analysis on the query sample according to the similar training samples and the associated input variables to determine the numerical value of the sewage index of the query sample comprises: and determining a sample weight matrix according to the query sample and the similar training sample. And determining a weighted average matrix of the input and the output of the training sample according to the sample weight matrix. And constructing and training a first slow feature regression analysis model according to the associated input variables, the similar training samples and the weighted average matrix. And predicting the sewage index according to the first slow characteristic regression analysis model and the query sample.

Further, in some embodiments of the invention, the step of predicting the sewage metric according to the first slow feature regression analysis model and the query sample comprises: discarding a second slow feature regression analysis model that has been constructed or exists before the first slow feature regression analysis model, and inputting the query sample to the first slow feature regression analysis model to determine a numerical value of the sewage indicator.

In addition, the sewage index measuring system provided by the second aspect of the present invention includes a memory and a processor. The processor is connected with the memory and is configured to implement the method for determining a wastewater index provided by the first aspect of the present invention.

Further, the above computer-readable storage medium provided according to the third aspect of the present invention has computer instructions stored thereon. The computer instructions, when executed by the processor, implement the method for determining a wastewater index as set forth above in the first aspect of the present invention.

Drawings

The above features and advantages of the present disclosure will be better understood upon reading the detailed description of embodiments of the disclosure in conjunction with the following drawings. In the drawings, components are not necessarily drawn to scale, and components having similar relative characteristics or features may have the same or similar reference numerals.

FIG. 1 illustrates an architecture diagram of a sewage indicator measurement device provided in accordance with some embodiments of the present invention;

FIG. 2 illustrates a flow chart of a method of determining a wastewater indicator provided according to some embodiments of the invention;

FIG. 3 is a numerical diagram illustrating predicted and actual values of a method for determining a wastewater indicator according to some embodiments of the invention;

FIG. 4 is a numerical diagram illustrating predicted values and actual values of a method for determining a wastewater indicator according to some embodiments of the invention;

fig. 5 is a numerical diagram illustrating predicted values and actual values of a sewage index determination method according to some embodiments of the present invention.

Detailed Description

The following description of the embodiments of the present invention is provided for illustrative purposes, and other advantages and effects of the present invention will become apparent to those skilled in the art from the present disclosure. While the invention will be described in connection with the preferred embodiments, there is no intent to limit its features to those embodiments. On the contrary, the invention has been described in connection with the embodiments for the purpose of covering alternatives or modifications as may be extended based on the claims of the invention. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The invention may be practiced without these particulars. Moreover, some of the specific details have been left out of the description in order to avoid obscuring or obscuring the focus of the present invention.

In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

Additionally, the terms "upper," "lower," "left," "right," "top," "bottom," "horizontal," "vertical" and the like as used in the following description are to be understood as referring to the segment and the associated drawings in the illustrated orientation. The relative terms are used for convenience of description only and do not imply that the described apparatus should be constructed or operated in a particular orientation and therefore should not be construed as limiting the invention.

It will be understood that, although the terms first, second, third, etc. may be used herein to describe various elements, regions, layers and/or sections, these elements, regions, layers and/or sections should not be limited by these terms, but rather are used to distinguish one element, region, layer and/or section from another element, region, layer and/or section. Thus, a first component, region, layer or section discussed below could be termed a second component, region, layer or section without departing from some embodiments of the present invention.

As described above, for the sewage treatment process, accurate prediction of the key indexes of sewage treatment can effectively help the factory to effectively judge whether the sewage treatment process meets the key problems such as treatment standards, etc., thereby reducing the sewage treatment cost for the factory and improving the sewage treatment efficiency. However, due to problems such as sensor failure, historical data is often lost, and prediction of key indexes of sewage treatment is extremely difficult. Soft measurement is an effective technique for solving the problem of prediction of key indexes of sewage. In the field of sewage treatment, due to the nonlinearity and the dynamic property of the sewage treatment process, common soft measurement methods such as SVR and the like are difficult to effectively solve data loss, and the model prediction effect is poor, so that the actual requirements are difficult to meet.

In order to overcome the above-mentioned defects of the prior art, the present invention provides a method for determining a sewage index, a device for determining a sewage index and a computer readable storage medium corresponding thereto, which are used for accurately predicting a sewage index in the case of data loss to determine an accurate value of the sewage index.

In some non-limiting embodiments, the method for measuring the sewage index provided by the first aspect of the present invention may be implemented by the apparatus for measuring the sewage index provided by the second aspect of the present invention. Specifically, the sewage index measuring device is provided with a memory and a processor. The memory includes, but is not limited to, the above-described computer-readable storage medium provided by the third aspect of the present invention having computer instructions stored thereon. The processor is connected to the memory and configured to execute the computer instructions stored in the memory to implement the method for determining a wastewater index provided by the first aspect of the present invention.

Referring first to fig. 1, fig. 1 shows an architecture diagram of a contaminant detection apparatus according to some embodiments of the present invention.

FIG. 1 shows an architectural diagram of a contaminant assay device provided in accordance with some embodiments of the present invention. The sewage index measuring apparatus includes an internal communication bus 301, a processor (processor) 302, a Read Only Memory (ROM) 303, a Random Access Memory (RAM) 304, a communication port 305, and a hard disk 307. The internal communication bus 301 may enable data communication between components of the sewage indicator measurement device. Processor 302 may make the determination and issue a prompt. In some embodiments, processor 302 may be comprised of one or more processors. The communication port 305 can realize data transmission and communication between the sewage indicator measurement device and an external input/output device. In some embodiments, the sewage index determination device may send and receive information and data from the network through the communication port 305. In some embodiments, the sewage indicator determination device may be configured to transmit and communicate data via the input/output terminal 306 with an external input/output device in a wired manner. The sewage indicator determination apparatus further comprises various forms of program storage units and data storage units, such as a hard disk 307, read Only Memory (ROM) 303 and Random Access Memory (RAM) 304, capable of storing various data files for computer processing and/or communication use, as well as possible program instructions for execution by the processor 302. The processor 302 executes these instructions to implement the main parts of the method. The results of the processing by the processor 302 are communicated to an external output device via the communication port 305 for display on a user interface of the output device.

The working principle of the sewage index measuring device will be described below with reference to some examples of the sewage index measuring method. It will be appreciated by those skilled in the art that these examples are merely provided as non-limiting examples of the present invention, and are intended to clearly illustrate the broad concepts of the present invention and to provide specific details which may be readily implemented by the public rather than to limit the overall function or operation of the device. Similarly, the sewage index measuring apparatus is only a non-limiting embodiment provided by the present invention, and does not limit the main body of execution of each step in these sewage index measuring methods.

Referring to fig. 2, fig. 2 is a flow chart illustrating a method for determining a wastewater indicator according to some embodiments of the present invention.

As shown in step S1 of fig. 2, in the process of determining the pollutants, the sewage index determination method may first obtain historical data of sewage treatment and a query sample to be predicted. Thereafter, as shown in step S2 of fig. 2, the present invention may determine similar training samples and associated input variables corresponding to the query sample according to the historical data and the query sample. Then, as shown in step S3 of fig. 2, the present invention may perform slow feature regression analysis on the query sample according to the similar training sample and the associated input variable, so as to determine the value of the sewage index of the query sample.

Alternatively, in some embodiments of the present invention, a person skilled in the art may obtain the sewage treatment data by installing a sensor for measuring the sewage index, and a sewage index database containing the historical data is established.

It is understood by those skilled in the art that the above-mentioned sensor for sewage index is only a non-limiting embodiment provided by the present invention, and is intended to measure the above-mentioned historical index of sewage treatment to obtain the historical data of the sewage index, and is not intended to limit the scope of the present invention.

Alternatively, in some embodiments provided herein, the acquired wastewater treatment history data may include, but is not limited to, biochemical oxygen demand (BOD 5), dissolved oxygen content, chemical oxygen demand, total suspended solids concentration, water environment quality index, dissolved fast biodegradable organic matter concentration, readily degradable substrate concentration, NO ₃ ^- N and NO ₂ ^- N concentration, NH ₄ ^- N and NH ₃ ^- N concentration solubility biodegradable organic nitrogen concentration, easily degradable substrate concentration.

Further, the present invention may determine a similar training sample and an associated input variable corresponding to the query sample according to the historical data and the query sample, specifically, the present invention may check the integrity of the query sample, and fill up a corresponding missing value if the query sample has data missing. And determining the historical data with higher similarity to the query sample as the similar training sample according to the historical data. And determining candidate input variables with high correlation with the sewage index as the related input variables according to the historical data.

Further, the present invention may check the integrity of the query sample. And if the query sample has data loss, filling a corresponding loss value. Specifically, the history data is subjected to data preprocessing. And determining the potential posterior distribution of the historical data according to the preprocessed historical data. And filling the data missing of the query sample according to the historical data and the potential posterior distribution of the historical data.

Preferably, in some embodiments of the present invention, the step of performing data preprocessing on the historical data may be performing mean variance normalization on the historical data, and determining dimensions of latent variables in the historical data according to the historical data after the mean variance normalization.

It will be understood by those skilled in the art that the above-mentioned mean variance normalization method is only a preferred solution provided by the present invention, and is intended to pre-process data to determine the dimensions of latent variables in historical data, and is not intended to limit the scope of the present invention.

Specifically, the invention can firstly normalize the variance of the mean value of the historical data, and the formula is as follows:

wherein: z represents the normalized data, x represents the original data, μ represents the expectation of the original data, and σ represents the variance of the original data, which aims to perform mean-variance normalization on the current data and reduce the influence of potential variables with larger variance. After this, principal component analysis [1] (hereinafter PCA) is used to determine the dimensions of latent variables to reduce the data processing time thereafter.

Then, the potential posterior distribution of the historical data can be obtained by VBPCA (penalty probability principal component analysis), and the main steps are as follows: VBPCA penalizes parameter values on the basis of Probability Principal Component Analysis (PPCA), and the VBPCA assumes that the parameters of the PPCA have Gaussian prior distribution.

p(μ)＝N(μ|0，ε _μ I)

Wherein W ∈ R ^D×K Representing the loaded matrix, W the jth column of the loaded matrix,

set of probability model hypercames representing VBPCA, θ = { W, μ, e _j Denotes the model parameters.

The latent variable posterior distribution p9 theta | x cannot be obtained in the above steps _j Epsilon), introducing a variational Bayes EM algorithm for solving the problem, simplifying the posterior estimation process by using average field approximation, and then iteratively updating the posterior distribution of unknown parameters by using a coordinate raising strategy, wherein the factorization formula of the posterior distribution of the parameters is as follows:

wherein x _j ∈R ^D×1 Represents the jth observed value, z _j ∈R ^K×1 Denotes the latent variable of the jth sample, W _i I-th column, u, of the loading matrix _i Mean value of the ith sample, C (q (theta), epsilon) is cost function, and probability density function q (theta) tableA posteriori distribution p (theta | x) of latent variables _j Epsilon) is used.

Thus, the present invention can obtain the potential posterior distribution of the historical data by the method.

Further, the invention can select the similar training sample by determining the first SKL divergence of the historical data and determining the difference degree of the latent variables of the query sample in the historical data according to the first SKL divergence of the historical data.

Further, according to the preset number N1, the corresponding number N1 of historical data with a smaller first SKL divergence can be selected from the historical data as the similar training samples.

Specifically, the invention adopts the similarity measurement criterion of Gaussian distribution on the assumption that the measurement noise is white Gaussian when the sample and variable selection is carried out. The criterion uses SKL divergence to calculate the degree of difference of gaussian distributions. The SKL divergence can be defined as follows:

wherein x _j: Represents the jth history sample, x _query Representing query samples, trace representing trace operators, z _query A latent variable representing a sample of the query,

and

the mean and covariance values are given, respectively.

Samples with smaller SKL divergence values have greater similarity to the query sample. And the historical data is sorted in an ascending order according to the SKL divergence value of the historical data, and the previous N1 most similar samples are selected as the similar training samples according to experience so as to carry out correlation analysis on the subsequent variables.

Further, the present invention may determine, as the associated input variable, a candidate input variable having a higher correlation with the sewage indicator according to the historical data. Specifically, the present invention may approximate the distribution of each of the candidate input variables and the predicted value by a gaussian kernel density estimation method, calculate a probability distribution function of each of the candidate input variables, and determine a second SKL divergence of the candidate input variable based on a difference in the distribution between the candidate input variable and the predicted value. And determining the related input variable from the candidate input variables according to the second SKL divergence.

Specifically, in the instant learning framework, after a sample is selected, an appropriate variable selection needs to be performed based on the selected sample, and the main steps are as follows:

first, the distribution of each input variable and the predicted value is approximated using a gaussian kernel density estimation method. The formula is as follows:

where M represents the length of the current database, X _: Denotes the jth candidate input variable, X _i,j The jth candidate input variable representing the ith sample, Y the prediction vector, and h the bandwidth.

Then, after calculating the probability distribution function of each candidate input variable, analyzing the distribution difference between the candidate input variable and the predicted value by using the SKL divergence, wherein the formula is as follows:

wherein P (X) _:j ) And P (Y) representsProbability distribution functions of the j candidate input variables and the predicted values.

Further, according to the preset number N2, the present invention may select, from the candidate input variables, a corresponding number N2 of the candidate input variables having a smaller second SKL divergence as the associated input variables.

Specifically, in some embodiments of the invention, variables with smaller SKL divergence values may be considered as input variables to which predicted values are related. All variables will be reordered in ascending order according to SKL divergence values and the top N2 most relevant variables are empirically selected as the associated input variables.

Further, after the correlation input variable and the similar training sample are determined, the method can perform slow feature regression analysis on the query sample according to the correlation input variable and the training sample to determine the numerical value of the sewage index of the query sample. The invention can firstly determine a sample weight matrix according to the query sample and the similar training sample. And then determining a weighted average matrix of the input and the output of the training sample according to the sample weight matrix. And then, constructing and training a first slow characteristic regression analysis model according to the correlation input variable, the similar training sample and the weighted average matrix. And finally, predicting the sewage index according to the first slow characteristic regression analysis model and the query sample.

Further, in some embodiments of the invention, the invention may first discard a second slow feature regression analysis model that has been built or exists before the first slow feature regression analysis model. And inputting the query sample into the constructed first slow characteristic regression analysis model to determine the numerical value of the sewage index.

Specifically, sample X is queried _query And training sample X _selected The similarity matrix Ψ has the following formula:

therein Ψ _i (i =1,2, …, N1) is query sample X _query And the ith training sample

I.e. the sample weight. The ith diagonal element is defined according to the corresponding potential distribution dissimilarity, and can be obtained by calculation:

where std (-) calculates the standard deviation and η represents an adjustable parameter. Notably, when η → + ∞, the weight of the selected samples is close to 1,

computing training sample input and output weighted average matrix (X) _w 、Y _w ) The expression is as follows:

wherein, the first and the second end of the pipe are connected with each other,

the (i) th training sample is represented,

represents the value of BOD5 for the ith training sample. Before constructing the local weighted slow feature analysis based model, the weighted mean of all variables of the training sample and the query sample needs to be removed.

Wherein

N representing elements all 1 ₁ X 1 matrix, N ₁ Representing the number of training samples.

After scaling the selected input data and query samples, the training samples will be weighted. Then, a basic linear slow feature analysis algorithm will be implemented on the weighted samples:

in which a two-step Singular Value Decomposition (SVD) is employed. First, the covariance matrix of the input data is calculated as follows

The interrelationship between input variables can be eliminated by singular value decomposition

S＝UDU ^T

Then, a whitening matrix Q is obtained

Q＝D ^-1/2 U ^T

In addition, a whitening transformation is given

Wherein

In a similar way, another covariance can be obtained

Wherein

Is the first derivative of Z. Next, the orthogonal matrix P can be derived by the following SVD

The weighting matrix W can be calculated as follows

Finally, the slow characteristic can be written as follows

A coefficient theta between the slow feature and the output value is calculated. The relationship between latent variables and output values may be represented in a weighted linear regression form, which is formulated as follows:

and predicting the query sample, and after the coefficient is obtained, calculating a predicted output value as a predicted value of the sewage index.

And then, before the query sample is updated, namely the index of the sewage sample is measured in the next round, discarding the constructed local prediction model, and repeating the method to predict the sewage index of the updated query sample, so that the model can accurately predict the sewage index under the condition of data loss to determine the accurate value of the sewage index.

Please further refer to fig. 3, fig. 4 and fig. 5. Fig. 3, 4 and 5 are numerical diagrams illustrating predicted values and actual values of a sewage index measurement method according to some embodiments of the present invention.

As shown in the figure, the deviation between the actual value and the predicted value predicted by the sewage index determination method is small, so that the model can accurately predict the sewage index under the condition of data loss to determine the accurate numerical value of the sewage index.

While, for purposes of simplicity of explanation, the methodologies are shown and described as a series of acts, it is to be understood and appreciated that the methodologies are not limited by the order of acts, as some acts may, in accordance with one or more embodiments, occur in different orders and/or concurrently with other acts from that shown and described herein or not shown and described herein, as would be understood by one skilled in the art.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

Although the sewage indicator measuring system described in the above embodiments can be implemented by a combination of software and hardware. However, it is understood that the sewage indicator measurement system may be implemented in software or hardware. For a hardware implementation, the sewage indicator determination system may be implemented in one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), processors, controllers, micro-controllers, microprocessors, other electronic devices configured to perform the above functions, or a selected combination thereof. For software implementations, the sewage indicator determination system may be implemented by separate software modules, such as program modules (processes) and function modules (functions), running on a common chip, each of which performs one or more of the functions and operations described herein.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.

In one or more exemplary embodiments, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software as a computer program product, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a web site, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk (disk) and disc (disc), as used herein, includes Compact Disc (CD), laser disc, optical disc, digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks (disks) usually reproduce data magnetically, while discs (discs) reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the spirit or scope of the disclosure. Thus, the disclosure is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for measuring an index of wastewater, the method comprising the steps of:

acquiring historical data of sewage treatment and a query sample to be predicted;

according to the historical data and the query samples, determining similar training samples and associated input variables corresponding to the query samples; and

and performing slow feature regression analysis on the query sample according to the similar training sample and the associated input variable so as to determine the numerical value of the sewage index of the query sample.

2. The assay method of claim 1, wherein determining similar training samples and associated input variables corresponding to the query sample based on the historical data and the query sample comprises:

checking the integrity of the query sample, and filling a corresponding missing value if the query sample has data missing;

according to the historical data, determining historical data with higher similarity to the query sample as the similar training sample; and

and determining candidate input variables with higher correlation with the sewage index as the correlation input variables according to the historical data.

3. The method of claim 2, wherein the step of checking the integrity of the query sample and filling in the corresponding missing value if the query sample has data missing comprises:

performing data preprocessing on the historical data;

determining potential posterior distribution of the historical data according to the preprocessed historical data; and

and filling the data missing of the query sample according to the historical data and the potential posterior distribution of the historical data.

4. The assay of claim 3, wherein the step of pre-processing the historical data comprises:

performing mean variance normalization on the historical data; and

and determining the dimensionality of the potential variable in the historical data according to the historical data after the mean variance normalization.

5. The measurement method according to claim 4, wherein the step of determining, from the historical data, the historical data having a higher similarity to the query sample as the similar training sample comprises:

determining latent variables of the historical data and a first SKL divergence of the historical data; and

and determining the difference degree of the historical data and the latent variable of the query sample according to the first SKL divergence so as to select the similar training sample.

6. The method of claim 5, wherein the step of determining the difference between the historical data and the latent variable of the query sample according to the first SKL divergence to select the similar training sample comprises:

and selecting N1 historical data with a corresponding quantity, with smaller first SKL divergence, from the historical data as the similar training samples according to a preset quantity N1.

7. The method according to claim 2, wherein the step of determining, as the associated input variable, a candidate input variable having a high correlation with the sewage indicator based on the historical data includes:

approximating the distribution of each of the candidate input variables and the predicted values using a gaussian kernel density estimation method;

calculating a probability distribution function of each candidate input variable;

determining a second SKL divergence of the candidate input variable according to the distribution difference between the candidate input variable and the predicted value; and

and determining the associated input variable from the candidate input variables according to the second SKL divergence.

8. The measurement method according to claim 7, wherein the step of determining the associated input variable from the candidate input variables based on the second SKL divergence comprises:

and selecting N2 candidate input variables with smaller second SKL divergence from the candidate input variables as the associated input variables according to the preset number N2.

9. The assay of claim 8, wherein the step of performing a slow feature regression analysis on the query sample based on the similar training samples and the associated input variables to determine the value of the dirty water indicator for the query sample comprises:

determining a sample weight matrix according to the query sample and the similar training sample;

determining a weighted average matrix of the input and the output of the training sample according to the sample weight matrix;

constructing and training a first slow characteristic regression analysis model according to the associated input variables, the similar training samples and the weighted average matrix; and

and predicting the sewage index according to the first slow characteristic regression analysis model and the query sample.

10. The assay of claim 9, wherein the step of predicting the wastewater indicator based on the first slow feature regression analysis model and the query sample comprises:

discarding a second slow feature regression analysis model that has been constructed or exists before the first slow feature regression analysis model, and inputting the query sample to the first slow feature regression analysis model to determine a numerical value of the sewage indicator.

11. A sewage index measurement system, comprising:

a memory; and

a processor coupled to the memory and configured to perform the method of determining a wastewater indicator of any of claims 1-10.

12. A computer readable storage medium having stored thereon computer instructions, wherein the computer instructions, when executed by a processor, perform a method for determining a wastewater indicator according to any one of claims 1 to 10.