CN112949680A - Pollution source identification method based on corresponding analysis and multiple linear regression - Google Patents

Pollution source identification method based on corresponding analysis and multiple linear regression Download PDF

Info

Publication number
CN112949680A
CN112949680A CN202110102148.6A CN202110102148A CN112949680A CN 112949680 A CN112949680 A CN 112949680A CN 202110102148 A CN202110102148 A CN 202110102148A CN 112949680 A CN112949680 A CN 112949680A
Authority
CN
China
Prior art keywords
pollution source
linear regression
multiple linear
source
analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110102148.6A
Other languages
Chinese (zh)
Inventor
陈锋
曹张伟
刘凤明
周建华
司秀荣
丁玎
孟凡生
梅凯
刘艳梅
李国洪
薛浩
金永涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North China Institute of Aerospace Engineering
Original Assignee
North China Institute of Aerospace Engineering
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North China Institute of Aerospace Engineering filed Critical North China Institute of Aerospace Engineering
Priority to CN202110102148.6A priority Critical patent/CN112949680A/en
Publication of CN112949680A publication Critical patent/CN112949680A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Processing Of Solid Wastes (AREA)

Abstract

The invention discloses an environmental active pollutant source analysis method based on corresponding analysis and multiple linear regression, which comprises the following steps: firstly, identifying a pollution source by using a corresponding analysis method based on sample data of the pollution source, and determining the number of main factors; and secondly, calculating the contribution rate of the pollution source of the factor load by utilizing multiple linear regression, and realizing source analysis of the characteristic pollutants. The pollution source identification method based on the corresponding analysis and the multiple linear regression provided by the invention identifies the pollution source by using the corresponding analysis method, calculates the contribution rate of the pollution source by using the composite multiple linear regression method, considers the factor load identification process as a nonlinear classification process, is a multi-factor comprehensive classification problem, is a pattern identification process, has strong practicability and wide popularization and application values, and provides reliable technical support for environmental management departments to deal with pollution accidents and control pollution risks.

Description

Pollution source identification method based on corresponding analysis and multiple linear regression
Technical Field
The invention relates to the technical field of pollution source identification, in particular to a pollution source identification method based on corresponding analysis and multiple linear regression.
Background
The pollution source identification technology is a method for distinguishing, analyzing and evaluating the source of pollutants. Current pollution source identification technologies can be broadly divided into three categories: list analysis, diffusion model and receptor model. The list analysis method is a source analysis method for establishing a list model by observing and simulating source emission amount, emission characteristics, emission geographical distribution and the like of pollutants; the diffusion model belongs to a prediction model, and predicts the time-space change condition of pollutants by inputting the emission data and related parameter information of each pollution source; receptor models are a class of techniques that determine the contribution rates of each source of contamination by chemical and microscopic analysis of a sample of the receptor, with the ultimate goal of identifying the source of contamination that contributes to the receptor and quantitatively calculating the contribution rates of each source of contamination. In the various source analysis methods based on the receptor model chemical method, the multivariate statistical method is simple to apply, the fingerprint spectrogram of each pollution source does not need to be known in advance, the pollution source in a research area does not need to be monitored in advance, and only receptor sample monitoring data are needed. A positive definite matrix factorization model belongs to a multivariate statistical method in a pollutant source analysis technology, and is a factor analysis method which is based on non-negative elements in a decomposition matrix and utilizes data standard deviation to carry out optimization. The core idea of the technology is principal component analysis, traditional principal component factor analysis (PCA) based on the least square method, which causes data distortion in the factor analysis process due to the adoption of line or column based normalization of the receptor sample data D. They also believe that least squares based PCA implicitly assumes that there is an unrealistic standard deviation of the sample data, resulting in PCA that does not yield an optimal solution of minimum variance. The method utilizes a positive definite matrix factorization model to carry out source analysis research, and has the core links of nonnegative constraint factorization and pollution source identification by utilizing a factor load matrix.
At present, research aiming at pollution source identification is few, and the main pollution source identification method is to realize qualitative comparison through graph observation of a source spectrum and a factor load or realize semi-quantitative comparison through calculating deviation of the source spectrum and the factor load. Most of the methods do not consider the nonlinear characteristics of the pollution source spectrum, and the identification result cannot truly reflect the corresponding relation between the factor load and the pollution source spectrum.
Disclosure of Invention
The invention aims to provide a pollution source identification method based on correspondence analysis and multiple linear regression.
A pollution source identification method based on corresponding analysis and multiple linear regression comprises the following steps:
the method comprises the following steps: based on sample data of the pollution source, identifying the pollution source by using a corresponding analysis method, which specifically comprises the following steps:
carrying out standardized processing on sample data of the pollution source;
calculating a correlation coefficient matrix of the pollution source sample;
calculating the eigenvalue and corresponding eigenvector of the correlation coefficient matrix;
taking all factors with the characteristic value larger than 1 as main factors, and determining the number of the main factors;
step two: the method for realizing the factor load pollution source contribution rate calculation by utilizing the multiple linear regression and realizing the characteristic pollutant source analysis specifically comprises the following steps:
establishing a multivariate phenomenon regression model:
Figure BDA0002916309360000021
y represents the total concentration of the pollution source, P represents the number of extracted main factors, and XiIs a factor score, m is a pollutant class, b is residual variable information not interpreted by the factor;
requirement XiFor non-linearity, sum y and X aboveiNormal normalization analysis was performed to yield:
Figure BDA0002916309360000022
z is the positive standard deviation of the source of contamination, XiIs a factor score, BiIs a multiple linear regression coefficient;
according to the formula: t is ti(%)=100(Bi/∑Bi) And calculating the average percentage contribution rate of the pollution source i.
Optionally, the sample data of the pollution source is standardized,
according to the formula:
Figure BDA0002916309360000023
wherein the content of the first and second substances,
Figure BDA0002916309360000024
i represents the number of monitoring points, j represents the positions of the pollutants of the class I of the monitoring points, and xijIndicating the concentration of the class j contaminant at the ith monitoring point,
Figure BDA0002916309360000025
represents the average concentration of the class j contaminants at all monitoring points,
Figure BDA0002916309360000026
is the pollutant concentration after standardized treatment;
the calculation of a correlation coefficient matrix for the pollution source samples,
according to the formula:
Figure BDA0002916309360000027
wherein the content of the first and second substances,
Figure BDA0002916309360000031
optionally, the sample of the contamination source comprises polycyclic aromatic hydrocarbons or heavy metals.
Compared with the prior art, the pollution source identification method based on the corresponding analysis and the multiple linear regression has the following beneficial effects:
firstly, the method can quickly and accurately trace the source of the pollutant, has strong practicability and wide popularization and application values, and provides reliable technical support for environmental management departments to deal with pollution accidents and control pollution risks;
secondly, the traditional pollutant source analysis technology can only roughly give a pollution source class which has a large contribution to an environment receptor, but cannot give the contribution of a specific emission source to the receptor, and lacks the practical guiding significance on pollution prevention and treatment work;
thirdly, the invention provides technical support for making regional pollution control countermeasures and improving regional environment quality, so that when environmental management departments face pollution problems in the future, pollution sources can be rapidly identified through a system, a complete source analysis method and a corresponding data information system, and pollution prevention and control are carried out.
Detailed Description
All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide a pollution source identification method based on correspondence analysis and multiple linear regression.
The present invention will be described in further detail with reference to specific embodiments in order to make the above objects, features and advantages more apparent and understandable.
A heavy metal pollution source identification method based on corresponding analysis and multiple linear regression comprises the following steps:
the method comprises the following steps: the method for calculating the heavy metal pollution source by applying the corresponding analysis method to determine the number of the main component factors comprises the following steps:
data normalization, according to the formula:
Figure BDA0002916309360000032
wherein the content of the first and second substances,
Figure BDA0002916309360000041
calculating a correlation coefficient matrix of the samples according to the formula as follows:
Figure BDA0002916309360000042
wherein the content of the first and second substances,
Figure BDA0002916309360000043
calculating the eigenvalues and corresponding eigenvectors of the correlation coefficient matrix:
characteristic value: lambda [ alpha ]1,λ2...λp
Feature vector: a isi=(ai1,ai2,...,ai3),i=1,2,...,p;
Taking all factors with the characteristic value larger than 1 as main factors, and determining the number of the main factors;
and step two, calculating the pollution source contribution rate of the factor load by utilizing multiple linear regression, and realizing source analysis of the characteristic pollutants, wherein the method comprises the following steps:
establishing a multiple linear regression model:
Figure BDA0002916309360000044
y represents the total concentration of the pollution source, P represents the number of extracted main factors, and XiIs a factor score, m is a pollutant category, i.e. a main factor, b is residual variable information which is not explained by the factor and belongs to random errors;
requirement XiFor non-linearity, sum y and X aboveiNormal normalization analysis was performed to yield:
Figure BDA0002916309360000045
z is the normal standard deviation of the source of contamination, XiIs a factor score, BiIs a multiple linear regression coefficient;
according to the formula: t is ti(%)=100(Bi/∑Bi) And calculating the average percentage contribution rate of the pollution source i.
In the first step, the process of determining the main factor is realized by corresponding analysis, and specifically includes:
(1) from the original data matrix X, a normalized probability matrix is calculated
Figure BDA0002916309360000046
Figure BDA0002916309360000047
The concentration of the pollutant after standardization is T is the sum of the concentrations of all monitoring points of the pollutant;
(2) computing a transition matrix
Z=(zij)
Figure BDA0002916309360000048
Wherein, i is 1.·, n; j 1.. p
Figure BDA0002916309360000051
T is the sum of the concentrations of the monitoring points of the pollutant for the normalized pollutant concentration, wherein
xi.=xi1+xi2+.......+xip
x.j=x1j+x2j+......+xnj
(3) Performing factor analysis
Analysis of R-type factor
Calculating characteristic roots of the skew matrix A which is Z' Z, extracting the previous m characteristic roots according to the cumulative percentage of the characteristic roots which is more than or equal to 85%, and calculating corresponding unit characteristic vectors to obtain a new factor load matrix;
analysis of Q-type factor
Calculating unit eigenvectors corresponding to the matrix B which is ZZ' for the m characteristic roots, thereby obtaining a Q-type factor load matrix;
(4) and (5) comprehensively analyzing and judging the main factor by R-Q.
The method is used for identifying heavy metal pollution sources in sediments in the Yangjiang river basin, and 10 station surface sediment samples are collected by the grab type sampler. The data analysis mainly performed on As, Hg, Cd, Cr and Pb, and the results are shown in Table 1.
TABLE 1
Figure BDA0002916309360000052
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (3)

1. An environmental pollutant source analysis method based on corresponding analysis and multiple linear regression is characterized by comprising the following steps:
the method comprises the following steps: based on sample data of the pollution source, identifying the pollution source by using a corresponding analysis method, which specifically comprises the following steps:
carrying out standardized processing on sample data of the pollution source;
calculating a correlation coefficient matrix of the pollution source sample;
calculating the eigenvalue and corresponding eigenvector of the correlation coefficient matrix;
taking all factors with the characteristic value larger than 1 as main factors, and determining the number of the main factors;
step two: the method for realizing the factor load pollution source contribution rate calculation by utilizing the multiple linear regression and realizing the characteristic pollutant source analysis specifically comprises the following steps:
establishing a multiple linear regression model:
Figure FDA0002916309350000011
y represents the total concentration of the pollution source, P represents the number of extracted main factors, and XiIs a factor score, m is a pollutant class, b is residual variable information not interpreted by the factor;
requirement XiFor non-linearity, sum y and X aboveiNormal normalization analysis was performed to yield:
Figure FDA0002916309350000012
z is the normal standard deviation of the source of contamination, XiIs a factor score, BiIs a multiple linear regression coefficient;
according to the formula: t is ti(%)=100(Bi/∑Bi) And calculating the average percentage contribution rate of the pollution source i.
2. The pollution source identification method based on correspondence analysis and multiple linear regression as claimed in claim 1, wherein the normalization process is performed on the pollution source sample data,
according to the formula:
Figure FDA0002916309350000013
wherein the content of the first and second substances,
Figure FDA0002916309350000014
i represents the number of monitoring points, j represents the positions of the pollutants of the class I of the monitoring points, and xijIndicating the concentration of the class j contaminant at the ith monitoring point,
Figure FDA0002916309350000015
represents the average concentration of the class j contaminants at all monitoring points,
Figure FDA0002916309350000016
is the pollutant concentration after standardized treatment;
the calculation of a correlation coefficient matrix for the pollution source samples,
according to the formula:
Figure FDA0002916309350000021
wherein the content of the first and second substances,
Figure FDA0002916309350000022
3. the pollution source identification method based on the correspondence analysis and the multiple linear regression according to claim 1, wherein the pollution source sample comprises polycyclic aromatic hydrocarbons or heavy metals.
CN202110102148.6A 2021-01-26 2021-01-26 Pollution source identification method based on corresponding analysis and multiple linear regression Pending CN112949680A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110102148.6A CN112949680A (en) 2021-01-26 2021-01-26 Pollution source identification method based on corresponding analysis and multiple linear regression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110102148.6A CN112949680A (en) 2021-01-26 2021-01-26 Pollution source identification method based on corresponding analysis and multiple linear regression

Publications (1)

Publication Number Publication Date
CN112949680A true CN112949680A (en) 2021-06-11

Family

ID=76236809

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110102148.6A Pending CN112949680A (en) 2021-01-26 2021-01-26 Pollution source identification method based on corresponding analysis and multiple linear regression

Country Status (1)

Country Link
CN (1) CN112949680A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113706127A (en) * 2021-10-22 2021-11-26 长视科技股份有限公司 Water area analysis report generation method and electronic equipment
CN116148400A (en) * 2023-04-20 2023-05-23 北京大学 Quantitative source analysis method based on pollution source and pollution receptor high-resolution mass spectrum data

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110175647A (en) * 2019-05-28 2019-08-27 北华航天工业学院 A kind of pollution source discrimination clustered based on principal component analysis and K-means

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110175647A (en) * 2019-05-28 2019-08-27 北华航天工业学院 A kind of pollution source discrimination clustered based on principal component analysis and K-means

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘颖: "上海市土壤和水体沉积物中多环芳烃的测定方法、分布特征和源解析", 《中国博士学位论文全文数据库工程科技Ⅰ辑》, no. 8, 15 August 2008 (2008-08-15), pages 5 - 2 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113706127A (en) * 2021-10-22 2021-11-26 长视科技股份有限公司 Water area analysis report generation method and electronic equipment
CN113706127B (en) * 2021-10-22 2022-02-22 长视科技股份有限公司 Water area analysis report generation method and electronic equipment
CN116148400A (en) * 2023-04-20 2023-05-23 北京大学 Quantitative source analysis method based on pollution source and pollution receptor high-resolution mass spectrum data
CN116148400B (en) * 2023-04-20 2023-06-27 北京大学 Quantitative source analysis method based on pollution source and pollution receptor high-resolution mass spectrum data

Similar Documents

Publication Publication Date Title
CN112949680A (en) Pollution source identification method based on corresponding analysis and multiple linear regression
CN105631203A (en) Method for recognizing heavy metal pollution source in soil
CN110619691B (en) Prediction method and device for slab surface cracks
CN112198144B (en) Method and system for quickly tracing sewage
CN112101789A (en) Water pollution alarm grade identification method based on artificial intelligence
CN108052486B (en) Fine source analysis method based on inorganic components and organic markers of particulate matters
CN112904810B (en) Process industry nonlinear process monitoring method based on effective feature selection
CN115453064B (en) Fine particulate matter air pollution cause analysis method and system
CN113516228A (en) Network anomaly detection method based on deep neural network
CN109669017B (en) Refinery distillation tower top cut water ion concentration prediction method based on deep learning
CN108197280B (en) Mining ability evaluation method based on industrial equipment data
CN113918707A (en) Policy convergence and enterprise image matching recommendation method
CN104915563B (en) The chronic reference prediction method of fresh water based on metal quantitative structure activity relationship
CN114217025A (en) Analysis method for evaluating influence of meteorological data on air quality concentration prediction
CN111737993B (en) Method for extracting equipment health state from fault defect text of power distribution network equipment
CN115879379B (en) Intelligent corrosion monitoring and early warning method and system for equipment
CN117493759A (en) Gas methane distinguishing method and device based on principal component analysis and vector machine
CN112633528A (en) Power grid primary equipment operation and maintenance cost determination method based on support vector machine
CN113281229B (en) Multi-model self-adaptive atmosphere PM based on small samples 2.5 Concentration prediction method
Zhang et al. Determining statistical process control baseline periods in long historical data streams
CN112711911B (en) Rapid pollution tracing method applied to boundary observation based on pollution source spectrum library
CN115728290A (en) Method, system, equipment and storage medium for detecting chromium element in soil
CN114390002A (en) Network flow multi-module clustering anomaly detection method based on grouping conditional entropy
CN114611399A (en) PM based on NGboost algorithm2.5Concentration long-time sequence prediction method
Koleva et al. Stochastic modelling of daily air pollution in Burgas, Bulgaria

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination