CN117216659A

CN117216659A - Atmospheric particulate source analysis method and system based on single-particle aerosol mass spectrum

Info

Publication number: CN117216659A
Application number: CN202311174457.XA
Authority: CN
Inventors: 李梅; 许永江; 成春雷
Original assignee: Jinan University
Current assignee: Jinan University
Priority date: 2023-09-12
Filing date: 2023-09-12
Publication date: 2023-12-12

Abstract

The invention provides an atmospheric particulate source analysis method and system based on single-particle aerosol mass spectrometry, which belong to the field of particulate identification, and the method comprises the following steps: collecting mass spectrum data of atmospheric particulates by a single particle aerosol mass spectrometer; analyzing the mass spectrum data based on a pre-trained deep learning model to determine the source of the atmospheric particulates; the deep learning model comprises a one-dimensional convolutional neural network, a long-term memory network and a short-term memory network which are sequentially connected, and a multi-layer perceptron. According to the invention, end-to-end particulate matter source analysis is performed through the deep learning model, the automation from input to output is realized to a higher degree, the influence of subjective factors is reduced, the deep learning model has better feature extraction and generalization capability, the weaknesses and nonlinear relations of particulate matters can be better captured, and the accuracy of atmospheric particulate matter source analysis is further improved.

Description

Atmospheric particulate source analysis method and system based on single-particle aerosol mass spectrum

Technical Field

The invention relates to the field of particulate matter identification, in particular to an atmospheric particulate matter source analysis method and system based on single-particle aerosol mass spectrometry.

Background

Atmospheric particulates enter the human body through the respiratory tract, and then harm the heart and lung functions, thereby causing health effects to human beings. The chemical composition of the particles can reflect the source and the formation process, so that the research on the chemical composition of the particles can realize source analysis and promote human beings to take corresponding measures for preventing and treating the atmospheric pollution to protect the health of the human beings.

In atmospheric particulate research, on-line source resolution of particulate matter is often achieved using single particle aerosol mass spectrometers (Single Particle Aerosol Mass Spectrometer, SPAMS). The single-particle aerosol mass spectrum is a mass spectrum instrument which can directly monitor the chemical composition of the particles in the atmosphere on line without pretreatment, can provide the spectrum information of the particles with high time resolution, and is widely applied to the analysis of the sources of the atmospheric particles.

The main modes of the current analysis of the atmospheric particulate sources are positive definite matrix factorization, ART-2a algorithm and similarity method. The positive definite matrix factorization is to decompose the observed atmosphere data into a plurality of linear combinations of potential factors, calculate the contribution degree of each factor to the observed data, manually select the input data types and the sources of the judging result factors, introduce certain subjective factors, the selected characteristics are difficult to cover the rules of pollution sources, the physical significance of the factors is not clear, and in addition, weak linear and nonlinear relations cannot be well processed; the ART-2a algorithm is based on a self-adaptive neural network, clusters are formed by particles, mass spectrum characteristics of the clusters are identified, pollution sources to which the clusters belong are judged, redundant clusters are often required to be combined manually in the operation, and the method has certain subjectivity and long time consumption; the similarity method judges attribution by comparing cosine similarity of two particle mass spectra, generally collects particle mass spectrum data of known pollution sources, performs cosine similarity calculation with particles in the environment, has limitation on processing weak linear and nonlinear relations, cannot well capture differences between two highly similar particles from different sources, and needs a threshold value for manually determining cosine similarity in the particle attribution judging process, so that certain subjectivity exists.

With the intensive development of the analysis of the source of atmospheric particulates, the need for fine analysis, which is to identify two highly similar particulates, is increasing, and their sources are similar, such as those emitted by gasoline and diesel vehicles, or those emitted by agricultural and engineering machinery, both of which are diesel engines. In addition, although some of the particles discharged from pollution sources have obvious tracing characteristics, the discharged particles have low proportion, so that the problem of insufficient coverage exists when the tracing characteristics are used for judging the sources, and therefore, the conventional method has a certain limit on fine resolution of the highly similar particles.

In summary, the existing fine analysis technology of the atmospheric particulates is not mature, and cannot well capture the weak line nature and nonlinear relation of the particulates, so that more complex modes and relations in the particulates cannot be identified, and the problems of high subjectivity, long time consumption in the analysis process and incapability of well distinguishing similar particulates from different sources exist.

Disclosure of Invention

The invention aims to provide an atmospheric particulate source analysis method and an atmospheric particulate source analysis system based on single-particle aerosol mass spectrometry, which can improve the accuracy of atmospheric particulate source analysis.

In order to achieve the above object, the present invention provides the following solutions:

an atmospheric particulate source analysis method based on single particle aerosol mass spectrometry, comprising:

collecting mass spectrum data of atmospheric particulates by a single particle aerosol mass spectrometer;

analyzing the mass spectrum data based on a pre-trained deep learning model to determine the source of the atmospheric particulates; the deep learning model comprises a one-dimensional convolutional neural network, a long-term memory network and a multi-layer perceptron which are sequentially connected.

Optionally, analyzing the mass spectrum data based on a pre-trained deep learning model to determine a source of atmospheric particulates, specifically including:

respectively carrying out normalization processing on the positive spectrogram and the negative spectrogram of the mass spectrum data to obtain mass spectrum one-dimensional data;

extracting a plurality of local features of the mass spectrum one-dimensional data by adopting a pre-trained one-dimensional convolutional neural network;

extracting the dependency relationship among a plurality of local features by adopting a pre-trained long-period and short-period memory network to obtain sequence data;

and determining the source of the atmospheric particulates by adopting a pre-trained multi-layer perceptron according to the sequence data.

Optionally, the positive spectrogram and the negative spectrogram of the mass spectrum data are respectively normalized by adopting an L2 norm.

Optionally, the one-dimensional convolutional neural network extracts local features of the mass spectrum one-dimensional data using the following formula:

Y ₁ ＝σ ₁ (W ₁ ·X ₁ +b ₁ )；

wherein Y is ₁ Is a local feature, W ₁ Is a weight matrix of a one-dimensional convolutional neural network, X ₁ B is mass spectrum one-dimensional data ₁ Is the offset vector sigma of a one-dimensional convolutional neural network ₁ To activate the function.

Optionally, the training process of the deep learning model includes:

obtaining a pollution spectrum library; the pollution spectrum library comprises a plurality of single-particle mass spectrum data and known sources corresponding to the single-particle mass spectrum data; the single particle mass spectrum data is one-dimensional data after normalization processing;

determining a loss function from the known sources of each single particle mass spectrum data;

and based on the loss function, performing iterative training on the deep learning model by adopting a back propagation algorithm until the loss function converges or reaches the maximum iteration number, so as to obtain a trained deep learning model.

Optionally, the loss function is a class cross entropy loss.

In order to achieve the above purpose, the present invention also provides the following solutions:

an atmospheric particulate source resolving system based on single particle aerosol mass spectrometry, comprising:

the single-particle aerosol mass spectrometer is used for collecting mass spectrum data of atmospheric particulate matters;

the processor is connected with the single-particle aerosol mass spectrometer and is used for analyzing the mass spectrum data based on a pre-trained deep learning model so as to determine the source of atmospheric particulate matters; the deep learning model comprises a one-dimensional convolutional neural network, a long-term memory network and a multi-layer perceptron which are sequentially connected.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects: according to the invention, the mass spectrum data of the atmospheric particulates are acquired by utilizing the single-particle aerosol mass spectrometer, the chemical composition information of each particle can be acquired, the end-to-end particulate matter source analysis is carried out by using the deep learning model consisting of the one-dimensional convolutional neural network, the long-term and short-term memory network and the multi-layer perceptron, the automation from input to output is high, the influence of subjective factors is reduced, the deep learning model has better characteristic extraction and generalization capability, the weaknesses and nonlinear relations of the particulates can be better captured, and the accuracy of the atmospheric particulate matter source analysis is further improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of an atmospheric particulate source analysis method based on single particle aerosol mass spectrometry provided by the invention;

FIG. 2 is a schematic diagram of a deep learning model constructed in accordance with the present invention;

FIG. 3 is a schematic illustration of an atmospheric particulate source resolution process;

fig. 4 is a schematic diagram of an atmospheric particulate source analysis system based on single particle aerosol mass spectrometry according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The invention aims to provide an atmospheric particulate source analysis method and system based on single-particle aerosol mass spectrometry, which acquire chemical composition information of each particle by utilizing a single-particle aerosol mass spectrometer, analyze the source of the particle end to end, form a deep learning model by using a One-dimensional convolutional neural network (One-Dimensional Convolutional Neural Network, 1D-CNN), a Long Short-term memory network (LSTM) and a Multi-Layer Perceptron (MLP), realize higher degree of automation from input to output, reduce the influence of subjective factors, shorten the running time of the model, better feature extraction and generalization capability of the deep learning model, better capture the weak line and nonlinear relation of the particle, and realize the real-time fine analysis of the source of the atmospheric particulate with high resolution and high accuracy.

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.

Example 1

As shown in fig. 1, the embodiment provides an atmospheric particulate source analysis method based on single-particle aerosol mass spectrometry, which includes:

step 100: collecting mass spectrum data of atmospheric particulates by a single particle aerosol mass spectrometer;

step 200: and analyzing the mass spectrum data based on a pre-trained deep learning model to determine the source of the atmospheric particulates. The deep learning model comprises a one-dimensional convolutional neural network, a long-term and short-term memory network and a multi-layer perceptron which are sequentially connected.

Specifically, step 200 includes:

step 201: and respectively carrying out normalization processing on the positive spectrogram and the negative spectrogram of the mass spectrum data to obtain mass spectrum one-dimensional data. In this embodiment, the positive spectrogram and the negative spectrogram of the mass spectrum data are normalized by using the L2 norm, and the positive spectrogram and the negative spectrogram are converted into a one-dimensional array with a value between 0 and 1.

Step 202: extracting a plurality of local features of the mass spectrum one-dimensional data by adopting a pre-trained one-dimensional convolutional neural network:

Y ₁ ＝σ ₁ (W ₁ ·X ₁ +b ₁ )；

wherein Y is ₁ Is a local feature, W ₁ Is a weight matrix of a one-dimensional convolutional neural network, X ₁ B is mass spectrum one-dimensional data ₁ Is the offset vector sigma of a one-dimensional convolutional neural network ₁ To activate the function. In this embodiment, the activation function here adopts a Relu activation function, where the expression of the Relu activation function is max (0, x), which is a nonlinear function, and x is a parameter in the activation function.

Step 203: and extracting the dependency relationship among a plurality of local features by adopting a pre-trained long-period and short-period memory network to obtain sequence data.

For the chemical composition research of the atmospheric particulates, not only local features but also complex dependency relationships among the features are required to be considered, so that the local features extracted by the one-dimensional convolutional neural network are input into the long-term and short-term memory network, the dependency relationships among the features are further extracted, and more complex modes or relationships are identified. The long-term memory network comprises an input door, a forgetting door, a first memory unit, a second memory unit, an output door and a hidden state, and the structure is as follows:

an input door: i.e _t ＝σ(W _i [h _t-1 ,x _t ]+b _i )；

Forgetting the door: f (f) _t ＝σ(W _f [h _t-1 ,x _t ]+b _f )；

A first memory unit:

a second memory unit:

output door: o (o) _t ＝σ(W _o [h _t-1 ,x _t ]+b _o )；

Hidden state:

wherein x is _t For input at the current time, h _t-1 I is the output of the last moment _t Is the addition degree value of the current time state, f _t In order to retain the value of the degree,is in an intermediate state, C _t C is the state of the last moment _t Is in the current state, o _t To output the degree value, h _t For the output at the current time, W _i Weight matrix for input gate, W _f Weight matrix for forgetting gate, W _c A weight matrix W for the first memory cell _o B is a weight matrix of the output gate _i B is the bias vector of the input gate _f Bias vector for forgetting gate, b _c Is the bias vector of the first memory cell, b _o For outputting the bias vector of the gate, tanh is hyperbolic tangent function, ++>Is a dot product operation.

The principle of the long-term and short-term memory network is as follows:

(1) an input door: the LSTM receives an input sequence and decides which information needs to be updated by a sigmoid activation function. The importance of the gate control input is entered. The sigmoid activation function has the expression ofBelonging to a nonlinear function.

(2) Forgetting the door: the LSTM decides which information needs to be forgotten by another sigmoid activation function. The forgetting door controls whether the memory at the previous moment needs to be forgotten.

(3) A memory unit: LSTM uses a memory unit to store and transfer information. It will update the memory through the input gate, forget the memory through the forget gate, and will output the memory through the output gate.

(4) Output door: the LSTM decides which information needs to be output by a sigmoid activation function. The importance of the gate control output is output.

(5) Hidden state: the LSTM computes the hidden state based on the results of the input gate, the forget gate, and the output gate. The hidden state is the output of the LSTM and is also the input for the next time step.

Step 204: and determining the source of the atmospheric particulates by adopting a pre-trained multi-layer perceptron according to the sequence data.

The multi-layer perceptron is used for receiving the sequence data processed by the one-dimensional convolutional neural network and the long-term and short-term memory network. The multi-layer perceptron comprises an input layer, a hidden layer and an output layer which are sequentially connected, and outputs the final particle classification result, namely the source of the particles. The expression of the multi-layer perceptron is Y ₃ ＝σ ₃ (W ₃ ·X ₃ +b ₃ ) Activating the function sigma ₃ Activating the function for softfmax, the expression isBelongs to a nonlinear function, Y ₃ In order to output the probability of the classification,X ₃ for input sequence data, x _i Is X ₃ Data in W ₃ Weight matrix for multi-layer perceptron, b ₃ Is the bias vector of the multi-layer perceptron.

A schematic of the deep learning model is shown in fig. 2.

Further, the training process of the deep learning model comprises the following steps:

step 300: and obtaining a pollution spectrum library. The pollution spectrum library comprises a plurality of single-particle mass spectrum data and known sources corresponding to the single-particle mass spectrum data. The single particle mass spectrum data is one-dimensional data after normalization processing. The method comprises the steps of collecting pollutants of known sources in the atmosphere through a vacuum bottle, and analyzing the pollutants by adopting a single-particle aerosol mass spectrometer to obtain single-particle mass spectrum data.

Step 400: the loss function is determined from the known sources of each single particle mass spectrum data. In this embodiment, the loss function is a class cross entropy loss:where L is the loss function value, the value of outputsize depends on the number of classifications, y _j For the value of the real tag of a sample in the j-th class, using one-hot coding, +.>To predict the probability of belonging to the j-th class.

Step 500: and based on the loss function, performing iterative training on the deep learning model by adopting a back propagation algorithm until the loss function converges or reaches the maximum iteration number, so as to obtain a trained deep learning model. The core idea of the back propagation algorithm is to calculate the gradient of the model parameters using the chain law. Assume an N-layer neural network, wherein the weight matrix of the N-th layer is W _N The offset vector is b _N The activation function is sigma _N The loss function is L, and the linear output is Z _N . The procedure for calculating the gradient is as follows:

(1) calculating the output Y of the deep learning model by forward propagation:

Y＝σ _N (W _N ·X _N-1 +b _N )。

(2) for layer D, the gradient was calculated using the following formula:

δ _D ＝(W _D ·X _D-1 +b _D )Tδ _D+1 ⊙Y′ _D ；

wherein delta _D As a gradient of the D layer, as indicated by multiplication by element, Y' _D Representing the derivative of the layer D activation function.

(3) The weight gradient from layer D to layer D+1 is calculated using the following formulaAnd bias gradient->

In the training process, the weight matrix and the bias vector are updated by adopting an Adam algorithm, first moment estimation and second moment estimation of the gradient are calculated firstly, then deviation correction is carried out on the first moment estimation and the second moment estimation after the deviation correction is used for updating the weight matrix and the bias vector:

where k is the number of iterations, m _k For first moment estimation of gradient at kth iteration, v _k For the second moment estimate of the gradient at the kth iteration,for the first moment estimation of the gradient after correction of the deviation at the kth iteration,/th iteration>For the second moment estimation of the gradient after correction of the deviation at the kth iteration, beta ₁ And beta ₂ Is super-parameter (herba Cinchi Oleracei)>For the gradient at the kth iteration, eta is the learning rate, epsilon is the numerical stability constant, theta _k The weight matrix and the bias vector at the kth iteration.

To sum up, the specific training process of the deep learning model is as follows: mass spectral data of the input particulate matter is first transferred into the deep learning model by forward propagation, and then the loss function is calculated. The deep learning model training process aims at minimizing the loss function, calculates gradients by back propagation, and then updates the weight matrix and bias vector of the deep learning model using Adam's algorithm. This process is iterated until a maximum number of iterations is reached or the loss function converges.

As shown in fig. 3, the present invention is mainly composed of four modules: pollution source collection, pollution spectrum library construction, model construction and actual environment source analysis. The method comprises the steps of pre-training a deep learning model by using data of a pollution spectrum library, storing the trained deep learning model, collecting mass spectrum data of single particles in an actual atmosphere environment through a single particle aerosol mass spectrometer, and importing the mass spectrum data into the trained deep learning model to realize fine analysis of a particle source.

The invention does not need feature selection, realizes the end-to-end training process, effectively reduces subjectivity, shortens the analysis time of particle sources, can better capture the weak line nature and nonlinear relation of the particles, identifies complex modes and relations, improves the distinguishing capability of similar particles, and realizes the fine analysis of the sources of the atmospheric particles.

Example two

In order to perform a corresponding method of the above embodiments to achieve the corresponding functions and technical effects, an atmospheric particulate source analysis system based on single particle aerosol mass spectrometry is provided below.

As shown in fig. 4, the atmospheric particulate source analysis system based on single particle aerosol mass spectrometry provided in this embodiment includes: a single particle aerosol mass spectrometer 1 and a processor 2.

Wherein the single particle aerosol mass spectrometer 1 is used for acquiring mass spectrum data of atmospheric particulate matters.

A processor 2 is connected to the single particle aerosol mass spectrometer 1, the processor 2 being configured to parse the mass spectrometry data based on a pre-trained deep learning model to determine the source of atmospheric particulates. The deep learning model comprises a one-dimensional convolutional neural network, a long-term memory network and a multi-layer perceptron which are sequentially connected.

Compared with the prior art, the atmospheric particulate source analysis system based on the single-particle aerosol mass spectrum provided by the embodiment has the same beneficial effects as the atmospheric particulate source analysis method based on the single-particle aerosol mass spectrum provided by the embodiment one, and is not described in detail herein.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other.

The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.

Claims

1. The atmospheric particulate source analysis method based on the single-particle aerosol mass spectrum is characterized by comprising the following steps of:

2. The atmospheric particulate source analysis method based on single particle aerosol mass spectrometry of claim 1, wherein analyzing the mass spectrometry data based on a pre-trained deep learning model to determine the source of atmospheric particulates comprises:

3. The atmospheric particulate source analysis method based on single particle aerosol mass spectrometry of claim 2, wherein the positive spectrogram and the negative spectrogram of the mass spectrometry data are normalized by adopting an L2 norm, respectively.

4. The atmospheric particulate source analysis method based on single particle aerosol mass spectrometry of claim 2, wherein the one-dimensional convolutional neural network extracts local features of the mass spectrometry one-dimensional data using the following formula:

Y ₁ ＝Y ₁ (W ₁ ·X ₁ +b ₁ )；

5. The atmospheric particulate source analysis method based on single particle aerosol mass spectrometry of claim 1, wherein the training process of the deep learning model comprises:

6. The atmospheric particulate source analysis method based on single particle aerosol mass spectrometry of claim 5, wherein the loss function is a class cross entropy loss.

7. An atmospheric particulate source analysis system based on single particle aerosol mass spectrometry, the atmospheric particulate source analysis system based on single particle aerosol mass spectrometry comprising: