CN106483077B

CN106483077B - Method for measuring element content of combustion body based on principal component and multiple linear regression

Info

Publication number: CN106483077B
Application number: CN201510552931.7A
Authority: CN
Inventors: 陈浩; 呙星; 张仁李; 盛卫星; 马晓峰; 韩玉兵
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2015-09-01
Filing date: 2015-09-01
Publication date: 2020-10-02
Anticipated expiration: 2035-09-01
Also published as: CN106483077A

Abstract

本发明提出一种基于主成分和多元线性回归的燃烧体元素含量测定方法，首先，对燃烧体发射光谱的光强度值进行数据筛选，对筛选后的光强度值进行去噪处理；然后，根据去噪处理后的光强度值获得光强度值的主成分得分；最后，根据光强度值的主成分得分计算获得燃烧体中待测元素的含量。本发明通过分析燃烧体的发射光谱的光强度值进行元素含量的测定，测定方法简单高效，测量精度高。The invention proposes a method for determining element content of a combustion body based on principal components and multiple linear regression. First, data screening is performed on the light intensity value of the emission spectrum of the combustion body, and the filtered light intensity value is denoised; then, according to The main component score of the light intensity value is obtained from the denoised light intensity value; finally, the content of the element to be tested in the combustion body is calculated according to the main component score of the light intensity value. In the invention, the element content is determined by analyzing the light intensity value of the emission spectrum of the combustion body, the determination method is simple and efficient, and the measurement accuracy is high.

Description

Determination method of element content in combustion body based on principal components and multiple linear regression

技术领域technical field

本发明属于光学技术领域，具体涉及一种基于主成分分析和多元线性回归法的燃烧体元素含量测定方法。The invention belongs to the technical field of optics, and in particular relates to a method for measuring element content of a combustion body based on principal component analysis and multiple linear regression method.

背景技术Background technique

对于燃烧体，要想测定其中含有哪些元素及各元素含量，需要通过复杂的化学测量分析方法，较为普遍的做法是采用对物质燃烧后的残留物进行元素分析，进而可以分析出原燃烧体中的元素及其含量，这种方法测量难度较大，同时对于含有无机元素的燃烧体，很难对燃烧后的残留物进行元素分析，此外，运用化学领域的方法测量元素及其含量的代价较高，测量成本较高。For the combustion body, in order to determine which elements and the content of each element contained in it, complex chemical measurement and analysis methods are required. Elements and their contents are difficult to measure by this method. At the same time, for combustion bodies containing inorganic elements, it is difficult to perform elemental analysis on the residues after combustion. In addition, the cost of measuring elements and their contents by methods in the chemical field is relatively high. high, and the measurement cost is high.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于提供一种基于主成分分析和多元线性回归的燃烧体元素含量测定方法，通过分析燃烧体的发射光谱的光强度值进行元素含量的测定，测定方法简单高效，测量精度高。The purpose of the present invention is to provide a method for measuring the element content of a combustion body based on principal component analysis and multiple linear regression. The element content is measured by analyzing the light intensity value of the emission spectrum of the combustion body.

为了解决上述技术问题，本发明提供一种基于主成分和多元线性回归的燃烧体元素含量测定方法，首先，对燃烧体发射光谱的光强度值进行数据筛选，对筛选后的光强度值进行去噪处理；然后，根据去噪处理后的光强度值获得光强度值的主成分得分；最后，根据光强度值的主成分得分计算获得燃烧体中待测元素的含量，计算方式如公式所示，In order to solve the above-mentioned technical problems, the present invention provides a method for measuring element content of a combustion body based on principal components and multiple linear regression. First, data screening is performed on the light intensity value of the emission spectrum of the combustion body, and the filtered light intensity value is removed. Noise processing; then, the principal component score of the light intensity value is obtained according to the denoised light intensity value; finally, the content of the element to be tested in the combustion body is calculated according to the principal component score of the light intensity value, and the calculation method is shown in the formula ,

Y＝b₀+b₁S₁+...+b_pS_p Y=b ₀ +b ₁ S ₁ +...+b _p S _p

其中，Y表示燃烧体中待测元素Y的含量，S₁，S₂，….S_p表示主成分得分，b₀为常数项，b₁…b_p为回归系数，b₀以及b₁…b_p均为已知量。Among them, Y represents the content of the element Y to be measured in the combustion body, S ₁ , S ₂ , … _.Sp represents the principal component score, b ₀ is a constant term, b ₁ … b _p is the regression coefficient, b ₀ and b ₁ … _bp are all known quantities.

本发明与现有技术相比，其显著优点在于，(1)不需要采用化学领域的分析测量方法，很容易在实际中实现；(2)采用主成分分析法，测量精度比较高；(3)采用数学建模的方法，方法较为创新；(4)理论上可以用于测量自然界中的大部分元素，方法用途较广。Compared with the prior art, the present invention has significant advantages in that (1) it does not need to adopt the analysis and measurement method in the field of chemistry, and it is easy to realize in practice; (2) the principal component analysis method is adopted, and the measurement accuracy is relatively high; (3) ) using the mathematical modeling method, the method is more innovative; (4) theoretically can be used to measure most of the elements in nature, the method is widely used.

具体实施方式Detailed ways

容易理解，依据本发明的技术方案，在不变更本发明的实质精神的情况下，本领域的一般技术人员可以想象出本发明基于主成分分析和多元线性回归的燃烧体元素含量测定方法的多种实施方式。因此，以下具体实施方式仅是对本发明的技术方案的示例性说明，而不应当视为本发明的全部或者视为对本发明技术方案的限制或限定。It is easy to understand that, according to the technical solution of the present invention, without changing the essential spirit of the present invention, those skilled in the art can imagine the multiplication of the method for determining the element content of the combustion body based on principal component analysis and multiple linear regression of the present invention. an implementation. Therefore, the following specific embodiments are only exemplary descriptions of the technical solutions of the present invention, and should not be regarded as the whole of the present invention or as limitations or restrictions on the technical solutions of the present invention.

本发明对主成分得分变量S₁，S₂，….S_p与因变量Y(即燃烧体中待测元素Y的含量)进行多元线性回归分析，建立它们之间的多元线性回归模型方程，如式(1)所示：The present invention performs multiple linear regression analysis on the principal component _score variables S ₁ , S ₂ , . As shown in formula (1):

Y＝b₀+b₁S₁+...+b_pS_p (1)Y=b ₀ +b ₁ S ₁ +...+b _p S _p (1)

式(1)中，b₀为常数项；b₁…b_p为回归系数，b₁为S₁,S₂…固定时,S₁每增加一个单位对Y的效应，即S₁对Y的偏回归系数；同理b₂为S₁,S₂…固定时，S₂每增加一个单位对Y的效应，即S₂对Y的偏回归系数等等。回归系数可以通过实验获得，在每次进行燃烧体中某待测元素含量时，只需根据光强度值获得主成分得分S₁、S₂…S_p，将主成分得分代入式(1)即可实现对燃烧体中某元素Y含量的测定。In formula (1), b ₀ is a constant term; b ₁ ... b _p is the regression coefficient, and b ₁ is S ₁ , S ₂ ... When fixed, the effect of each additional unit of S ₁ on Y is the effect of S ₁ on Y. Partial regression coefficient; Similarly, when b ₂ is S ₁ , S ₂ . . . is fixed, the effect of each additional unit of S ₂ on Y is the partial regression coefficient of S ₂ on Y and so on. The regression coefficient can be obtained through experiments. When the content of a certain element to be measured in the combustion body is carried out each time, it is only necessary to _obtain the principal component scores S ₁ , S ₂ . . . It can realize the determination of the content of a certain element Y in the combustion body.

一、回归系数是事先通过多次试验获得，每次使用时，只需通过查表即可使用，回归系数获得过程如下：1. The regression coefficient is obtained through multiple experiments in advance. Each time it is used, it can be used only by looking up the table. The process of obtaining the regression coefficient is as follows:

在多次试验获中，测得的包含元素Y的燃烧体发射光谱的光强度值，并在实验前通过其他方式非常艰难地测得燃烧体中待测元素Y的含量值(即先验信息)，设每次试验中，燃烧体发射光谱中L个光波长对应的光强度值为X₁，X₂，…X_L，且X₁～X_L的光波长的波长差是一个常数。In many experiments, the light intensity value of the emission spectrum of the combustion body containing element Y was measured, and the content value of the element Y to be measured in the combustion body was measured very difficultly by other means before the experiment (that is, the prior information). ), suppose that in each test, the light intensity values corresponding to _L light wavelengths in the emission spectrum of the combustion body are X ₁ , X ₂ , . . . XL , and the wavelength difference of the light wavelengths from X ₁ to _XL is a constant.

第一步，根据相关光谱理论知识，对发射光谱的光强度值进行筛选操作，获得筛选后的发射光谱的光强度值。The first step is to perform a screening operation on the light intensity value of the emission spectrum according to the relevant spectral theoretical knowledge, and obtain the light intensity value of the filtered emission spectrum.

根据对相关光谱理论知识的研究，可知光谱数据的筛选正确与否对数据的后续处理分析有一定的影响作用。假设需要测量的为元素Y，根据现有知识可知，元素Y的燃烧发射光谱波长应当为lamda1、lamda2…、lamdan共n个波长值。由于光谱线测量时发射光谱的波长值可能会发生一定的平移，因此需要先将光强度值X₁，X₂，…X_L所对应的L个光波长值分别与已知标准的n个波长值lamda1，lamda2…lamdan进行比较，如果两者相差值在预先设定的误差Δl范围内的，则认为此波长所对应的谱线是元素Y的发射光谱谱线，予以保留，否则，则认为此波长所对应的谱线不是元素Y的发射光谱谱线，予以剔除；设经前述筛选后保留了m个光波长对应的光强度值X₁，X₂，…X_m，m<L。According to the research on relevant spectral theoretical knowledge, it can be known that the correctness of spectral data screening has a certain influence on the subsequent processing and analysis of the data. Assuming that the element Y needs to be measured, according to the existing knowledge, the wavelength of the combustion emission spectrum of element Y should be lamda1, lamda2..., lamdan, a total of n wavelength values. Since the wavelength value of the emission spectrum may shift to a certain extent during the spectral line measurement, it is necessary to firstly compare the _L light wavelength values corresponding to the light intensity values X ₁ , X ₂ , ... XL with the known standard n wavelengths respectively. Compare the values of lamda1, lamda2… The spectral line corresponding to this wavelength is not the emission spectral line of the element Y, so it is rejected; it is assumed that after the aforementioned _screening , the light intensity values X ₁ , X ₂ , . . .

第二步，根据相关去噪理论，对筛选后的发射光谱的光强度值X₁，X₂，…X_m进行去噪预处理。发射光谱数据在采集时往往会受到诸多因素的影响，使得光谱数据中除了包含有用的信息外，还混杂有背景和仪器噪声等无关信息。在分析处理光谱数据时，这些噪声会对分析结果产生不良的影响，降低光谱分析精度，因此需要先进行去噪预处理。作为一种优选方案，本发明去噪预处理采用信号平滑法，去除高频噪声，提高光谱信号的信噪比，信号平滑中具体采用移动平均平滑(Moving Aerage Smoothing，MAS)法。设经去噪预处理后的m个光波长对应的光强度值为

In the second step, according to the relevant denoising theory, denoising preprocessing is performed on the light intensity values X ₁ , X ₂ , . . . X _m of the filtered emission spectrum. The emission spectrum data is often affected by many factors during the collection, so that the spectrum data contains not only useful information, but also irrelevant information such as background and instrument noise. When analyzing and processing spectral data, these noises will adversely affect the analysis results and reduce the accuracy of spectral analysis, so it is necessary to perform denoising preprocessing first. As a preferred solution, the denoising preprocessing of the present invention adopts a signal smoothing method to remove high-frequency noise and improve the signal-to-noise ratio of the spectral signal. The moving average smoothing (MAS) method is specifically used for signal smoothing. Set the light intensity corresponding to m light wavelengths after denoising preprocessing as

第三步，根据主成分分析方法，对去噪预处理后的光强度值为

进行主成分分析，获得最终的主成分得分S_i(i＝1,2,...p)。In the third step, according to the principal component analysis method, the light intensity value after denoising preprocessing is

Principal component analysis is performed to obtain the final principal component score S _i (i=1,2,...p).

发射光谱的光强度值数据量大，其中包含了丰富的物质成分和结构信息，通过主成分分析法分析光谱数据，用少数几个主成分变量代替原始变量，主成分变量是原始变量的线性组合，能够最大程度地代表原始变量的信息。主成分分析法通过正交变换使主成分之间彼此互不相关，以此克服数据共线性问题。主成分分析具体分析过程如下：The light intensity value of the emission spectrum has a large amount of data, which contains rich material composition and structural information. The spectral data is analyzed by the principal component analysis method, and a few principal component variables are used to replace the original variables. The principal component variables are the linear combination of the original variables. , which can best represent the information of the original variable. The principal component analysis method makes the principal components uncorrelated with each other through orthogonal transformation, so as to overcome the problem of data collinearity. The specific analysis process of principal component analysis is as follows:

1.1设对于

共m个光强度值，每一个波长对应的光强度值有n个数据(即n次试验获得的结果)，将n次试验获得光强度值组成自变量矩阵X如式(2)所示，其中x_nm代表第n次试验中第m个波长对应的光强度值。1.1 Assuming that for

There are m light intensity values in total, and the light intensity value corresponding to each wavelength has n data (that is, the results obtained from n tests), and the light intensity values obtained from n tests are composed of an independent variable matrix X, as shown in formula (2), where x _nm represents the light intensity value corresponding to the mth wavelength in the nth experiment.

n表示矩阵X的行数，m表示矩阵X的列数n represents the number of rows of matrix X, m represents the number of columns of matrix X

1.2计算自变量矩阵X的相关系数矩阵R，关系数矩阵R如式(3)所示：1.2 Calculate the correlation coefficient matrix R of the independent variable matrix X, and the correlation coefficient matrix R is shown in formula (3):

式(3)中，r_ij(i,j＝1,2,...,m)为光强度值

和

的相关系数，其计算公式如式(4)所示：In formula (3), r _ij (i,j=1,2,...,m) is the light intensity value

and

The correlation coefficient of , and its calculation formula is shown in formula (4):

其中，

表示变量x_i的平均值，x_i表示矩阵X中第i列数据；k表示矩阵X的行数，k＝1,2,...,n。in,

Represents the average value of variable x _i , x _i represents the data in the i-th column of matrix X; k represents the number of rows in matrix X, k=1,2,...,n.

1.3根据相关系数矩阵R求解出特征方程|λI-R|＝0所需使用的特征值λ₁,λ₂...λ_m，其中I为一个单位矩阵，求解特征值λ₁,λ₂...λ_m常用的方法是雅可比(Jacobi)法，并将求解出的特征值λ₁,λ₂...λ_m按照从大到小的顺序排列λ₁≥λ₂≥...≥λ_m≥01.3 According to the correlation coefficient matrix R, the eigenvalues λ ₁ , λ ₂ ... λ _m required to be used for the characteristic equation |λI-R|=0 are solved, where I is a unit matrix, and the eigenvalues λ ₁ , λ ₂ are solved. ..λ _m The commonly used method is the Jacobi method, and the solved eigenvalues λ ₁ , λ ₂ ... λ _m are arranged in descending order λ ₁ ≥λ ₂ ≥...≥ λ _m ≥ 0

1.4将特征值λ₁,λ₂...λ_m代入特征方程|λI-R|＝0分别求出特征值λ_i对应的特征向量e_i，特征向量e_i为m行1列的矩阵，其中i取1～m范围内的整数。1.4 Substitute the eigenvalues λ ₁ , λ ₂ ... λ _m into the characteristic equation |λI-R|=0 to obtain the eigenvectors e _i corresponding to the eigenvalues λ _i respectively, and the eigenvectors e _i are a matrix with m rows and 1 column, where i is an integer in the range of 1 to m.

1.5根据特征值λ₁,λ₂...λ_m计算主成分的贡献率和累计贡献率，贡献率计算公式(5)所示，累计贡献率如公式(6)所示：1.5 Calculate the contribution rate and cumulative contribution rate of the principal component according to the eigenvalues λ ₁ , λ ₂ ... λ _m , the contribution rate calculation formula (5) is shown, and the cumulative contribution rate is shown in the formula (6):

累计贡献率表示所生成的新变量(即主成分)所包含原始数据信息的程度，为了更大程度地代表原来的变量，一般选取主成分累计贡献率≥95％的特征值λ₁,λ₂,...λ_p对应的主成分，p≤m。The cumulative contribution rate indicates the degree of original data information contained in the new variable (ie principal component) generated. In order to represent the original variable to a greater extent, the eigenvalues λ ₁ , λ ₂ with the cumulative contribution rate of the principal component ≥95% are generally selected. ,...λ _p corresponds to the principal component, p≤m.

1.6根据特征值λ₁,λ₂...λ_m和特征向量e_i计算各主成分载荷I_i，计算公式如式(7)所示：1.6 Calculate each principal component load I _i according to the eigenvalues λ ₁ , λ ₂ ... λ _m and the eigenvectors e _i , the calculation formula is shown in formula (7):

主成分载荷I_i为m行1列的矩阵；The principal component load I _i is a matrix with m rows and 1 column;

然后根据求出的各主成分的载荷I_i，计算得到各主成分得分S_i，如公式(8)所示，Then, according to the obtained load I _i of each principal component, the score S _i of each principal component is calculated, as shown in formula (8),

其中，S_i为n行1列的矩阵，表示的是主成分变量S_i在n次试验下的得分数值。Among them, Si is a matrix with _n rows and 1 column, which represents the score value of the principal component variable Si under _n trials.

由上可知，主成分分析是一种有损降维的方法，当主成分选取过少时,信息损失量将会比较大，当主成分数量选取过多时，计算机的运算量将会大大增加，因此在实际计算过程中应该根据需要合理选取主成分变量个数。It can be seen from the above that principal component analysis is a lossy dimensionality reduction method. When too few principal components are selected, the amount of information loss will be relatively large. In the calculation process, the number of principal component variables should be reasonably selected according to the needs.

第四步，根据上面主成分分析求出的主成分得分S_i(i＝1,2,...p)和已知的燃烧体中元素Y的含量值Y_i(i＝1,2,...n)，采用最小二乘法进行拟合，可以求出回归系数b₁…b_p。The fourth step is based on the principal component score Si ( _i =1,2,...p) obtained by the above principal component analysis and the known content value of element Y in the combustion body Yi ( _i =1,2, ...n), the least squares method is used for fitting, and the regression coefficients b ₁ ...b _p can be obtained.

二、主成分得分S₁、S₂…S_p的确定2. Determination of principal component scores S ₁ , S ₂ ... _Sp

设对于当前需要测定其中元素Y含量的燃烧体的发射光谱的光强度值为z₁，z₂，…z_L，按照前述筛选与预处理方法做相应的处理，得到处理后的光强度值

将

组成一个1行m列的矩阵Z，根据公式9可获得主成分得分的值S₁、S₂…S_p。Assuming that the light intensity values of the emission spectrum of the combustion body whose content of element Y needs to be determined at present are _z ₁ , z ₂ , .

Will

A matrix Z with 1 row and m columns is formed, and the values S ₁ , S ₂ . . . S _p of the principal component scores can be obtained according to formula 9.

其中，i＝1,2,...p。where i=1,2,...p.

本发明可以通过以下实验进一步说明。The present invention can be further illustrated by the following experiments.

本实验中的燃烧体发射光谱波长范围为245nm～1044nm，共3648个光波长值，变量w1，w2…w3648分别代表某一波长值对应的光强度值，且w1～w3648光波长差是一个常数，光强度值是某一种固定参考强度的倍数，每一个变量都有19次试验的结果，基于此，完成对燃烧体中碳元素含量的测定。The wavelength range of the emission spectrum of the combustion body in this experiment is 245nm to 1044nm, with a total of 3648 light wavelength values. The variables w1, w2...w3648 respectively represent the light intensity value corresponding to a certain wavelength value, and the light wavelength difference between w1 and w3648 is a constant. , the light intensity value is a multiple of a certain fixed reference intensity, and each variable has the results of 19 trials. Based on this, the determination of the carbon content in the combustion body is completed.

第一步，对19次试验的(不是本次待测燃烧体)原始的光强度值进行发射光谱的光强度值数据的筛选，获得筛选后的光强强度值，由查询到的碳元素发射光谱资料可得：碳元素发射光谱波长范围为175.136nm～453.18nm(具体波长值见附录)，根据相关的光谱理论知识可得由于光谱线测量时发射光谱的波长值可能会发生一定的平移，因此选取在碳元素发射光谱波长范围附近间隔小于等于0.2nm处的谱线为碳元素的发射光谱谱线，筛选后的光谱数据包含954个波长，w1，w2…w954分别代表对应波长的光强度值。The first step is to screen the light intensity value data of the emission spectrum for the original light intensity values of the 19 tests (not the combustion body to be tested this time), and obtain the filtered light intensity value, which is emitted by the carbon element inquired. Spectral data can be obtained: the wavelength range of carbon emission spectrum is 175.136nm ~ 453.18nm (see appendix for the specific wavelength value). According to the relevant spectral theory knowledge, the wavelength value of the emission spectrum may shift to a certain extent during the measurement of the spectral line. Therefore, the spectral lines at intervals of less than or equal to 0.2 nm near the wavelength range of the emission spectrum of carbon element are selected as the emission spectral lines of carbon element. The filtered spectral data contains 954 wavelengths, and w1, w2...w954 respectively represent the light intensity of the corresponding wavelength. value.

第二步，对第一步筛选出的发射光谱的光强度值进行数据预处理，获得预处理后的光强度值，采用移动平均平滑法对光谱数据进行预处理，经过预处理后的954个光波长对应的光强度值为

The second step is to perform data preprocessing on the light intensity values of the emission spectrum screened in the first step to obtain the preprocessed light intensity values, and use the moving average smoothing method to preprocess the spectral data. The light intensity value corresponding to the light wavelength is

第三步，对第二步预处理后的光强度值进行主成分分析(PCA)，获得主成分得分数值，在Matlab(一种数学分析计算软件)中运用主成分分析方法进行求解，

为预处理后的光强度值变量，共有19次试验的结果，将19次试验的结果组成矩阵W(19行954列的数据)如下，其中n为19，m为954，w_nm代表第n次试验中第m个波长对应的光强度值。The third step is to perform principal component analysis (PCA) on the light intensity value preprocessed in the second step to obtain the principal component score value, and use the principal component analysis method in Matlab (a mathematical analysis and calculation software) to solve,

For the variable of light intensity value after preprocessing, there are a total of 19 test results, the results of 19 tests are composed of matrix W (data in 19 rows and 954 columns) as follows, where n is 19, m is 954, and w _nm represents the nth The light intensity value corresponding to the mth wavelength in this experiment.

(1)对数据矩阵W按列进行标准化(每一列中的数据除以每一列数据的总和)，获得标准化后的数据矩阵。(1) Normalize the data matrix W by column (the data in each column is divided by the sum of the data in each column) to obtain the normalized data matrix.

(2)对(1)中的标准化后的数据矩阵W，按照以下公式求出矩阵W的相关系数矩阵R，(2) For the standardized data matrix W in (1), obtain the correlation coefficient matrix R of the matrix W according to the following formula,

式中，r_ij(i,j＝1,2,...,m)为原变量

和

的相关系数，其计算公式如下所示，

代表变量w_i的平均值。In the formula, r _ij (i,j=1,2,...,m) is the original variable

and

The correlation coefficient of , and its calculation formula is as follows:

represents the mean of the variable _wi .

(3)对(2)中的相关系数矩阵R求解特征值λ_i和对应的特征向量e_i(i＝1,2....954)，同时计算贡献率和累积贡献率，取累计贡献率达到96％时的特征值λ₁,λ₂,…,λ_p所对应的为第一、第二，…，第p(p≤954)个主成分，求出每个主成分变量对应的贡献率如表1所示，共求出16个主成分。(3) Solve the eigenvalue λ _i and the corresponding eigenvector e _i (i=1, 2....954) for the correlation coefficient matrix R in (2), calculate the contribution rate and the cumulative contribution rate at the same time, and take the cumulative contribution When the rate reaches 96%, the eigenvalues λ ₁ , λ ₂ ,…,λ _p correspond to the first, second,…, pth (p≤954) principal components, and find the corresponding The contribution rate is shown in Table 1, and a total of 16 principal components were obtained.

表1各主成分贡献率Table 1 Contribution rate of each principal component

各主成分贡献率The contribution rate of each principal component 主成分变量1principal component variable 1 0.6980160.698016 主成分变量2Principal Component Variable 2 0.2157540.215754 主成分变量3Principal Component Variable 3 0.0075370.007537 主成分变量4Principal Component Variable 4 0.0066900.006690 主成分变量5Principal Component Variables 5 0.0063840.006384 主成分变量6Principal Component Variables 6 0.0062270.006227 主成分变量7Principal Component Variables 7 0.0061240.006124 主成分变量8Principal Component Variables 8 0.0058490.005849 主成分变量9Principal Component Variables 9 0.0055560.005556 主成分变量10Principal Component Variables 10 0.0054760.005476 主成分变量11Principal Component Variables 11 0.0053390.005339 主成分变量12Principal Component Variables 12 0.0049110.004911 主成分变量13Principal Component Variables 13 0.0048320.004832 主成分变量14Principal Component Variables 14 0.0046340.004634 主成分变量15Principal Component Variables 15 0.0045040.004504 主成分变量16Principal Component Variables 16 0.004327 0.004327

(4)根据(3)中求出的特征值和特征向量计算主成分变量载荷，其计算公式为：(4) Calculate the principal component variable load according to the eigenvalues and eigenvectors obtained in (3), and the calculation formula is:

(954行1列的矩阵)

(Matrix with 954 rows and 1 column)

根据各主成分的载荷计算得到各主成分的得分：The score of each principal component is calculated according to the load of each principal component:

其中，S_i为19行1列的矩阵，表示的是主成分变量S_i在19次样本试验下的得分。Among them, _Si is a matrix with 19 rows and 1 column, which represents the score of the principal component variable _Si under 19 sample trials.

利用上式求出每一个主成分变量(共16个)得分，每一个主成分变量都有19次试验的结果。Use the above formula to obtain the score of each principal component variable (16 in total), and each principal component variable has the results of 19 trials.

第四步，对第三步中求出的16个主成分得分变量S₁，S₂，…S₁₆和试验中先验信息(实验前测得的碳元素含量)Y进行多元线性回归拟合，16个主成分得分变量作为自变量，碳元素含量作为因变量，求得碳含量Y与16个主成分变量的关系：The fourth step is to perform multiple linear regression fitting on the 16 principal component score variables S ₁ , S ₂ , ... S ₁₆ obtained in the third step and the prior information in the experiment (the carbon content measured before the experiment) Y , 16 principal component score variables are used as independent variables, and carbon element content is used as a dependent variable, and the relationship between carbon content Y and 16 principal component variables is obtained:

Y＝b₀+b₁S₁+...+b₁₆S₁₆ Y=b ₀ +b ₁ S ₁ +...+b ₁₆ S ₁₆

在Matlab中调用regress函数(最小二乘拟合法)求得上式中各项系数如表2所示：Call the regress function (least square fitting method) in Matlab to obtain the coefficients in the above formula as shown in Table 2:

表2多元线性回归方程拟合系数Table 2 Multiple linear regression equation fitting coefficients

多元线性回归拟合系数Multiple Linear Regression Fit Coefficients b0b0 7.0005647.000564 b1b1 0.3468450.346845 b2b2 -0.015874-0.015874 b3b3 -46.693979-46.693979 b4b4 -33.287644-33.287644 b5b5 55.12868055.128680 b6b6 7.8276107.827610 b7b7 2.9223072.922307 b8b8 4.5684494.568449 b9b9 -20.499940-20.499940 b10b10 -40.594263-40.594263 b11b11 96.92787696.927876 b12b12 5.3122275.312227 b13b13 14.55807314.558073 b14b14 123.090712123.090712 b15b15 -58.641999-58.641999 b16b16 9.248874 9.248874

运用拟合后得出的关系模型方程求解碳含量，实验前测得燃烧体碳含量真实值如表3所示：The relationship model equation obtained after fitting is used to solve the carbon content. The real value of the carbon content of the combustion body measured before the experiment is shown in Table 3:

表3碳含量真实值Table 3 Carbon content true value

试验次数nnumber of trials n 碳含量真实值True value of carbon content 11 21.00000021.000000 22 20.00000020.000000 33 18.00000018.000000 44 20.00000020.000000 55 18.48000018.480000 66 25.04000025.040000 77 29.35000029.350000 88 19.00000019.000000 99 24.00000024.000000 1010 19.60000019.600000 1111 25.00000025.000000 1212 25.31000025.310000 1313 30.00000030.000000 1414 27.75000027.750000 1515 15.17000015.170000 1616 20.00000020.000000 1717 17.92000017.920000 1818 20.44000020.440000 1919 14.280000 14.280000

运用拟合出的方程求出的碳含量如表4所示：The carbon content calculated using the fitted equation is shown in Table 4:

表4碳含量测量值Table 4 Carbon Content Measurements

试验次数nnumber of trials n 碳含量测量值Carbon content measurement 11 22.21492322.214923 22 20.84507820.845078 33 18.37391318.373913 44 20.35542420.355424 55 18.52450518.524505 66 23.56523523.565235 77 28.69186128.691861 88 18.74252918.742529 99 22.50987722.509877 1010 19.19090719.190907 1111 23.60210023.602100 1212 26.44718426.447184 1313 31.45960731.459607 1414 27.59899827.598998 1515 15.61303315.613033 1616 21.48484621.484846 1717 18.47247718.472477 1818 21.34211021.342110 1919 15.508944 15.508944

经分析可知，本发明通过建立主成分分析模型和多元线性回归模型，通过分析燃烧体的发射光谱的光强度值进行元素含量的测定，碳含量真实值和预测值之间的误差比较少，测量精度较高、方法简单高效。It can be seen from the analysis that the present invention determines the element content by establishing a principal component analysis model and a multiple linear regression model, and analyzing the light intensity value of the emission spectrum of the combustion body. The precision is high, and the method is simple and efficient.

Claims

1. A method for measuring the element content of a combustion body based on principal components and multiple linear regression is characterized in that,

carrying out data screening on the light intensity value of the emission spectrum of the combustion body, and carrying out denoising treatment on the screened light intensity value;

obtaining a principal component score of the light intensity value according to the de-noised light intensity value;

the content of the element to be measured in the combustion body is obtained by calculation according to the principal component score of the light intensity value, the calculation mode is shown as formula (1),

Y＝b₀+b₁S₁+...+b_pS_p(1)

wherein Y represents the content of the element to be measured in the combustion body, and S₁，S₂，….S_pRepresents the score of the principal component, b₀Is a constant term, b₁…b_pAs a regression coefficient, b₀And b₁…b_pAre all known amounts;

the regression coefficient acquisition process is as follows:

firstly, screening the light intensity value of the emission spectrum of the combustion body, and denoising the screened light intensity value to obtain the light intensity values corresponding to the m light wavelengths subjected to screening and denoising treatment

L is the number of light wavelengths in the emission spectrum of the combustion body;

secondly, according to the principal component analysis method, the light intensity value after denoising pretreatment is carried out

Performing principal component analysis to obtain principal component score S_iP, p is the number of principal components;

1.1 is provided for

M light intensity values are totally obtained, the light intensity values obtained by n times of tests form an independent variable matrix X shown as a formula (2), wherein X_nmRepresents the light intensity value corresponding to the mth wavelength in the nth test,

where n denotes the number of rows of the matrix X and m denotes the number of columns of the matrix X

1.2, calculating a correlation coefficient matrix R of the independent variable matrix X, wherein the correlation coefficient matrix R is shown as a formula (3):

in the formula (3), r_ij，i,j＝1,2,...,m，r_ijAs a light intensity value

And

the calculation formula of the correlation coefficient (c) is shown in formula (4):

wherein,

represents the variable x_iAverage value of (1), x_iRepresents the ith column of data in the matrix X; k represents the number of rows of matrix X, k being 1,2.., n;

1.3 solving the eigenvalue lambda according to the correlation coefficient matrix R₁,λ₂...λ_m；

1.4 dividing the characteristic value lambda₁,λ₂...λ_mSubstituting the characteristic equation | λ I-R | ═ 0 to respectively obtain characteristic values λ_iCorresponding feature vector e_iFeature vector e_iIs a matrix with m rows and 1 column, wherein i is an integer ranging from 1 to m;

1.5 according to the characteristic value lambda₁,λ₂...λ_mCalculating the contribution rate and the cumulative contribution rate of the principal component, wherein the contribution rate is calculated by formula (5), and the cumulative contribution rate is expressed by formula (6):

selecting p characteristic values lambda of which the accumulated contribution rate of the main components is more than or equal to 95 percent₁,λ₂,...λ_pTaking the corresponding principal component as the final principal component score, wherein p is less than or equal to m;

1.6 according to the characteristic value lambda₁,λ₂...λ_mAnd a feature vector e_iCalculating the loads I of the main components_iThe calculation formula is shown in formula (7):

principal component load I_iA matrix of m rows and 1 column;

then, based on the obtained loads I of the respective principal components_iCalculating to obtain score S of each principal component_iAs shown in the formula (8),

wherein S is_iIs a matrix of n rows and 1 column, and represents a principal component variable S_iScore values under n trials;

thirdly, according to the score S of the principal component_i(i ═ 1,2.. p) and the known content value Y of the element Y in the combustion body_i(i ═ 1,2.. n), fitting was performed by the least square method, and the regression coefficient b was determined₁…b_p；

Screening and denoising light intensity values of emission spectra of elements to be detected in a combustion body to form a matrix Z with 1 row and m columns, and obtaining principal component scores according to the following formula (9):