CN113848225B - XRF element quantitative analysis method based on PCA-SVR - Google Patents

XRF element quantitative analysis method based on PCA-SVR Download PDF

Info

Publication number
CN113848225B
CN113848225B CN202111073294.7A CN202111073294A CN113848225B CN 113848225 B CN113848225 B CN 113848225B CN 202111073294 A CN202111073294 A CN 202111073294A CN 113848225 B CN113848225 B CN 113848225B
Authority
CN
China
Prior art keywords
matrix
pca
content
sample
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111073294.7A
Other languages
Chinese (zh)
Other versions
CN113848225A (en
Inventor
杨婉琪
李福生
赵彦春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202111073294.7A priority Critical patent/CN113848225B/en
Publication of CN113848225A publication Critical patent/CN113848225A/en
Application granted granted Critical
Publication of CN113848225B publication Critical patent/CN113848225B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N23/00Investigating or analysing materials by the use of wave or particle radiation, e.g. X-rays or neutrons, not covered by groups G01N3/00 – G01N17/00, G01N21/00 or G01N22/00
    • G01N23/22Investigating or analysing materials by the use of wave or particle radiation, e.g. X-rays or neutrons, not covered by groups G01N3/00 – G01N17/00, G01N21/00 or G01N22/00 by measuring secondary emission from the material
    • G01N23/223Investigating or analysing materials by the use of wave or particle radiation, e.g. X-rays or neutrons, not covered by groups G01N3/00 – G01N17/00, G01N21/00 or G01N22/00 by measuring secondary emission from the material by irradiating the sample with X-rays or gamma-rays and by measuring X-ray fluorescence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Analysing Materials By The Use Of Radiation (AREA)

Abstract

The invention discloses an XRF element quantitative analysis method based on PCA-SVR, which comprises the steps of reading element peak value information and content information; determining input and output of the PCA-SVR model; calculating a correlation coefficient and a unit feature vector; calculating a principal component; constructing a classification hyperplane, and converting the optimal classification hyperplane problem into a quadratic programming model; carrying out parameter optimization training on a PCA-SVR model, and quantitatively predicting the element content; selecting the optimal number of the main components; and calculating a decision coefficient and evaluating the prediction effect of the PCA-SVR model. The method has the advantages of simple operation process, high prediction accuracy, intuitive result, easy operation, capability of solving the problems of X fluorescence spectrum peak value overlapping interference, inaccuracy of the traditional instrument measurement method and the like, reduces the influence of the environment background, reduces the error caused by the statistical fluctuation, and can effectively and quickly carry out quantitative prediction on the elements contained in the object to be detected.

Description

XRF element quantitative analysis method based on PCA-SVR
Technical Field
The invention relates to the field of element detection, in particular to an XRF element quantitative analysis method based on PCA-SVR.
Background
With the gradual development of energy spectrum scientific research, the online qualitative and quantitative detection technology becomes a new development trend. Through perfect extension research in recent ten years, the analysis of element content by X-ray fluorescence (XRF) spectroscopy becomes a novel analysis technology, and the method is widely applied to various fields such as metallurgy, building materials, ground mines, commercial inspection, environmental protection, food sanitation, nonferrous metals and the like. The method has the advantages of rapid analysis, no damage to sample properties, wide analysis range, stable and reliable result, rapid realization of simultaneous analysis of multiple elements, simple operation and the like.
The traditional method mainly carries out accurate qualitative and accurate quantitative analysis on trace elements through an XRF spectrometer, is easy to have the problems of overlapping of peak counts among element spectral lines, uncertainty of element information, high element detection limit and the like, and how to improve the accuracy of an element quantitative analysis result under the condition of spectral line overlapping interference becomes the key point of the research of the invention. Therefore, the principal component analysis-support vector regression (PCA-SVR) algorithm is applied to the quantitative analysis of the elements, the problems of inaccurate calculation and lack of data inspection of the traditional X fluorescence spectrometer are solved, and an optional inspection method is provided for the quantitative analysis of the X fluorescence spectrometer result.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a method for predicting the content of elements in a substance to be tested based on a PCA-SVR algorithm.
In order to achieve the purpose, the invention provides an XRF element quantitative analysis method based on PCA-SVR, which comprises the following steps:
step 1: determining a standard sample set, supposing that n samples to be detected are concentrated in the standard sample set, taking a union set of all elements (No. 12-92 elements in an element periodic table) which can be identified by an ED-XRF fluorescence spectrometer (an energy dispersion type X-ray fluorescence spectrometer) in the standard sample set to form an element set contained in the n samples to be detected, and obtaining an element set A with content in the standard sample set;
step 2: reading in element peak information and content information. Taking any sample to be detected as a sample to be identified, and testing the corresponding element peak value information and content information in the element set A by an ED-XRF fluorescence spectrometer to obtain the actually measured component value (or peak count) X and content value Y of each element.
And step 3: and determining the input and the output of the PCA-SVR model. Constructing a PCA-SVR model, wherein a certain element needing quantitative analysis is called a target element, and an element which interferes with the target element is called an interference element. And taking an actually measured component value (or peak count) matrix consisting of the target elements and the interference elements in the standard sample set as the input of the PCA-SVR model, and taking an actually measured component value matrix consisting of the element content of the target elements as the output of the PCA-SVR model. For example, a matrix X of measured component values (or peak counts) of elementsnmIs a sample containing n samples to be tested, each sample to be tested is composed of component values of m elements, XnmThe first column of the matrix is the measured component value of a single target element, and the other m-1 columns are composed of other target elements and interference elements corresponding to all the target elements. Likewise, the matrix Y of measured content values of the elementsn1Is a sample containing n parts of samples to be detected, and each part of samples to be detected consists of the concentration value of the single element;
and 4, step 4: XRF spectral data was normalized. Will be original XnmStandardizing the matrix to obtain a standardized matrix
Figure GDA0003530064330000021
XnmRow vector of ith row in matrix
Figure GDA0003530064330000022
Represents a vector of component values (or peak counts) of m elements contained in the ith test sample. To XnmThe matrix is normalized as follows:
Figure GDA0003530064330000023
Figure GDA0003530064330000024
Figure GDA0003530064330000025
Figure GDA0003530064330000026
wherein i is a normalized matrix
Figure GDA00035300643300000212
And i is 1,2, n, j is a standardized matrix
Figure GDA00035300643300000213
And j is 1,2ijTo represent
Figure GDA0003530064330000027
The component value (or peak count) of the jth element in (j),
Figure GDA0003530064330000028
is a matrix XnmSample average value of j column of (1), x'ijRepresents the component value (or peak count) of the jth element of the ith sample to be tested after standard transformation, SjIs a matrix XnmThe standard deviation of the samples in column j,
Figure GDA0003530064330000029
representing a normalized matrix
Figure GDA00035300643300000210
Row vectors for the ith row.
And 5: normalizing matrix
Figure GDA00035300643300000211
Correlation coefficient matrix R:
Figure GDA0003530064330000031
in the formula, n is the number of samples to be detected,
Figure GDA0003530064330000032
is a normalized matrix.
Step 6: normalizing a matrix
Figure GDA0003530064330000033
Unit feature vector of
Figure GDA0003530064330000034
Solving an equation system Rb for the determined characteristic root lambda and the correlation coefficient matrix R obtained in the step 5j=λbjObtain a feature vector bjThen for each feature vector bjAfter normalization, m normalized unit feature vectors can be obtained
Figure GDA0003530064330000035
Figure GDA0003530064330000036
Wherein j is a normalized matrix
Figure GDA00035300643300000315
And j is 1,2jIs a feature vector, | | | | · | |, is a p-norm.
And 7: normalizing m unit feature vectors
Figure GDA0003530064330000037
Respectively converted into m main components, main component ujThe calculation formula of (2) is as follows:
Figure GDA0003530064330000038
wherein i is 1,2, and n, j is 1,2, and m,
Figure GDA0003530064330000039
for normalizing the matrix
Figure GDA00035300643300000310
The row vector of the ith row.
And 8: mapping the original X by a non-linear functionnmMapping of m-dimensional data to k-dimensional data of each row in the matrix, i.e. by PCA reductionAfter dimension measurement, k main components are obtained, and new k-dimension data are used for reflecting information expressed by original m-dimension data. The k-dimensional element component value (or peak count) feature data is then mapped from the low-dimensional nonlinear separable space into a high-dimensional linear separable feature space. Constructing a classification hyperplane in this high-dimensional linearly separable feature space:
Figure GDA00035300643300000311
wherein p is 1,2, k (k is less than or equal to m), hpFor class marking in p-dimension, in hyperplane
Figure GDA00035300643300000312
Is defined as h p1 in the hyperplane
Figure GDA00035300643300000313
Is defined as hpIs-1. w is the feature weight vector, b is the offset, xpRepresents the element component value (or peak count) vector of the sample to be detected after the PCA is reduced to the p dimension,
Figure GDA00035300643300000314
to convert data xpA non-linear mapping function mapped to a high-dimensional linearly separable feature space, wherein x is omitted for simplifying the formulapSubscript i in (1) different samples to be tested correspond to different xp
And step 9: in order to control the calculation speed and reduce the error in the sample training, a penalty factor C and a relaxation variable xi are introducedpAnd (3) constraining, and converting the classified hyperplane problem into a quadratic programming model:
Figure GDA0003530064330000041
in the formula, w is a feature weight vector, C is a penalty factor, xipAs a relaxation variable, hpIs a class mark, b is an offset,
Figure GDA0003530064330000042
to convert data xpA non-linear mapping function that maps to a high-dimensional linearly separable feature space.
Step 10: and performing parameter optimization by using a cross-validation method based on grid search, and training a PCA-SVR model. Obtaining an optimal parameter penalty factor C' and a relaxation variable xi by continuously iterating and searching for an optimal parameterp', and introducing a Lagrange multiplier alphapAnd solving the formula (9) by the kernel function K, wherein different samples to be detected correspond to different alphapAnd xip. When the minimum classification hyperplane meeting the precision requirement of the above formula (9) is the target element content prediction result
Figure GDA0003530064330000043
Else, iteration is continued until optimal parameters C ' and ξ ' are found 'p. The calculation formula for predicting the content of the single target element of any ith sample to be detected is as follows:
Figure GDA0003530064330000044
wherein i is 1,2, n, p is 1,2, k, αpIs a Lagrangian multiplier, hpFor class labels, K is the kernel function and b is the offset.
Step 11: comparing the single target element content predicted by calculating k principal components in the step 10 with the actual content result condition of the single target element: the Root Mean Square Error (RMSE) is calculated for each of k by taking different values, and the optimum number of principal components is selected. The RMSE is used for measuring the closeness degree of a predicted value and an actual value, the smaller the RMSE is, the more accurate the selection of the number of the main components is, and the more accurate the element content prediction is. In general, RMSE decreases as the number of principal components increases until a minimum or constant value is reached. When the RMSE curve is obviously reduced and then gradually levels off, the corresponding k value is the optimal main component number koptimal. I.e. the original matrix XnmMapping m-dimensional data of each row to koptimalDimension dataAbove, with koptimalThe dimension data reflects the original matrix XnmInformation expressed by medium m-dimensional data, koptimalM is less than or equal to m, and the RMSE evaluation index is calculated as follows:
Figure GDA0003530064330000045
in the formula (I), the compound is shown in the specification,
Figure GDA0003530064330000046
is a predicted value y of the content of a single target element in the ith sample to be testediThe actual value of the content of the single target element in the ith sample to be detected is obtained.
Step 12: when the number of the main components is koptimalComparing the prediction result with the actual content result of the standard sample, and calculating the determination coefficient (R)2),R2The method is used for reasonably evaluating the prediction effect of the PCA-SVR model and is used for describing the fitting degree of a regression line and an observed value. R2The larger the element content, the more accurate the element content prediction. R2The calculation formula of (2) is as follows:
Figure GDA0003530064330000051
Figure GDA0003530064330000052
in the formula, yiIs the real value of the content of a single target element in the ith sample to be detected,
Figure GDA0003530064330000053
is a predicted value of the content of the single target element in the ith sample to be detected,
Figure GDA0003530064330000054
the average value of the true value of the content of the single target element in the ith sample to be detected is obtained.
The method has the advantages of simple operation process, scientificity, reasonability, simple flow, convenience in operation, high prediction accuracy, intuitive result and popular and understandable property; the operation mode of the invention has the characteristics of high detection precision, high prediction accuracy, small calculation complexity and the like, can solve the problems of X fluorescence energy spectrum peak value overlapping interference, inaccurate measurement method of the traditional instrument and the like, reduces the influence of the environmental background, reduces the error caused by statistical fluctuation, and can effectively and quickly carry out quantitative prediction on the elements contained in the object to be detected.
Drawings
FIG. 1 is a flow chart of a PCA-SVR-based XRF elemental quantitative analysis method of the present invention;
FIG. 2 is a spectrum of a standard soil sample according to the present invention;
FIG. 3 is a graph of the results of principal component analysis based on the present invention;
FIG. 4 is a diagram showing the result of prediction of the content of soil elements according to the present invention.
Detailed Description
The following provides a more detailed description of the embodiments and the operation of the present invention with reference to the accompanying drawings.
The embodiment provides an XRF element quantitative analysis method based on PCA-SVR, the working flow of which is shown in FIG. 1, and the specific steps for obtaining element information and detection limit in a standard soil sample are as follows:
step 1: and determining a soil sample set, wherein n soil samples are set in the soil sample set, namely a sample 1 and a sample 2 … … sample 57. All elements capable of being identified by a spectrometer are taken to form an element set A contained in the soil sample, and a total 57 element sets A1-A57 are finally obtained, namely, a union set of A1-A57 is taken to obtain the element set A with the content in the soil sample set, wherein the elements in the element set A are included in No. 12-92 elements in the periodic table.
Step 2: 57 national standard samples are adopted as standard samples, and comprise GSS series soil component analysis standard substances, GBW series soil component analysis standard substances and GSD water system sediment component analysis standard substances, namely GSS-1-GSS-27, GSS-32, GBW 0070003-GBW 0070006 and GSD-2 a-GSD-33. An XRF spectrogram of a sample to be detected, an element component value X (or peak count) and a content value Y contained in the sample can be simultaneously obtained by an ED-XRF fluorescence spectrometer by using an intelligent energy dispersion fluorescence analysis method, and the XRF spectrogram of a standard soil sample is shown in figure 2.
And step 3: in the element set a, a union set of a target element to be studied and a corresponding interfering element thereof is taken As an input variable of the PCA-SVR model, the study objects in this embodiment are ten soil harmful elements of 23(V), 24(Cr), 25(Mn), 27(Co), 29(Cu), 30(Zn), 33(As), 42(Mo), 48(Cd), and 82(Pb), and the Cd element is taken As an example herein to perform detailed element content prediction. The information on the composition of some standard soil samples is shown in table 1.
Table 1 partial standard soil sample composition information
Figure GDA0003530064330000061
Taking 57 parts of standard soil samples as an example, the component content of the target element is recorded, and the detailed information is shown in table 2 when the original data is completely collected.
TABLE 2 national Standard soil sample composition information (ppm)
Figure GDA0003530064330000062
Figure GDA0003530064330000071
Figure GDA0003530064330000081
The measured component value matrix X consisting of the target element and the interference element thereofnmAnd taking the target element content matrix as the input of the PCA-SVR model and the output of the PCA-SVR model. Details of the interference elements are shown in table 3.
TABLE 3 main interference elements of the target elements
Figure GDA0003530064330000082
Taking Cd element content prediction As an example, the input of the PCA-SVR model is a57 × 21 component data matrix, that is, a matrix containing 57 samples, each sample is composed of component values of twenty-one elements (a union of all target elements and their corresponding interfering elements) of Cd, V, Ti, As, K, Cr, Se, Fe, Ni, Zn, Sr, Cu, P, Co, Mn, Pb, Ca, Mo, Nb, Ag, Sb, wherein the first column is a component value of a single target element (Cd), the remaining 20 columns are component values of other target elements and all interfering elements, and the column positions of the 20 columns of data can be randomly arranged. Similarly, the output of the PCA-SVR model is a57 × 1 component data matrix, i.e., a matrix containing 57 samples, each sample consisting of a single target element (Cd) content value.
And 4, step 4: XRF spectral data was normalized. The original size of matrix X is n × mnmCarrying out standardization processing to obtain a standardized matrix
Figure GDA0003530064330000083
XnmRow vector of ith row in matrix
Figure GDA0003530064330000084
A vector consisting of component values (or peak counts) of m elements contained in the ith test sample. To XnmThe matrix is normalized as follows:
Figure GDA0003530064330000091
Figure GDA0003530064330000092
Figure GDA0003530064330000093
Figure GDA0003530064330000094
wherein i is a normalized matrix
Figure GDA0003530064330000095
And i is 1,2, n, j is a standardized matrix
Figure GDA0003530064330000096
And j equals 1,2, a, m, xijTo represent
Figure GDA0003530064330000097
The component value (or peak count) of the jth element in (j),
Figure GDA0003530064330000098
is a matrix XnmJ column sample mean, x'ijRepresents the component value (or peak count) of the jth element of the ith sample to be tested after standard transformation, SjIs a matrix XnmThe standard deviation of the samples in column j,
Figure GDA0003530064330000099
representing a normalized matrix
Figure GDA00035300643300000910
Row vector for row i.
And 5: normalizing a matrix
Figure GDA00035300643300000911
Correlation coefficient matrix R of (a):
Figure GDA00035300643300000912
in the formula, n is the number of samples to be detected,
Figure GDA00035300643300000913
is a normalized matrix.
And 6: normalizing a matrix
Figure GDA00035300643300000914
Unit feature vector of
Figure GDA00035300643300000915
Solving an equation system Rb for the determined characteristic root lambda and the correlation coefficient matrix R obtained in the step 5j=λbjObtain a feature vector bjThen for each feature vector bjAfter normalization, m normalized unit feature vectors can be obtained
Figure GDA00035300643300000916
Figure GDA00035300643300000917
Wherein j is a normalized matrix
Figure GDA00035300643300000918
And j is 1,2jIs a feature vector, | | | | · | |, is a p-norm.
And 7: normalizing m unit feature vectors
Figure GDA00035300643300000919
Respectively converted into m main components, main component ujThe calculation formula of (2) is as follows:
Figure GDA0003530064330000101
wherein i is 1,2, and n, j is 1,2, and m,
Figure GDA0003530064330000102
for standardizing matrices
Figure GDA0003530064330000103
The row vector of the ith row.
And 8: mapping matrix X by a non-linear functionnmMapping the m-dimensional data of each row in the system to k-dimensional data, namely obtaining k principal components after dimensionality reduction by a Principal Component Analysis (PCA), reflecting information expressed by the original m-dimensional data by the k-dimensional data, mapping k-dimensional element component data from a low-dimensional nonlinear separable space to a high-dimensional linear separable feature space, and constructing a classification hyperplane in the high-dimensional linear separable feature space:
Figure GDA0003530064330000104
wherein p is 1,2, k is less than or equal to m, hpFor class marking in p-dimension, in hyperplane
Figure GDA0003530064330000105
Is defined as h p1 is ═ 1; in the hyperplane
Figure GDA0003530064330000106
Is defined as hp-1, w is the feature weight vector, b is the offset, xpRepresenting the element component value vector of the sample to be detected after the PCA is reduced to the p dimension,
Figure GDA0003530064330000107
to convert data xpA non-linear mapping function mapped to a high-dimensional linearly separable feature space, wherein x is omitted for simplifying the formulapSubscript i (different samples to be tested, corresponding to different x) in (1)p)。
And step 9: in order to control the calculation speed and reduce the error in the sample training, a penalty factor C and a relaxation variable xi are introducedpAnd (3) constraining, and converting the classified hyperplane problem into a quadratic programming model:
Figure GDA0003530064330000108
in the formula, w is a feature weight vector, C is a penalty factor, and xipAs a relaxation variable, hpIs a class mark, b is an offset,
Figure GDA0003530064330000109
to convert data xpA non-linear mapping function that maps to a high-dimensional linearly separable feature space.
Step 10: and performing parameter optimization by using a cross-validation method based on grid search, and training a PCA-SVR model. Obtaining an optimal parameter penalty factor C ' and an optimal relaxation variable xi ' by continuously iterating and searching for an optimal parameter 'pAnd introducing a Lagrange multiplier alphapAnd solving the formula (9) by the kernel function K, wherein different samples to be detected correspond to different alphapAnd xipUsually in the parameters w, b, xp,C,ξp,αpThe subscript p in K varies with the input sample to be tested, e.g. xp,ξp,αpWithout the subscript p, the usual does not change, e.g., w, b, C, K. When the minimum classification hyperplane meeting the precision requirement of the above formula (9) is the content prediction result of the single target element (Cd)
Figure GDA0003530064330000111
Else, iteration is continued until optimal parameters C ' and ξ ' are found 'p. The calculation formula for predicting the content of the single target element (Cd) in any ith sample to be detected is as follows:
Figure GDA0003530064330000112
wherein i is 1,2, n, p is 1,2, k, αpIs the Lagrange multiplier, h, of the ith sample to be measuredpFor class labels, K is the kernel function and b is the offset.
Step 11: calculating k principal components u in step 10jComparing the predicted target element content with the actual target element content result, and taking the k as differentRespectively, the Root Mean Square Error (RMSE) is calculated to select the optimum number of principal components. The RMSE is used for measuring the closeness degree of a predicted value and an actual value, the smaller the RMSE is, the more accurate the selection of the number of the main components is, and the more accurate the element content prediction is. In general, RMSE decreases as the number of principal components increases until a minimum or constant value is reached. When the RMSE curve is obviously reduced and then gradually levels off, the corresponding k value is the optimal main component number koptimalI.e. matrix XnmMapping m-dimensional data of each row to koptimalOn dimension data, with koptimalDimensional data reflects matrix XnmInformation expressed by medium m-dimensional data, koptimalM is less than or equal to m, and the RMSE evaluation index is calculated as follows:
Figure GDA0003530064330000113
in the formula (I), the compound is shown in the specification,
Figure GDA0003530064330000114
is the predicted value y of the content of a single target element in the ith sample to be testediThe actual value of the content of the single target element in the ith sample to be detected is obtained.
As shown in fig. 3, the RMSE curves obtained by the conventional Partial Least Squares Regression (PLSR) method and the present invention based on the PCA-SVR method are compared, and the PCA-SVR method is optimal when the number of principal components is 4 and the PLSR method is optimal when the number of principal components is 8; under the condition of the same number of principal components, the RMSE of the PCA-SVR method is smaller than that of the PLSR method, and the prediction is more accurate.
Step 12: when the number of the main components is koptimalComparing the prediction result with the actual content result of the standard sample, and calculating the determination coefficient (R)2),R2The method is used for reasonably evaluating the prediction effect of the PCA-SVR model and is used for describing the fitting degree of a regression line and an observed value. R2The larger the element content, the more accurate the element content prediction. R2The calculation formula of (2) is as follows:
Figure GDA0003530064330000115
Figure GDA0003530064330000121
in the formula, yiIs the true value of the content of a single target element (Cd) in the ith sample to be detected,
Figure GDA0003530064330000122
is a predicted value of the content of the single target element (Cd) in the ith sample to be detected,
Figure GDA0003530064330000123
the average value of the true value of the content of the single target element (Cd) in the ith sample to be detected is obtained.
The element content prediction method for the other 9 kinds of study objects 23(V), 24(Cr), 25(Mn), 27(Co), 29(Cu), 30(Zn), 33(As), 42(Mo), and 82(Pb) in the element set a was the same As that for the element 48 (Cd).
The element determination coefficient R of the standard soil sample based on the PCA-SVR method of the invention and the traditional Partial Least Squares Regression (PLSR) method is adopted2Results are compared and detailed information is shown in table 4:
TABLE 4 determination coefficient R for prediction of element content of standard soil sample2Comparison of results
Figure GDA0003530064330000124
Taking Cd as an example, the content prediction result is shown in fig. 4, and it can be seen that the XRF element quantitative prediction result based on PCA-SVR better conforms to the actual result of element content compared to the Partial Least Squares Regression (PLSR) method. The PCA-SVR algorithm effectively solves the problem of spectral line overlapping, improves the accuracy of the element quantitative analysis result and embodies the superiority of the method of the invention.
While the invention has been described with reference to specific embodiments, any feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise; all of the disclosed features, or all of the method or process steps, may be combined in any combination, except mutually exclusive features and/or steps; any non-essential addition or replacement made by a person skilled in the art according to the technical features of the technical solution of the present invention is within the scope of the present invention.

Claims (5)

1. An XRF element quantitative analysis method based on PCA-SVR is characterized by comprising the following steps:
step 1: determining a standard sample set, wherein the standard sample set comprises n samples to be detected, taking a union set of all elements which can be identified by an ED-XRF fluorescence spectrometer in the standard sample set to form an element set contained in the n samples to be detected, and obtaining an element set A with content in the standard sample set, wherein the elements which can be identified by the ED-XRF fluorescence spectrometer are No. 12-92 elements in an element periodic table;
step 2: reading element peak information and content information: taking any sample to be detected as a sample to be identified, and testing corresponding element peak value information and content information in an element set A by an ED-XRF fluorescence spectrometer to obtain the actually measured component value and content value of each element in the sample to be identified;
and step 3: determining the input and the output of the PCA-SVR model: constructing a PCA-SVR model, wherein a certain element needing quantitative analysis is called a target element, an element which interferes with the target element is called an interference element, and an actually measured component value matrix X consisting of all the target elements researched in the element set A and the corresponding interference elements is constructednmAs the input of the PCA-SVR model, the actually measured content value matrix Y formed by the element content of the target elementn1As the output of the PCA-SVR model, wherein the measured component values matrix XnmIs a matrix comprising n samples to be tested, each sample to be tested consisting of the actually measured component values of m elements, XnmThe first column of the matrix is the measured component value of a single target element, the remaining m-1 columns are other target elements in the element set A andmeasured component values of interference elements corresponding to all target elements; the measured content value matrix Yn1Is a matrix comprising n samples to be tested, each sample to be tested consisting of the single target element content value;
and 4, step 4: XRF spectral data normalization: will matrix XnmCarrying out standardization processing to obtain a standardized matrix
Figure FDA0003530064320000011
Matrix XnmRow vector of the ith row
Figure FDA0003530064320000012
Representing the m element actual measurement component value vectors contained in the ith sample to be tested, and aligning the matrix XnmThe normalization process is performed as follows:
Figure FDA0003530064320000013
Figure FDA0003530064320000014
Figure FDA0003530064320000015
Figure FDA0003530064320000021
wherein i is a normalized matrix
Figure FDA0003530064320000022
And i is 1,2, n, j is a standardized matrix
Figure FDA0003530064320000023
Column (2) ofAnd j is 1,2ijTo represent
Figure FDA0003530064320000024
The measured component value of the jth element in (a),
Figure FDA0003530064320000025
is a matrix XnmSample mean, x 'of column j'ijRepresenting the component value, S, of the jth element of the ith sample to be tested after standard transformationjIs a matrix XnmThe standard deviation of the samples in column j,
Figure FDA0003530064320000026
representing a normalized matrix
Figure FDA0003530064320000027
A row vector for row i;
and 5: normalizing a matrix
Figure FDA0003530064320000028
Correlation coefficient matrix R:
Figure FDA0003530064320000029
step 6: normalizing matrix
Figure FDA00035300643200000210
Unit feature vector of
Figure FDA00035300643200000211
Solving an equation system Rb for the determined characteristic root lambda and the correlation coefficient matrix R obtained in the step 5j=λbjGet the feature vector bjThen for each feature vector bjObtaining m normalized unit characteristic vectors after normalization
Figure FDA00035300643200000212
Figure FDA00035300643200000213
Wherein, | | · | | is a p-norm;
and 7: normalizing m unit feature vectors
Figure FDA00035300643200000214
Respectively converted into m main components, main component ujThe calculation formula of (2) is as follows:
Figure FDA00035300643200000215
and 8: mapping matrix X by a non-linear functionnmMapping the m-dimensional data of each row in the system to k-dimensional data, namely obtaining k principal components after dimensionality reduction by a Principal Component Analysis (PCA), reflecting information expressed by original m-dimensional element component value data by k-dimensional element component value data, then mapping the k-dimensional element component value data from a low-dimensional nonlinear separable space to a high-dimensional linear separable feature space, and constructing a classification hyperplane in the high-dimensional linear separable feature space:
Figure FDA00035300643200000216
wherein p is 1,2, k is less than or equal to m, hpFor class marking in p-dimension, in hyperplane
Figure FDA00035300643200000217
Is defined as hp1 is ═ 1; in a hyperplane
Figure FDA0003530064320000031
Is defined as hp=-1,w is the feature weight vector, b is the offset, xpRepresenting the element component value vector of the sample to be detected after the PCA is reduced to the p dimension,
Figure FDA0003530064320000032
to convert data xpA non-linear mapping function mapped to a high-dimensional linearly separable feature space, wherein x is omitted for simplifying the formulapSubscript i in (1), i.e. different samples to be tested, corresponds to different xp
And step 9: introducing a penalty factor C and a relaxation variable xipAnd (3) constraining, and converting the classified hyperplane problem into a quadratic programming model:
Figure FDA0003530064320000033
step 10: performing parameter optimization by using a cross-validation method based on grid search, and training the PCA-SVR model: obtaining an optimal parameter penalty factor C ' and an optimal relaxation variable xi ' by continuously iterating and searching for an optimal parameter 'pAnd introducing a Lagrange multiplier alphapAnd solving the formula (9) by the kernel function K, wherein different samples to be detected correspond to different alphapAnd the resulting optimum relaxation variable ξ'pAre also different; the minimum classification hyperplane meeting the precision requirement of the formula (9) is the prediction result of the content of the target element
Figure FDA0003530064320000034
The calculation formula for predicting the content of the single target element of any ith sample to be detected is as follows:
Figure FDA0003530064320000035
step 11: comparing the single target element content predicted in the step 10 with the actual single target element content result: taking k as different values to respectively calculate the root mean square error RMSE which follows the principal componentThe number is increased and decreased until reaching the minimum value or constant value, at which time the corresponding k value is the optimal number k of principal componentsoptimalI.e. matrix XnmMapping m-dimensional data of each row to koptimalOn dimension data, with koptimalDimensional data reflects matrix XnmInformation expressed by medium m-dimensional data, koptimalM is less than or equal to m, and the RMSE evaluation index is calculated as follows:
Figure FDA0003530064320000036
in the formula (I), the compound is shown in the specification,
Figure FDA0003530064320000037
is the predicted value y of the content of a single target element in the ith sample to be testediThe true value of the content of the single target element in the ith sample to be detected is obtained;
step 12: when the number of the main components is koptimalComparing the predicted single target element content with the actual single target element content result, and calculating the decision coefficient R2To evaluate the predictive effect of the model, R2The calculation formula of (2) is as follows:
Figure FDA0003530064320000041
Figure FDA0003530064320000042
in the formula (I), the compound is shown in the specification,
Figure FDA0003530064320000043
the average value of the true value of the content of the single target element in the ith sample to be detected is obtained.
2. The method for PCA-SVR based XRF elemental quantitative analysis of claim 1 wherein n-57 and m-21.
3. The PCA-SVR-based XRF elemental analysis method of claim 2, characterized in that the samples to be tested in said standard sample set comprise GSS series soil composition analysis standard substance, GBW series soil composition analysis standard substance and GSD water system sediment composition analysis standard substance.
4. The PCA-SVR-based XRF elemental quantitative analysis method of claim 3 characterized in that all the target elements studied in said element set A include vanadium (V), chromium (Cr), manganese (Mn), cobalt (Co), copper (Cu), zinc (Zn), arsenic (As), molybdenum (Mo), cadmium (Cd), lead (Pb).
5. The PCA-SVR-based XRF elemental quantitative analysis method of claim 4 characterized in that said single target element is cadmium (Cd).
CN202111073294.7A 2021-09-14 2021-09-14 XRF element quantitative analysis method based on PCA-SVR Active CN113848225B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111073294.7A CN113848225B (en) 2021-09-14 2021-09-14 XRF element quantitative analysis method based on PCA-SVR

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111073294.7A CN113848225B (en) 2021-09-14 2021-09-14 XRF element quantitative analysis method based on PCA-SVR

Publications (2)

Publication Number Publication Date
CN113848225A CN113848225A (en) 2021-12-28
CN113848225B true CN113848225B (en) 2022-06-03

Family

ID=78974124

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111073294.7A Active CN113848225B (en) 2021-09-14 2021-09-14 XRF element quantitative analysis method based on PCA-SVR

Country Status (1)

Country Link
CN (1) CN113848225B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104198512A (en) * 2014-08-18 2014-12-10 北京农业质量标准与检测技术研究中心 Support vector machine-based X-ray fluorescence spectrum analysis method and support vector machine-based X-ray fluorescence spectrum analysis device
CN104897709A (en) * 2015-06-15 2015-09-09 江苏大学 Agricultural product element quantitative detection model building method based on X-ray fluorescence analysis
CN109829513A (en) * 2019-03-04 2019-05-31 武汉大学 A kind of sequential Wavelength Dispersive-X-Ray fluorescence spectrum intelligent analysis method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104198512A (en) * 2014-08-18 2014-12-10 北京农业质量标准与检测技术研究中心 Support vector machine-based X-ray fluorescence spectrum analysis method and support vector machine-based X-ray fluorescence spectrum analysis device
CN104897709A (en) * 2015-06-15 2015-09-09 江苏大学 Agricultural product element quantitative detection model building method based on X-ray fluorescence analysis
CN109829513A (en) * 2019-03-04 2019-05-31 武汉大学 A kind of sequential Wavelength Dispersive-X-Ray fluorescence spectrum intelligent analysis method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
PCA-SVR联用算法在近红外光谱分析烟草成分中的应用;刘旭等;《光谱学与光谱分析》;20071231;第27卷(第12期);2460-2463 *
主成分分析-支持向量回归建模方法及应用研究;侯振雨等;《分析化学》;20060531;第34卷(第5期);617-620 *
基于PCA-SVR的燃煤锅炉NO_x排放预测;钟用禄等;《热力发电》;20150131;第44卷(第1期);87-90 *

Also Published As

Publication number Publication date
CN113848225A (en) 2021-12-28

Similar Documents

Publication Publication Date Title
Wang et al. A comparison of three methods for estimating leaf area index of paddy rice from optimal hyperspectral bands
CN101825567A (en) Screening method for near infrared spectrum wavelength and Raman spectrum wavelength
CN107958267B (en) Oil product property prediction method based on spectral linear representation
CN113011796A (en) Edible oil safety early warning method based on hierarchical analysis-neural network
CN109409350A (en) A kind of Wavelength selecting method based on PCA modeling reaction type load weighting
CN115436407A (en) Element content quantitative analysis method combining random forest regression with principal component analysis
CN106600037A (en) Multi-parameter auxiliary load forecasting method based on principal component analysis
CN112528559A (en) Chlorophyll a concentration inversion method combining presorting and machine learning
CN112231621A (en) Method for reducing element detection limit based on BP-adaboost
Freitas et al. Forecasting the spatiotemporal variability of soil CO 2 emissions in sugarcane areas in southeastern Brazil using artificial neural networks
Hamidisepehr et al. Moisture content classification of soil and stalk residue samples from spectral data using machine learning
CN115169728A (en) Soil fertility prediction method based on simplified neural network
Jamali et al. Wheat leaf traits monitoring based on machine learning algorithms and high-resolution satellite imagery
Ortiz-Herrero et al. Multivariate (O) PLS regression methods in forensic dating
CN113848225B (en) XRF element quantitative analysis method based on PCA-SVR
Tosin et al. Estimation of grapevine predawn leaf water potential based on hyperspectral reflectance data in Douro wine region.
CN112926016A (en) Multivariable time series change point detection method
Kinlaw et al. A New Index of the Business Cycle
Lin et al. Hyperspectral estimation of soil composition contents based on kernel principal component analysis and machine learning model
CN109829513B (en) Sequential wavelength dispersion X-ray fluorescence spectrum intelligent analysis method
Ying et al. Gaussian process regression coupled with MPT-AES for quantitative determination of multiple elements in ginseng
CN114994109A (en) XRF trace element quantitative analysis method based on ISOMAP-ELM
CN117894394A (en) Method for predicting trace element content based on ConvBiLSTM-attribute deep neural network
CN113011086A (en) Estimation method of forest biomass based on GA-SVR algorithm
Colla et al. GADF—Genetic Algorithms for distribution fitting

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant