CN114676792A - Near infrared spectrum quantitative analysis dimensionality reduction method and system based on stochastic projection algorithm - Google Patents

Near infrared spectrum quantitative analysis dimensionality reduction method and system based on stochastic projection algorithm Download PDF

Info

Publication number
CN114676792A
CN114676792A CN202210385752.9A CN202210385752A CN114676792A CN 114676792 A CN114676792 A CN 114676792A CN 202210385752 A CN202210385752 A CN 202210385752A CN 114676792 A CN114676792 A CN 114676792A
Authority
CN
China
Prior art keywords
spectrum
near infrared
matrix
avg
quantitative analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210385752.9A
Other languages
Chinese (zh)
Inventor
杜文莉
赵云蒙
何仁初
钟伟民
杨明磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China University of Science and Technology
Original Assignee
East China University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China University of Science and Technology filed Critical East China University of Science and Technology
Priority to CN202210385752.9A priority Critical patent/CN114676792A/en
Publication of CN114676792A publication Critical patent/CN114676792A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • G01N21/35Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
    • G01N21/359Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using near infrared light
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Optimization (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Biophysics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Databases & Information Systems (AREA)
  • Algebra (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Investigating Or Analysing Materials By Optical Means (AREA)

Abstract

The invention relates to the technical field of near-infrared modeling data processing, in particular to a near-infrared spectrum quantitative analysis dimensionality reduction method and system based on a random projection algorithm. The invention comprises the following steps: step S1, acquiring near infrared spectrum xvalSample and corresponding physicochemical property value yvalAs a sample set; step S2, dividing the sample set into a correction set and a verification set, and calculating an average spectrum xavg(ii) a Step S3, respectively aligning the near infrared spectrum xvalAnd average spectrum xavgPreprocessing the spectrum matrix XvalAnd averageSpectral matrix Xavg(ii) a Step S4, spectrum matrix X is subjected to Gaussian random projection algorithmvalCarrying out random dimensionality reduction projection to obtain a dimensionality reduced spectrum matrix XvalRed(ii) a Step S5, establishing an artificial neural network prediction model; step S6, checking the model by adopting a verification set; and step S7, carrying out quantitative analysis on the input near infrared spectrum, and outputting a corresponding physicochemical property predicted value. The invention does not need to select the wavelength of the spectrum, reduces the modeling difficulty and shortens the modeling time.

Description

Near infrared spectrum quantitative analysis dimensionality reduction method and system based on stochastic projection algorithm
Technical Field
The invention relates to the technical field of near-infrared modeling data processing, in particular to a near-infrared spectrum quantitative analysis dimensionality reduction method and system based on a random projection algorithm.
Background
The near-infrared analysis technology is a method for analyzing the absorption characteristics of a near-infrared spectrum region according to a certain chemical component in a detected sample, and qualitatively and quantitatively analyzes the sample by means of a chemometrics multivariate correction method and the slight difference of spectral information among the samples. Due to the influence of unstable factors such as noise of a spectral instrument, external environment variation and the like, the signal-to-noise ratio of certain wave bands of the near infrared spectrum is low, the spectral quality is poor, the wave bands can cause instability of the model, multiple correlations exist among the spectral wavelengths of the sample, redundant information exists in the spectral information, and the calculation of the near infrared analysis model is complex.
Therefore, wavelength selection is often required when building near-infrared analysis models. At present, the methods for selecting the wavelength include a correlation coefficient method, a genetic algorithm, a simulated annealing algorithm, an interval partial least square method and the like.
However, the process of wavelength selection is very cumbersome and the most time-consuming and laborious process before modeling.
At present, the common methods for near infrared modeling mainly include multiple linear regression, partial least square method, artificial neural network, support vector machine and the like.
The partial least square method is one of the most common modeling methods for near infrared modeling, can effectively reduce the dimension, extract the effective information of an independent variable matrix, reflect the linear relation between the near infrared spectrum wave number and the oil attribute to be analyzed, and is reliable and accurate in modeling. However, the partial least squares method cannot effectively reflect the nonlinear relationship between the near infrared spectrum and the properties of the oil to be analyzed.
Therefore, there is a need for improvement to solve the above-mentioned deficiencies of the existing near infrared spectroscopy quantitative analysis technology.
Disclosure of Invention
The invention aims to provide a near infrared spectrum quantitative analysis dimensionality reduction method and system based on a random projection algorithm, and solves the problem that in the prior art, wavelength selection is needed for near infrared analysis, so that time and labor are wasted.
In order to achieve the aim, the invention provides a near infrared spectrum quantitative analysis dimension reduction method based on a stochastic projection algorithm, which comprises the following steps:
step S1, acquiring near infrared spectrum xvalSample and corresponding physicochemical property value yvalAs a sample set;
step S2, dividing the sample set into a correction set and a verification set, and according to the near infrared spectrum x of the correction setvalCalculating the average spectrum xavg
Step S3, near infrared spectrum x of correction setvalPreprocessing the spectrum matrix XvalAverage spectrum x for the calibration setavgPreprocessing to obtain an average spectrum matrix Xavg
Step S4, based on the Gaussian random projection algorithm, the spectrum matrix X of the correction setvalCarrying out random dimensionality reduction projection to obtain a dimensionality reduced spectrum matrix XvalRed
Step S5, based on the reduced dimension spectrum matrix XvalRedEstablishing an artificial neural network prediction model;
step S6, checking the artificial neural network prediction model established in the step S5 by adopting a verification set;
and S7, carrying out quantitative analysis on the input near infrared spectrum based on the artificial neural network prediction model checked in the step S6, and outputting a corresponding physicochemical property predicted value.
In one embodiment, in the step S2, the average spectrum xavgThe corresponding expression is:
Figure BDA0003593597340000021
wherein n is the number of near infrared spectrums, xvaliIs the ith spectrum.
In an embodiment, in step S2, the dividing the sample set into a correction set and a verification set further includes:
and selecting m spectra from the sample set as a correction set and using the rest samples as a verification set by adopting a K-S algorithm based on Euclidean distance or an SPXY algorithm based on property variables.
In an embodiment, the preprocessing manner in step S3 includes: first derivative, second derivative and max-min normalization.
In one embodiment, the step S3 further includes:
for near infrared spectrum xvalSimultaneously carrying out various pretreatments to obtain a spectrum matrix Xval
For average spectrum xavgSimultaneously carrying out various pretreatments to obtain an average spectrum matrix Xavg
In an embodiment, the step S4, further includes:
step S41, according to the average spectrum matrix XavgObtaining a Gaussian random projection transition matrix P;
step S42, based on the transition matrix P of Gaussian random projection, for the spectral matrix XvalCarrying out random dimensionality reduction projection to obtain a dimensionality reduced spectrum matrix XvalRed
In an embodiment, the step S41, further includes:
averaging the spectral matrix X of p wavelength pointsavgCarrying out random dimensionality reduction projection to obtain a dimensionality reduced average spectrum matrix X of q wavelength pointsavgRed
According to the expression XavgRed=P*XavgAnd solving a Gaussian random projection transition matrix P.
In one embodiment, the average spectrum matrix XavgAnd an average spectral matrix XavgRedThe following inequalities are satisfied:
(1-eps)||Xavg-XavgRed||2<||Xavg-XavgRed||2<(1+eps)||Xavg-XavgRed||2
the p wavelength points and the q wavelength points after dimensionality reduction satisfy the following inequality:
Figure BDA0003593597340000031
wherein eps is the dimension reduction error.
In one embodiment, the artificial neural network prediction model is a two-dimensional convolution prediction model;
the step S5 further includes:
step S51, introducing the spectrum matrix after dimensionality reduction into an input layer of two-dimensional convolution, calculating through two convolution layers, a weight and activation function and a pooling layer, and transmitting the spectrum matrix to an output layer after calculating through the convolution layer and the pooling layer for multiple times;
and step S52, comparing the predicted value obtained by the output layer with the sample expected value, and if an error exists between the predicted value and the sample expected value, returning to step S51 to adjust the weight until the difference between the predicted value and the sample expected value reaches a first threshold value.
In an embodiment, the step S6, further includes:
and (4) carrying out verification set inspection on the artificial neural network prediction model established in the step (S5) by adopting a verification set, and calculating a prediction standard deviation Rmsep, wherein the corresponding expression is as follows:
Figure BDA0003593597340000032
where m is the number of spectra in the validation set, yi,actual1To verify the measurement of the ith spectrum,yi,predicted1To verify the predicted value of the ith spectrum.
In an embodiment, the step S6, further includes:
performing cross check on the artificial neural network prediction model established in the step S5 by using a correction set, and calculating a cross validation standard deviation Rmescv, wherein a corresponding expression is as follows:
Figure BDA0003593597340000041
where n is the number of spectra in the calibration set, yi,actual2To correct the measured value of the ith spectrum, yi,predicted2Predicted values of the ith spectrum are set for correction.
In order to achieve the above object, the present invention provides a near infrared spectrum quantitative analysis dimension reduction system based on a stochastic projection algorithm, comprising:
a memory for storing instructions executable by the processor;
a processor for executing the instructions to implement the method of any one of the above
To achieve the above object, the present invention provides a computer readable medium having stored thereon computer instructions, wherein the computer instructions, when executed by a processor, perform the method as described in any one of the above.
According to the near infrared spectrum quantitative analysis dimensionality reduction method and system based on the stochastic projection algorithm, Gaussian stochastic projection is used for dimensionality reduction, spectral wavelength selection is not needed, modeling difficulty is reduced, modeling time is shortened, and simple and rapid modeling can be performed for near infrared analysis.
Drawings
The above and other features, properties and advantages of the present invention will become more apparent from the following description of the embodiments with reference to the accompanying drawings in which like reference numerals denote like features throughout the several views, wherein:
FIG. 1 discloses a flow chart of a method for reducing the dimension of near infrared spectrum quantitative analysis based on a stochastic projection algorithm according to an embodiment of the invention;
FIG. 2 discloses a sample raw spectrum according to an embodiment of the present invention;
FIG. 3 discloses a schematic diagram of a near infrared spectrum quantitative analysis dimension reduction system based on a stochastic projection algorithm according to an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Aiming at the defects of the existing near-infrared technology, the invention provides a near-infrared spectrum quantitative analysis dimension reduction method and system based on a random projection algorithm, and the method and system can be widely applied to the industries of petrochemical industry, agriculture, food and the like.
Fig. 1 discloses a flow chart of a near infrared spectrum quantitative analysis dimension reduction method based on a stochastic projection algorithm according to an embodiment of the present invention, and as shown in fig. 1, the near infrared spectrum quantitative analysis dimension reduction method based on the stochastic projection algorithm provided by the present invention specifically includes the following steps:
step S1, acquiring near infrared spectrum xvalSample and corresponding physicochemical property value yvalAs a sample set;
step S2, dividing the sample set into a correction set and a verification set, and according to the near infrared spectrum x of the correction setvalCalculating the average spectrum xavg
Step S3, near infrared spectrum x of correction setvalPreprocessing the spectrum matrix XvalAverage spectrum x over correction setavgPreprocessing to obtain an average spectrum matrix Xavg
Step S4, based on the Gaussian random projection algorithm, the spectrum matrix X of the correction setvalCarrying out random dimensionality reduction projection to obtain a dimensionality reduced spectrum matrix XvalRed
Step S5, based on the reduced dimension spectrum matrix XvalRedEstablishing an artificial neural networkMeasuring a model;
step S6, checking the artificial neural network prediction model established in the step S5 by adopting a verification set;
and S7, carrying out quantitative analysis on the input near infrared spectrum based on the artificial neural network prediction model checked in the step S6, and outputting a corresponding physicochemical property predicted value.
These steps will be described in detail below. It is understood that within the scope of the present invention, the above-mentioned technical features of the present invention and the technical features described in detail below (e.g., the embodiments) can be combined with each other and associated with each other to constitute a preferred technical solution.
Step S1, acquiring near infrared spectrum xvalSample and corresponding physicochemical property value yvalAs a sample set.
And acquiring the near infrared spectrum of a batch of samples and corresponding physicochemical property values thereof for modeling.
The physicochemical properties include physical properties and chemical properties.
Alternatively, physical properties include, but are not limited to, density, freezing point and viscosity, distillation range, and the like;
optionally, the chemical properties include composition of the components, elemental content, and the like.
Alternatively, the physicochemical property value may be measured in a laboratory manner as a measured value.
In this example, a batch of near infrared spectra x is acquiredvalAnd physical and chemical property value yvalFor modeling;
wherein the near infrared spectrum xvalComprising n spectra xvaliWherein i is 1 to n, xvaliRepresents the ith spectrum;
ith spectrum xvaliPhysicochemical property value y corresponding to label attributevali
The near infrared spectrum has p wavelength points.
Step S2, dividing the sample set into a correction set and a verification set, and according to the near infrared spectrum x of the correction setvalCalculating the average spectrum xavg
Further onN average spectra x of the near infrared spectraavgThe calculation formula is shown as (1):
Figure BDA0003593597340000061
wherein n is the number of near infrared spectrums, xvaliIs the ith spectrum.
Furthermore, m spectra with strong spectral representation are selected from the sample set as a correction set by adopting a K-S algorithm based on Euclidean distance or an SPXY algorithm based on property variables, and the rest samples are used as a verification set.
The principle of the K-S (Kennard-Stone) algorithm is that all samples are regarded as training set candidate samples, and samples are sequentially selected from the training set candidate samples to be used as training set candidate samples. Firstly, selecting two samples with the farthest Euclidean distance into a training set, finding two samples with the farthest Euclidean distance and the nearest Euclidean distance from each remaining sample to each known sample in the training set, selecting the two samples into the training set, and repeating the steps until the number of the samples meets the requirement.
The SPXY (sample set partitioning on joint x-y distance) algorithm was developed based on the K-S algorithm, which takes into account both the x-variable and the y-variable in the calculation of the distance between samples.
Step S3, near infrared spectrum x of correction setvalPreprocessing the spectrum to obtain a spectrum matrix XvalAverage spectrum x for the calibration setavgPreprocessing to obtain an average spectrum matrix Xavg
The near infrared spectrum is susceptible to interference of some environmental factors during measurement, noise is generated, and the spectrum contains some wavelength points which cannot be utilized, so that the spectrum is preprocessed in the step.
The spectrum preprocessing can amplify effective information of the spectrum and filter noise information in the spectrum, so that the modeling complexity is reduced, and the robustness of the model is improved.
The manner of preprocessing includes, but is not limited to, first derivative, second derivative, and maximum-minimum normalization.
Further, the step S3 further includes:
for near infrared spectrum xvalSimultaneously carrying out various pretreatments to obtain a spectrum matrix Xval
For average spectrum xavgSimultaneously carrying out various pretreatments to obtain an average spectrum matrix Xavg
In this embodiment, the three preprocessing methods are simultaneously adopted, so that one sample spectrum x is obtainedvalAnd average spectrum xavgThe spectrum matrix X containing three preprocessing modes is changedvalAnd the average spectral matrix Xavg
And carrying out various pretreatments on a sample spectrum, and combining the obtained various preprocessed data into a matrix.
Step S4, based on the Gaussian random projection algorithm, the spectrum matrix X of the correction setvalCarrying out random dimensionality reduction projection to obtain a dimensionality reduced spectrum matrix XvalRed
And performing Gaussian random projection on each sample matrix to a low-dimensional matrix.
Further, the step S4 further includes:
step S41, according to the average spectrum matrix XavgObtaining a Gaussian random projection transition matrix P;
step S42, based on Gaussian random projection transition matrix P, for near infrared spectrum xvalCarrying out random dimensionality reduction projection to obtain a dimensionality reduced spectrum matrix XvalRed
More specifically, the gaussian random projection transition matrix P of step S41 is obtained by:
averaging the spectral matrix X of p wavelength pointsavgCarrying out random dimensionality reduction projection to obtain a dimensionality reduced average spectrum matrix X of q wavelength pointsavgRedMean spectral matrix X after dimensionality reductionavgRedSatisfies the following formula (2):
XavgRed=P*Xavg (2)
wherein, P is a gaussian random projection transition matrix (obeying gaussian distribution) using the mean spectrum matrix to reduce dimension, and the transition matrix P can be solved according to the formula (2).
Wherein the average spectrum matrix XavgAnd the average spectrum matrix X after dimensionality reductionavgRedSatisfies the following inequality (3):
(1-eps)||Xavg-XavgRed||2<||Xavg-XavgRed||2<(1+eps)||Xavg-XavgRed||2 (3)
p wavelength points and q wavelength points after dimensionality reduction satisfy the following inequality (4):
Figure BDA0003593597340000081
eps is the dimensionality reduction error.
In the present embodiment, eps uses a default value of 0.1.
For each spectrum, a random dimensionality reduction is performed, which is calculated as represented by equation (5):
XvalRedi=P*Xvali (5)
wherein: xvalRediIs the reduced spectrum matrix XvalRedThe ith element of (1), XvaliIs a spectral matrix X of the dimension to be reducedvalThe ith element of (1).
The dimensionality reduction from p wavelength points to q wavelength points can be realized through the formula (5), and the characteristics of the data are greatly maintained.
Step S5, based on the reduced dimension spectrum matrix XvalRedAnd establishing an artificial neural network prediction model.
And establishing an artificial neural network prediction model by using the correction set. The artificial neural network model includes, but is not limited to, a multilayer perceptron prediction model, a back propagation neural network prediction model, a convolutional neural network prediction model, and the like.
A Multilayer Perceptron (MLP) is a feedforward artificial neural network model that maps multiple input data sets onto a single output data set.
The convolutional neural network has strong capability of extracting features, and any nonlinear mapping from input to output can be realized by using the convolutional neural network, so that the problem that the nonlinear relation cannot be reflected by a partial least square method is solved.
In this embodiment, a two-dimensional convolutional neural network is used to build an analysis model, so as to obtain a predicted value of quantitative analysis.
Thus, the step S5 further includes:
step S51, reducing the dimension of the spectrum matrix XvalRediImporting an input layer of two-dimensional convolution, calculating by two layers of convolution layers, a weight and an activation function and a pooling layer, and transmitting to an output layer after calculating by a plurality of times of convolution layers and pooling layers;
and step S52, comparing the predicted value obtained by the output layer with the sample expected value, if an error exists between the predicted value and the sample expected value, returning to step S51 to adjust the weight, and continuously adjusting the weight until the difference between the predicted value and the sample expected value reaches a first threshold value.
The first threshold is a preset minimum value or a minimum value.
And step S6, verifying the artificial neural network prediction model established in the step S5 by adopting a verification set.
And (5) carrying out verification set inspection on the artificial neural network prediction model established in the step (S5) by adopting a verification set, and calculating a prediction standard deviation Rmsep according to a formula (6), wherein the corresponding expression is as follows:
Figure BDA0003593597340000091
where m is the number of spectra in the validation set, yi,actual1To verify the measurement of the ith spectrum, yi,predicted1To verify the predicted value of the ith spectrum.
Performing cross check on the artificial neural network prediction model established in the step S5 by using a correction set, and calculating a cross validation standard deviation Rmescv according to a formula (7), wherein a corresponding expression is as follows:
Figure BDA0003593597340000092
where n is the number of spectra in the calibration set, yi,actual2To correct the measured value of the ith spectrum, yi,predicted2Predicted values of the ith spectrum are set for correction.
And (4) checking the model by using the spectral true measured value of the verification set, judging whether the prediction accuracy requirement is met, if not, returning to the step S1 to restart the modeling process, and if so, entering the step S7 to apply the model.
And S7, carrying out quantitative analysis on the input near infrared spectrum based on the artificial neural network prediction model checked in the step S6, and outputting a corresponding physicochemical property predicted value.
The model checked in step S6 meets the prediction accuracy requirement, and can be used for quantitative analysis of actual infrared spectrum. And (4) importing the near infrared spectrum data to be analyzed into the model, and outputting the corresponding physicochemical property predicted value.
While, for purposes of simplicity of explanation, the methodologies are shown and described as a series of acts, it is to be understood and appreciated that the methodologies are not limited by the order of acts, as some acts may, in accordance with one or more embodiments, occur in different orders and/or concurrently with other acts from that shown and described herein or not shown and described herein, as would be understood by one skilled in the art.
The near infrared spectrum quantitative analysis dimensionality reduction method based on the random projection algorithm reduces dimensionality by Gaussian random projection without wavelength selection, solves the problems that the prior art needs a large amount of manual experience intervention in the wavelength selection process and consumes a large amount of time, and is complex and simple.
Meanwhile, the difficulty that the traditional method cannot fit nonlinearity is overcome by means of the excellent nonlinearity fitting capability of the neural network model and the strong feature extraction capability of the two-dimensional convolution neural network.
Therefore, the near infrared spectrum quantitative analysis dimensionality reduction method based on the stochastic projection algorithm can greatly reduce modeling time and establish a rapid and accurate near infrared quantitative analysis model while ensuring modeling precision.
The aviation kerosene near infrared spectrum and data corresponding to a laboratory analysis report are used as experimental objects, and the near infrared spectrum quantitative analysis dimension reduction method based on the stochastic projection algorithm provided by the invention is specifically described in detail.
And step S1, acquiring a batch of near infrared spectrums and corresponding physicochemical property values thereof.
The specific process is as follows:
obtaining a spectrogram of 52 samples by a near-infrared spectrometer, wherein the number p of wavelength points of the spectrum is 2074, and the spectrum of the sample corresponds to the physicochemical property value reported by laboratory analysis, and the original spectrum of the sample is shown in fig. 2.
Step S2, selecting 36 spectra with strong spectral representativeness from the sample set as a correction set by adopting a K-S algorithm based on Euclidean distance, and calculating an average spectrum xavg
And step S3, performing three kinds of preprocessing on the near infrared spectrum simultaneously to obtain a spectrum matrix.
The specific process is as follows:
the near infrared spectrum is converted to a row matrix, while maximum and minimum preprocessing, first derivative and second derivative preprocessing, is used to form a 3-row spectral row matrix.
Step S4 for spectral matrix XvalCarrying out random Gaussian random projection to obtain each sample spectral matrix X after dimensionality reductionvalRed
The specific process is as follows:
by spectral matrix XvalCalculating the average spectrum XavgObtaining a Gaussian distribution transition matrix P through Gaussian random projection to obtain a dimensionality-reduced spectrum matrix XavlRedI.e. a 3-row 941 wavelength point spectral row matrix after dimensionality reduction.
Step S5, establishing a two-dimensional convolution prediction model;
the specific process is as follows:
dimension reduction of inputSpectral matrix sample XvalRediAnd introducing an input layer of the two-dimensional convolution, fitting physicochemical properties (density) through the convolution, pooling and output layer, and generating a two-dimensional convolution prediction model called the model of the method.
In particular, as a comparison, the spectral matrix X without dimensionality reduction is usedvalAnd introducing an input layer of the two-dimensional convolution, fitting the density through the convolution, pooling and output layers, and generating a two-dimensional convolution prediction model for comparison, wherein the two-dimensional convolution prediction model is called as an unreduced-dimension model.
Step S6, verifying the established model;
the specific process is as follows:
and respectively importing 16 verification samples of the verification set into the 'method model' and the 'non-dimensionality-reduction model', and respectively predicting the density of the 16 samples.
The measured values and predicted values of the verification sets of the "method model" and the "unreduced model" are shown in table 1, and the training time required for generating the model and the model evaluation results are shown in table 2.
Table 1 two model validation sets sample densitometry and predictive value results (in kg/m)3)
Figure BDA0003593597340000111
Figure BDA0003593597340000121
TABLE 2 comparison of model evaluation data of two models
Method Training time(s)) Rmsep(kg/m3) Rmsecv(kg/m3)
Method model 65 2.90 2.38
Non-dimensionality reduction model 446 2.80 2.22
As can be seen from tables 1 and 2, the method model saves 85% of training time compared with the non-dimensionality-reduced model, and only increases the prediction deviation by no more than 10%.
FIG. 3 is a block diagram of a near infrared spectroscopy quantitative analysis dimensionality reduction system based on a stochastic projection algorithm according to an embodiment of the invention. The near infrared spectroscopy quantitative analysis dimension reduction system based on the stochastic projection algorithm may include an internal communication bus 301, a processor (processor)302, a Read Only Memory (ROM)303, a Random Access Memory (RAM)304, a communication port 305, and a hard disk 307. The internal communication bus 301 can realize data communication among the components of the near infrared spectrum quantitative analysis dimensionality reduction system based on a stochastic projection algorithm. Processor 302 may make the determination and issue a prompt. In some embodiments, processor 302 may be comprised of one or more processors.
The communication port 305 can realize data transmission and communication between the near infrared spectrum quantitative analysis dimensionality reduction system based on the stochastic projection algorithm and an external input/output device. In some embodiments, a stochastic projection algorithm based near infrared spectroscopy dimension reduction system may send and receive information and data from the network through the communication port 305. In some embodiments, the stochastic projection algorithm based near infrared spectroscopy quantitative analysis dimension reduction system may transmit and communicate data between the external input/output device and the input/output terminal 306 in a wired manner.
The stochastic projection algorithm based near infrared spectroscopy dimension reduction system may also include various forms of program storage units and data storage units, such as a hard disk 307, Read Only Memory (ROM)303 and Random Access Memory (RAM)304, capable of storing various data files for computer processing and/or communication use, and possibly program instructions for execution by the processor 302. The processor 302 executes these instructions to implement the main parts of the method. The results of the processing by the processor 302 are communicated to an external output device via the communication port 305 for display on a user interface of the output device.
For example, the implementation process file of the above-mentioned near infrared spectroscopy quantitative analysis dimension reduction method based on the stochastic projection algorithm may be a computer program, stored in the hard disk 307, and recorded in the processor 302 for execution, so as to implement the method of the present application.
When the implementation process file of the near infrared spectrum quantitative analysis dimensionality reduction method based on the stochastic projection algorithm is a computer program, the implementation process file can also be stored in a computer readable storage medium to be used as a product. For example, computer-readable storage media can include but are not limited to magnetic storage devices (e.g., hard disk, floppy disk, magnetic strips), optical disks (e.g., Compact Disk (CD), Digital Versatile Disk (DVD)), smart cards, and flash memory devices (e.g., electrically Erasable Programmable Read Only Memory (EPROM), card, stick, key drive). In addition, various storage media described herein can represent one or more devices and/or other machine-readable media for storing information. The term "machine-readable medium" can include, without being limited to, wireless channels and various other media (and/or storage media) capable of storing, containing, and/or carrying code and/or instructions and/or data.
Compared with the prior art, the invention provides a near infrared spectrum quantitative analysis dimensionality reduction method and system based on a random projection algorithm, and the method and system have the following beneficial effects:
1) complex wavelength selection is not needed, the reliability and the accuracy of the model are ensured, the requirements on technical personnel and the modeling complexity are reduced, and the improvement of a near-infrared quantitative modeling method are promoted;
2) the dimension reduction is realized by using a Gaussian random projection method, so that the loss of information is avoided and the spatial dimension is reduced while sufficient spectral information is extracted, thereby reducing the processing data volume of modeling;
3) meanwhile, a plurality of typical preprocessing methods are used, so that the spectral noise is effectively reduced, and a solid foundation is laid for subsequent modeling.
As used in this application and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" are intended to cover only the explicitly identified steps or elements as not constituting an exclusive list and that the method or apparatus may comprise further steps or elements.
Those of skill in the art would understand that information, signals, and data may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits (bits), symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The various illustrative logical modules, and circuits described in connection with the embodiments disclosed herein may be implemented or performed with a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
In one or more exemplary embodiments, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software as a computer program product, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a web site, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk (disk) and disc (disc), as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks (disks) usually reproduce data magnetically, while discs (discs) reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The embodiments described above are provided to enable persons skilled in the art to make or use the invention and that modifications or variations can be made to the embodiments described above by persons skilled in the art without departing from the inventive concept of the present invention, so that the scope of protection of the present invention is not limited by the embodiments described above but should be accorded the widest scope consistent with the innovative features set forth in the claims.

Claims (14)

1. A near infrared spectrum quantitative analysis dimensionality reduction method based on a stochastic projection algorithm is characterized by comprising the following steps:
step S1, acquiring near infrared spectrum xvalAnd corresponding physicochemical property value yvalAs a sample set;
step S2, dividing the sample set into a correction set and a verification set, and according to the near infrared spectrum x of the correction setvalCalculating the average spectrum xavg
Step S3, near infrared spectrum x of correction setvalIs pretreated to obtainSpectral matrix XvalAverage spectrum x for the calibration setavgPreprocessing to obtain an average spectrum matrix Xavg
Step S4, based on the Gaussian random projection algorithm, the spectrum matrix X of the correction setvalCarrying out random dimensionality reduction projection to obtain a dimensionality reduced spectrum matrix XvalRed
Step S5, based on the spectrum matrix X after dimension reductionvalRedEstablishing an artificial neural network prediction model;
step S6, checking the artificial neural network prediction model established in the step S5 by adopting a verification set;
and S7, carrying out quantitative analysis on the input near infrared spectrum based on the artificial neural network prediction model checked in the step S6, and outputting a corresponding physicochemical property predicted value.
2. The method for quantitative analysis and dimension reduction of near infrared spectrum based on stochastic projection algorithm as claimed in claim 1, wherein in step S2, the average spectrum xavgThe corresponding expression is:
Figure FDA0003593597330000011
wherein n is the number of near infrared spectrums, xvaliIs the ith spectrum.
3. The method for quantitative analysis and dimension reduction of near infrared spectroscopy based on stochastic projection algorithm of claim 1, wherein the step S2 is to divide the sample set into a correction set and a verification set, further comprising:
and selecting m spectra from the sample set as a correction set and using the rest samples as a verification set by adopting a K-S algorithm based on Euclidean distance or an SPXY algorithm based on property variables.
4. The method for quantitative analysis and dimension reduction of near infrared spectrum based on stochastic projection algorithm according to claim 1, wherein the preprocessing in step S3 comprises: first derivative, second derivative and maximum-minimum normalization.
5. The method for quantitative analysis and dimension reduction in near infrared spectroscopy based on stochastic projection algorithm according to claim 1, wherein the step S3 further comprises:
for near infrared spectrum xvalSimultaneously carrying out various pretreatments to obtain a spectrum matrix Xval
For average spectrum xavgSimultaneously carrying out various pretreatments to obtain an average spectrum matrix Xavg
6. The method for quantitative analysis and dimension reduction in near infrared spectroscopy based on stochastic projection algorithm according to claim 1, wherein the step S4 further comprises:
step S41, according to the average spectrum matrix XavgObtaining a Gaussian random projection transition matrix P;
step S42, based on the transition matrix P of Gaussian random projection, for the spectral matrix XvalCarrying out random dimensionality reduction projection to obtain a dimensionality reduced spectrum matrix XvalRed
7. The method for quantitative analysis and dimension reduction in near infrared spectroscopy based on stochastic projection algorithm according to claim 6, wherein the step S41 further comprises:
averaging the spectral matrix X of p wavelength pointsavgCarrying out random dimensionality reduction projection to obtain a dimensionality reduced average spectrum matrix X of q wavelength pointsavgRed
According to the expression XavgRed=P*XavgAnd solving a Gaussian random projection transition matrix P.
8. The stochastic projection algorithm-based near infrared spectroscopic quantitative analysis dimension reduction method of claim 7, wherein the average spectral matrix X isavgAnd the average spectrum matrix X after dimensionality reductionavgRedThe following inequalities are satisfied:
(1-eps)||Xavg-XavgRed||2<||Xavg-XavgRed||2<(1+eps)||Xavg-XavgRed||2
the p wavelength points and the q wavelength points after dimensionality reduction satisfy the following inequality:
Figure FDA0003593597330000021
wherein eps is dimension reduction error.
9. The method for quantitative analysis and dimension reduction in near infrared spectroscopy based on stochastic projection algorithm according to claim 1, wherein the artificial neural network prediction model of step S5 further comprises:
a multi-layer perceptron prediction model, a back propagation neural network prediction model, and a convolutional neural network prediction model.
10. The stochastic projection algorithm-based near infrared spectroscopy quantitative analysis dimension reduction method according to claim 1, wherein the artificial neural network prediction model is a two-dimensional convolution prediction model;
the step S5 further includes:
step S51, introducing the spectrum matrix after dimensionality reduction into an input layer of two-dimensional convolution, calculating through two layers of convolution layers, a weight and activation function and a pooling layer, and transmitting the spectrum matrix to an output layer after calculating through the convolution layers and the pooling layer for multiple times;
and step S52, comparing the predicted value obtained by the output layer with the sample expected value, and if an error exists between the predicted value and the sample expected value, returning to step S51 to adjust the weight until the difference between the predicted value and the sample expected value reaches a first threshold value.
11. The method for quantitative analysis and dimension reduction in near infrared spectroscopy based on stochastic projection algorithm according to claim 1, wherein the step S6 further comprises:
and (4) carrying out verification set inspection on the artificial neural network prediction model established in the step (S5) by adopting a verification set, and calculating a prediction standard deviation Rmsep, wherein the corresponding expression is as follows:
Figure FDA0003593597330000031
where m is the number of spectra in the validation set, yi,actual1To verify the measurement of the ith spectrum, yi,predicted1To verify the predicted value of the ith spectrum.
12. The method for quantitative analysis and dimension reduction in near infrared spectroscopy based on stochastic projection algorithm according to claim 1, wherein the step S6 further comprises:
performing cross check on the artificial neural network prediction model established in the step S5 by using a correction set, and calculating a cross validation standard deviation Rmescv, wherein a corresponding expression is as follows:
Figure FDA0003593597330000041
where n is the number of spectra in the calibration set, yi,actual2To correct the measured value of the ith spectrum, yi,predicted2Predicted values of the ith spectrum are set for correction.
13. A near infrared spectrum quantitative analysis dimensionality reduction system based on a stochastic projection algorithm comprises:
a memory for storing instructions executable by the processor;
a processor for executing the instructions to implement the method of any one of claims 1-12.
14. A computer readable medium having computer instructions stored thereon, wherein the computer instructions, when executed by a processor, perform the method of any of claims 1-12.
CN202210385752.9A 2022-04-13 2022-04-13 Near infrared spectrum quantitative analysis dimensionality reduction method and system based on stochastic projection algorithm Pending CN114676792A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210385752.9A CN114676792A (en) 2022-04-13 2022-04-13 Near infrared spectrum quantitative analysis dimensionality reduction method and system based on stochastic projection algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210385752.9A CN114676792A (en) 2022-04-13 2022-04-13 Near infrared spectrum quantitative analysis dimensionality reduction method and system based on stochastic projection algorithm

Publications (1)

Publication Number Publication Date
CN114676792A true CN114676792A (en) 2022-06-28

Family

ID=82078531

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210385752.9A Pending CN114676792A (en) 2022-04-13 2022-04-13 Near infrared spectrum quantitative analysis dimensionality reduction method and system based on stochastic projection algorithm

Country Status (1)

Country Link
CN (1) CN114676792A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115795225A (en) * 2022-12-09 2023-03-14 四川威斯派克科技有限公司 Method and device for screening near infrared spectrum correction set
CN116818703A (en) * 2023-06-28 2023-09-29 山东大学 Method for predicting concentration of hyaluronic acid solution based on near infrared spectrum analysis

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115795225A (en) * 2022-12-09 2023-03-14 四川威斯派克科技有限公司 Method and device for screening near infrared spectrum correction set
CN115795225B (en) * 2022-12-09 2024-01-23 四川威斯派克科技有限公司 Screening method and device for near infrared spectrum correction set
CN116818703A (en) * 2023-06-28 2023-09-29 山东大学 Method for predicting concentration of hyaluronic acid solution based on near infrared spectrum analysis
CN116818703B (en) * 2023-06-28 2024-02-02 山东大学 Method for predicting concentration of hyaluronic acid solution based on near infrared spectrum analysis

Similar Documents

Publication Publication Date Title
Cozzolino et al. Interpreting and reporting principal component analysis in food science analysis and beyond
Mishra et al. New data preprocessing trends based on ensemble of multiple preprocessing techniques
CN114676792A (en) Near infrared spectrum quantitative analysis dimensionality reduction method and system based on stochastic projection algorithm
CN101010567B (en) Method for producing independent multidimensional calibrating patterns
CN110687072B (en) Calibration set and verification set selection and modeling method based on spectral similarity
CN110736707B (en) Spectrum detection optimization method for transferring spectrum model from master instrument to slave instrument
US20210247367A1 (en) Workflow-based model optimization method for vibrational spectral analysis
Fan et al. Direct calibration transfer to principal components via canonical correlation analysis
CN114611582B (en) Method and system for analyzing substance concentration based on near infrared spectrum technology
CN109283153B (en) Method for establishing quantitative analysis model of soy sauce
Renner et al. Critical review on data processing algorithms in non-target screening: challenges and opportunities to improve result comparability
Andrade‐Garda et al. Partial Least‐Squares Regression
Reis et al. Prediction of profiles in the process industries
Wei et al. Two-stage iteratively reweighted smoothing splines for baseline correction
CN109145403B (en) Near infrared spectrum modeling method based on sample consensus
Swarbrick et al. An overview of chemometrics for the engineering and measurement sciences
Tian et al. Application of NIR spectral Standardization based on principal component score evaluation in wheat Flour Crude Protein model Sharing
CN110632024B (en) Quantitative analysis method, device and equipment based on infrared spectrum and storage medium
CN113607683A (en) Automatic modeling method for near infrared spectrum quantitative analysis
Balabin et al. Universal technique for optimization of neural network training parameters: gasoline near infrared data example
CN111220565A (en) CPLS-based infrared spectrum measuring instrument calibration migration method
CN117093841B (en) Abnormal spectrum screening model determining method, device and medium for wheat transmission spectrum
Honghong et al. Transfer of near infrared calibration for gasoline octane number based on screening consistent wavelengths combined with direct standardization algorithm
CN110907570B (en) Organic matter maturity evaluation method and terminal equipment
Cao et al. Double outlyingness analysis in quantitative spectral calibration: Implicit detection and intuitive categorization of outliers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination