CN114942233A - Near infrared spectrum characteristic wavelength selection method, device, equipment and storage medium - Google Patents
Near infrared spectrum characteristic wavelength selection method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN114942233A CN114942233A CN202210474552.0A CN202210474552A CN114942233A CN 114942233 A CN114942233 A CN 114942233A CN 202210474552 A CN202210474552 A CN 202210474552A CN 114942233 A CN114942233 A CN 114942233A
- Authority
- CN
- China
- Prior art keywords
- wavelength
- near infrared
- infrared spectrum
- correlation
- points
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000002329 infrared spectrum Methods 0.000 title claims abstract description 94
- 238000010187 selection method Methods 0.000 title abstract description 16
- 230000004044 response Effects 0.000 claims abstract description 47
- 238000005259 measurement Methods 0.000 claims abstract description 30
- 238000000034 method Methods 0.000 claims abstract description 29
- 238000004458 analytical method Methods 0.000 claims abstract description 13
- 239000013598 vector Substances 0.000 claims description 56
- 239000011159 matrix material Substances 0.000 claims description 45
- 238000004364 calculation method Methods 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 13
- 238000001228 spectrum Methods 0.000 claims description 9
- 238000007781 pre-processing Methods 0.000 claims description 7
- 238000005070 sampling Methods 0.000 claims description 6
- 238000012844 infrared spectroscopy analysis Methods 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims 1
- 240000008042 Zea mays Species 0.000 description 15
- 235000002017 Zea mays subsp mays Nutrition 0.000 description 15
- 235000005824 Zea mays ssp. parviglumis Nutrition 0.000 description 14
- 235000005822 corn Nutrition 0.000 description 14
- 230000000875 corresponding effect Effects 0.000 description 12
- 230000003595 spectral effect Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 6
- 238000007405 data analysis Methods 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 4
- 230000002596 correlated effect Effects 0.000 description 4
- 238000012845 near infrared spectroscopy analysis Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000010200 validation analysis Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- IRGKJPHTQIWQTD-UHFFFAOYSA-N 2,7-dibromopyrene-1,3,6,8-tetrone Chemical compound O=C1C(Br)C(=O)C2=CC=C3C(=O)C(Br)C(=O)C4=CC=C1C2=C43 IRGKJPHTQIWQTD-UHFFFAOYSA-N 0.000 description 1
- UFHFLCQGNIYNRP-UHFFFAOYSA-N Hydrogen Chemical compound [H][H] UFHFLCQGNIYNRP-UHFFFAOYSA-N 0.000 description 1
- 235000016383 Zea mays subsp huehuetenangensis Nutrition 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 230000005670 electromagnetic radiation Effects 0.000 description 1
- 238000003912 environmental pollution Methods 0.000 description 1
- 238000013210 evaluation model Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 229910052739 hydrogen Inorganic materials 0.000 description 1
- 239000001257 hydrogen Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 235000009973 maize Nutrition 0.000 description 1
- 239000002689 soil Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/17—Systems in which incident light is modified in accordance with the properties of the material investigated
- G01N21/25—Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
- G01N21/31—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
- G01N21/35—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
- G01N21/359—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using near infrared light
Abstract
The invention is suitable for the technical field of near infrared spectrum analysis, and provides a near infrared spectrum characteristic wavelength selection method, a device, equipment and a storage medium, wherein the method comprises the following steps: obtaining variables of each wavelength point to be selected in the near infrared spectrum data; selecting a first wavelength point of which the correlation with the response variable is greater than a first preset threshold value from each wavelength point variable according to correlation measurement information between each wavelength point variable and the response variable; and selecting a second wavelength point from the first wavelength points, wherein the correlation between the second wavelength point and other first wavelength points is less than a second preset threshold value according to the correlation measurement information among the first wavelength points. The first wavelength points with larger correlation with the response variable are selected, and the second wavelength points with smaller correlation with other first wavelength points are selected from the first wavelength points, so that the redundancy of data is reduced, the correlation among characteristic wavelength variables is reduced, the problem of multiple collinearity among the variables is avoided, and the precision of a subsequently established model is improved.
Description
Technical Field
The invention belongs to the technical field of near infrared spectrum analysis, and particularly relates to a near infrared spectrum characteristic wavelength selection method, device, equipment and storage medium.
Background
The near infrared spectrum is electromagnetic radiation wave between visible light and middle infrared, characteristic information of hydrogen-containing groups of organic molecules in a sample can be obtained by scanning the near infrared spectrum of the sample, and the analysis of the sample by utilizing the near infrared spectrum technology has the advantages of convenience, rapidness, high efficiency, accuracy, lower cost, no damage to the sample, no consumption of chemical reagents, no environmental pollution and the like, so the technology is favored by more and more people.
However, when a sample is analyzed by using a near infrared spectrum analysis technology, modeling analysis of high-dimensional data is involved, which often involves a large number of characteristic variables, and a large amount of redundancy exists in the characteristic variables, so that the characteristic wavelength of the near infrared spectrum generally needs to be selected before the modeling analysis is performed.
The existing near infrared spectrum characteristic wavelength selection method is commonly used by a continuous projection algorithm (SPA), but when the continuous projection algorithm is used for selecting characteristic wavelength variables for modeling, the accuracy of the model is low.
Disclosure of Invention
The embodiment of the invention aims to provide a near infrared spectrum characteristic wavelength selection method, and aims to solve the problem that the accuracy of a model established by a selected characteristic wavelength variable is low in the existing near infrared spectrum characteristic wavelength selection method.
The embodiment of the invention is realized by that the method comprises the following steps: acquiring variables of each wavelength point to be selected in the near infrared spectrum data;
selecting a first wavelength point with the correlation with the response variable larger than a first preset threshold value from the wavelength point variables according to the correlation measurement information between the wavelength point variables and the response variable;
and selecting a second wavelength point from the first wavelength points, wherein the correlation between the second wavelength point and other first wavelength points is less than a second preset threshold value according to the correlation measurement information among the first wavelength points, so that the selection of the characteristic wavelength of the near infrared spectrum is completed.
Another object of an embodiment of the present invention is to provide a near infrared spectrum characteristic wavelength selection device, including:
the acquisition module is used for acquiring various wavelength point variables to be selected in the near infrared spectrum data;
the first selection module is used for selecting a first wavelength point of which the correlation with the response variable is greater than a first preset threshold value from the wavelength point variables according to the correlation measurement information between the wavelength point variables and the response variable;
and the second selection module is used for selecting second wavelength points, the correlations of which with other first wavelength points are smaller than a second preset threshold value, from the first wavelength points according to the correlation measurement information among the first wavelength points so as to complete the selection of the characteristic wavelengths of the near infrared spectrum.
It is a further object of embodiments of the present invention to provide a computer device comprising a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, causes the processor to perform the steps of the above-mentioned near infrared spectrum characteristic wavelength selection method.
It is another object of the embodiments of the present invention to provide a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, causes the processor to execute the steps of the near infrared spectrum characteristic wavelength selection method.
The method for selecting characteristic wavelength of near infrared spectrum provided by the embodiment of the invention comprises the steps of selecting first wavelength points with larger relevance to response variables from all wavelength point variables according to relevance measurement information between all wavelength point variables and response variables after acquiring all wavelength point variables to be selected in near infrared spectrum data, then selecting second wavelength points with smaller relevance to other first wavelength points from the first wavelength points according to the relevance measurement information between all the first wavelength points, using the second wavelength points as selected characteristic wavelength variables, carrying out modeling by using the selected second wavelength points, ensuring the accuracy of the built model due to smaller relevance between the selected second wavelength points, selecting the wavelength points with larger relevance to the response variables, and removing data redundancy in the selection process, the calculated amount of the near infrared spectrum data analysis is reduced, and the response speed of the near infrared spectrum data analysis is improved.
Drawings
FIG. 1 is a flow chart of a method for selecting a characteristic wavelength of a near infrared spectrum according to an embodiment of the present invention;
FIG. 2 is a flowchart of acquiring wavelength point variables to be selected from near infrared spectral data according to an embodiment of the present invention;
FIG. 3 is a raw data near infrared spectrum of a corn target sample provided by an embodiment of the present invention;
FIG. 4 is a graph of a pre-processed NIR spectrum according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a selected first wavelength point according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating a finally selected second wavelength point according to an embodiment of the present invention;
FIG. 7 shows a PLS analysis model Full-PLS constructed from pre-processed near infrared spectra data according to an embodiment of the present invention;
FIG. 8 shows a PLS analysis model CC-PLS constructed from data after selection of a first wavelength point according to an embodiment of the invention;
FIG. 9 is a TSCA-PLS analysis model established based on the finally selected characteristic wavelength data according to an embodiment of the present invention;
FIG. 10 is a TSCA-MLR model established based on characteristic wavelength variable data finally selected by the present solution and an SPA-MLR model established based on characteristic wavelength variable data selected by the continuous projection algorithm according to the embodiment of the present invention;
fig. 11 is a block diagram of a near infrared spectrum characteristic wavelength selection device according to an embodiment of the present invention;
FIG. 12 is a block diagram showing an internal configuration of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
It will be understood that, as used herein, the terms "first," "second," and the like may be used herein to describe various elements, but these elements should not be limited by these terms unless otherwise specified. These terms are only used to distinguish one element from another. For example, a first xx script may be referred to as a second xx script, and similarly, a second xx script may be referred to as a first xx script, without departing from the scope of the present application.
As shown in fig. 1, in an embodiment, a near infrared spectrum characteristic wavelength selection method is provided, and the near infrared spectrum characteristic wavelength selection method may specifically include the following steps:
and S202, acquiring variables of each wavelength point to be selected in the near infrared spectrum data.
In the embodiment of the present application, the variable of each wavelength point to be selected in the obtained near infrared spectrum data is not limited, for example, a computer device to which the near infrared spectrum characteristic wavelength selection method in the present application is applied may be directly obtained from other devices or a storage device, as shown in fig. 2, and may also be obtained through the following steps:
step S302, obtaining original near infrared spectrum data of a target to be subjected to infrared spectrum analysis.
In the embodiment of the present application, a target to be subjected to infrared spectroscopic analysis is not limited, for example, the target to be subjected to near infrared spectroscopic analysis may be corn or soil, and the present embodiment takes the near infrared spectroscopic analysis target as corn as an example, 80 corn sample data are provided, a wavelength range is 1100 to 2498nm, a sampling interval is 2nm, and 700 wavelength points are counted in total, as shown in fig. 3, the target to be subjected to near infrared spectroscopic analysis is an original data near infrared spectrogram of a corn target sample, and when the target to be subjected to near infrared spectroscopic analysis is corn, a moisture content of the corn can be used as a response variable to select a characteristic wavelength (variable).
And S304, preprocessing the original near infrared spectrum data to enable overlapping peaks in the original near infrared spectrum data to be highlighted, and obtaining the preprocessed near infrared spectrum data.
In the embodiment of the present application, the main purpose of preprocessing the original near infrared spectrum data is to correct the spectrum baseline, eliminate the translation of the baseline in the spectrum, increase the spectrum resolution, and improve the signal-to-noise ratio of the spectrum, so that the overlapped peaks in the original near infrared spectrum data are highlighted, and the specific method for preprocessing the original near infrared spectrum data in this embodiment is not limited, for example, the original spectrum data may be preprocessed by using a Savitzky-Golay (S-G) first-order derivation method with a window size of 13, after the preprocessing, the overlapped peaks in the original spectrum are highlighted, the baseline shift is improved, and the preprocessed near infrared spectrum data is shown in fig. 4.
And S306, acquiring the variable of each wavelength point to be selected according to the preprocessed near infrared spectrum data and preset sampling intervals.
In the embodiment of the application, the specific size of the preset sampling interval is not limited, for example, a sampling interval of 2nm may be selected, the wavelength of the near infrared spectrum data of the corn sample data is in a wavelength range of 1100 to 2498nm, and when the sampling interval of 2nm is adopted, 688 wavelength points can be obtained after preprocessing of 700 wavelength points in the original data.
Step S204, according to the correlation measurement information between each wavelength point variable and the response variable, selecting a first wavelength point from each wavelength point variable, wherein the correlation between the first wavelength point and the response variable is greater than a first preset threshold value.
In the embodiment of the present application, the correlation measurement information between each wavelength point variable and the response variable refers to the correlation measurement information between 688 wavelength points (variables) obtained through pretreatment and the moisture content (response variable) of the corn. In an embodiment, step S204 may specifically include the following steps:
step S402, calculating the near infrared spectrum according to a preset correlation measurement index calculation formulaMatrix X n×p Each column vector and response variable Y n×1 Determining a correlation metric rho between each wavelength point variable and the response variable j (j ═ 1, 2.. times, p), where n is the number of samples targeted for infrared spectroscopic analysis and p is the number of wavelength point variables corresponding to each sample.
In the embodiment of the present application, a preset correlation metric calculation formula is not limited, for example, the correlation metric may use a correlation coefficient absolute value or an angle cosine absolute value, but is not limited thereto, for example, when the correlation coefficient is used to determine a value metric, for two vectors A, B with a length N, the correlation metric calculation formula is as follows:
where A and B are two vectors of length N, μ A Is the mean, σ, of the vector A A Is the standard deviation of vector A, μ B Is the mean, σ, of the vector B B Is the standard deviation of vector B.
When using the absolute value measure of cosine of the included angle, the correlation metric index is calculated as follows for two vectors A, B with length N:
wherein A and B are two vectors of length n, A i Is the i-th element value of vector A, B i Is the ith element value of vector B.
In the embodiment of the present application, the near-infrared spectrum matrix X may be calculated by selecting a correlation metric index calculation formula corresponding to a correlation coefficient absolute value or an included angle cosine absolute value n×p Each column vector and response variable Y of n×1 The present embodiment can utilize the formulaCalculated as an example, the near infrared spectral matrix X n×p And a response variable matrix Y n×1 All can be regarded as vectors with the length of n, and can approximate a near infrared spectrum matrix X n×p And a response variable matrix Y n×1 Respectively and correspondingly substituting the vectors A and B in the correlation measurement index calculation formula to obtain the correlation measurement index rho between each wavelength point variable and the response variable j (j ═ 1, 2.. times, p), where the near infrared spectral matrix X n×p Forming a matrix when the number of wavelength point variables corresponding to each target sample is P for n samples of targets to be subjected to infrared spectrum analysis, and responding to the variable matrix Y n×1 Each component of which corresponds to a response variable of a target sample.
In this embodiment of the application, for example, if there are 80 corn sample data, a characteristic wavelength variable may be selected from near infrared spectrum data of the 80 corn sample data for modeling, certainly, a part of the 80 corn sample data may be selected for modeling, and other samples may be used for verification or prediction 48×688 And the response variable matrix formed by 48 corn samples is Y 48×1 。
Step S404, according to the first preset threshold t 1 Determining ρ j Is greater than t 1 The matrix Z corresponding to the element(s), i.e. the near infrared spectrum matrix corresponding to the first wavelength point, is:
Z n×m ={Z j is the jth column | ρ of X j >t 1 },m<p。
In the embodiment of the present application, the specific numerical value of the first preset threshold is not limited in this embodiment, for example, the first preset threshold t may be selected 1 Is 0.4, and then measures the correlation between each wavelength point variable and the response variable obtained in the previous step to obtain an index rho j And t 1 A comparison is made. In general rho j The closer the value is to 1, the stronger the correlation between the jth column vector representing the near infrared spectral matrix X and the response variable Y, whereas ρ is j The closer the value is to 0The lower the degree of correlation between the two quantities, from p j Is greater than t 1 The elements of (a) form the matrix Z, i.e. the column vector with the stronger correlation between the infrared spectral matrix and the response variable is selected, assumed to be from p j M are selected to be larger than t 1 Elements, the m being greater than t 1 Rho of j The corresponding wavelength point is the first wavelength point, and the near infrared spectrum matrix corresponding to the selected first wavelength point is represented as Z n×m . As shown in fig. 5, the first wavelength point selected in this embodiment is schematically illustrated.
Step S206, selecting a second wavelength point from the first wavelength points, wherein the correlation between the second wavelength point and other first wavelength points is smaller than a second preset threshold value, according to the correlation measurement information among the first wavelength points, and completing the selection of the characteristic wavelength of the near infrared spectrum.
In an embodiment of the present application, step S206 may specifically include the following steps:
step S502, calculating a matrix Z according to the preset correlation measurement index calculation formula n×m Is provided with a correlation matrix R between column vectors m×m Wherein R is m×m Value of element (1) r if =r fi ,r ii =1,(i,f=1,2,…,m)。
In the embodiment of the present application, the calculation formula of the preset correlation metric index is not limited, for example, the correlation metric index may also be measured using a correlation coefficient absolute value or an angle cosine absolute value, and when the correlation coefficient is used to determine the value measurement, the calculation formula of the correlation metric index is specifically the above formula:
when the absolute value of the cosine of the included angle is used to measure the correlation measurement index, the calculation formula is specifically the formula:the meaning of each parameter in the formula is not described herein.
In the embodiment of the present application, the matrix Z n×m A near infrared spectrum matrix corresponding to the first wavelength point selected in the above step, a correlation matrix R m×m Finger matrix Z n×m Wherein the matrix formed by the correlation information between the column vectors, for example, the near infrared spectrum matrix corresponding to the first wavelength point selected in the above step is Z 48×519 The correlation matrix between the column vectors is R 519×519 When the matrix Z is combined n×m When any column vector is correlated with other column vectors, repeated calculation exists between two column vectors, for example, when the 2 nd column vector is calculated to be correlated with other column vectors, correlation calculation between the 2 nd column vector and the 7 th column vector exists, meanwhile, when the 7 th column vector is calculated to be correlated with other column vectors, correlation calculation between the 2 nd column vector and the 7 th column vector also exists, so that R m×m Value of element (1) r if =r fi (i, f ═ 1,2, …, m). When the correlation between the 2 nd column and the 2 nd column vector is calculated, the correlation is completely correlated, and the correlation metric index between the correlation metrics is 1, so r ii =1。
Step S504, calculate the square matrix R m×m Obtaining a mean vector mu and a standard deviation vector sigma respectively according to the mean value and the standard deviation of each column, wherein the element value mu in the mean vector mu i Is a square matrix R m×m Average value of elements in each column, element value sigma in standard deviation vector sigma i Is a square matrix R m×m Standard deviation of each column element in (1).
In the embodiments of the present application, the mean vector μ is the matrix R m×m The mean value of each column constitutes a vector, and the standard deviation vector sigma is a matrix R m×m The standard deviation of each column constitutes a vector, wherein the element values μ in the mean vector μ i Is a square matrix R m×m Average value of each column of data (divided by diagonal line data), element value σ in standard vector σ i Is a square matrix R m×m Standard deviation of the columns of data (divided by the diagonal data).
Step S506, according to a second preset threshold value, Z is determined n×m The matrix S corresponding to the element smaller than the second preset threshold value, that is, the near infrared spectrum matrix corresponding to the second wavelength point is:
S n×k ={s i is Z n×m I column of (2) | mu i <t μ andσ i <t σ ,i=1,2,…,m},k<m, wherein the second preset threshold comprises t μ And t σ 。
In the embodiment of the present application, the second threshold includes a threshold t of the mean value μ And a threshold t of standard deviation σ The present embodiment is on t μ And t σ The specific value of (a) is not limited, and can be determined by reference to an experiment, for example, t can be made to be μ And t σ Are all 0.4, and in general, the threshold t μ And t σ The lower the value, the lower the correlation between the selected characteristic variables, but too low t μ And t σ The value of which may miss some wavelength points that play a key role in modeling, and then Z is selected n×m The mean value and standard deviation of each column in the table are respectively compared with a threshold value t μ And t σ Comparing, selecting elements with mean value and standard deviation smaller than preset threshold, and obtaining near infrared spectrum matrix S corresponding to second wavelength point of each target sample n×k Thereby completing the selection of the characteristic wavelength of the near infrared spectrum.
In the embodiment of the present application, for example, when the K value at the selected point is 13, 13 second wavelength points are selected, as shown in fig. 6, and 13 points in fig. 6 are the finally selected characteristic wavelengths.
In the embodiment of the present application, S may also be established 48×13 And (3) obtaining an MLR model of the concentration vector y, and evaluating the prediction effect of the model, wherein PLS analysis models are respectively established for data after near infrared spectrum pretreatment of the corn sample, data after selection of the first wavelength point and data after selection of the characteristic wavelength as shown in the following figures 7-9, so as to respectively obtain Full-PLS, CC-PLS and TSCA-PLS. The number of best major factors for Full-PLS is 6, the number of best major factors for CC-PLS is 7, and the number of best major factors for TSCA-PLS is 8. In addition, 2 TSCA-MLR models (MLR models built according to characteristic wavelength variable data finally selected by the present scheme) and SPA-MLR models (MLR models built according to characteristic wavelength variable data selected by the continuous projection algorithm) are built as shown in FIG. 10Models, 5 models in total. The model parameters of the 5 model validation sets and test sets (table 1) were compared to test the validity of the method. The present invention employs a model determination coefficient (R) 2 ) And Root Mean Square Error (RMSE) evaluation model, when R 2 The closer the value is to 1, the closer the RMSE is to 0, the better the fitting effect of the model is, and the higher the prediction accuracy of the model is. As shown in Table 1 below, the TSCA-MLR model outperforms the FULL spectrum FULL-PLS model in both the validation set and the prediction set. Comparing the validation set and the prediction set R in four models based on wavelength selection (CC-PLS, TSCA-PLS, SPA-MLR, TSCA-MLR) 2 And after RMSE, the TSCA-MLR model is found to perform optimally on a prediction set, and has no obvious overfitting phenomenon, while TSCA-PLS and SPA-MLR have slight overfitting phenomenon. For TSCA-PLS and TSCA-MLR models using the same wavelength selection method, the TSCA-PLS models are obviously overfitted, which shows that the co-linearity problem among modeling variables is basically eliminated through the wavelength selection result of the TSCA method, and the co-linearity among wavelengths is not required to be eliminated by using the PLS method.
TABLE 1 parameters of models under different characteristic wavelength selection methods for maize datasets
The method for selecting characteristic wavelength of near infrared spectrum provided by the embodiment of the application comprises the steps of selecting first wavelength points with larger relevance to response variables from all wavelength point variables according to relevance measurement information between all wavelength point variables and response variables after acquiring all wavelength point variables to be selected in near infrared spectrum data, then selecting second wavelength points with smaller relevance to other first wavelength points from the first wavelength points according to the relevance measurement information between all the first wavelength points, using the second wavelength points as selected characteristic wavelength variables, carrying out modeling by using the selected second wavelength points, ensuring the accuracy of the built model due to the smaller relevance between the selected second wavelength points, selecting the wavelength points with larger relevance to the response variables, and removing data redundancy in the selection process, the calculated amount of the near infrared spectrum data analysis is reduced, and the response speed of the near infrared spectrum data analysis is improved.
As shown in fig. 11, in an embodiment, a near infrared spectrum characteristic wavelength selection apparatus is provided, which may be integrated in a computer device, and specifically may include an obtaining module 610, a first selecting module 620, and a second selecting module 630.
The acquisition module 610 is configured to acquire each wavelength point variable to be selected in the near infrared spectrum data;
a first selecting module 620, configured to select, according to correlation metric information between each wavelength point variable and a response variable, a first wavelength point from each wavelength point variable, where correlation with the response variable is greater than a first preset threshold;
the second selecting module 630 is configured to select, according to the correlation metric information between the first wavelength points, a second wavelength point from the first wavelength points, where correlation with other first wavelength points is smaller than a second preset threshold, so as to complete selection of the characteristic wavelength of the near infrared spectrum.
In the embodiment of the present application, the obtaining module 610, the first selecting module 620, and the second selecting module 630 of the near-infrared characteristic wavelength selecting apparatus correspond to the steps S202, S204, and S206 in the near-infrared characteristic wavelength selecting method one to one, and for the function implementation and the related refinement, reference is made to the specific embodiment of the near-infrared characteristic wavelength selecting method, which is not described herein again.
The near-infrared characteristic wavelength selection device provided by the embodiment of the application selects the first wavelength point with larger correlation with the response variable from the variable of each wavelength point through the first selection module by arranging the first selection module and the second selection module, removes data redundancy, equivalently reduces the calculated amount for analyzing the near-infrared spectrum data, improves the response speed of near-infrared spectrum data analysis, then can select the wavelength point with smaller correlation with each other through the second selection module, reduces multiple collinearity among the variables, and further ensures the precision of a subsequent built model.
FIG. 12 is a diagram illustrating an internal structure of a computer device in one embodiment. As shown in fig. 12, the computer apparatus includes a processor, a memory, a network interface, an input device, and a display screen connected through a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program that, when executed by the processor, causes the processor to implement a near infrared spectral feature wavelength selection method. The internal memory may also have stored therein a computer program that, when executed by the processor, causes the processor to perform a near infrared spectral signature wavelength selection method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 12 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, the near infrared spectrum characteristic wavelength selection apparatus provided herein may be implemented in the form of a computer program that is executable on a computer device as shown in fig. 12. The memory of the computer device may store various program modules constituting the near infrared spectral characteristic wavelength selection apparatus, such as the acquisition module 610, the first selection module 620, and the second selection module 630 shown in fig. 11. The program modules constitute computer programs that cause the processor to perform the steps of the methods for near infrared spectral characteristic wavelength selection of the various embodiments of the present application described in the present specification.
For example, the computer device shown in fig. 12 may execute step S202 by the acquisition module 610 in the near infrared spectrum characteristic wavelength selection apparatus shown in fig. 11. The computer device may perform step S204 through the first selection module 620. The computer device may perform step S206 through the second selection module 630.
In one embodiment, a computer device is proposed, the computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
step S202, obtaining variables of each wavelength point to be selected in near infrared spectrum data;
step S204, selecting a first wavelength point with the correlation with the response variable larger than a first preset threshold value from each wavelength point variable according to the correlation measurement information between each wavelength point variable and the response variable;
step S206, selecting a second wavelength point from the first wavelength points, wherein the correlation between the second wavelength point and other first wavelength points is smaller than a second preset threshold value, according to the correlation measurement information among the first wavelength points, and completing the selection of the characteristic wavelength of the near infrared spectrum.
In one embodiment, a computer readable storage medium is provided, having a computer program stored thereon, which, when executed by a processor, causes the processor to perform the steps of:
step S202, obtaining variables of each wavelength point to be selected in near infrared spectrum data;
step S204, selecting a first wavelength point with the correlation with the response variable larger than a first preset threshold value from the wavelength point variables according to the correlation measurement information between the wavelength point variables and the response variable;
step S206, selecting a second wavelength point from the first wavelength points, wherein the correlation between the second wavelength point and other first wavelength points is smaller than a second preset threshold value, according to the correlation measurement information among the first wavelength points, and completing the selection of the characteristic wavelength of the near infrared spectrum.
It should be understood that, although the steps in the flowcharts of the embodiments of the present invention are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not limited to being performed in the exact order illustrated and, unless explicitly stated herein, may be performed in other orders. Moreover, at least a portion of the steps in various embodiments may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by a computer program, which may be stored in a non-volatile computer readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that various changes and modifications can be made by those skilled in the art without departing from the spirit of the invention, and these changes and modifications are all within the scope of the invention. Therefore, the protection scope of the present patent should be subject to the appended claims.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.
Claims (9)
1. A method for selecting a characteristic wavelength of a near infrared spectrum, the method comprising:
obtaining variables of each wavelength point to be selected in the near infrared spectrum data;
selecting a first wavelength point with the correlation with the response variable larger than a first preset threshold value from the wavelength point variables according to the correlation measurement information between the wavelength point variables and the response variable;
and selecting a second wavelength point from the first wavelength points, wherein the correlation between the second wavelength point and other first wavelength points is less than a second preset threshold value according to the correlation measurement information among the first wavelength points, so that the selection of the characteristic wavelength of the near infrared spectrum is completed.
2. The method according to claim 1, wherein the selecting a first wavelength point from the wavelength point variables, the correlation of which with the response variable is greater than a first preset threshold value, according to the correlation metric information between the wavelength point variables and the response variable comprises:
calculating a near infrared spectrum matrix X according to a preset correlation measurement index calculation formula n×p Each column vector and response variable of (2)Quantity Y n×1 Determining a correlation metric rho between each wavelength point variable and the response variable j (j ═ 1, 2.. times, p), where n is the number of samples to be subjected to infrared spectroscopic analysis, and p is the number of wavelength point variables corresponding to each sample;
according to a first preset threshold value t 1 Determining rho j Is greater than t 1 The matrix Z corresponding to the element(s), i.e. the near infrared spectrum matrix corresponding to the first wavelength point, is:
Z n×m ={Z j is the jth column | ρ of X j >t},m<p。
3. The method according to claim 2, wherein the selecting, from the first wavelength points, second wavelength points whose correlation with other first wavelength points is less than a second predetermined threshold according to the correlation metric information between the first wavelength points comprises:
calculating a matrix Z according to the preset correlation measurement index calculation formula n×m Is provided with a correlation matrix R between column vectors m×m Wherein R is m×m Value of element (1) r if =r fi ,r ii =1,(i,f=1,2,...,m);
Calculating a square matrix R m×m Obtaining a mean vector mu and a standard deviation vector sigma respectively according to the mean value and the standard deviation of each column, wherein the element value mu in the mean vector mu i Is a square matrix R m×m Average value of elements in each column, element value sigma in standard deviation vector sigma i Is a square matrix R m×m Standard deviation of the elements in each column;
according to a second preset threshold value, determining Z n×m The matrix S corresponding to the element smaller than the second preset threshold value, that is, the near infrared spectrum matrix corresponding to the second wavelength point is:
S n×k ={s i is Z n×m I column of (2) | mu i <t μ and σ i <t σ I 1,2, …, m, k < m, wherein the second predetermined threshold comprises t μ And t σ 。
4. A method according to claim 2 or 3, wherein said predetermined correlation metric is calculated by the formula:
where A and B are two vectors of length N, μ A Is the mean, σ, of the vector A A Is the standard deviation of vector A, μ B Is the mean, σ, of the vector B B Is the standard deviation of vector B;
or, the preset correlation metric index calculation formula is as follows:
5. The method as claimed in claim 1, wherein the obtaining of the variable of each wavelength point to be selected in the near infrared spectrum data comprises:
acquiring original near infrared spectrum data of a target to be subjected to infrared spectrum analysis;
preprocessing the original near infrared spectrum data to enable overlapping peaks in the original near infrared spectrum data to be highlighted, and obtaining preprocessed near infrared spectrum data;
and acquiring the variable of each wavelength point to be selected according to the preprocessed near infrared spectrum data and preset sampling intervals.
6. The method of claim 5, wherein the pre-processing of the raw NIR spectra data comprises:
and processing the original near infrared spectrum data by utilizing a Savitzky-Golay filtering fitting method.
7. A near infrared spectrum characteristic wavelength selection device, characterized in that it comprises:
the acquisition module is used for acquiring various wavelength point variables to be selected in the near infrared spectrum data;
the first selection module is used for selecting a first wavelength point with the correlation with the response variable larger than a first preset threshold value from the wavelength point variables according to the correlation measurement information between the wavelength point variables and the response variable;
and the second selection module is used for selecting second wavelength points, the correlations of which with other first wavelength points are smaller than a second preset threshold value, from the first wavelength points according to the correlation measurement information among the first wavelength points so as to complete the selection of the characteristic wavelengths of the near infrared spectrum.
8. A computer device comprising a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, causes the processor to carry out the steps of the method of selecting characteristic wavelengths for near infrared spectra according to any of claims 1 to 6.
9. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, causes the processor to carry out the steps of the method of selecting characteristic wavelengths for near infrared spectra according to any of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210474552.0A CN114942233A (en) | 2022-04-29 | 2022-04-29 | Near infrared spectrum characteristic wavelength selection method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210474552.0A CN114942233A (en) | 2022-04-29 | 2022-04-29 | Near infrared spectrum characteristic wavelength selection method, device, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114942233A true CN114942233A (en) | 2022-08-26 |
Family
ID=82908104
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210474552.0A Pending CN114942233A (en) | 2022-04-29 | 2022-04-29 | Near infrared spectrum characteristic wavelength selection method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114942233A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116026780A (en) * | 2023-03-28 | 2023-04-28 | 江西中医药大学 | Method and system for online detection of coating moisture absorption rate based on series strategy wavelength selection |
-
2022
- 2022-04-29 CN CN202210474552.0A patent/CN114942233A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116026780A (en) * | 2023-03-28 | 2023-04-28 | 江西中医药大学 | Method and system for online detection of coating moisture absorption rate based on series strategy wavelength selection |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110455722A (en) | Rubber tree blade phosphorus content EO-1 hyperion inversion method and system | |
CN107958267B (en) | Oil product property prediction method based on spectral linear representation | |
CN113049500B (en) | Water quality detection model training and water quality detection method, electronic equipment and storage medium | |
CN110503156B (en) | Multivariate correction characteristic wavelength selection method based on minimum correlation coefficient | |
Wang et al. | Near-infrared wavelength-selection method based on joint mutual information and weighted bootstrap sampling | |
CN114942233A (en) | Near infrared spectrum characteristic wavelength selection method, device, equipment and storage medium | |
Si et al. | Hierarchical temperature imaging using pseudoinversed convolutional neural network aided TDLAS tomography | |
CN114049525A (en) | Fusion neural network system, device and method for identifying gas types and concentrations | |
CN114112995A (en) | Aerosol optical characteristic data assimilation method and device based on three-dimensional variational technology | |
Ortiz-Herrero et al. | Multivariate (O) PLS regression methods in forensic dating | |
CN112990107B (en) | Hyperspectral remote sensing image underwater target detection method and device and computer equipment | |
CN114676636A (en) | Grassland area soil moisture rapid inversion method integrating vegetation and habitat characteristics | |
CN111896497B (en) | Spectral data correction method based on predicted value | |
Omidikia et al. | Uninformative variable elimination assisted by gram–Schmidt orthogonalization/successive projection algorithm for descriptor selection in QSAR | |
CN109145403B (en) | Near infrared spectrum modeling method based on sample consensus | |
CN116399836A (en) | Cross-talk fluorescence spectrum decomposition method based on alternating gradient descent algorithm | |
CN114739980B (en) | Element information prediction method, device, equipment and medium | |
CN110632024B (en) | Quantitative analysis method, device and equipment based on infrared spectrum and storage medium | |
CN112859034B (en) | Natural environment radar echo amplitude model classification method and device | |
Shan et al. | A nonlinear calibration transfer method based on joint kernel subspace | |
CN102057261B (en) | Method and apparatus for automatic calibration of spectrometers in chemometry by means of a bayes iterative estimation method | |
CN114141316A (en) | Method and system for predicting biological toxicity of organic matters based on spectrogram analysis | |
CN114398228A (en) | Method and device for predicting equipment resource use condition and electronic equipment | |
CN113609445A (en) | Multi-source heterogeneous monitoring data processing method, terminal device and readable storage medium | |
CN112884052A (en) | Method and device for extracting structural modal parameters, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |