CN114942233A - Near infrared spectrum characteristic wavelength selection method, device, equipment and storage medium - Google Patents

Near infrared spectrum characteristic wavelength selection method, device, equipment and storage medium Download PDF

Info

Publication number
CN114942233A
CN114942233A CN202210474552.0A CN202210474552A CN114942233A CN 114942233 A CN114942233 A CN 114942233A CN 202210474552 A CN202210474552 A CN 202210474552A CN 114942233 A CN114942233 A CN 114942233A
Authority
CN
China
Prior art keywords
wavelength
near infrared
infrared spectrum
correlation
points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210474552.0A
Other languages
Chinese (zh)
Inventor
陈争光
万岩
许楠
王雪
杨冬风
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Heilongjiang Bayi Agricultural University
Original Assignee
Heilongjiang Bayi Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Heilongjiang Bayi Agricultural University filed Critical Heilongjiang Bayi Agricultural University
Priority to CN202210474552.0A priority Critical patent/CN114942233A/en
Publication of CN114942233A publication Critical patent/CN114942233A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • G01N21/35Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
    • G01N21/359Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using near infrared light

Abstract

The invention is suitable for the technical field of near infrared spectrum analysis, and provides a near infrared spectrum characteristic wavelength selection method, a device, equipment and a storage medium, wherein the method comprises the following steps: obtaining variables of each wavelength point to be selected in the near infrared spectrum data; selecting a first wavelength point of which the correlation with the response variable is greater than a first preset threshold value from each wavelength point variable according to correlation measurement information between each wavelength point variable and the response variable; and selecting a second wavelength point from the first wavelength points, wherein the correlation between the second wavelength point and other first wavelength points is less than a second preset threshold value according to the correlation measurement information among the first wavelength points. The first wavelength points with larger correlation with the response variable are selected, and the second wavelength points with smaller correlation with other first wavelength points are selected from the first wavelength points, so that the redundancy of data is reduced, the correlation among characteristic wavelength variables is reduced, the problem of multiple collinearity among the variables is avoided, and the precision of a subsequently established model is improved.

Description

Near infrared spectrum characteristic wavelength selection method, device, equipment and storage medium
Technical Field
The invention belongs to the technical field of near infrared spectrum analysis, and particularly relates to a near infrared spectrum characteristic wavelength selection method, device, equipment and storage medium.
Background
The near infrared spectrum is electromagnetic radiation wave between visible light and middle infrared, characteristic information of hydrogen-containing groups of organic molecules in a sample can be obtained by scanning the near infrared spectrum of the sample, and the analysis of the sample by utilizing the near infrared spectrum technology has the advantages of convenience, rapidness, high efficiency, accuracy, lower cost, no damage to the sample, no consumption of chemical reagents, no environmental pollution and the like, so the technology is favored by more and more people.
However, when a sample is analyzed by using a near infrared spectrum analysis technology, modeling analysis of high-dimensional data is involved, which often involves a large number of characteristic variables, and a large amount of redundancy exists in the characteristic variables, so that the characteristic wavelength of the near infrared spectrum generally needs to be selected before the modeling analysis is performed.
The existing near infrared spectrum characteristic wavelength selection method is commonly used by a continuous projection algorithm (SPA), but when the continuous projection algorithm is used for selecting characteristic wavelength variables for modeling, the accuracy of the model is low.
Disclosure of Invention
The embodiment of the invention aims to provide a near infrared spectrum characteristic wavelength selection method, and aims to solve the problem that the accuracy of a model established by a selected characteristic wavelength variable is low in the existing near infrared spectrum characteristic wavelength selection method.
The embodiment of the invention is realized by that the method comprises the following steps: acquiring variables of each wavelength point to be selected in the near infrared spectrum data;
selecting a first wavelength point with the correlation with the response variable larger than a first preset threshold value from the wavelength point variables according to the correlation measurement information between the wavelength point variables and the response variable;
and selecting a second wavelength point from the first wavelength points, wherein the correlation between the second wavelength point and other first wavelength points is less than a second preset threshold value according to the correlation measurement information among the first wavelength points, so that the selection of the characteristic wavelength of the near infrared spectrum is completed.
Another object of an embodiment of the present invention is to provide a near infrared spectrum characteristic wavelength selection device, including:
the acquisition module is used for acquiring various wavelength point variables to be selected in the near infrared spectrum data;
the first selection module is used for selecting a first wavelength point of which the correlation with the response variable is greater than a first preset threshold value from the wavelength point variables according to the correlation measurement information between the wavelength point variables and the response variable;
and the second selection module is used for selecting second wavelength points, the correlations of which with other first wavelength points are smaller than a second preset threshold value, from the first wavelength points according to the correlation measurement information among the first wavelength points so as to complete the selection of the characteristic wavelengths of the near infrared spectrum.
It is a further object of embodiments of the present invention to provide a computer device comprising a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, causes the processor to perform the steps of the above-mentioned near infrared spectrum characteristic wavelength selection method.
It is another object of the embodiments of the present invention to provide a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, causes the processor to execute the steps of the near infrared spectrum characteristic wavelength selection method.
The method for selecting characteristic wavelength of near infrared spectrum provided by the embodiment of the invention comprises the steps of selecting first wavelength points with larger relevance to response variables from all wavelength point variables according to relevance measurement information between all wavelength point variables and response variables after acquiring all wavelength point variables to be selected in near infrared spectrum data, then selecting second wavelength points with smaller relevance to other first wavelength points from the first wavelength points according to the relevance measurement information between all the first wavelength points, using the second wavelength points as selected characteristic wavelength variables, carrying out modeling by using the selected second wavelength points, ensuring the accuracy of the built model due to smaller relevance between the selected second wavelength points, selecting the wavelength points with larger relevance to the response variables, and removing data redundancy in the selection process, the calculated amount of the near infrared spectrum data analysis is reduced, and the response speed of the near infrared spectrum data analysis is improved.
Drawings
FIG. 1 is a flow chart of a method for selecting a characteristic wavelength of a near infrared spectrum according to an embodiment of the present invention;
FIG. 2 is a flowchart of acquiring wavelength point variables to be selected from near infrared spectral data according to an embodiment of the present invention;
FIG. 3 is a raw data near infrared spectrum of a corn target sample provided by an embodiment of the present invention;
FIG. 4 is a graph of a pre-processed NIR spectrum according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a selected first wavelength point according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating a finally selected second wavelength point according to an embodiment of the present invention;
FIG. 7 shows a PLS analysis model Full-PLS constructed from pre-processed near infrared spectra data according to an embodiment of the present invention;
FIG. 8 shows a PLS analysis model CC-PLS constructed from data after selection of a first wavelength point according to an embodiment of the invention;
FIG. 9 is a TSCA-PLS analysis model established based on the finally selected characteristic wavelength data according to an embodiment of the present invention;
FIG. 10 is a TSCA-MLR model established based on characteristic wavelength variable data finally selected by the present solution and an SPA-MLR model established based on characteristic wavelength variable data selected by the continuous projection algorithm according to the embodiment of the present invention;
fig. 11 is a block diagram of a near infrared spectrum characteristic wavelength selection device according to an embodiment of the present invention;
FIG. 12 is a block diagram showing an internal configuration of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
It will be understood that, as used herein, the terms "first," "second," and the like may be used herein to describe various elements, but these elements should not be limited by these terms unless otherwise specified. These terms are only used to distinguish one element from another. For example, a first xx script may be referred to as a second xx script, and similarly, a second xx script may be referred to as a first xx script, without departing from the scope of the present application.
As shown in fig. 1, in an embodiment, a near infrared spectrum characteristic wavelength selection method is provided, and the near infrared spectrum characteristic wavelength selection method may specifically include the following steps:
and S202, acquiring variables of each wavelength point to be selected in the near infrared spectrum data.
In the embodiment of the present application, the variable of each wavelength point to be selected in the obtained near infrared spectrum data is not limited, for example, a computer device to which the near infrared spectrum characteristic wavelength selection method in the present application is applied may be directly obtained from other devices or a storage device, as shown in fig. 2, and may also be obtained through the following steps:
step S302, obtaining original near infrared spectrum data of a target to be subjected to infrared spectrum analysis.
In the embodiment of the present application, a target to be subjected to infrared spectroscopic analysis is not limited, for example, the target to be subjected to near infrared spectroscopic analysis may be corn or soil, and the present embodiment takes the near infrared spectroscopic analysis target as corn as an example, 80 corn sample data are provided, a wavelength range is 1100 to 2498nm, a sampling interval is 2nm, and 700 wavelength points are counted in total, as shown in fig. 3, the target to be subjected to near infrared spectroscopic analysis is an original data near infrared spectrogram of a corn target sample, and when the target to be subjected to near infrared spectroscopic analysis is corn, a moisture content of the corn can be used as a response variable to select a characteristic wavelength (variable).
And S304, preprocessing the original near infrared spectrum data to enable overlapping peaks in the original near infrared spectrum data to be highlighted, and obtaining the preprocessed near infrared spectrum data.
In the embodiment of the present application, the main purpose of preprocessing the original near infrared spectrum data is to correct the spectrum baseline, eliminate the translation of the baseline in the spectrum, increase the spectrum resolution, and improve the signal-to-noise ratio of the spectrum, so that the overlapped peaks in the original near infrared spectrum data are highlighted, and the specific method for preprocessing the original near infrared spectrum data in this embodiment is not limited, for example, the original spectrum data may be preprocessed by using a Savitzky-Golay (S-G) first-order derivation method with a window size of 13, after the preprocessing, the overlapped peaks in the original spectrum are highlighted, the baseline shift is improved, and the preprocessed near infrared spectrum data is shown in fig. 4.
And S306, acquiring the variable of each wavelength point to be selected according to the preprocessed near infrared spectrum data and preset sampling intervals.
In the embodiment of the application, the specific size of the preset sampling interval is not limited, for example, a sampling interval of 2nm may be selected, the wavelength of the near infrared spectrum data of the corn sample data is in a wavelength range of 1100 to 2498nm, and when the sampling interval of 2nm is adopted, 688 wavelength points can be obtained after preprocessing of 700 wavelength points in the original data.
Step S204, according to the correlation measurement information between each wavelength point variable and the response variable, selecting a first wavelength point from each wavelength point variable, wherein the correlation between the first wavelength point and the response variable is greater than a first preset threshold value.
In the embodiment of the present application, the correlation measurement information between each wavelength point variable and the response variable refers to the correlation measurement information between 688 wavelength points (variables) obtained through pretreatment and the moisture content (response variable) of the corn. In an embodiment, step S204 may specifically include the following steps:
step S402, calculating the near infrared spectrum according to a preset correlation measurement index calculation formulaMatrix X n×p Each column vector and response variable Y n×1 Determining a correlation metric rho between each wavelength point variable and the response variable j (j ═ 1, 2.. times, p), where n is the number of samples targeted for infrared spectroscopic analysis and p is the number of wavelength point variables corresponding to each sample.
In the embodiment of the present application, a preset correlation metric calculation formula is not limited, for example, the correlation metric may use a correlation coefficient absolute value or an angle cosine absolute value, but is not limited thereto, for example, when the correlation coefficient is used to determine a value metric, for two vectors A, B with a length N, the correlation metric calculation formula is as follows:
Figure BDA0003624786210000061
where A and B are two vectors of length N, μ A Is the mean, σ, of the vector A A Is the standard deviation of vector A, μ B Is the mean, σ, of the vector B B Is the standard deviation of vector B.
When using the absolute value measure of cosine of the included angle, the correlation metric index is calculated as follows for two vectors A, B with length N:
Figure BDA0003624786210000062
wherein A and B are two vectors of length n, A i Is the i-th element value of vector A, B i Is the ith element value of vector B.
In the embodiment of the present application, the near-infrared spectrum matrix X may be calculated by selecting a correlation metric index calculation formula corresponding to a correlation coefficient absolute value or an included angle cosine absolute value n×p Each column vector and response variable Y of n×1 The present embodiment can utilize the formula
Figure BDA0003624786210000063
Calculated as an example, the near infrared spectral matrix X n×p And a response variable matrix Y n×1 All can be regarded as vectors with the length of n, and can approximate a near infrared spectrum matrix X n×p And a response variable matrix Y n×1 Respectively and correspondingly substituting the vectors A and B in the correlation measurement index calculation formula to obtain the correlation measurement index rho between each wavelength point variable and the response variable j (j ═ 1, 2.. times, p), where the near infrared spectral matrix X n×p Forming a matrix when the number of wavelength point variables corresponding to each target sample is P for n samples of targets to be subjected to infrared spectrum analysis, and responding to the variable matrix Y n×1 Each component of which corresponds to a response variable of a target sample.
In this embodiment of the application, for example, if there are 80 corn sample data, a characteristic wavelength variable may be selected from near infrared spectrum data of the 80 corn sample data for modeling, certainly, a part of the 80 corn sample data may be selected for modeling, and other samples may be used for verification or prediction 48×688 And the response variable matrix formed by 48 corn samples is Y 48×1
Step S404, according to the first preset threshold t 1 Determining ρ j Is greater than t 1 The matrix Z corresponding to the element(s), i.e. the near infrared spectrum matrix corresponding to the first wavelength point, is:
Z n×m ={Z j is the jth column | ρ of X j >t 1 },m<p。
In the embodiment of the present application, the specific numerical value of the first preset threshold is not limited in this embodiment, for example, the first preset threshold t may be selected 1 Is 0.4, and then measures the correlation between each wavelength point variable and the response variable obtained in the previous step to obtain an index rho j And t 1 A comparison is made. In general rho j The closer the value is to 1, the stronger the correlation between the jth column vector representing the near infrared spectral matrix X and the response variable Y, whereas ρ is j The closer the value is to 0The lower the degree of correlation between the two quantities, from p j Is greater than t 1 The elements of (a) form the matrix Z, i.e. the column vector with the stronger correlation between the infrared spectral matrix and the response variable is selected, assumed to be from p j M are selected to be larger than t 1 Elements, the m being greater than t 1 Rho of j The corresponding wavelength point is the first wavelength point, and the near infrared spectrum matrix corresponding to the selected first wavelength point is represented as Z n×m . As shown in fig. 5, the first wavelength point selected in this embodiment is schematically illustrated.
Step S206, selecting a second wavelength point from the first wavelength points, wherein the correlation between the second wavelength point and other first wavelength points is smaller than a second preset threshold value, according to the correlation measurement information among the first wavelength points, and completing the selection of the characteristic wavelength of the near infrared spectrum.
In an embodiment of the present application, step S206 may specifically include the following steps:
step S502, calculating a matrix Z according to the preset correlation measurement index calculation formula n×m Is provided with a correlation matrix R between column vectors m×m Wherein R is m×m Value of element (1) r if =r fi ,r ii =1,(i,f=1,2,…,m)。
In the embodiment of the present application, the calculation formula of the preset correlation metric index is not limited, for example, the correlation metric index may also be measured using a correlation coefficient absolute value or an angle cosine absolute value, and when the correlation coefficient is used to determine the value measurement, the calculation formula of the correlation metric index is specifically the above formula:
Figure BDA0003624786210000081
when the absolute value of the cosine of the included angle is used to measure the correlation measurement index, the calculation formula is specifically the formula:
Figure BDA0003624786210000082
the meaning of each parameter in the formula is not described herein.
In the embodiment of the present application, the matrix Z n×m A near infrared spectrum matrix corresponding to the first wavelength point selected in the above step, a correlation matrix R m×m Finger matrix Z n×m Wherein the matrix formed by the correlation information between the column vectors, for example, the near infrared spectrum matrix corresponding to the first wavelength point selected in the above step is Z 48×519 The correlation matrix between the column vectors is R 519×519 When the matrix Z is combined n×m When any column vector is correlated with other column vectors, repeated calculation exists between two column vectors, for example, when the 2 nd column vector is calculated to be correlated with other column vectors, correlation calculation between the 2 nd column vector and the 7 th column vector exists, meanwhile, when the 7 th column vector is calculated to be correlated with other column vectors, correlation calculation between the 2 nd column vector and the 7 th column vector also exists, so that R m×m Value of element (1) r if =r fi (i, f ═ 1,2, …, m). When the correlation between the 2 nd column and the 2 nd column vector is calculated, the correlation is completely correlated, and the correlation metric index between the correlation metrics is 1, so r ii =1。
Step S504, calculate the square matrix R m×m Obtaining a mean vector mu and a standard deviation vector sigma respectively according to the mean value and the standard deviation of each column, wherein the element value mu in the mean vector mu i Is a square matrix R m×m Average value of elements in each column, element value sigma in standard deviation vector sigma i Is a square matrix R m×m Standard deviation of each column element in (1).
In the embodiments of the present application, the mean vector μ is the matrix R m×m The mean value of each column constitutes a vector, and the standard deviation vector sigma is a matrix R m×m The standard deviation of each column constitutes a vector, wherein the element values μ in the mean vector μ i Is a square matrix R m×m Average value of each column of data (divided by diagonal line data), element value σ in standard vector σ i Is a square matrix R m×m Standard deviation of the columns of data (divided by the diagonal data).
Step S506, according to a second preset threshold value, Z is determined n×m The matrix S corresponding to the element smaller than the second preset threshold value, that is, the near infrared spectrum matrix corresponding to the second wavelength point is:
S n×k ={s i is Z n×m I column of (2) | mu i <t μ andσ i <t σ ,i=1,2,…,m},k<m, wherein the second preset threshold comprises t μ And t σ
In the embodiment of the present application, the second threshold includes a threshold t of the mean value μ And a threshold t of standard deviation σ The present embodiment is on t μ And t σ The specific value of (a) is not limited, and can be determined by reference to an experiment, for example, t can be made to be μ And t σ Are all 0.4, and in general, the threshold t μ And t σ The lower the value, the lower the correlation between the selected characteristic variables, but too low t μ And t σ The value of which may miss some wavelength points that play a key role in modeling, and then Z is selected n×m The mean value and standard deviation of each column in the table are respectively compared with a threshold value t μ And t σ Comparing, selecting elements with mean value and standard deviation smaller than preset threshold, and obtaining near infrared spectrum matrix S corresponding to second wavelength point of each target sample n×k Thereby completing the selection of the characteristic wavelength of the near infrared spectrum.
In the embodiment of the present application, for example, when the K value at the selected point is 13, 13 second wavelength points are selected, as shown in fig. 6, and 13 points in fig. 6 are the finally selected characteristic wavelengths.
In the embodiment of the present application, S may also be established 48×13 And (3) obtaining an MLR model of the concentration vector y, and evaluating the prediction effect of the model, wherein PLS analysis models are respectively established for data after near infrared spectrum pretreatment of the corn sample, data after selection of the first wavelength point and data after selection of the characteristic wavelength as shown in the following figures 7-9, so as to respectively obtain Full-PLS, CC-PLS and TSCA-PLS. The number of best major factors for Full-PLS is 6, the number of best major factors for CC-PLS is 7, and the number of best major factors for TSCA-PLS is 8. In addition, 2 TSCA-MLR models (MLR models built according to characteristic wavelength variable data finally selected by the present scheme) and SPA-MLR models (MLR models built according to characteristic wavelength variable data selected by the continuous projection algorithm) are built as shown in FIG. 10Models, 5 models in total. The model parameters of the 5 model validation sets and test sets (table 1) were compared to test the validity of the method. The present invention employs a model determination coefficient (R) 2 ) And Root Mean Square Error (RMSE) evaluation model, when R 2 The closer the value is to 1, the closer the RMSE is to 0, the better the fitting effect of the model is, and the higher the prediction accuracy of the model is. As shown in Table 1 below, the TSCA-MLR model outperforms the FULL spectrum FULL-PLS model in both the validation set and the prediction set. Comparing the validation set and the prediction set R in four models based on wavelength selection (CC-PLS, TSCA-PLS, SPA-MLR, TSCA-MLR) 2 And after RMSE, the TSCA-MLR model is found to perform optimally on a prediction set, and has no obvious overfitting phenomenon, while TSCA-PLS and SPA-MLR have slight overfitting phenomenon. For TSCA-PLS and TSCA-MLR models using the same wavelength selection method, the TSCA-PLS models are obviously overfitted, which shows that the co-linearity problem among modeling variables is basically eliminated through the wavelength selection result of the TSCA method, and the co-linearity among wavelengths is not required to be eliminated by using the PLS method.
TABLE 1 parameters of models under different characteristic wavelength selection methods for maize datasets
Figure BDA0003624786210000101
The method for selecting characteristic wavelength of near infrared spectrum provided by the embodiment of the application comprises the steps of selecting first wavelength points with larger relevance to response variables from all wavelength point variables according to relevance measurement information between all wavelength point variables and response variables after acquiring all wavelength point variables to be selected in near infrared spectrum data, then selecting second wavelength points with smaller relevance to other first wavelength points from the first wavelength points according to the relevance measurement information between all the first wavelength points, using the second wavelength points as selected characteristic wavelength variables, carrying out modeling by using the selected second wavelength points, ensuring the accuracy of the built model due to the smaller relevance between the selected second wavelength points, selecting the wavelength points with larger relevance to the response variables, and removing data redundancy in the selection process, the calculated amount of the near infrared spectrum data analysis is reduced, and the response speed of the near infrared spectrum data analysis is improved.
As shown in fig. 11, in an embodiment, a near infrared spectrum characteristic wavelength selection apparatus is provided, which may be integrated in a computer device, and specifically may include an obtaining module 610, a first selecting module 620, and a second selecting module 630.
The acquisition module 610 is configured to acquire each wavelength point variable to be selected in the near infrared spectrum data;
a first selecting module 620, configured to select, according to correlation metric information between each wavelength point variable and a response variable, a first wavelength point from each wavelength point variable, where correlation with the response variable is greater than a first preset threshold;
the second selecting module 630 is configured to select, according to the correlation metric information between the first wavelength points, a second wavelength point from the first wavelength points, where correlation with other first wavelength points is smaller than a second preset threshold, so as to complete selection of the characteristic wavelength of the near infrared spectrum.
In the embodiment of the present application, the obtaining module 610, the first selecting module 620, and the second selecting module 630 of the near-infrared characteristic wavelength selecting apparatus correspond to the steps S202, S204, and S206 in the near-infrared characteristic wavelength selecting method one to one, and for the function implementation and the related refinement, reference is made to the specific embodiment of the near-infrared characteristic wavelength selecting method, which is not described herein again.
The near-infrared characteristic wavelength selection device provided by the embodiment of the application selects the first wavelength point with larger correlation with the response variable from the variable of each wavelength point through the first selection module by arranging the first selection module and the second selection module, removes data redundancy, equivalently reduces the calculated amount for analyzing the near-infrared spectrum data, improves the response speed of near-infrared spectrum data analysis, then can select the wavelength point with smaller correlation with each other through the second selection module, reduces multiple collinearity among the variables, and further ensures the precision of a subsequent built model.
FIG. 12 is a diagram illustrating an internal structure of a computer device in one embodiment. As shown in fig. 12, the computer apparatus includes a processor, a memory, a network interface, an input device, and a display screen connected through a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program that, when executed by the processor, causes the processor to implement a near infrared spectral feature wavelength selection method. The internal memory may also have stored therein a computer program that, when executed by the processor, causes the processor to perform a near infrared spectral signature wavelength selection method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 12 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, the near infrared spectrum characteristic wavelength selection apparatus provided herein may be implemented in the form of a computer program that is executable on a computer device as shown in fig. 12. The memory of the computer device may store various program modules constituting the near infrared spectral characteristic wavelength selection apparatus, such as the acquisition module 610, the first selection module 620, and the second selection module 630 shown in fig. 11. The program modules constitute computer programs that cause the processor to perform the steps of the methods for near infrared spectral characteristic wavelength selection of the various embodiments of the present application described in the present specification.
For example, the computer device shown in fig. 12 may execute step S202 by the acquisition module 610 in the near infrared spectrum characteristic wavelength selection apparatus shown in fig. 11. The computer device may perform step S204 through the first selection module 620. The computer device may perform step S206 through the second selection module 630.
In one embodiment, a computer device is proposed, the computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
step S202, obtaining variables of each wavelength point to be selected in near infrared spectrum data;
step S204, selecting a first wavelength point with the correlation with the response variable larger than a first preset threshold value from each wavelength point variable according to the correlation measurement information between each wavelength point variable and the response variable;
step S206, selecting a second wavelength point from the first wavelength points, wherein the correlation between the second wavelength point and other first wavelength points is smaller than a second preset threshold value, according to the correlation measurement information among the first wavelength points, and completing the selection of the characteristic wavelength of the near infrared spectrum.
In one embodiment, a computer readable storage medium is provided, having a computer program stored thereon, which, when executed by a processor, causes the processor to perform the steps of:
step S202, obtaining variables of each wavelength point to be selected in near infrared spectrum data;
step S204, selecting a first wavelength point with the correlation with the response variable larger than a first preset threshold value from the wavelength point variables according to the correlation measurement information between the wavelength point variables and the response variable;
step S206, selecting a second wavelength point from the first wavelength points, wherein the correlation between the second wavelength point and other first wavelength points is smaller than a second preset threshold value, according to the correlation measurement information among the first wavelength points, and completing the selection of the characteristic wavelength of the near infrared spectrum.
It should be understood that, although the steps in the flowcharts of the embodiments of the present invention are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not limited to being performed in the exact order illustrated and, unless explicitly stated herein, may be performed in other orders. Moreover, at least a portion of the steps in various embodiments may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by a computer program, which may be stored in a non-volatile computer readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that various changes and modifications can be made by those skilled in the art without departing from the spirit of the invention, and these changes and modifications are all within the scope of the invention. Therefore, the protection scope of the present patent should be subject to the appended claims.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (9)

1. A method for selecting a characteristic wavelength of a near infrared spectrum, the method comprising:
obtaining variables of each wavelength point to be selected in the near infrared spectrum data;
selecting a first wavelength point with the correlation with the response variable larger than a first preset threshold value from the wavelength point variables according to the correlation measurement information between the wavelength point variables and the response variable;
and selecting a second wavelength point from the first wavelength points, wherein the correlation between the second wavelength point and other first wavelength points is less than a second preset threshold value according to the correlation measurement information among the first wavelength points, so that the selection of the characteristic wavelength of the near infrared spectrum is completed.
2. The method according to claim 1, wherein the selecting a first wavelength point from the wavelength point variables, the correlation of which with the response variable is greater than a first preset threshold value, according to the correlation metric information between the wavelength point variables and the response variable comprises:
calculating a near infrared spectrum matrix X according to a preset correlation measurement index calculation formula n×p Each column vector and response variable of (2)Quantity Y n×1 Determining a correlation metric rho between each wavelength point variable and the response variable j (j ═ 1, 2.. times, p), where n is the number of samples to be subjected to infrared spectroscopic analysis, and p is the number of wavelength point variables corresponding to each sample;
according to a first preset threshold value t 1 Determining rho j Is greater than t 1 The matrix Z corresponding to the element(s), i.e. the near infrared spectrum matrix corresponding to the first wavelength point, is:
Z n×m ={Z j is the jth column | ρ of X j >t},m<p。
3. The method according to claim 2, wherein the selecting, from the first wavelength points, second wavelength points whose correlation with other first wavelength points is less than a second predetermined threshold according to the correlation metric information between the first wavelength points comprises:
calculating a matrix Z according to the preset correlation measurement index calculation formula n×m Is provided with a correlation matrix R between column vectors m×m Wherein R is m×m Value of element (1) r if =r fi ,r ii =1,(i,f=1,2,...,m);
Calculating a square matrix R m×m Obtaining a mean vector mu and a standard deviation vector sigma respectively according to the mean value and the standard deviation of each column, wherein the element value mu in the mean vector mu i Is a square matrix R m×m Average value of elements in each column, element value sigma in standard deviation vector sigma i Is a square matrix R m×m Standard deviation of the elements in each column;
according to a second preset threshold value, determining Z n×m The matrix S corresponding to the element smaller than the second preset threshold value, that is, the near infrared spectrum matrix corresponding to the second wavelength point is:
S n×k ={s i is Z n×m I column of (2) | mu i <t μ and σ i <t σ I 1,2, …, m, k < m, wherein the second predetermined threshold comprises t μ And t σ
4. A method according to claim 2 or 3, wherein said predetermined correlation metric is calculated by the formula:
Figure FDA0003624786200000021
where A and B are two vectors of length N, μ A Is the mean, σ, of the vector A A Is the standard deviation of vector A, μ B Is the mean, σ, of the vector B B Is the standard deviation of vector B;
or, the preset correlation metric index calculation formula is as follows:
Figure FDA0003624786200000022
wherein A and B are two vectors of length n, A i Is the i-th element value of vector A, B i Is the ith element value of vector B.
5. The method as claimed in claim 1, wherein the obtaining of the variable of each wavelength point to be selected in the near infrared spectrum data comprises:
acquiring original near infrared spectrum data of a target to be subjected to infrared spectrum analysis;
preprocessing the original near infrared spectrum data to enable overlapping peaks in the original near infrared spectrum data to be highlighted, and obtaining preprocessed near infrared spectrum data;
and acquiring the variable of each wavelength point to be selected according to the preprocessed near infrared spectrum data and preset sampling intervals.
6. The method of claim 5, wherein the pre-processing of the raw NIR spectra data comprises:
and processing the original near infrared spectrum data by utilizing a Savitzky-Golay filtering fitting method.
7. A near infrared spectrum characteristic wavelength selection device, characterized in that it comprises:
the acquisition module is used for acquiring various wavelength point variables to be selected in the near infrared spectrum data;
the first selection module is used for selecting a first wavelength point with the correlation with the response variable larger than a first preset threshold value from the wavelength point variables according to the correlation measurement information between the wavelength point variables and the response variable;
and the second selection module is used for selecting second wavelength points, the correlations of which with other first wavelength points are smaller than a second preset threshold value, from the first wavelength points according to the correlation measurement information among the first wavelength points so as to complete the selection of the characteristic wavelengths of the near infrared spectrum.
8. A computer device comprising a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, causes the processor to carry out the steps of the method of selecting characteristic wavelengths for near infrared spectra according to any of claims 1 to 6.
9. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, causes the processor to carry out the steps of the method of selecting characteristic wavelengths for near infrared spectra according to any of claims 1 to 6.
CN202210474552.0A 2022-04-29 2022-04-29 Near infrared spectrum characteristic wavelength selection method, device, equipment and storage medium Pending CN114942233A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210474552.0A CN114942233A (en) 2022-04-29 2022-04-29 Near infrared spectrum characteristic wavelength selection method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210474552.0A CN114942233A (en) 2022-04-29 2022-04-29 Near infrared spectrum characteristic wavelength selection method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114942233A true CN114942233A (en) 2022-08-26

Family

ID=82908104

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210474552.0A Pending CN114942233A (en) 2022-04-29 2022-04-29 Near infrared spectrum characteristic wavelength selection method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114942233A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116026780A (en) * 2023-03-28 2023-04-28 江西中医药大学 Method and system for online detection of coating moisture absorption rate based on series strategy wavelength selection

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116026780A (en) * 2023-03-28 2023-04-28 江西中医药大学 Method and system for online detection of coating moisture absorption rate based on series strategy wavelength selection

Similar Documents

Publication Publication Date Title
CN110455722A (en) Rubber tree blade phosphorus content EO-1 hyperion inversion method and system
CN107958267B (en) Oil product property prediction method based on spectral linear representation
CN113049500B (en) Water quality detection model training and water quality detection method, electronic equipment and storage medium
CN110503156B (en) Multivariate correction characteristic wavelength selection method based on minimum correlation coefficient
Wang et al. Near-infrared wavelength-selection method based on joint mutual information and weighted bootstrap sampling
CN114942233A (en) Near infrared spectrum characteristic wavelength selection method, device, equipment and storage medium
Si et al. Hierarchical temperature imaging using pseudoinversed convolutional neural network aided TDLAS tomography
CN114049525A (en) Fusion neural network system, device and method for identifying gas types and concentrations
CN114112995A (en) Aerosol optical characteristic data assimilation method and device based on three-dimensional variational technology
Ortiz-Herrero et al. Multivariate (O) PLS regression methods in forensic dating
CN112990107B (en) Hyperspectral remote sensing image underwater target detection method and device and computer equipment
CN114676636A (en) Grassland area soil moisture rapid inversion method integrating vegetation and habitat characteristics
CN111896497B (en) Spectral data correction method based on predicted value
Omidikia et al. Uninformative variable elimination assisted by gram–Schmidt orthogonalization/successive projection algorithm for descriptor selection in QSAR
CN109145403B (en) Near infrared spectrum modeling method based on sample consensus
CN116399836A (en) Cross-talk fluorescence spectrum decomposition method based on alternating gradient descent algorithm
CN114739980B (en) Element information prediction method, device, equipment and medium
CN110632024B (en) Quantitative analysis method, device and equipment based on infrared spectrum and storage medium
CN112859034B (en) Natural environment radar echo amplitude model classification method and device
Shan et al. A nonlinear calibration transfer method based on joint kernel subspace
CN102057261B (en) Method and apparatus for automatic calibration of spectrometers in chemometry by means of a bayes iterative estimation method
CN114141316A (en) Method and system for predicting biological toxicity of organic matters based on spectrogram analysis
CN114398228A (en) Method and device for predicting equipment resource use condition and electronic equipment
CN113609445A (en) Multi-source heterogeneous monitoring data processing method, terminal device and readable storage medium
CN112884052A (en) Method and device for extracting structural modal parameters, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination