CN116879224A - Method for rapidly screening effective spectrum pretreatment scheme of near-infrared calibration model and application thereof - Google Patents
Method for rapidly screening effective spectrum pretreatment scheme of near-infrared calibration model and application thereof Download PDFInfo
- Publication number
- CN116879224A CN116879224A CN202310733507.7A CN202310733507A CN116879224A CN 116879224 A CN116879224 A CN 116879224A CN 202310733507 A CN202310733507 A CN 202310733507A CN 116879224 A CN116879224 A CN 116879224A
- Authority
- CN
- China
- Prior art keywords
- mode
- near infrared
- spectrum
- variable
- derivative
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001228 spectrum Methods 0.000 title claims abstract description 76
- 238000000034 method Methods 0.000 title claims abstract description 56
- 238000012216 screening Methods 0.000 title claims abstract description 16
- 238000007781 pre-processing Methods 0.000 claims abstract description 45
- 238000002329 infrared spectrum Methods 0.000 claims abstract description 39
- 230000000694 effects Effects 0.000 claims abstract description 36
- 239000000126 substance Substances 0.000 claims abstract description 36
- 230000003595 spectral effect Effects 0.000 claims abstract description 34
- 238000009499 grossing Methods 0.000 claims description 79
- 238000012937 correction Methods 0.000 claims description 72
- 238000012545 processing Methods 0.000 claims description 63
- 238000011282 treatment Methods 0.000 claims description 50
- 238000000149 argon plasma sintering Methods 0.000 claims description 38
- 230000008569 process Effects 0.000 claims description 21
- 230000009466 transformation Effects 0.000 claims description 20
- 238000012795 verification Methods 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000010521 absorption reaction Methods 0.000 claims description 4
- 150000001875 compounds Chemical class 0.000 claims description 4
- 102000004169 proteins and genes Human genes 0.000 claims description 4
- 108090000623 proteins and genes Proteins 0.000 claims description 4
- 125000001422 pyrrolinyl group Chemical group 0.000 claims description 4
- 238000012360 testing method Methods 0.000 claims description 4
- 238000003672 processing method Methods 0.000 abstract description 5
- 241000209094 Oryza Species 0.000 description 12
- 235000007164 Oryza sativa Nutrition 0.000 description 12
- 235000021329 brown rice Nutrition 0.000 description 12
- 235000009566 rice Nutrition 0.000 description 12
- 230000000052 comparative effect Effects 0.000 description 7
- 238000002203 pretreatment Methods 0.000 description 7
- 230000006872 improvement Effects 0.000 description 5
- 238000001514 detection method Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 235000013312 flour Nutrition 0.000 description 4
- DQBQWWSFRPLIAX-UHFFFAOYSA-N 2-acetyl-1-pyrroline Chemical compound CC(=O)C1=NCCC1 DQBQWWSFRPLIAX-UHFFFAOYSA-N 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000000862 absorption spectrum Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000012620 biological material Substances 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000005670 electromagnetic radiation Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 235000013305 food Nutrition 0.000 description 1
- 238000004451 qualitative analysis Methods 0.000 description 1
- 238000004445 quantitative analysis Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000001845 vibrational spectrum Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/28—Determining representative reference patterns, e.g. by averaging or distorting; Generating dictionaries
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/17—Systems in which incident light is modified in accordance with the properties of the material investigated
- G01N21/25—Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
- G01N21/31—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
- G01N21/35—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
- G01N21/359—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using near infrared light
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/27—Regression, e.g. linear or logistic regression
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Health & Medical Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Investigating Or Analysing Materials By Optical Means (AREA)
Abstract
The invention relates to the technical field of near infrared spectrum, in particular to a method for rapidly screening an effective spectrum pretreatment scheme of a near infrared calibration model and application thereof. The method comprises the following steps: determining a spectrum preprocessing mode, setting 1-2 variables in each mode, setting 2 levels in each variable, and constructing an alternative spectrum preprocessing scheme according to the modes; preprocessing near infrared spectrum data of a substance to be detected by adopting each alternative spectrum preprocessing scheme respectively, then constructing a near infrared calibration model of the content of the substance to be detected, obtaining SEP of each calibration model, and calculating the effect value E of each variable on the calibration model according to the SEP v Thereby determining an effective spectral pretreatment scheme. The method of the invention can rapidly, accurately and efficiently evaluate the quality of each spectrum processing method on the calibration modelThe screened spectral pretreatment scheme can ensure better effectiveness.
Description
Technical Field
The invention relates to the technical field of near infrared spectrum, in particular to a method for rapidly screening an effective spectrum pretreatment scheme of a near infrared calibration model and application thereof.
Background
Near infrared light is electromagnetic radiation with a wavelength between 780 and 2526nm, between visible light (Vis) and mid-infrared (MIR). Near infrared spectrum is a frequency multiplication and dominant frequency absorption spectrum of molecular vibration spectrum. The near infrared spectrum technology utilizes substance information contained in the near infrared spectrum to perform qualitative or quantitative analysis of substances to be detected, has the technical application advantages of being quick and efficient, capable of realizing on-line detection, high in detection precision, low in cost, environment-friendly and the like, and has wide application in the fields of agriculture, pharmaceutical industry, food and the like.
When the content of a certain substance to be measured in a sample to be measured is detected by adopting a near infrared spectrum technology, a near infrared calibration (prediction) model is constructed in advance for the substance to be measured in the sample, and then the acquired near infrared spectrum data of the sample to be measured are substituted into the calibration model to obtain the content of the substance to be measured in the sample to be measured. In the process of constructing the near infrared calibration model, the influence of light scattering, noise interference and the like is reduced by preprocessing the spectrum data, so that the prediction capability of the calibration model can be improved, and higher accuracy can be realized when the calibration model is used for near infrared spectrum detection. The design of the spectral preprocessing scheme is a key factor affecting the predictive power of the scaling model. The established calibration model can only be used for detecting the specific substances to be detected in the specific sample or the current period of time, the universality among the calibration models is poor, and the universality of a spectrum pretreatment scheme is also poor, so that the calibration model has become one of important factors for limiting the development of near infrared spectrum technology.
The common near infrared spectrum pretreatment methods are various, and when the method is used, the various methods are often required to be combined so as to achieve a better effect. At present, the method for designing the spectrum pretreatment scheme is to randomly select several combinations, the selection is often random, the most suitable pretreatment scheme cannot be guaranteed, and thus, the best calibration model obtained by a random test (real-and-error) has uncertainty and can not reach higher accuracy; it is also not known what pre-processing method affects the quality of the scaling model, whether to increase or process redundancy. If the influence effect of the spectrum data preprocessing mode on the scaling prediction model can be rapidly and effectively evaluated, the effective spectrum preprocessing mode can be selected, and the method has important significance.
Disclosure of Invention
In order to solve the technical problems that the existing design method of the near infrared spectrum pretreatment scheme is difficult to ensure accuracy and cannot reflect the influence effect of the pretreatment method on the quality of the calibration model, the invention provides a method for rapidly screening an effective spectrum pretreatment scheme of a near infrared calibration model and application thereof. The method can rapidly, accurately and efficiently evaluate the influence effect of each spectrum processing method on the quality of the calibration model, and the screened spectrum pretreatment scheme can ensure better effectiveness and can ensure that the finally established near infrared calibration model has higher prediction capability.
The specific technical scheme of the invention is as follows:
in a first aspect, the invention provides a method for rapidly screening an effective spectrum pretreatment scheme of a near infrared calibration model, comprising the following steps:
(1) Determining a spectrum preprocessing mode, setting 1-2 variables in each mode, setting 2 levels in each variable, and constructing an alternative spectrum preprocessing scheme according to the modes;
(2) Preprocessing near infrared spectrum data of a substance to be detected by adopting each alternative spectrum preprocessing scheme respectively, then constructing a near infrared calibration model of the content of the substance to be detected, and obtaining a prediction standard deviation SEP (standard error of prediction) of each near infrared calibration model through the prediction of an external verification set;
(3) Calculating the effect value E of each variable on the near infrared calibration model according to the prediction standard deviation SEP of each near infrared calibration model v The calculation formula is as follows:
wherein , and />All n of a variable in all the alternative spectral preprocessing schemes respectively 2 Average Log (SEP) of levels and all n 1 Log (SEP) mean of horizontal treatment;
(4) According to E v Screening to obtain an effective spectrum pretreatment scheme.
The invention constructs a method for rapidly screening an effective spectrum pretreatment scheme of a near infrared calibration model, by setting a plurality of spectrum pretreatment modes, setting 1-2 variables in each mode (namely a specific spectrum pretreatment method in each mode), setting 2 levels for each variable, designing a permutation and combination, further calculating the influence effect of each variable on the quality of the calibration model, and screening an effective spectrum data pretreatment mode according to the influence effect. By the method, the number of times of combination can be reduced, and the influence effect of variables (spectrum processing methods) on the quality of the calibration model can be rapidly, accurately and efficiently evaluated, so that technical support is provided for the establishment of a near-infrared calibration model of a substance to be detected, and the prediction capability of the near-infrared calibration model is effectively improved.
By calculating E in the present invention v Can characterize each variable pairInfluence effect of the vertical near infrared scaling model. E (E) v Negative values indicate that the variable is an effective spectrum pretreatment method capable of improving the quality of the calibration model; e (E) v Positive values indicate that the variable is a spectral preprocessing method that can degrade the quality of the scaled model; e (E) v The absolute value is near zero, indicating that the variable is an ineffective spectral pretreatment method. The determination coefficient RSQ can also represent the prediction capability of the near infrared calibration model to a certain extent, but has limited accuracy, the RSQ is high, and the calibration model is verified in a new prediction model to not necessarily have good prediction capability; and E employed in the present invention v The influence effect of each spectrum processing method on the quality of the calibration model can be reflected, and the accuracy of the evaluation of the prediction capability of the calibration model is higher.
Preferably, the specific process of step (1) comprises the following steps:
(1.1) determining the mode of spectral pretreatment, denoted X 1 、X 2 、……、X k Wherein k is the total number of spectral preprocessing modes;
(1.2) mode X 1 、X 2 、……、X k Respectively set 1-2 variables, which are marked as X i-j Where i is the i-th pattern, i=1, 2,3, … …, k, j is the j-th variable, j=1 or 2;
(1.3) setting n for each variable 1 、n 2 Two levels, where n 1 For comparison, n 2 For treatment;
(1.4) sequential use of Pattern X for near infrared Spectrum data 1 、X 2 、……、X k And (3) preprocessing, wherein 1-2 variables are set according to the step (1.2) in each mode, two levels are set according to the step (1.3) in each variable, and S alternative spectrum preprocessing schemes are constructed.
When 1 variable is set in each mode, s=2 k The method comprises the steps of carrying out a first treatment on the surface of the When 2 variables are set in each mode, s=2 k X 2k; when 2 variables are set in m modes and 1 variable is set in k-m modes, s=2 k ×2m。
Preferably, the specific process of step (4) comprises the following steps: at each ofSelection of E in mode v The obtained spectrum pretreatment scheme is an effective spectrum pretreatment scheme; selecting E in each mode v A variable that is negative and when E of each variable in a certain mode v When both are negative, E is selected in the mode v The variable with larger absolute value is the obtained spectrum pretreatment scheme which is the optimal spectrum pretreatment scheme.
Preferably, in step (1.1), the mode of the spectrum pretreatment is one or more of derivative treatment, light scattering correction and smoothing treatment, and k=1 to 3.
Preferably, in step (1.1), k=3, pattern X 1 、X 2 and X3 In this order, derivative processing, light scattering correction, and smoothing processing.
Preferably, in step (1.2), the variable set in the derivative processing mode is a first derivative and/or a second derivative, the variable set in the light scattering correction mode is a multiple scattering correction and/or a standard normal transformation, and the variable set in the smoothing processing mode is a primary smoothing processing and/or a secondary smoothing processing.
Preferably, the substance to be tested is a pyrroline ring-containing compound; in step (1.2), the variables set in the derivative processing mode are the first derivative and the second derivative, the variables set in the light scattering correction mode are the multiple scattering correction and the standard normal transformation, and the variables set in the smoothing processing mode are the primary smoothing processing.
Preferably, the substance to be detected is protein, or the near infrared spectrum data of the substance to be detected contains primary or secondary frequency multiplication absorption bands of N-H stretching bonds; in step (1.2), the variables set in the derivative processing mode are a first derivative and a second derivative, the variables set in the light scattering correction mode are a multiple scattering correction, and the variables set in the smoothing processing mode are a primary smoothing processing and a secondary smoothing processing.
The invention is based on theoretical research and a large number of experiments, and can reduce the alternative range of a spectrum pretreatment method aiming at some characteristic substances, such as: when the substance to be detected is a compound containing a pyrroline ring, compared with the secondary smoothing treatment, the quality of the near infrared calibration model can be improved to a greater extent by adopting the primary smoothing treatment; when the substance to be detected is protein or the near infrared spectrum data of the substance to be detected contains a primary or secondary frequency multiplication absorption band of N-H stretching bonds, compared with standard normal transformation, the light scattering correction is carried out by adopting multi-element scattering correction, so that the quality of the near infrared calibration model can be improved to a greater extent. Based on the findings, the number of the alternative spectrum pretreatment schemes can be reduced, and the screening of the spectrum pretreatment schemes is faster and more efficient while the prediction capability of the near infrared calibration model is ensured.
Preferably, in step (1.3), the control is no treatment.
In a second aspect, the invention provides the use of said method for constructing a near infrared calibration model.
Compared with the prior art, the invention has the following advantages:
(1) The method has the advantages of being quick, accurate and efficient, can reflect the influence effect of each spectrum processing method on the quality of the calibration model, has a good screening effect on the spectrum pretreatment scheme, and can ensure the effectiveness of the screened spectrum pretreatment scheme, so that the established near infrared bidding model has higher prediction capability, and is beneficial to improving the accuracy of the near infrared spectrum detection technology;
(2) Aiming at some characteristic substances, the invention shortens the alternative range of the spectrum pretreatment method while ensuring the effectiveness of the screened spectrum pretreatment scheme, so that the screening of the spectrum pretreatment scheme is faster and more efficient while ensuring the prediction capability of the near infrared calibration model.
Detailed Description
The invention is further described below with reference to examples.
The following description of the embodiments is provided to facilitate the understanding and use of the invention by those skilled in the art. It will be apparent to those skilled in the art that various modifications can be readily made to these embodiments and the generic principles described herein may be applied to other embodiments without the use of the inventive faculty. Therefore, the present invention is not limited to the following embodiments, and those skilled in the art, based on the present disclosure, should make improvements and modifications without departing from the scope of the present invention.
General examples
A method for rapidly screening an effective spectrum pretreatment scheme of a near infrared calibration model comprises the following steps:
(1) Determining the mode of spectral preprocessing, denoted X 1 、X 2 、……、X k Wherein k is the total number of spectral preprocessing modes;
(2) Mode X 1 、X 2 、……、X k Respectively set 1-2 variables, which are marked as X i-j Where i is the i-th pattern, i=1, 2,3, … …, k, j is the j-th variable, j=1 or 2;
(3) Each variable is provided with n respectively 1 、n 2 Two levels, where n 1 For comparison, n 2 For treatment;
(4) Mode X for near infrared spectrum data in turn 1 、X 2 、……、X k Preprocessing, wherein 1-2 variables are set in each mode according to the step (2), two levels are set in each variable according to the step (3), and S alternative spectrum preprocessing schemes are constructed;
(5) Preprocessing near infrared spectrum data of a substance to be detected by adopting each alternative spectrum preprocessing scheme obtained in the step (4), constructing a near infrared calibration model of the content of the substance to be detected by utilizing the preprocessed near infrared spectrum data, and obtaining a prediction standard deviation SEP of each near infrared calibration model through the prediction of an external verification set;
(6) Calculating the effect value E of each variable on the near infrared calibration model according to the prediction standard deviation SEP of each near infrared calibration model obtained in the step (5) v The calculation formula is as follows:
wherein ,for all the n of a variable in all the alternative spectral preprocessing schemes 2 Log (SEP) mean of levels; />For all the alternative spectral preprocessing schemes, all n of the variables 1 Log (SEP) mean of horizontal treatment;
(7) E obtained according to step (6) v And screening to obtain an effective spectrum pretreatment scheme from all the alternative spectrum pretreatment schemes.
As a specific embodiment, the specific process of step (7) includes the following steps: selecting E in each mode v The obtained spectrum pretreatment scheme is an effective spectrum pretreatment scheme; selecting E in each mode v A variable that is negative and when E of each variable in a certain mode v When both are negative, E is selected in the mode v The variable with larger absolute value is the obtained spectrum pretreatment scheme which is the optimal spectrum pretreatment scheme.
As a specific embodiment, in step (1), the mode of the spectrum pretreatment is one or more of derivative treatment, light scattering correction and smoothing treatment, and k=1 to 3.
In step (1), k=3, pattern X 1 、X 2 and X3 In this order, derivative processing, light scattering correction, and smoothing processing.
In step (2), the variables set in the derivative processing mode are first-order derivatives and/or second-order derivatives, the variables set in the light scattering correction mode are multiple scattering correction and/or standard normal transformation, and the variables set in the smoothing processing mode are primary smoothing processing and/or secondary smoothing processing.
As a specific embodiment, the substance to be tested is a compound containing a pyrroline ring; in the step (2), the variables set in the derivative processing mode are the first derivative and the second derivative, the variables set in the light scattering correction mode are the multiple scattering correction and the standard normal transformation, and the variables set in the smoothing processing mode are the primary smoothing processing.
As another specific embodiment, the near infrared spectrum data of the substance to be detected contains a primary or secondary frequency multiplication absorption band of an N-H stretching bond; in the step (2), the variables set in the derivative processing mode are the first derivative and the second derivative, the variables set in the light scattering correction mode are the multiple scattering correction, and the variables set in the smoothing processing mode are the primary smoothing processing and the secondary smoothing processing.
As another specific embodiment, the substance to be tested is a protein; in the step (2), the variables set in the derivative processing mode are the first derivative and the second derivative, the variables set in the light scattering correction mode are the multiple scattering correction, and the variables set in the smoothing processing mode are the primary smoothing processing and the secondary smoothing processing.
As a specific embodiment, in step (3), the control is no treatment.
The application of the method in constructing a near infrared calibration model.
Example 1
In the process of constructing a near infrared calibration model for detecting the content of 2-acetyl-1-pyrroline in rice brown rice flour, a spectrum pretreatment scheme is screened by the following steps:
(1) The mode of spectral preprocessing was determined, 3 in total, in order of mode 1 (X 1 ) For the mathematical derivative mode, mode 2 (X 2 ) For light scattering correction, mode 3 (X 3 ) Is smoothing processing.
(2) The three modes are respectively provided with 1-2 variables, and the specific settings are as follows: mode X 1 Setting a first derivative and a second derivative to process 2 variables; mode X 2 Setting 2 variables of multivariate scattering correction (multiplicative scatter correction, MSC) and standard normal transformation (standard normal variate, SNV); mode X 3 The smoothing process is set 1 variable at a time.
(3) Each mode was set to 2 levels, including control (i.e., no treatment, indicated by the symbol "-"), and treatment (indicated by the symbol "+"), as follows: control (no treatment-) and high level (first derivative + or second derivative+), are set in mode 1 (mathematical derivative mode); control (no treatment-) and high level (multiple scatter correction + or standard normal transformation+), are set in mode 2 (light scatter correction); the control (no treatment-) and high level (one smoothing+) are set in mode 3 (smoothing).
(4) The near infrared spectrum data is sequentially subjected to mathematical derivative, light scattering correction and smoothing, 2 variables are set in each mode according to the step (2), 1 variable is set in the mode 3, and 2 levels are set in each variable according to the step (3). Specific permutations and combinations are shown in table 1 below, for a total of 32 combinations, 14 combinations repeated, and 18 independent combinations, i.e., 18 alternative spectral pretreatment schemes.
(5) Collecting near infrared spectrum data of a rice brown rice calibration sample set, respectively preprocessing the near infrared spectrum data by adopting various alternative spectrum preprocessing schemes, constructing a near infrared calibration model for detecting the content of 2-acetyl-1-pyrroline in the rice brown rice by adopting the preprocessed near infrared spectrum data, and obtaining the prediction standard deviation (SEP) of each near infrared calibration model through the prediction of an external verification set as shown in the following table 1.
(6) From the calculation of Table 1, the average Log (SEP) value of the first derivative process in the mathematical derivative mode-1.0733), control->Is-0.9447, E v A value of-0.1286; e of second derivative processing v The value was-0.1726. Multi-component scatter correction process E in light scatter correction mode v A value of-0.055; standard normal transformation E v The value was-0.0595. One-time smoothing E in smoothing mode v The value is-0.0111.
Above-mentionedPretreatment of spectra for all alternativesIn the table, all high levels of Log (SEP) average for a variable; />Log (SEP) averages for all controls for this variable for all alternative spectral pretreatment schemes; effect value of the variable on the near infrared calibration model +.>
(7) E of variables v The values indicate that: mathematical derivative and 4 variables E in light Scattering correction mode v All negative, indicating that these 4 variables all have the effect of improving the quality of the scaled model, with 2 variables in the mathematical derivative pattern affecting a larger effect. Smoothing processing mode E v Also negative, but already near zero, indicating that this mode has essentially no effect and may not be used in constructing the calibration model for the substance to be measured. Thus, the effective spectral pretreatment scheme is: sequentially performing first-order or second-order derivative treatment, multi-component scattering correction treatment or standard normal transformation; the optimal spectrum pretreatment scheme is as follows: and sequentially performing second derivative processing and standard normal transformation.
The data in table 1 is statistically analyzed by adopting professional statistical software DPS through a Duncan new complex polar difference method with three factors and single repetition, and F values of 3 spectrum pretreatment modes of derivative treatment, smoothing treatment and light scattering correction are 3735, 2.16 and 653 respectively, and the difference is extremely obvious, not obvious and extremely obvious respectively, namely, the derivative treatment and light scattering treatment modes are effective modes, so that the accuracy of a calibration model of a substance to be detected can be improved. The smoothing effect is not significant and may not be employed. This result is the same as E calculated in this embodiment v The results of the value reflection are consistent.
TABLE 1
Example 2
In the process of constructing a near infrared calibration model for detecting protein content in rice brown rice flour, a spectral pretreatment scheme is screened by the following steps:
(1) The mode of spectral preprocessing was determined, 3 in total, in order of mode 1 (X 1 ) Digital derivative, mode 2 (X 2 ) For light scattering correction, mode 3 (X 3 ) Is smoothing processing.
(2) The three modes are respectively provided with 1-2 variables, and the specific settings are as follows: mode X 1 Setting a first derivative and a second derivative to process 2 variables; mode X 2 Setting 1 variable of multi-element scattering correction; mode X 3 The primary smoothing and the secondary smoothing processing are set for 2 variables.
(3) Each mode was set to 2 levels, including control (i.e., no treatment, indicated by the symbol "-"), and treatment (indicated by the symbol "+"), as follows: the control (no treatment-) and the treatment (first derivative + or second derivative+), are set in mode 1 (mathematical derivative mode); control (no treatment-) and treatment (multiple scatter correction+), are set in mode 2 (light scatter correction); the control (no treatment-) and the treatment (primary smoothing + or secondary smoothing +) are set in mode 3 (smoothing).
(4) The near infrared spectrum data is sequentially subjected to mathematical derivative, light scattering correction and smoothing, and 2 variables are set in each mode according to the step (2), 1 variable is set in the mode 2, and 2 levels are set in each variable according to the step (3). Specific permutations and combinations as shown in table 2 below, there were a total of 32 combinations, 14 combinations repeated, and one independent combination, i.e., 18 alternative spectral pretreatment schemes.
(5) Collecting near infrared spectrum data of a rice brown rice calibration sample set, respectively preprocessing the near infrared spectrum data by adopting various alternative spectrum preprocessing schemes, constructing a near infrared calibration model for detecting protein content in the rice brown rice by adopting the preprocessed near infrared spectrum data, and obtaining prediction standard deviation (SEP) of each near infrared calibration model through the prediction of an external verification set as shown in the following table 2.
(6) From the calculation of Table 2, the average Log (SEP) value of the first derivative process in the mathematical derivative mode-0.5338), control->Is-0.5187, E v A value of-0.015; e of second derivative processing v The value was-0.0135. Multi-component scatter correction process E in light scatter correction mode v The value was-0.094. Primary and secondary smoothing E in smoothing mode v The values were 0.005 and 0.006, respectively.
Above-mentionedAll high level Log (SEP) averages for a variable in all alternative spectral pretreatment schemes; />Log (SEP) averages for all controls for this variable for all alternative spectral pretreatment schemes; effect value of the variable on the near infrared calibration model +.>
(7) E of variables v The values indicate that: mathematical derivative and 3 variables E in light Scattering correction mode v All negative values indicate that all 3 variables have the effect of improving the quality of the calibration model, wherein the effect of the multi-element scattering correction on the calibration model is larger. Primary and secondary smoothing E in smoothing mode v Near zero, this mode is shown to have substantially no effect and may not be used in constructing the calibration model for the substance to be measured. Thus, the effective spectral pretreatment scheme is: sequentially performing first-order or second-order derivative treatment and multi-element scattering correction treatment; the optimal spectrum pretreatment scheme is as follows: the first derivative processing and the multi-component scattering correction processing are sequentially carried out.
The data in table 2 is statistically analyzed by adopting professional statistical software DPS through a Duncan new complex polar difference method with three factors and single repetition, and F values of 3 spectrum pretreatment modes of derivative treatment, smoothing treatment and light scattering correction are 90.4,1.3 and 551 respectively, and the differences are extremely obvious and not obvious and extremely obvious respectively, namely, the derivative treatment and the light scattering treatment modes are effective modes, so that the accuracy of a calibration model of a substance to be detected can be improved. The smoothing effect is not significant and may not be employed. This result is compared with our calculated E v And consistent.
TABLE 2
Comparative example 1
In the process of constructing a near infrared calibration model for detecting the content of 2-acetyl-1-pyrroline in rice brown rice flour, a spectrum pretreatment scheme is screened by the following steps:
(1) The mode of spectral preprocessing was determined, 3 in total, in order of mode 1 (X 1 ) For the mathematical derivative mode, mode 2 (X 2 ) For light scattering correction, mode 3 (X 3 ) Is smoothing processing.
(2) The three modes are respectively provided with 1-2 variables, and the specific settings are as follows: mode X 1 Setting and not processing; mode X 2 Setting 2 variables of multivariate scattering correction (multiplicative scatter correction, MSC) and standard normal transformation (standard normalvariate, SNV); mode X 3 Setting 2 variables of primary and secondary smoothing.
(3) Each mode was set to 2 levels, including control (i.e., no treatment, indicated by the symbol "-"), and treatment (indicated by the symbol "+"), as follows: control (no treatment-) and high level (multiple scatter correction + or standard normal transformation+), are set in mode 2 (light scatter correction); the control (no treatment-) and high level (primary smoothing + or secondary smoothing +) are set in mode 3 (smoothing treatment).
(4) And (3) sequentially performing mathematical derivative, light scattering correction and smoothing on the near infrared spectrum data, wherein each of modes 1,2 and 3 is provided with 1-2 variables according to the step (2), and each variable is provided with 2 levels according to the step (3). The specific permutation and combination are shown in the numbers (1) to (2) in the following table 3, and there are 16 combinations in total, 6 combinations are repeated, and 10 independent combinations are provided, namely 10 alternative spectrum pretreatment schemes; at the same time, 9 combinations shown in the number (3) in table 3 were set for examining the application effect of the derivative processing on the improvement of the quality of the calibration model.
(5) Near infrared spectrum data of a rice brown rice calibration sample set (same as in example 1) are collected, each of the alternative spectrum pretreatment schemes is adopted to pretreat the near infrared spectrum data, a near infrared calibration model for detecting protein content in the rice brown rice is constructed by adopting the pretreated near infrared spectrum data, and a prediction standard deviation (SEP) of each near infrared calibration model is obtained through the prediction of an external verification set (same as in example 1) as shown in the following table 3.
(6) From Table 3, it is calculated that the multiple scattering correction processing E in the light scattering correction mode v A value of-0.1014; standard normal transformation E v The value is-0.0723. Primary and secondary smoothing E in smoothing mode v The values were 0.0043 and-0.0009, respectively. (E) v The values were calculated from the first 16 combinations in table 3. The first 16 combinations were mathematical treatments with pattern 1 set only for control and no treatment, thus compared to example 1. The last 9 combinations were all subjected to a mathematical first derivative treatment, which was compared with the first 16 combinations to investigate the effect of the derivative treatment on the application of the improved quality of the scaled model. ) Above-mentionedAll high level Log (SEP) averages for a variable in all alternative spectral pretreatment schemes; />Pretreatment of spectra for all alternativesIn the protocol, log (SEP) averages for all controls for this variable; effect value of the variable on the near infrared calibration model +.>
(7) E of variables v The values indicate that: 2 variables E in light Scattering correction mode v All are negative values, and have the effect of improving the quality of the calibration model. Secondary smoothing mode E v Also negative but already close to zero; primary smoothing mode E v Positive values indicate that this mode has substantially no effect and may not be used in constructing a calibration model for the substance to be measured. This result is compared with E calculated in this example 1 v The validity of the value-reflected light scattering correction model and the results of the smoothing process, which were not effective, agree. Meanwhile, the determination coefficient RSQ index of the scaling model can be seen: x in this comparative example 1 Without mathematical manipulation, the highest RSQ value was 0.934, and values significantly lower than those after the first derivative used in the comparative example were all greater than 0.950, again verifying that the mathematical manipulation of the results in example 1, either the first derivative or the second derivative, was effective for scaling model improvement.
TABLE 3 Table 3
Comparative example 2
In the process of constructing a near infrared calibration model for detecting protein content in rice brown rice flour, a spectral pretreatment scheme is screened by the following steps:
(1) The mode of spectral preprocessing was determined, 3 in total, in order of mode 1 (X 1 ) For the mathematical derivative mode, mode 2 (X 2 ) For light scattering correction, mode 3 (X 3 ) Is smoothing processing.
(2) Three modes are respectively provided withSetting 1-2 variables, and specifically setting as follows: mode X 1 Setting and not processing; mode X 2 Setting 2 variables of multivariate scattering correction (multiplicative scatter correction, MSC) and standard normal transformation (standard normal variate, SNV); mode X 3 The smoothing process is set 1 variable at a time.
(3) Each mode was set to 2 levels, including control (i.e., no treatment, indicated by the symbol "-"), and treatment (indicated by the symbol "+"), as follows: control (no treatment-) and high level (multiple scatter correction + or standard normal transformation+), are set in mode 2 (light scatter correction); the control (no treatment-) and high level (primary smoothing + or secondary smoothing +) are set in mode 3 (smoothing treatment).
(4) And (3) sequentially performing mathematical derivative, light scattering correction and smoothing on the near infrared spectrum data, wherein each of modes 1,2 and 3 is provided with 1-2 variables according to the step (2), and each variable is provided with 2 levels according to the step (3). Specific permutation and combination are shown in the numbers (1) to (2) in the following table 4, and there are 16 combinations in total, 7 combinations are repeated, and 9 independent combinations, namely 9 alternative spectral pretreatment schemes.
(5) Collecting near infrared spectrum data of a rice brown rice calibration sample set, respectively preprocessing the near infrared spectrum data by adopting various alternative spectrum preprocessing schemes, constructing a near infrared calibration model for detecting protein content in the rice brown rice by adopting the preprocessed near infrared spectrum data, and obtaining prediction standard deviation (SEP) of each near infrared calibration model through the prediction of an external verification set, wherein the prediction standard deviation (SEP) of each near infrared calibration model is shown in the following table 4; at the same time, 9 combinations shown by the number (3) in table 4 were set for examining the application effect of the derivative processing on the improvement of the quality of the calibration model.
(6) From Table 4, it is calculated that the multiple scattering correction processing E in the light scattering correction mode v A value of-0.0598; standard normal transformation E v The value was-0.0286. Primary and secondary smoothing E in smoothing mode v The values were 0.0037 and 0.0025, respectively. (E) v The values were calculated from the first 16 combinations in table 4. The previous 16 combinations were set with the mathematical treatment of mode 1 being only control untreated, thus comparing with the comparative example 1Compared with the prior art. The last 9 combinations were all subjected to a mathematical first derivative treatment, which was compared with the first 16 combinations to investigate the effect of the derivative treatment on the application of the improved quality of the scaled model. ) Above-mentionedAll high level Log (SEP) averages for a variable in all alternative spectral pretreatment schemes; />Log (SEP) averages for all controls for this variable for all alternative spectral pretreatment schemes; effect value of the variable on the near infrared calibration model +.>
(7) E of variables v The values indicate that: 2 variables E in light Scattering correction mode v All are negative values, and have the effect of improving the quality of the calibration model, and the effect of the multi-element scattering correction processing is superior to that of standard normal transformation. Primary and secondary smoothing mode E v Positive values indicate that this mode has substantially no effect and may not be used in constructing a calibration model for the substance to be measured. This result is compared with E calculated in this example 2 v The validity of the value-reflected light scattering correction model and the results of the smoothing process, which were not effective, agree. Meanwhile, the determination coefficient RSQ index of the scaling model can be seen: x in this comparative example 1 Without mathematical treatment, the highest RSQ value was 0.9352, significantly lower than the mean 0.9456 after the first derivative used in the comparative example, again verifying that the mathematical treatment of the results in example 1, either the first derivative or the second derivative, was effective for scaling model improvement.
TABLE 4 Table 4
In the present invention, unless otherwise indicated, scientific and technical terms used herein have the meanings commonly understood by one of ordinary skill in the art. Moreover, the experimental methods used herein, unless otherwise specified, are all conventional; the reagents, biological materials and apparatus used, unless otherwise indicated, are all commercially available.
The foregoing description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and any simple modification, variation and equivalent transformation of the above embodiment according to the technical substance of the present invention still fall within the scope of the technical solution of the present invention.
Claims (10)
1. A method for rapidly screening an effective spectrum pretreatment scheme of a near infrared calibration model, which is characterized by comprising the following steps:
(1) Determining a spectrum preprocessing mode, setting 1-2 variables in each mode, setting 2 levels in each variable, and constructing an alternative spectrum preprocessing scheme according to the modes;
(2) Preprocessing near infrared spectrum data of a substance to be detected by adopting each alternative spectrum preprocessing scheme respectively, then constructing a near infrared calibration model of the content of the substance to be detected, and obtaining a prediction standard deviation SEP of each near infrared calibration model through the prediction of an external verification set;
(3) Calculating the effect value E of each variable on the near infrared calibration model according to the prediction standard deviation SEP of each near infrared calibration model v The calculation formula is as follows:
wherein , and />All n of a variable in all the alternative spectral preprocessing schemes respectively 2 Average Log (SEP) of levels and all n 1 Log (SEP) mean of horizontal treatment;
(4) According to E v Screening to obtain an effective spectrum pretreatment scheme.
2. The method of claim 1, wherein the specific process of step (1) comprises the steps of:
(1.1) determining the mode of spectral pretreatment, denoted X 1 、X 2 、……、X k Wherein k is the total number of spectral preprocessing modes;
(1.2) mode X 1 、X 2 、……、X k Respectively set 1-2 variables, which are marked as X i-j Where i is the i-th pattern, i=1, 2,3, … …, k, j is the j-th variable, j=1 or 2;
(1.3) setting n for each variable 1 、n 2 Two levels, where n 1 For comparison, n 2 For treatment;
(1.4) sequential use of Pattern X for near infrared Spectrum data 1 、X 2 、……、X k And (3) preprocessing, wherein 1-2 variables are set according to the step (1.2) in each mode, two levels are set according to the step (1.3) in each variable, and S alternative spectrum preprocessing schemes are constructed.
3. The method of claim 1, wherein the specific process of step (4) comprises the steps of: selecting E in each mode v The obtained spectrum pretreatment scheme is an effective spectrum pretreatment scheme; selecting E in each mode v A variable that is negative and when E of each variable in a certain mode v When both are negative, E is selected in the mode v The variable with larger absolute value is the obtained spectrum pretreatment scheme which is the optimal spectrum pretreatment scheme.
4. The method of claim 2, wherein in step (1.1), the mode of spectral preprocessing is one or more of derivative processing, light scattering correction, and smoothing processing, and k = 1-3.
5. The method of claim 4, wherein in step (1.1), k=3, mode X 1 、X 2 and X3 In this order, derivative processing, light scattering correction, and smoothing processing.
6. The method according to claim 4 or 5, wherein in the step (1.2), the variable set in the derivative processing mode is a first derivative and/or a second derivative, the variable set in the light scattering correction mode is a multiple scattering correction and/or a standard normal transformation, and the variable set in the smoothing processing mode is a primary smoothing processing and/or a secondary smoothing processing.
7. The method of claim 6, wherein the test substance is a pyrroline ring-containing compound; in the step (2), the variables set in the derivative processing mode are the first derivative and the second derivative, the variables set in the light scattering correction mode are the multiple scattering correction and the standard normal transformation, and the variables set in the smoothing processing mode are the primary smoothing processing.
8. The method of claim 6, wherein the test substance is a protein or the near infrared spectral data of the test substance contains primary or secondary frequency doubling absorption bands of N-H stretch bonds; in step (1.2), the variables set in the derivative processing mode are a first derivative and a second derivative, the variables set in the light scattering correction mode are a multiple scattering correction, and the variables set in the smoothing processing mode are a primary smoothing processing and a secondary smoothing processing.
9. The method of claim 2, wherein in step (1.3), the control is no treatment.
10. Use of the method according to one of claims 1 to 9 for constructing a near infrared calibration model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310733507.7A CN116879224A (en) | 2023-06-20 | 2023-06-20 | Method for rapidly screening effective spectrum pretreatment scheme of near-infrared calibration model and application thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310733507.7A CN116879224A (en) | 2023-06-20 | 2023-06-20 | Method for rapidly screening effective spectrum pretreatment scheme of near-infrared calibration model and application thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116879224A true CN116879224A (en) | 2023-10-13 |
Family
ID=88261232
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310733507.7A Pending CN116879224A (en) | 2023-06-20 | 2023-06-20 | Method for rapidly screening effective spectrum pretreatment scheme of near-infrared calibration model and application thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116879224A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117093841A (en) * | 2023-10-18 | 2023-11-21 | 中国科学院合肥物质科学研究院 | Abnormal spectrum screening model determining method, device and medium for wheat transmission spectrum |
-
2023
- 2023-06-20 CN CN202310733507.7A patent/CN116879224A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117093841A (en) * | 2023-10-18 | 2023-11-21 | 中国科学院合肥物质科学研究院 | Abnormal spectrum screening model determining method, device and medium for wheat transmission spectrum |
CN117093841B (en) * | 2023-10-18 | 2024-02-09 | 中国科学院合肥物质科学研究院 | Abnormal spectrum screening model determining method, device and medium for wheat transmission spectrum |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7570357B2 (en) | Visible/near-infrared spectrometry and its device | |
CN105630743A (en) | Spectrum wave number selection method | |
CN105136714B (en) | A kind of tera-hertz spectra Wavelength selecting method based on genetic algorithm | |
CN116879224A (en) | Method for rapidly screening effective spectrum pretreatment scheme of near-infrared calibration model and application thereof | |
Scott et al. | Why do we need 14C inter-comparisons?: The Glasgow-14C inter-comparison series, a reflection over 30 years | |
US20210247367A1 (en) | Workflow-based model optimization method for vibrational spectral analysis | |
CN112179871B (en) | Method for nondestructive detection of caprolactam content in sauce food | |
CN110749565A (en) | Method for rapidly identifying storage years of Pu' er tea | |
CN104730032A (en) | Near infrared spectrum-based mathematical model and detection method for heat-treated wood color | |
CN109270022B (en) | Waveband selection method of near-infrared spectrum model and model construction method | |
Xu et al. | MCCV stacked regression for model combination and fast spectral interval selection in multivariate calibration | |
CN116559110A (en) | Self-adaptive near infrared spectrum transformation method based on correlation and Gaussian curve fitting | |
CN104931450A (en) | Method for predicting mechanical strength of heat-treated wood | |
CN108120694B (en) | Multi-element correction method and system for chemical component analysis of sun-cured red tobacco | |
CN114624402B (en) | Quality evaluation method for snail rice noodle sour bamboo shoots based on near infrared spectrum | |
JPH0792433B2 (en) | Measuring method for sugar content of fruits and vegetables and sugar content measuring device | |
CN114062306B (en) | Near infrared spectrum data segmentation preprocessing method | |
CN115436315A (en) | Near infrared spectrum-based cement additive detection method | |
CN113049526B (en) | Corn seed moisture content determination method based on terahertz attenuated total reflection | |
CN113075197A (en) | Method for rapidly detecting adulteration content of corn flour in white pepper powder | |
CN112763448A (en) | ATR-FTIR technology-based method for rapidly detecting content of polysaccharides in rice bran | |
CN107167447A (en) | The method for blending apple fumet content in cider is calculated using near-infrared spectrum technique | |
Riau et al. | Selection of compound group to identify the authenticity one of jamu product using the group lasso for logistic regression | |
JP2002506991A (en) | Automatic calibration method | |
CN113125378A (en) | Near infrared spectrum-based method for rapidly detecting nutritional components in camel meat at different parts |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |