CN116879224A - Method for rapidly screening effective spectrum pretreatment scheme of near-infrared calibration model and application thereof - Google Patents

Method for rapidly screening effective spectrum pretreatment scheme of near-infrared calibration model and application thereof Download PDF

Info

Publication number
CN116879224A
CN116879224A CN202310733507.7A CN202310733507A CN116879224A CN 116879224 A CN116879224 A CN 116879224A CN 202310733507 A CN202310733507 A CN 202310733507A CN 116879224 A CN116879224 A CN 116879224A
Authority
CN
China
Prior art keywords
mode
near infrared
spectrum
variable
derivative
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310733507.7A
Other languages
Chinese (zh)
Inventor
胡培松
唐绍清
谢黎虹
圣忠华
胡时开
魏祥进
焦桂爱
邵高能
王玲
赵凤利
陈颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China National Rice Research Institute
Original Assignee
China National Rice Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China National Rice Research Institute filed Critical China National Rice Research Institute
Priority to CN202310733507.7A priority Critical patent/CN116879224A/en
Publication of CN116879224A publication Critical patent/CN116879224A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/28Determining representative reference patterns, e.g. by averaging or distorting; Generating dictionaries
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • G01N21/35Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
    • G01N21/359Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using near infrared light
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/27Regression, e.g. linear or logistic regression
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Investigating Or Analysing Materials By Optical Means (AREA)

Abstract

The invention relates to the technical field of near infrared spectrum, in particular to a method for rapidly screening an effective spectrum pretreatment scheme of a near infrared calibration model and application thereof. The method comprises the following steps: determining a spectrum preprocessing mode, setting 1-2 variables in each mode, setting 2 levels in each variable, and constructing an alternative spectrum preprocessing scheme according to the modes; preprocessing near infrared spectrum data of a substance to be detected by adopting each alternative spectrum preprocessing scheme respectively, then constructing a near infrared calibration model of the content of the substance to be detected, obtaining SEP of each calibration model, and calculating the effect value E of each variable on the calibration model according to the SEP v Thereby determining an effective spectral pretreatment scheme. The method of the invention can rapidly, accurately and efficiently evaluate the quality of each spectrum processing method on the calibration modelThe screened spectral pretreatment scheme can ensure better effectiveness.

Description

Method for rapidly screening effective spectrum pretreatment scheme of near-infrared calibration model and application thereof
Technical Field
The invention relates to the technical field of near infrared spectrum, in particular to a method for rapidly screening an effective spectrum pretreatment scheme of a near infrared calibration model and application thereof.
Background
Near infrared light is electromagnetic radiation with a wavelength between 780 and 2526nm, between visible light (Vis) and mid-infrared (MIR). Near infrared spectrum is a frequency multiplication and dominant frequency absorption spectrum of molecular vibration spectrum. The near infrared spectrum technology utilizes substance information contained in the near infrared spectrum to perform qualitative or quantitative analysis of substances to be detected, has the technical application advantages of being quick and efficient, capable of realizing on-line detection, high in detection precision, low in cost, environment-friendly and the like, and has wide application in the fields of agriculture, pharmaceutical industry, food and the like.
When the content of a certain substance to be measured in a sample to be measured is detected by adopting a near infrared spectrum technology, a near infrared calibration (prediction) model is constructed in advance for the substance to be measured in the sample, and then the acquired near infrared spectrum data of the sample to be measured are substituted into the calibration model to obtain the content of the substance to be measured in the sample to be measured. In the process of constructing the near infrared calibration model, the influence of light scattering, noise interference and the like is reduced by preprocessing the spectrum data, so that the prediction capability of the calibration model can be improved, and higher accuracy can be realized when the calibration model is used for near infrared spectrum detection. The design of the spectral preprocessing scheme is a key factor affecting the predictive power of the scaling model. The established calibration model can only be used for detecting the specific substances to be detected in the specific sample or the current period of time, the universality among the calibration models is poor, and the universality of a spectrum pretreatment scheme is also poor, so that the calibration model has become one of important factors for limiting the development of near infrared spectrum technology.
The common near infrared spectrum pretreatment methods are various, and when the method is used, the various methods are often required to be combined so as to achieve a better effect. At present, the method for designing the spectrum pretreatment scheme is to randomly select several combinations, the selection is often random, the most suitable pretreatment scheme cannot be guaranteed, and thus, the best calibration model obtained by a random test (real-and-error) has uncertainty and can not reach higher accuracy; it is also not known what pre-processing method affects the quality of the scaling model, whether to increase or process redundancy. If the influence effect of the spectrum data preprocessing mode on the scaling prediction model can be rapidly and effectively evaluated, the effective spectrum preprocessing mode can be selected, and the method has important significance.
Disclosure of Invention
In order to solve the technical problems that the existing design method of the near infrared spectrum pretreatment scheme is difficult to ensure accuracy and cannot reflect the influence effect of the pretreatment method on the quality of the calibration model, the invention provides a method for rapidly screening an effective spectrum pretreatment scheme of a near infrared calibration model and application thereof. The method can rapidly, accurately and efficiently evaluate the influence effect of each spectrum processing method on the quality of the calibration model, and the screened spectrum pretreatment scheme can ensure better effectiveness and can ensure that the finally established near infrared calibration model has higher prediction capability.
The specific technical scheme of the invention is as follows:
in a first aspect, the invention provides a method for rapidly screening an effective spectrum pretreatment scheme of a near infrared calibration model, comprising the following steps:
(1) Determining a spectrum preprocessing mode, setting 1-2 variables in each mode, setting 2 levels in each variable, and constructing an alternative spectrum preprocessing scheme according to the modes;
(2) Preprocessing near infrared spectrum data of a substance to be detected by adopting each alternative spectrum preprocessing scheme respectively, then constructing a near infrared calibration model of the content of the substance to be detected, and obtaining a prediction standard deviation SEP (standard error of prediction) of each near infrared calibration model through the prediction of an external verification set;
(3) Calculating the effect value E of each variable on the near infrared calibration model according to the prediction standard deviation SEP of each near infrared calibration model v The calculation formula is as follows:
wherein , and />All n of a variable in all the alternative spectral preprocessing schemes respectively 2 Average Log (SEP) of levels and all n 1 Log (SEP) mean of horizontal treatment;
(4) According to E v Screening to obtain an effective spectrum pretreatment scheme.
The invention constructs a method for rapidly screening an effective spectrum pretreatment scheme of a near infrared calibration model, by setting a plurality of spectrum pretreatment modes, setting 1-2 variables in each mode (namely a specific spectrum pretreatment method in each mode), setting 2 levels for each variable, designing a permutation and combination, further calculating the influence effect of each variable on the quality of the calibration model, and screening an effective spectrum data pretreatment mode according to the influence effect. By the method, the number of times of combination can be reduced, and the influence effect of variables (spectrum processing methods) on the quality of the calibration model can be rapidly, accurately and efficiently evaluated, so that technical support is provided for the establishment of a near-infrared calibration model of a substance to be detected, and the prediction capability of the near-infrared calibration model is effectively improved.
By calculating E in the present invention v Can characterize each variable pairInfluence effect of the vertical near infrared scaling model. E (E) v Negative values indicate that the variable is an effective spectrum pretreatment method capable of improving the quality of the calibration model; e (E) v Positive values indicate that the variable is a spectral preprocessing method that can degrade the quality of the scaled model; e (E) v The absolute value is near zero, indicating that the variable is an ineffective spectral pretreatment method. The determination coefficient RSQ can also represent the prediction capability of the near infrared calibration model to a certain extent, but has limited accuracy, the RSQ is high, and the calibration model is verified in a new prediction model to not necessarily have good prediction capability; and E employed in the present invention v The influence effect of each spectrum processing method on the quality of the calibration model can be reflected, and the accuracy of the evaluation of the prediction capability of the calibration model is higher.
Preferably, the specific process of step (1) comprises the following steps:
(1.1) determining the mode of spectral pretreatment, denoted X 1 、X 2 、……、X k Wherein k is the total number of spectral preprocessing modes;
(1.2) mode X 1 、X 2 、……、X k Respectively set 1-2 variables, which are marked as X i-j Where i is the i-th pattern, i=1, 2,3, … …, k, j is the j-th variable, j=1 or 2;
(1.3) setting n for each variable 1 、n 2 Two levels, where n 1 For comparison, n 2 For treatment;
(1.4) sequential use of Pattern X for near infrared Spectrum data 1 、X 2 、……、X k And (3) preprocessing, wherein 1-2 variables are set according to the step (1.2) in each mode, two levels are set according to the step (1.3) in each variable, and S alternative spectrum preprocessing schemes are constructed.
When 1 variable is set in each mode, s=2 k The method comprises the steps of carrying out a first treatment on the surface of the When 2 variables are set in each mode, s=2 k X 2k; when 2 variables are set in m modes and 1 variable is set in k-m modes, s=2 k ×2m。
Preferably, the specific process of step (4) comprises the following steps: at each ofSelection of E in mode v The obtained spectrum pretreatment scheme is an effective spectrum pretreatment scheme; selecting E in each mode v A variable that is negative and when E of each variable in a certain mode v When both are negative, E is selected in the mode v The variable with larger absolute value is the obtained spectrum pretreatment scheme which is the optimal spectrum pretreatment scheme.
Preferably, in step (1.1), the mode of the spectrum pretreatment is one or more of derivative treatment, light scattering correction and smoothing treatment, and k=1 to 3.
Preferably, in step (1.1), k=3, pattern X 1 、X 2 and X3 In this order, derivative processing, light scattering correction, and smoothing processing.
Preferably, in step (1.2), the variable set in the derivative processing mode is a first derivative and/or a second derivative, the variable set in the light scattering correction mode is a multiple scattering correction and/or a standard normal transformation, and the variable set in the smoothing processing mode is a primary smoothing processing and/or a secondary smoothing processing.
Preferably, the substance to be tested is a pyrroline ring-containing compound; in step (1.2), the variables set in the derivative processing mode are the first derivative and the second derivative, the variables set in the light scattering correction mode are the multiple scattering correction and the standard normal transformation, and the variables set in the smoothing processing mode are the primary smoothing processing.
Preferably, the substance to be detected is protein, or the near infrared spectrum data of the substance to be detected contains primary or secondary frequency multiplication absorption bands of N-H stretching bonds; in step (1.2), the variables set in the derivative processing mode are a first derivative and a second derivative, the variables set in the light scattering correction mode are a multiple scattering correction, and the variables set in the smoothing processing mode are a primary smoothing processing and a secondary smoothing processing.
The invention is based on theoretical research and a large number of experiments, and can reduce the alternative range of a spectrum pretreatment method aiming at some characteristic substances, such as: when the substance to be detected is a compound containing a pyrroline ring, compared with the secondary smoothing treatment, the quality of the near infrared calibration model can be improved to a greater extent by adopting the primary smoothing treatment; when the substance to be detected is protein or the near infrared spectrum data of the substance to be detected contains a primary or secondary frequency multiplication absorption band of N-H stretching bonds, compared with standard normal transformation, the light scattering correction is carried out by adopting multi-element scattering correction, so that the quality of the near infrared calibration model can be improved to a greater extent. Based on the findings, the number of the alternative spectrum pretreatment schemes can be reduced, and the screening of the spectrum pretreatment schemes is faster and more efficient while the prediction capability of the near infrared calibration model is ensured.
Preferably, in step (1.3), the control is no treatment.
In a second aspect, the invention provides the use of said method for constructing a near infrared calibration model.
Compared with the prior art, the invention has the following advantages:
(1) The method has the advantages of being quick, accurate and efficient, can reflect the influence effect of each spectrum processing method on the quality of the calibration model, has a good screening effect on the spectrum pretreatment scheme, and can ensure the effectiveness of the screened spectrum pretreatment scheme, so that the established near infrared bidding model has higher prediction capability, and is beneficial to improving the accuracy of the near infrared spectrum detection technology;
(2) Aiming at some characteristic substances, the invention shortens the alternative range of the spectrum pretreatment method while ensuring the effectiveness of the screened spectrum pretreatment scheme, so that the screening of the spectrum pretreatment scheme is faster and more efficient while ensuring the prediction capability of the near infrared calibration model.
Detailed Description
The invention is further described below with reference to examples.
The following description of the embodiments is provided to facilitate the understanding and use of the invention by those skilled in the art. It will be apparent to those skilled in the art that various modifications can be readily made to these embodiments and the generic principles described herein may be applied to other embodiments without the use of the inventive faculty. Therefore, the present invention is not limited to the following embodiments, and those skilled in the art, based on the present disclosure, should make improvements and modifications without departing from the scope of the present invention.
General examples
A method for rapidly screening an effective spectrum pretreatment scheme of a near infrared calibration model comprises the following steps:
(1) Determining the mode of spectral preprocessing, denoted X 1 、X 2 、……、X k Wherein k is the total number of spectral preprocessing modes;
(2) Mode X 1 、X 2 、……、X k Respectively set 1-2 variables, which are marked as X i-j Where i is the i-th pattern, i=1, 2,3, … …, k, j is the j-th variable, j=1 or 2;
(3) Each variable is provided with n respectively 1 、n 2 Two levels, where n 1 For comparison, n 2 For treatment;
(4) Mode X for near infrared spectrum data in turn 1 、X 2 、……、X k Preprocessing, wherein 1-2 variables are set in each mode according to the step (2), two levels are set in each variable according to the step (3), and S alternative spectrum preprocessing schemes are constructed;
(5) Preprocessing near infrared spectrum data of a substance to be detected by adopting each alternative spectrum preprocessing scheme obtained in the step (4), constructing a near infrared calibration model of the content of the substance to be detected by utilizing the preprocessed near infrared spectrum data, and obtaining a prediction standard deviation SEP of each near infrared calibration model through the prediction of an external verification set;
(6) Calculating the effect value E of each variable on the near infrared calibration model according to the prediction standard deviation SEP of each near infrared calibration model obtained in the step (5) v The calculation formula is as follows:
wherein ,for all the n of a variable in all the alternative spectral preprocessing schemes 2 Log (SEP) mean of levels; />For all the alternative spectral preprocessing schemes, all n of the variables 1 Log (SEP) mean of horizontal treatment;
(7) E obtained according to step (6) v And screening to obtain an effective spectrum pretreatment scheme from all the alternative spectrum pretreatment schemes.
As a specific embodiment, the specific process of step (7) includes the following steps: selecting E in each mode v The obtained spectrum pretreatment scheme is an effective spectrum pretreatment scheme; selecting E in each mode v A variable that is negative and when E of each variable in a certain mode v When both are negative, E is selected in the mode v The variable with larger absolute value is the obtained spectrum pretreatment scheme which is the optimal spectrum pretreatment scheme.
As a specific embodiment, in step (1), the mode of the spectrum pretreatment is one or more of derivative treatment, light scattering correction and smoothing treatment, and k=1 to 3.
In step (1), k=3, pattern X 1 、X 2 and X3 In this order, derivative processing, light scattering correction, and smoothing processing.
In step (2), the variables set in the derivative processing mode are first-order derivatives and/or second-order derivatives, the variables set in the light scattering correction mode are multiple scattering correction and/or standard normal transformation, and the variables set in the smoothing processing mode are primary smoothing processing and/or secondary smoothing processing.
As a specific embodiment, the substance to be tested is a compound containing a pyrroline ring; in the step (2), the variables set in the derivative processing mode are the first derivative and the second derivative, the variables set in the light scattering correction mode are the multiple scattering correction and the standard normal transformation, and the variables set in the smoothing processing mode are the primary smoothing processing.
As another specific embodiment, the near infrared spectrum data of the substance to be detected contains a primary or secondary frequency multiplication absorption band of an N-H stretching bond; in the step (2), the variables set in the derivative processing mode are the first derivative and the second derivative, the variables set in the light scattering correction mode are the multiple scattering correction, and the variables set in the smoothing processing mode are the primary smoothing processing and the secondary smoothing processing.
As another specific embodiment, the substance to be tested is a protein; in the step (2), the variables set in the derivative processing mode are the first derivative and the second derivative, the variables set in the light scattering correction mode are the multiple scattering correction, and the variables set in the smoothing processing mode are the primary smoothing processing and the secondary smoothing processing.
As a specific embodiment, in step (3), the control is no treatment.
The application of the method in constructing a near infrared calibration model.
Example 1
In the process of constructing a near infrared calibration model for detecting the content of 2-acetyl-1-pyrroline in rice brown rice flour, a spectrum pretreatment scheme is screened by the following steps:
(1) The mode of spectral preprocessing was determined, 3 in total, in order of mode 1 (X 1 ) For the mathematical derivative mode, mode 2 (X 2 ) For light scattering correction, mode 3 (X 3 ) Is smoothing processing.
(2) The three modes are respectively provided with 1-2 variables, and the specific settings are as follows: mode X 1 Setting a first derivative and a second derivative to process 2 variables; mode X 2 Setting 2 variables of multivariate scattering correction (multiplicative scatter correction, MSC) and standard normal transformation (standard normal variate, SNV); mode X 3 The smoothing process is set 1 variable at a time.
(3) Each mode was set to 2 levels, including control (i.e., no treatment, indicated by the symbol "-"), and treatment (indicated by the symbol "+"), as follows: control (no treatment-) and high level (first derivative + or second derivative+), are set in mode 1 (mathematical derivative mode); control (no treatment-) and high level (multiple scatter correction + or standard normal transformation+), are set in mode 2 (light scatter correction); the control (no treatment-) and high level (one smoothing+) are set in mode 3 (smoothing).
(4) The near infrared spectrum data is sequentially subjected to mathematical derivative, light scattering correction and smoothing, 2 variables are set in each mode according to the step (2), 1 variable is set in the mode 3, and 2 levels are set in each variable according to the step (3). Specific permutations and combinations are shown in table 1 below, for a total of 32 combinations, 14 combinations repeated, and 18 independent combinations, i.e., 18 alternative spectral pretreatment schemes.
(5) Collecting near infrared spectrum data of a rice brown rice calibration sample set, respectively preprocessing the near infrared spectrum data by adopting various alternative spectrum preprocessing schemes, constructing a near infrared calibration model for detecting the content of 2-acetyl-1-pyrroline in the rice brown rice by adopting the preprocessed near infrared spectrum data, and obtaining the prediction standard deviation (SEP) of each near infrared calibration model through the prediction of an external verification set as shown in the following table 1.
(6) From the calculation of Table 1, the average Log (SEP) value of the first derivative process in the mathematical derivative mode-1.0733), control->Is-0.9447, E v A value of-0.1286; e of second derivative processing v The value was-0.1726. Multi-component scatter correction process E in light scatter correction mode v A value of-0.055; standard normal transformation E v The value was-0.0595. One-time smoothing E in smoothing mode v The value is-0.0111.
Above-mentionedPretreatment of spectra for all alternativesIn the table, all high levels of Log (SEP) average for a variable; />Log (SEP) averages for all controls for this variable for all alternative spectral pretreatment schemes; effect value of the variable on the near infrared calibration model +.>
(7) E of variables v The values indicate that: mathematical derivative and 4 variables E in light Scattering correction mode v All negative, indicating that these 4 variables all have the effect of improving the quality of the scaled model, with 2 variables in the mathematical derivative pattern affecting a larger effect. Smoothing processing mode E v Also negative, but already near zero, indicating that this mode has essentially no effect and may not be used in constructing the calibration model for the substance to be measured. Thus, the effective spectral pretreatment scheme is: sequentially performing first-order or second-order derivative treatment, multi-component scattering correction treatment or standard normal transformation; the optimal spectrum pretreatment scheme is as follows: and sequentially performing second derivative processing and standard normal transformation.
The data in table 1 is statistically analyzed by adopting professional statistical software DPS through a Duncan new complex polar difference method with three factors and single repetition, and F values of 3 spectrum pretreatment modes of derivative treatment, smoothing treatment and light scattering correction are 3735, 2.16 and 653 respectively, and the difference is extremely obvious, not obvious and extremely obvious respectively, namely, the derivative treatment and light scattering treatment modes are effective modes, so that the accuracy of a calibration model of a substance to be detected can be improved. The smoothing effect is not significant and may not be employed. This result is the same as E calculated in this embodiment v The results of the value reflection are consistent.
TABLE 1
Example 2
In the process of constructing a near infrared calibration model for detecting protein content in rice brown rice flour, a spectral pretreatment scheme is screened by the following steps:
(1) The mode of spectral preprocessing was determined, 3 in total, in order of mode 1 (X 1 ) Digital derivative, mode 2 (X 2 ) For light scattering correction, mode 3 (X 3 ) Is smoothing processing.
(2) The three modes are respectively provided with 1-2 variables, and the specific settings are as follows: mode X 1 Setting a first derivative and a second derivative to process 2 variables; mode X 2 Setting 1 variable of multi-element scattering correction; mode X 3 The primary smoothing and the secondary smoothing processing are set for 2 variables.
(3) Each mode was set to 2 levels, including control (i.e., no treatment, indicated by the symbol "-"), and treatment (indicated by the symbol "+"), as follows: the control (no treatment-) and the treatment (first derivative + or second derivative+), are set in mode 1 (mathematical derivative mode); control (no treatment-) and treatment (multiple scatter correction+), are set in mode 2 (light scatter correction); the control (no treatment-) and the treatment (primary smoothing + or secondary smoothing +) are set in mode 3 (smoothing).
(4) The near infrared spectrum data is sequentially subjected to mathematical derivative, light scattering correction and smoothing, and 2 variables are set in each mode according to the step (2), 1 variable is set in the mode 2, and 2 levels are set in each variable according to the step (3). Specific permutations and combinations as shown in table 2 below, there were a total of 32 combinations, 14 combinations repeated, and one independent combination, i.e., 18 alternative spectral pretreatment schemes.
(5) Collecting near infrared spectrum data of a rice brown rice calibration sample set, respectively preprocessing the near infrared spectrum data by adopting various alternative spectrum preprocessing schemes, constructing a near infrared calibration model for detecting protein content in the rice brown rice by adopting the preprocessed near infrared spectrum data, and obtaining prediction standard deviation (SEP) of each near infrared calibration model through the prediction of an external verification set as shown in the following table 2.
(6) From the calculation of Table 2, the average Log (SEP) value of the first derivative process in the mathematical derivative mode-0.5338), control->Is-0.5187, E v A value of-0.015; e of second derivative processing v The value was-0.0135. Multi-component scatter correction process E in light scatter correction mode v The value was-0.094. Primary and secondary smoothing E in smoothing mode v The values were 0.005 and 0.006, respectively.
Above-mentionedAll high level Log (SEP) averages for a variable in all alternative spectral pretreatment schemes; />Log (SEP) averages for all controls for this variable for all alternative spectral pretreatment schemes; effect value of the variable on the near infrared calibration model +.>
(7) E of variables v The values indicate that: mathematical derivative and 3 variables E in light Scattering correction mode v All negative values indicate that all 3 variables have the effect of improving the quality of the calibration model, wherein the effect of the multi-element scattering correction on the calibration model is larger. Primary and secondary smoothing E in smoothing mode v Near zero, this mode is shown to have substantially no effect and may not be used in constructing the calibration model for the substance to be measured. Thus, the effective spectral pretreatment scheme is: sequentially performing first-order or second-order derivative treatment and multi-element scattering correction treatment; the optimal spectrum pretreatment scheme is as follows: the first derivative processing and the multi-component scattering correction processing are sequentially carried out.
The data in table 2 is statistically analyzed by adopting professional statistical software DPS through a Duncan new complex polar difference method with three factors and single repetition, and F values of 3 spectrum pretreatment modes of derivative treatment, smoothing treatment and light scattering correction are 90.4,1.3 and 551 respectively, and the differences are extremely obvious and not obvious and extremely obvious respectively, namely, the derivative treatment and the light scattering treatment modes are effective modes, so that the accuracy of a calibration model of a substance to be detected can be improved. The smoothing effect is not significant and may not be employed. This result is compared with our calculated E v And consistent.
TABLE 2
Comparative example 1
In the process of constructing a near infrared calibration model for detecting the content of 2-acetyl-1-pyrroline in rice brown rice flour, a spectrum pretreatment scheme is screened by the following steps:
(1) The mode of spectral preprocessing was determined, 3 in total, in order of mode 1 (X 1 ) For the mathematical derivative mode, mode 2 (X 2 ) For light scattering correction, mode 3 (X 3 ) Is smoothing processing.
(2) The three modes are respectively provided with 1-2 variables, and the specific settings are as follows: mode X 1 Setting and not processing; mode X 2 Setting 2 variables of multivariate scattering correction (multiplicative scatter correction, MSC) and standard normal transformation (standard normalvariate, SNV); mode X 3 Setting 2 variables of primary and secondary smoothing.
(3) Each mode was set to 2 levels, including control (i.e., no treatment, indicated by the symbol "-"), and treatment (indicated by the symbol "+"), as follows: control (no treatment-) and high level (multiple scatter correction + or standard normal transformation+), are set in mode 2 (light scatter correction); the control (no treatment-) and high level (primary smoothing + or secondary smoothing +) are set in mode 3 (smoothing treatment).
(4) And (3) sequentially performing mathematical derivative, light scattering correction and smoothing on the near infrared spectrum data, wherein each of modes 1,2 and 3 is provided with 1-2 variables according to the step (2), and each variable is provided with 2 levels according to the step (3). The specific permutation and combination are shown in the numbers (1) to (2) in the following table 3, and there are 16 combinations in total, 6 combinations are repeated, and 10 independent combinations are provided, namely 10 alternative spectrum pretreatment schemes; at the same time, 9 combinations shown in the number (3) in table 3 were set for examining the application effect of the derivative processing on the improvement of the quality of the calibration model.
(5) Near infrared spectrum data of a rice brown rice calibration sample set (same as in example 1) are collected, each of the alternative spectrum pretreatment schemes is adopted to pretreat the near infrared spectrum data, a near infrared calibration model for detecting protein content in the rice brown rice is constructed by adopting the pretreated near infrared spectrum data, and a prediction standard deviation (SEP) of each near infrared calibration model is obtained through the prediction of an external verification set (same as in example 1) as shown in the following table 3.
(6) From Table 3, it is calculated that the multiple scattering correction processing E in the light scattering correction mode v A value of-0.1014; standard normal transformation E v The value is-0.0723. Primary and secondary smoothing E in smoothing mode v The values were 0.0043 and-0.0009, respectively. (E) v The values were calculated from the first 16 combinations in table 3. The first 16 combinations were mathematical treatments with pattern 1 set only for control and no treatment, thus compared to example 1. The last 9 combinations were all subjected to a mathematical first derivative treatment, which was compared with the first 16 combinations to investigate the effect of the derivative treatment on the application of the improved quality of the scaled model. ) Above-mentionedAll high level Log (SEP) averages for a variable in all alternative spectral pretreatment schemes; />Pretreatment of spectra for all alternativesIn the protocol, log (SEP) averages for all controls for this variable; effect value of the variable on the near infrared calibration model +.>
(7) E of variables v The values indicate that: 2 variables E in light Scattering correction mode v All are negative values, and have the effect of improving the quality of the calibration model. Secondary smoothing mode E v Also negative but already close to zero; primary smoothing mode E v Positive values indicate that this mode has substantially no effect and may not be used in constructing a calibration model for the substance to be measured. This result is compared with E calculated in this example 1 v The validity of the value-reflected light scattering correction model and the results of the smoothing process, which were not effective, agree. Meanwhile, the determination coefficient RSQ index of the scaling model can be seen: x in this comparative example 1 Without mathematical manipulation, the highest RSQ value was 0.934, and values significantly lower than those after the first derivative used in the comparative example were all greater than 0.950, again verifying that the mathematical manipulation of the results in example 1, either the first derivative or the second derivative, was effective for scaling model improvement.
TABLE 3 Table 3
Comparative example 2
In the process of constructing a near infrared calibration model for detecting protein content in rice brown rice flour, a spectral pretreatment scheme is screened by the following steps:
(1) The mode of spectral preprocessing was determined, 3 in total, in order of mode 1 (X 1 ) For the mathematical derivative mode, mode 2 (X 2 ) For light scattering correction, mode 3 (X 3 ) Is smoothing processing.
(2) Three modes are respectively provided withSetting 1-2 variables, and specifically setting as follows: mode X 1 Setting and not processing; mode X 2 Setting 2 variables of multivariate scattering correction (multiplicative scatter correction, MSC) and standard normal transformation (standard normal variate, SNV); mode X 3 The smoothing process is set 1 variable at a time.
(3) Each mode was set to 2 levels, including control (i.e., no treatment, indicated by the symbol "-"), and treatment (indicated by the symbol "+"), as follows: control (no treatment-) and high level (multiple scatter correction + or standard normal transformation+), are set in mode 2 (light scatter correction); the control (no treatment-) and high level (primary smoothing + or secondary smoothing +) are set in mode 3 (smoothing treatment).
(4) And (3) sequentially performing mathematical derivative, light scattering correction and smoothing on the near infrared spectrum data, wherein each of modes 1,2 and 3 is provided with 1-2 variables according to the step (2), and each variable is provided with 2 levels according to the step (3). Specific permutation and combination are shown in the numbers (1) to (2) in the following table 4, and there are 16 combinations in total, 7 combinations are repeated, and 9 independent combinations, namely 9 alternative spectral pretreatment schemes.
(5) Collecting near infrared spectrum data of a rice brown rice calibration sample set, respectively preprocessing the near infrared spectrum data by adopting various alternative spectrum preprocessing schemes, constructing a near infrared calibration model for detecting protein content in the rice brown rice by adopting the preprocessed near infrared spectrum data, and obtaining prediction standard deviation (SEP) of each near infrared calibration model through the prediction of an external verification set, wherein the prediction standard deviation (SEP) of each near infrared calibration model is shown in the following table 4; at the same time, 9 combinations shown by the number (3) in table 4 were set for examining the application effect of the derivative processing on the improvement of the quality of the calibration model.
(6) From Table 4, it is calculated that the multiple scattering correction processing E in the light scattering correction mode v A value of-0.0598; standard normal transformation E v The value was-0.0286. Primary and secondary smoothing E in smoothing mode v The values were 0.0037 and 0.0025, respectively. (E) v The values were calculated from the first 16 combinations in table 4. The previous 16 combinations were set with the mathematical treatment of mode 1 being only control untreated, thus comparing with the comparative example 1Compared with the prior art. The last 9 combinations were all subjected to a mathematical first derivative treatment, which was compared with the first 16 combinations to investigate the effect of the derivative treatment on the application of the improved quality of the scaled model. ) Above-mentionedAll high level Log (SEP) averages for a variable in all alternative spectral pretreatment schemes; />Log (SEP) averages for all controls for this variable for all alternative spectral pretreatment schemes; effect value of the variable on the near infrared calibration model +.>
(7) E of variables v The values indicate that: 2 variables E in light Scattering correction mode v All are negative values, and have the effect of improving the quality of the calibration model, and the effect of the multi-element scattering correction processing is superior to that of standard normal transformation. Primary and secondary smoothing mode E v Positive values indicate that this mode has substantially no effect and may not be used in constructing a calibration model for the substance to be measured. This result is compared with E calculated in this example 2 v The validity of the value-reflected light scattering correction model and the results of the smoothing process, which were not effective, agree. Meanwhile, the determination coefficient RSQ index of the scaling model can be seen: x in this comparative example 1 Without mathematical treatment, the highest RSQ value was 0.9352, significantly lower than the mean 0.9456 after the first derivative used in the comparative example, again verifying that the mathematical treatment of the results in example 1, either the first derivative or the second derivative, was effective for scaling model improvement.
TABLE 4 Table 4
In the present invention, unless otherwise indicated, scientific and technical terms used herein have the meanings commonly understood by one of ordinary skill in the art. Moreover, the experimental methods used herein, unless otherwise specified, are all conventional; the reagents, biological materials and apparatus used, unless otherwise indicated, are all commercially available.
The foregoing description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and any simple modification, variation and equivalent transformation of the above embodiment according to the technical substance of the present invention still fall within the scope of the technical solution of the present invention.

Claims (10)

1. A method for rapidly screening an effective spectrum pretreatment scheme of a near infrared calibration model, which is characterized by comprising the following steps:
(1) Determining a spectrum preprocessing mode, setting 1-2 variables in each mode, setting 2 levels in each variable, and constructing an alternative spectrum preprocessing scheme according to the modes;
(2) Preprocessing near infrared spectrum data of a substance to be detected by adopting each alternative spectrum preprocessing scheme respectively, then constructing a near infrared calibration model of the content of the substance to be detected, and obtaining a prediction standard deviation SEP of each near infrared calibration model through the prediction of an external verification set;
(3) Calculating the effect value E of each variable on the near infrared calibration model according to the prediction standard deviation SEP of each near infrared calibration model v The calculation formula is as follows:
wherein , and />All n of a variable in all the alternative spectral preprocessing schemes respectively 2 Average Log (SEP) of levels and all n 1 Log (SEP) mean of horizontal treatment;
(4) According to E v Screening to obtain an effective spectrum pretreatment scheme.
2. The method of claim 1, wherein the specific process of step (1) comprises the steps of:
(1.1) determining the mode of spectral pretreatment, denoted X 1 、X 2 、……、X k Wherein k is the total number of spectral preprocessing modes;
(1.2) mode X 1 、X 2 、……、X k Respectively set 1-2 variables, which are marked as X i-j Where i is the i-th pattern, i=1, 2,3, … …, k, j is the j-th variable, j=1 or 2;
(1.3) setting n for each variable 1 、n 2 Two levels, where n 1 For comparison, n 2 For treatment;
(1.4) sequential use of Pattern X for near infrared Spectrum data 1 、X 2 、……、X k And (3) preprocessing, wherein 1-2 variables are set according to the step (1.2) in each mode, two levels are set according to the step (1.3) in each variable, and S alternative spectrum preprocessing schemes are constructed.
3. The method of claim 1, wherein the specific process of step (4) comprises the steps of: selecting E in each mode v The obtained spectrum pretreatment scheme is an effective spectrum pretreatment scheme; selecting E in each mode v A variable that is negative and when E of each variable in a certain mode v When both are negative, E is selected in the mode v The variable with larger absolute value is the obtained spectrum pretreatment scheme which is the optimal spectrum pretreatment scheme.
4. The method of claim 2, wherein in step (1.1), the mode of spectral preprocessing is one or more of derivative processing, light scattering correction, and smoothing processing, and k = 1-3.
5. The method of claim 4, wherein in step (1.1), k=3, mode X 1 、X 2 and X3 In this order, derivative processing, light scattering correction, and smoothing processing.
6. The method according to claim 4 or 5, wherein in the step (1.2), the variable set in the derivative processing mode is a first derivative and/or a second derivative, the variable set in the light scattering correction mode is a multiple scattering correction and/or a standard normal transformation, and the variable set in the smoothing processing mode is a primary smoothing processing and/or a secondary smoothing processing.
7. The method of claim 6, wherein the test substance is a pyrroline ring-containing compound; in the step (2), the variables set in the derivative processing mode are the first derivative and the second derivative, the variables set in the light scattering correction mode are the multiple scattering correction and the standard normal transformation, and the variables set in the smoothing processing mode are the primary smoothing processing.
8. The method of claim 6, wherein the test substance is a protein or the near infrared spectral data of the test substance contains primary or secondary frequency doubling absorption bands of N-H stretch bonds; in step (1.2), the variables set in the derivative processing mode are a first derivative and a second derivative, the variables set in the light scattering correction mode are a multiple scattering correction, and the variables set in the smoothing processing mode are a primary smoothing processing and a secondary smoothing processing.
9. The method of claim 2, wherein in step (1.3), the control is no treatment.
10. Use of the method according to one of claims 1 to 9 for constructing a near infrared calibration model.
CN202310733507.7A 2023-06-20 2023-06-20 Method for rapidly screening effective spectrum pretreatment scheme of near-infrared calibration model and application thereof Pending CN116879224A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310733507.7A CN116879224A (en) 2023-06-20 2023-06-20 Method for rapidly screening effective spectrum pretreatment scheme of near-infrared calibration model and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310733507.7A CN116879224A (en) 2023-06-20 2023-06-20 Method for rapidly screening effective spectrum pretreatment scheme of near-infrared calibration model and application thereof

Publications (1)

Publication Number Publication Date
CN116879224A true CN116879224A (en) 2023-10-13

Family

ID=88261232

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310733507.7A Pending CN116879224A (en) 2023-06-20 2023-06-20 Method for rapidly screening effective spectrum pretreatment scheme of near-infrared calibration model and application thereof

Country Status (1)

Country Link
CN (1) CN116879224A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117093841A (en) * 2023-10-18 2023-11-21 中国科学院合肥物质科学研究院 Abnormal spectrum screening model determining method, device and medium for wheat transmission spectrum

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117093841A (en) * 2023-10-18 2023-11-21 中国科学院合肥物质科学研究院 Abnormal spectrum screening model determining method, device and medium for wheat transmission spectrum
CN117093841B (en) * 2023-10-18 2024-02-09 中国科学院合肥物质科学研究院 Abnormal spectrum screening model determining method, device and medium for wheat transmission spectrum

Similar Documents

Publication Publication Date Title
US7570357B2 (en) Visible/near-infrared spectrometry and its device
CN105630743A (en) Spectrum wave number selection method
CN105136714B (en) A kind of tera-hertz spectra Wavelength selecting method based on genetic algorithm
CN116879224A (en) Method for rapidly screening effective spectrum pretreatment scheme of near-infrared calibration model and application thereof
Scott et al. Why do we need 14C inter-comparisons?: The Glasgow-14C inter-comparison series, a reflection over 30 years
US20210247367A1 (en) Workflow-based model optimization method for vibrational spectral analysis
CN112179871B (en) Method for nondestructive detection of caprolactam content in sauce food
CN110749565A (en) Method for rapidly identifying storage years of Pu' er tea
CN104730032A (en) Near infrared spectrum-based mathematical model and detection method for heat-treated wood color
CN109270022B (en) Waveband selection method of near-infrared spectrum model and model construction method
Xu et al. MCCV stacked regression for model combination and fast spectral interval selection in multivariate calibration
CN116559110A (en) Self-adaptive near infrared spectrum transformation method based on correlation and Gaussian curve fitting
CN104931450A (en) Method for predicting mechanical strength of heat-treated wood
CN108120694B (en) Multi-element correction method and system for chemical component analysis of sun-cured red tobacco
CN114624402B (en) Quality evaluation method for snail rice noodle sour bamboo shoots based on near infrared spectrum
JPH0792433B2 (en) Measuring method for sugar content of fruits and vegetables and sugar content measuring device
CN114062306B (en) Near infrared spectrum data segmentation preprocessing method
CN115436315A (en) Near infrared spectrum-based cement additive detection method
CN113049526B (en) Corn seed moisture content determination method based on terahertz attenuated total reflection
CN113075197A (en) Method for rapidly detecting adulteration content of corn flour in white pepper powder
CN112763448A (en) ATR-FTIR technology-based method for rapidly detecting content of polysaccharides in rice bran
CN107167447A (en) The method for blending apple fumet content in cider is calculated using near-infrared spectrum technique
Riau et al. Selection of compound group to identify the authenticity one of jamu product using the group lasso for logistic regression
JP2002506991A (en) Automatic calibration method
CN113125378A (en) Near infrared spectrum-based method for rapidly detecting nutritional components in camel meat at different parts

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination