CN116482037A - Model construction method and device for estimating nutritional ingredient content of wolfberry based on hyperspectrum - Google Patents

Model construction method and device for estimating nutritional ingredient content of wolfberry based on hyperspectrum Download PDF

Info

Publication number
CN116482037A
CN116482037A CN202310309351.XA CN202310309351A CN116482037A CN 116482037 A CN116482037 A CN 116482037A CN 202310309351 A CN202310309351 A CN 202310309351A CN 116482037 A CN116482037 A CN 116482037A
Authority
CN
China
Prior art keywords
medlar
wavelength
hyperspectral
spectral reflectance
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310309351.XA
Other languages
Chinese (zh)
Inventor
赵金龙
张学艺
李阳
王云霞
张婍
南学军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NINGXIA HUI AUTONOMOUS REGION METEOROLOGICAL SCIENCE INSTITUTE
Original Assignee
NINGXIA HUI AUTONOMOUS REGION METEOROLOGICAL SCIENCE INSTITUTE
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NINGXIA HUI AUTONOMOUS REGION METEOROLOGICAL SCIENCE INSTITUTE filed Critical NINGXIA HUI AUTONOMOUS REGION METEOROLOGICAL SCIENCE INSTITUTE
Priority to CN202310309351.XA priority Critical patent/CN116482037A/en
Publication of CN116482037A publication Critical patent/CN116482037A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/47Scattering, i.e. diffuse reflection
    • G01N21/4738Diffuse reflection, e.g. also for testing fluids, fibrous materials
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A40/00Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
    • Y02A40/10Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in agriculture

Landscapes

  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Investigating Or Analysing Materials By Optical Means (AREA)

Abstract

The invention relates to a model construction method and a device for estimating the content of nutrient components of medlar based on hyperspectrum, which are characterized in that original spectral reflectance data of a medlar dry fruit powder sample is obtained by repeatedly observing a spectrometer for a plurality of times, and an average value is calculated and used as the spectral reflectance data of the sample; preprocessing the spectral reflectance data, and randomly dividing the preprocessed spectral reflectance data into a training set and a verification set according to a certain proportion; selecting a sensitive wave band from the training set by adopting a continuous projection algorithm, and establishing a hyperspectral estimation model of the content of the main nutrient components of the medlar by adopting a partial least squares regression method based on the sensitive wave band; and carrying out accuracy verification on the constructed hyperspectral estimation model by using the verification set. The method is based on hyperspectral technology and a multivariate statistical analysis method to estimate the content of the nutritional ingredients of the medlar, solves the problems of time consumption, labor consumption, high cost and the like of the traditional chemical detection method, and realizes quick and nondestructive detection of the medlar quality.

Description

Model construction method and device for estimating nutritional ingredient content of wolfberry based on hyperspectrum
Technical Field
The invention relates to the technical field of food nutrition, in particular to a model construction method and device for estimating the content of nutritional ingredients of medlar based on hyperspectrum.
Background
The Chinese wolfberry (Lycium chinense Mill.) is a fallen leaf shrub of Lycium (Lycium L.) of Solanaceae, and has the planting time of 600 years in northwest China, and the dried mature fruit is called fructus Lycii in Chinese medicine, and has effects of nourishing liver and kidney, internal heat quenching thirst, reducing blood sugar, and resisting cancer. Since the 21 st century, the Chinese food consumer market gradually shows a development trend of health, safety and high quality along with the rapid development of Chinese economy and society, the improvement of the living standard of people and the continuous deepening of structural reform at the supply side. Moreover, as the health consciousness of people increases, wolfberry, which is a medicine as well as a functional food, is becoming popular in europe and north america, and many commercialized products enter the market for health foods under the name of wolfberry. The method can quickly and accurately master the content of the nutritional ingredients of the medlar, can help a manager to know the nutritional status of the medlar tree in time and scientifically fertilize, and has important reference value for consumers to discriminate the quality of the medlar. However, the traditional medlar nutrient component determination mainly adopts a chemical detection method, and the determination process is long and complex.
Disclosure of Invention
Aiming at the technical problems existing in the prior art, the invention provides a model construction method and device for estimating the content of the nutrient components of the medlar based on hyperspectral technology and a multivariate statistical analysis method, which are used for estimating the content of the nutrient components of the medlar based on hyperspectral technology, solving the problems of time consumption, labor consumption, high cost and the like of the traditional chemical detection method and realizing quick and nondestructive detection of the quality of the medlar.
The technical scheme for solving the technical problems is as follows:
in a first aspect, the present invention provides a method for constructing a model for estimating nutritional ingredient content of wolfberry based on hyperspectrum, comprising:
repeatedly observing for multiple times by using a spectrometer to obtain original spectral reflectance data of the dried fruit powder sample of the medlar, and calculating an average value to be used as the spectral reflectance data of the sample;
preprocessing the spectral reflectance data, and randomly dividing the preprocessed spectral reflectance data into a training set and a verification set according to a certain proportion; wherein, the number of training set samples is 70-80% of all samples, and the number of verification set samples is 20-30% of all samples;
selecting a sensitive wave band from the training set by adopting a continuous projection algorithm, and establishing a hyperspectral estimation model of the content of main nutritional ingredients of the medlar by adopting a partial least squares regression method based on the sensitive wave band, wherein the main nutritional ingredients of the medlar comprise total sugar, polysaccharide, crude protein, manganese, copper, iron, zinc and calcium;
and carrying out accuracy verification on the constructed hyperspectral estimation model by using the verification set.
Further, a continuous projection algorithm is adopted to select a sensitive wave band from the training set, and a hyperspectral estimation model of the content of main nutrient components of the medlar is established by adopting a partial least squares regression method based on the sensitive wave band, and the hyperspectral estimation model comprises the following steps:
constructing a spectral matrix X from the spectral reflectance data n×p And a sample property parameter vector, n is the sample capacity, p is the full wavelength number;
based on the spectrum matrix X, using an orthogonal projection algorithm to complete p wavelength groups, wherein each wavelength group comprises M wavelengths, M is less than or equal to min (n, p), and p multiplied by M wavelength variable subspaces are constructed
Selecting partial least square regression method and adopting one-leave-one method cross verification method, and changing the subspace from p×M wavelengths by taking the minimum RMSECV as the targetSelecting the optimal wavelength packet and the mostOptimal wavelength point.
Further, based on the spectrum matrix X, p wavelength groups are completed by using an orthogonal projection algorithm, each wavelength group contains M wavelengths, M is less than or equal to min (n, p), and p multiplied by M wavelength variable subspaces are constructedComprising the following steps:
s301, let i=1, assign X to the kth column of the spectral matrix X k(1) I.e. k (1) =k, x k(1) =x k And let z 1 =x k(1)
S302, each wavelength vector x which is not selected yet is selected j The position set of (2) is denoted as S i
S303, based on z i Constructing orthogonal projection operatorsWherein I is an n×n identity matrix;
s304, calculating each x j Orthogonal projection vectors of (a)And selecting wavelength positions from them
S305, let i=i+1,if i<M, return to S302 to select the next wavelength vector.
Further, the cross-validation root mean square error RMSECV is calculated as follows:
wherein, RMSECV k,i Cross-validating root mean square error for the kth wavelength packet containing the first i wavelength variables; y is j Andand respectively verifying the actual measurement value and the model prediction value of the sample property parameter vector for the j th test.
Further, performing accuracy verification on the constructed hyperspectral estimation model by using the verification set comprises the following steps: the precision inspection index selects a determination coefficient R 2 Root mean square error RMSE, relative analysis error RPD; the calculation formula is as follows:
wherein:representing the predicted value of the nutrient content of the medlar; y is i Expressing the measured value of the nutrient content of the medlar; />Representing the average value of measured values of the nutrient content of the medlar; n represents the number of samples of the predicted set of medlar; SD (secure digital memory card) p Representing standard deviation of the wolfberry predictive set samples; RMSE p The root mean square error of the matrimony vine prediction set samples is represented.
Further, preprocessing the spectral reflectance data includes: performing first-stage pretreatment on the spectral reflectance data by using SG smooth filtering, and then selecting different pretreatment methods for performing second-stage pretreatment on the SG smooth filtered data according to different nutritional components to be estimated; the second stage preprocessing method comprises differential transformation, standard normal variable transformation and multiple scattering correction.
In a second aspect, the present invention provides a model construction apparatus for estimating nutritional ingredient content of wolfberry based on hyperspectrum, comprising:
the data acquisition module is used for repeatedly observing for a plurality of times by utilizing a spectrometer to acquire original spectral reflectance data of the dried fruit powder sample of the medlar, and calculating an average value to be used as the spectral reflectance data of the sample;
the preprocessing module is used for preprocessing the spectral reflectivity data and randomly dividing the preprocessed spectral reflectivity data into a training set and a verification set according to a certain proportion; wherein, the number of training set samples is 70-80% of all samples, and the number of verification set samples is 20-30% of all samples;
the model construction module adopts a continuous projection algorithm to select a sensitive wave band from the training set; based on the sensitive wave band, a hyperspectral estimation model of the content of main nutrient components of the medlar is established by adopting a partial least square regression method, wherein the main nutrient components of the medlar comprise total sugar, polysaccharide, crude protein, manganese, copper, iron, zinc and calcium;
and the verification module is used for carrying out accuracy verification on the constructed hyperspectral estimation model by utilizing the verification set.
Further, the system comprises a first-stage pretreatment module and a second-stage pretreatment module;
the first-stage preprocessing module is used for carrying out first-stage preprocessing on the spectral reflectivity data by using SG smoothing filtering;
the second-stage preprocessing module is used for carrying out second-stage preprocessing on the data after the SG is smoothed and filtered, and selecting different preprocessing methods according to different nutritional components to be estimated; the second stage preprocessing method comprises differential transformation, standard normal variable transformation and multiple scattering correction.
In a third aspect, the present invention provides an electronic device comprising:
a memory for storing a computer software program;
and the processor is used for reading and executing the computer software program so as to realize the model construction method for estimating the nutritional ingredient content of the medlar based on hyperspectrum.
In a fourth aspect, the present invention provides a non-transitory computer readable storage medium, where a computer software program is stored, where the computer software program when executed by a processor implements a model construction method for estimating nutritional ingredient content of wolfberry based on hyperspectrum according to the first aspect of the present invention.
The beneficial effects of the invention are as follows: according to the invention, the indoor hyperspectral data of the dried wolfberry fruit sample are obtained, different preprocessing is carried out on the original data, a characteristic variable screening method is adopted to extract sensitive wave bands, and an estimation model of main nutritional ingredients of the wolfberry fruit is established. The invention discovers the sensitive wave band of the main nutrient components (total sugar, polysaccharide, crude protein, manganese, copper, iron, zinc and calcium) of the medlar through a large number of experimental comparisons, and the selected sensitive wave band has close connection with the chemical bond of the main nutrient components of the medlar, thereby having better indication effect on the quality of the medlar. By screening the sensitive wave bands, the complexity of the model is greatly reduced, the modeling efficiency is improved, and the interpretability of the model is enhanced. In addition, compared with the traditional chemical determination, the hyperspectral technology adopted by the invention has the obvious advantages of simple operation and no need of chemical reagents, and provides a new thought and a new method for rapidly determining the nutrient components of the medlar.
Drawings
FIG. 1 is a schematic flow chart of a model construction method for estimating nutritional ingredient content of wolfberry based on hyperspectrum, which is provided by the embodiment of the invention;
FIG. 2 is a schematic diagram of a model construction device for estimating nutritional ingredient content of wolfberry based on hyperspectrum, which is provided by the embodiment of the invention;
fig. 3 is a schematic diagram of an embodiment of an electronic device according to an embodiment of the present invention;
fig. 4 is a schematic diagram of an embodiment of a computer readable storage medium according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
In the description of the present application, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more of the described features. In the description of the present application, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.
In the description of the present application, the term "for example" is used to mean "serving as an example, instance, or illustration. Any embodiment described herein as "for example" is not necessarily to be construed as preferred or advantageous over other embodiments. The following description is presented to enable any person skilled in the art to make and use the invention. In the following description, details are set forth for purposes of explanation. It will be apparent to one of ordinary skill in the art that the present invention may be practiced without these specific details. In other instances, well-known structures and processes have not been described in detail so as not to obscure the description of the invention with unnecessary detail. Thus, the present invention is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.
The hyperspectral technology has the advantages of multiband (350-2500 nm) and high resolution (3-10 nm), and the obtained diffuse reflection spectrum contains rich information of reflector structures and compositions, so that an effective way is provided for efficiently and accurately estimating the content of nutrient components of the medlar. The hyperspectral technology is a measurement method capable of replacing traditional chemistry, and has important application value in the aspect of rapidly and quantitatively detecting the nutrient components of agricultural products.
The differences in the channel characteristics of the nutrient components of the medlar in different areas are the most important factors for determining the reflection spectrum characteristics of the medlar. In the aspect of high spectrum detection of nutrient components of medlar, the prior art mainly aims at components such as medlar total sugar, polysaccharide, vitamin C, total antioxidant, phenols, anthocyanin, soluble solid content, total acidity and the like, while technical researches on crude proteins and mineral elements are mainly concentrated on pasture, rice, wheat, avocado, orange, nuts and the like, and the technical researches on medlar are reported. In order to fully exert the advantages of the hyperspectral technology in the aspect of nondestructive detection of crop nutrient components, the problems of time consumption, labor consumption, high cost and the like of the traditional chemical detection method are solved. The invention aims at quick and nondestructive detection of nutritional ingredients of medlar, utilizes the hyperspectral data and a small amount of chemical measurement data of medlar collected indoors, deeply digs the spectral response characteristics and the optimal sensitive wave bands of main nutritional ingredients (such as total sugar, polysaccharide, crude protein, manganese, copper, iron, zinc and calcium) of medlar by preprocessing the spectral data and selecting sensitive wave bands, and invents a method for estimating the content of the nutritional ingredients of medlar based on a hyperspectral technology and a multivariate statistical analysis method. The invention provides a method for realizing quick and nondestructive detection of wolfberry quality, which aims to solve the problems of time consumption, labor consumption, high cost and the like of the traditional chemical detection method.
The following examples of the present invention further describe the process of the present invention using Ningxia main cultivated wolfberry variety ('Ningqi No. 7').
Medlar sample collection
In this embodiment, the variety of fructus Lycii, sampling location, sampling time, and sampling mode are defined. The sampling place is recommended to be a main production area of Ningxia wolfberry, such as Huinong, yinchuan, zhongning, concentric, original fixation and the like. The sampling time is recommended to be selected from the current mature period of the medlar summer fruits, and sampling points of different batches are fixed. The sampling mode is to select plants with similar ages when sampling, randomly sample the plants from the upper part, the middle part and the lower part of the canopy, put the samples into envelopes, and place the samples in a refrigerator to be rapidly transported to a laboratory.
Dried medlar
In order to reduce the interference of external factors on the determination of the nutrient content of the medlar, na adopting the traditional process is not recommended in the airing process 2 CO 3 The solution is soaked to remove the wax layer on the surface of the fresh medlar, and the whole process is suggested to be carried out on a special wooden plate completely by adopting a natural airing mode. And naturally airing the fresh fruits to constant weight, removing the fruit stalks and the bad fruits, putting the fresh fruits in a 60 ℃ oven for continuous drying, and ending the drying when the moisture content of the wet base of the medlar is less than 13%. Pulverizing into powder with high-speed pulverizer, bagging, and storing in a drying vessel to prevent reversion. The total sample amount is preferably above 50 bags, and the weight of each bag of sample is about 250g.
Wolfberry spectral data acquisition
Each bag of dried fruit powder sample of medlar is divided into 2 parts by a culture dish with the thickness of 90mm multiplied by 90mm, wherein one part has the thickness of about 10mm, has a smooth and even surface and is used for spectrum measurement, and the other part is used for measuring the content of nutrient components. The crushed medlar sample reduces the influence of the surface fold texture and shadow of the dried fruits on the reflectivity, and is more suitable for establishing hyperspectral estimation models aiming at different nutritional ingredients. The spectrum measurement is carried out by a spectrometer, and the detection wave band is optimal in the range of 350-2500 nm. In order to ensure the quality of spectrum observation data and improve modeling accuracy, the instrument is preheated for 30min before measurement, the probe is vertically downward when measuring, the distance from the surface of the medlar is about 10cm, the vertical height of the light source lamp is about 30cm, and the zenith angle is about 15 degrees. The instrument was calibrated for white board before each spectral measurement, and the measurement was performed in a dark room using a 100W tungsten halogen lamp. Each sample was repeatedly measured 5 to 10 times and the arithmetic average was taken as the actual spectral reflectance of the sample.
Obtaining measured value of nutrient components of Chinese wolfberry
The method comprises the steps of selecting main nutritional ingredients of the medlar, such as total sugar, polysaccharide, crude protein, manganese, copper, iron, zinc, calcium and the like, referring to the related requirements of various index measurement methods in the national standard of China, acquiring actual measurement data of various indexes in a laboratory, and taking the actual measurement data as a true value of the nutritional ingredient content of the medlar.
Pretreatment of medlar spectral data
Firstly, extracting original spectral reflectance data of the surface of the medlar obtained by a spectrometer. Since the raw data format obtained by the instrument is typically ASD or sed format, the data needs to be extracted using a spectrometer with specialized software (e.g., DARWin SP software for ViewSpec Pro or SR-3500 of ASD). The extraction process mainly comprises data reading, mean value calculation and data export. The data reading is to read the original spectral reflectance data by using the above-mentioned instrument and the professional software. The average value calculation is to calculate the average value of the original spectral reflectivity data obtained by 5-10 times of read repeated observation. The data export is to export the mean calculation result into ASCII format, which is convenient for later viewing by txt text or excel software.
The spectrum measuring instrument is affected by a plurality of interference factors in the use process, so that the original spectrum curve not only comprises the characteristic spectrum of the measured target, but also comprises noise data such as high-frequency random noise, baseline drift and the like of the instrument. Therefore, the extracted raw spectral data needs to be preprocessed before use to reduce or eliminate the influence of noise on the model. The preprocessing is preferably performed using an open source spectral data processing software package (e.g., a 'propctr' package in R language or a 'SPy' package in Python language, etc.). Because such packages integrate many functions that are very practical in near infrared and infrared spectroscopy applications, they have been widely used for spectral data preprocessing and screening of representative samples or spectral bands. The spectrum data preprocessing method mainly comprises SG smoothing filtering, differential transformation, standard normal variable transformation (SNV), multiple Scattering Correction (MSC) and the like. The SG smoothing can effectively reduce noise points, the smoothing effect is determined by the smoothing points, the noise points are difficult to remove due to the fact that the smoothing points are too small to select, and effective information is easy to lose due to the fact that the smoothing points are too large to select. Through a comparison test, the invention recommends that SG once polynomial 11-point smoothing treatment is selected, and the random noise of the spectrum data after treatment in the wave band ranges of 838-1023 nm and 1735-2500 nm is obviously improved. On the basis, 1-order differentiation (1 st D) Differential of order 2 (2 nd D) Preprocessing such as standard normal variable transformation and multi-element scattering correction.
Compared with the existing pretreatment method of the medlar spectrum data, the pretreatment mode of different combinations adopted by the invention can reduce the heterogeneity of different medlar sample spectrum curves, enhance the quality of the spectrum data and improve the modeling precision. The spectrum differential transformation can eliminate baseline and background interference, distinguish overlapping peaks and improve spectrum resolution and sensitivity; the multi-element scattering correction can correct baseline drift of the diffuse reflection spectrum, and effectively inhibit noise caused by sample non-uniformity; the standard normal transformation can eliminate scattering effect caused by uneven size and distribution of sample particles. Compared with the original spectrum curve, the position of the absorption peak is unchanged after the multi-element scattering correction and standard normal transformation are carried out. After first-order differential transformation, the characteristic absorption band is more prominent, particularly, strong absorption peaks exist near the near-infrared long waves 1160, 1353, 1422, 1557, 1685, 1898, 2038 and 2246nm, strong reflection peaks exist near the green light 566nm, and the heterogeneity characteristic of the spectrum curve is obviously improved.
Establishing a nutritional ingredient estimation model of medlar
The whole sample is randomly divided into two parts according to a certain proportion, wherein one part is a training sample, and the other part is a verification sample. The number of training samples may be 70-80%, preferably 75%, of the total samples and the verification samples may be 20-30%, preferably 25%, of the total samples.
Because hyperspectral data has higher dimension and a large amount of redundant data, interference of irrelevant information can be reduced by extracting sensitive wave bands before modeling, and the prediction capability of a model is improved. The existing hyperspectral sensitive wave band selection methods are numerous, the modeling effect of different sensitive wave band selection methods is compared through the test, and the selection of the sensitive wave band by adopting a continuous projection algorithm (SPA) is recommended. The continuous projection algorithm is to apply variable projection operations in a data matrix to find the spectral feature variable combination with the lowest redundant information and the lowest collinearity. According to the method, a small number of variable data can be extracted from a large number of original spectrum data matrixes to summarize most of spectrum information, spectrum information overlapping can be avoided to the greatest extent, and modeling speed is improved by reducing calculation amount and simplifying a model structure.
Assuming a spectral matrix X n×p And a sample property parameter vector y, where n is the sample size and p is the full wavelength number. The algorithm steps for SPA to realize wavelength selection are divided into two stages: the first stage completes each M wavelength selections of p wavelength packets [ M has the maximum possible value of M=min (n, p)]The M wavelength vector selection steps for the kth (k=1, 2, …, p) wavelength packet are as follows:
step 1, let i=1, assign the kth column of spectral matrix X to X k(1) I.e. k (1) =k, x k(1) =x k And let z 1 =x k(1)
Step 2, each wavelength vector x which is not selected yet is obtained j The position set of (2) is denoted as S i
Step 3, based on z i Constructing orthogonal projection operatorsWherein I is an n×n identity matrix;
step 4, calculating each x j Orthogonal projection vectors of (a)And selecting wavelength positions from them
Step 5, let i=i+1,if i<M, return to step 2 to select the next wavelength vector.
After the above steps are completed, a low-dimensional spectrum matrix is obtainedRepeating the above algorithm will produce a total of p wavelength packets, with each wavelength vector of the sample in turn being the preferred wavelength vector for each packet.
In the second stage of SPA algorithm, M (M is greater than or equal to 1) optimal wavelengths are selected by means of a multivariate quantitative correction model. The multivariate statistical analysis method may be principal component analysis, multivariate linear regression analysis, stepwise regression analysis, nonlinear regression analysis, etc., and is preferably Partial Least Squares Regression (PLSR). The PLSR integrates the advantages of 3 analysis methods of common multiple linear regression, principal component analysis and typical correlation analysis, and can well solve the problems of multiple collinearity among independent variables, fewer samples than variable number, complex calculation and the like. In addition, because the medlar nutrient component test has higher test cost and the available sample quantity is limited, the partial least squares regression method is more suitable for the invention.
The embodiment of the invention adopts Partial Least Squares Regression (PLSR) and adopts a 'leave-one-out cross validation' mode, and changes subspace from p multiplied by M wavelengths with the minimum of cross validation Root Mean Square Error (RMSECV) as the targetSelect the optimal wavelength packet k (1) =k * And m=m thereof * An optimal wavelength point, wherein the kth wavelength packet, RMSECV containing the first i wavelength variables k,i The calculation formula is as follows:
k=1,2,…,p,i=1,2,…,M
wherein: y is j Andcross test by leave-one-out methodThe j-th round of circulation is verified, namely the actual measurement value and the model prediction value of the individual property parameters of the sample are verified.
The calculation process is realized in Matlab software by adopting an SPA toolkit. The input data of the SPA sensitive wave band screening process is full-wave band spectrum data of a training sample, and the full-wave band spectrum data is output as a selected sensitive wave band sequence number.
And (3) carrying out accuracy inspection on the medlar nutrient component estimation model constructed in the steps by utilizing the spectrum data of the verification set. The precision inspection index selects the decision coefficient (R 2 ) Root Mean Square Error (RMSE), relative analysis error (RPD). R is R 2 Reflecting stability of model establishment and verification, R 2 The closer to 1, the better the stability of the model and the higher the fitting degree are. RMSE is used to verify the predictive power of the model, with smaller RMSE providing more predictive power. RPD is the ratio of the standard deviation of the sample to the root mean square error, and is used for evaluating the prediction capability of the model, and when the RPD is less than 1.4, the model cannot predict the sample; when RPD is more than or equal to 1.4 and less than 2.0, the model effect is considered to be general, and the sample can be roughly predicted; when RPD is more than or equal to 2, the model has excellent prediction capability. The calculation formula is as follows:
wherein:representing the predicted value of the nutrient content of the medlar; y is i Expressing the measured value of the nutrient content of the medlar; y represents the average value of measured values of the nutrient content of the medlar; n represents the number of samples of the predicted set of medlar; SD (secure digital memory card) p Representing standard deviation of the wolfberry predictive set samples; RMSE p The root mean square error of the matrimony vine prediction set samples is represented.
The invention recommends that a continuous projection algorithm is preferentially adopted to screen sensitive wave bands to construct a model. When the model prediction accuracy does not meet the requirement, other sensitive wave band selection methods (such as a competition adaptive re-weighting sampling method, a principal component analysis method, a stepwise regression method and the like) can be considered to reconstruct the model.
Through a large number of experimental comparison, the optimal wave band and optimal model of the main nutrition component of the recommended medlar are shown in the following table.
The original hyperspectral data volume is large, the hidden high correlation between wave bands can cause the generation of multiple collinearity problems, and the model is distorted, so how to select sensitive wave bands is a difficult point and a key point for estimating the content of nutrient components of the medlar based on the hyperspectral technology. In modeling terms, not all spectra are highly correlated with the measured chemical index, since the groups are selective for absorption by the spectra. When full spectrum modeling is selected, because the number of spectrum variables is too large, the model has numerous irrelevant spectrum information, and the robustness and the precision of the built model are not ideal. And through screening sensitive wave band modeling, on one hand, the information redundancy of independent variables is reduced, the interpretability of the model is enhanced, the aging and the accuracy of the estimation of the nutrition components of the medlar are improved, and on the other hand, the method has important reference value in the aspects of development and popularization of special equipment for production in the future.
The method comprises the steps of obtaining indoor hyperspectral data of a dried wolfberry fruit sample, carrying out different preprocessing on the original data, extracting sensitive wave bands by adopting a characteristic variable screening method, and establishing a prediction model of main nutritional components of the wolfberry fruit. The invention discovers the sensitive wave band of the main nutrient components (total sugar, polysaccharide, crude protein, manganese, copper, iron, zinc and calcium) of the medlar through a large number of experimental comparisons, and the selected sensitive wave band has close connection with the chemical bond of the main nutrient components of the medlar, thereby having better indication effect on the quality of the medlar. By screening the sensitive wave bands, the complexity of the model is greatly reduced, the modeling efficiency is improved, and the interpretability of the model is enhanced. In addition, compared with the traditional chemical determination, the hyperspectral technology adopted by the invention has the obvious advantages of simple operation and no need of chemical reagents, and provides a new thought and a new method for rapidly determining the nutrient components of the medlar.
On the basis of the above embodiment, the embodiment of the present invention further provides a model building device for estimating the nutritional ingredient content of medlar based on hyperspectrum, as shown in fig. 2, including:
the data acquisition module is used for repeatedly observing for a plurality of times by utilizing a spectrometer to acquire original spectral reflectance data of the dried fruit powder sample of the medlar, and calculating an average value to be used as the spectral reflectance data of the sample;
the preprocessing module is used for preprocessing the spectral reflectivity data and randomly dividing the preprocessed spectral reflectivity data into a training set and a verification set according to a certain proportion; wherein, the number of training set samples is 70-80% of all samples, and the number of verification set samples is 20-30% of all samples;
the model construction module adopts a continuous projection algorithm to select a sensitive wave band from the training set; based on the sensitive wave band, a hyperspectral estimation model of the content of main nutrient components of the medlar is established by adopting a partial least square regression method, wherein the main nutrient components of the medlar comprise total sugar, polysaccharide, crude protein, manganese, copper, iron, zinc and calcium;
and the verification module is used for carrying out accuracy verification on the constructed hyperspectral estimation model by utilizing the verification set.
Further, the system comprises a first-stage pretreatment module and a second-stage pretreatment module;
the first-stage preprocessing module is used for carrying out first-stage preprocessing on the spectral reflectivity data by using SG smoothing filtering;
the second-stage preprocessing module is used for carrying out second-stage preprocessing on the data after the SG is smoothed and filtered, and selecting different preprocessing methods according to different nutritional components to be estimated; the second stage preprocessing method comprises differential transformation, standard normal variable transformation and multiple scattering correction.
Referring to fig. 3, fig. 3 is a schematic diagram of an embodiment of an electronic device according to an embodiment of the invention. As shown in fig. 3, an embodiment of the present invention provides an electronic device 500, including a memory 510, a processor 520, and a computer program 511 stored in the memory 510 and executable on the processor 520, wherein the processor 520 executes the computer program 511 to implement the following steps:
repeatedly observing for multiple times by using a spectrometer to obtain original spectral reflectance data of the dried fruit powder sample of the medlar, and calculating an average value to be used as the spectral reflectance data of the sample;
preprocessing the spectral reflectance data, and randomly dividing the preprocessed spectral reflectance data into a training set and a verification set according to a certain proportion; wherein, the number of training set samples is 70-80% of all samples, and the number of verification set samples is 20-30% of all samples;
selecting a sensitive wave band from the training set by adopting a continuous projection algorithm, and establishing a hyperspectral estimation model of the content of main nutritional ingredients of the medlar by adopting a partial least squares regression method based on the sensitive wave band, wherein the main nutritional ingredients of the medlar comprise total sugar, polysaccharide, crude protein, manganese, copper, iron, zinc and calcium;
and carrying out accuracy verification on the constructed hyperspectral estimation model by using the verification set.
Referring to fig. 4, fig. 4 is a schematic diagram of an embodiment of a computer readable storage medium according to an embodiment of the invention. As shown in fig. 4, the present embodiment provides a computer-readable storage medium 600 having stored thereon a computer program 611, which computer program 611 when executed by a processor implements the steps of:
repeatedly observing for multiple times by using a spectrometer to obtain original spectral reflectance data of the dried fruit powder sample of the medlar, and calculating an average value to be used as the spectral reflectance data of the sample;
preprocessing the spectral reflectance data, and randomly dividing the preprocessed spectral reflectance data into a training set and a verification set according to a certain proportion; wherein, the number of training set samples is 70-80% of all samples, and the number of verification set samples is 20-30% of all samples;
selecting a sensitive wave band from the training set by adopting a continuous projection algorithm, and establishing a hyperspectral estimation model of the content of main nutritional ingredients of the medlar by adopting a partial least squares regression method based on the sensitive wave band, wherein the main nutritional ingredients of the medlar comprise total sugar, polysaccharide, crude protein, manganese, copper, iron, zinc and calcium;
and carrying out accuracy verification on the constructed hyperspectral estimation model by using the verification set.
In the foregoing embodiments, the descriptions of the embodiments are focused on, and for those portions of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (10)

1. A model construction method for estimating the content of nutrient components of medlar based on hyperspectrum is characterized by comprising the following steps:
repeatedly observing for multiple times by using a spectrometer to obtain original spectral reflectance data of the dried fruit powder sample of the medlar, and calculating an average value to be used as the spectral reflectance data of the sample;
preprocessing the spectral reflectance data, and randomly dividing the preprocessed spectral reflectance data into a training set and a verification set according to a certain proportion; wherein, the number of training set samples is 70-80% of all samples, and the number of verification set samples is 20-30% of all samples;
selecting a sensitive wave band from the training set by adopting a continuous projection algorithm, and establishing a hyperspectral estimation model of the content of main nutritional ingredients of the medlar by adopting a partial least squares regression method based on the sensitive wave band, wherein the main nutritional ingredients of the medlar comprise total sugar, polysaccharide, crude protein, manganese, copper, iron, zinc and calcium;
and carrying out accuracy verification on the constructed hyperspectral estimation model by using the verification set.
2. The method of claim 1, wherein selecting a sensitive band from the training set using a continuous projection algorithm, and establishing a hyperspectral estimation model of the content of the main nutritional ingredient of the wolfberry using a partial least squares regression method based on the sensitive band, comprises:
constructing a spectral matrix X from the spectral reflectance data n×p And a sample property parameter vector, n is the sample capacity, p is the full wavelength number;
based on the spectrum matrix X, using an orthogonal projection algorithm to complete p wavelength groups, wherein each wavelength group comprises M wavelengths, M is less than or equal to min (n, p), and p multiplied by M wavelength variable subspaces are constructed
Selecting partial least square regression method and adopting one-leave-one method cross verification method, and changing the subspace from p×M wavelengths by taking the minimum RMSECV as the targetAnd selecting the optimal wavelength group and the optimal wavelength point.
3. The model construction method according to claim 2, wherein p wavelength groupings, each containing M wavelengths, m.ltoreq.min (n, p), are completed using an orthogonal projection algorithm based on the spectral matrix X, constructing p×m wavelength variation subspacesComprising the following steps:
s301, let i=1, assign X to the kth column of the spectral matrix X k(1) I.e. k (1) =k, x k(1) =x k And let z 1 =x k(1)
S302, each wavelength vector x which is not selected yet is selected j The position set of (2) is denoted as S i
S303, based on z i Constructing orthogonal projection operatorsWherein I is an n×n identity matrix;
s304, calculating each x j Orthogonal projection vectors of (a)And selecting wavelength positions from them
S305, let i=i+1,if i<M, return to S302 to select the next wavelength vector.
4. The model building method according to claim 2, wherein the cross-validation root mean square error RMSECV is calculated as follows:
wherein, RMSECV k,i Cross-validating root mean square error for the kth wavelength packet containing the first i wavelength variables; y is j Andand respectively verifying the actual measurement value and the model prediction value of the sample property parameter vector for the j th test.
5. The model construction method according to claim 1, characterized in that the accuracy verification of the constructed hyperspectral estimation model using the verification set comprises: the precision inspection index selects a determination coefficient R 2 Root mean square error RMSE, relative analysis error RPD; the calculation formula is as follows:
wherein:representing the predicted value of the nutrient content of the medlar; y is i Expressing the measured value of the nutrient content of the medlar; />Representing the average value of measured values of the nutrient content of the medlar; n represents the number of samples of the predicted set of medlar; SD (secure digital memory card) p Representing standard deviation of the wolfberry predictive set samples; RMSE p The root mean square error of the matrimony vine prediction set samples is represented.
6. The model construction method according to claim 1, characterized in that preprocessing the spectral reflectance data comprises: performing first-stage pretreatment on the spectral reflectance data by using SG smooth filtering, and then selecting different pretreatment methods for performing second-stage pretreatment on the SG smooth filtered data according to different nutritional components to be estimated; the second stage preprocessing method comprises differential transformation, standard normal variable transformation and multiple scattering correction.
7. The utility model provides a model construction device based on hyperspectral estimation matrimony vine nutrient content which characterized in that includes:
the data acquisition module is used for repeatedly observing for a plurality of times by utilizing a spectrometer to acquire original spectral reflectance data of the dried fruit powder sample of the medlar, and calculating an average value to be used as the spectral reflectance data of the sample;
the preprocessing module is used for preprocessing the spectral reflectivity data and randomly dividing the preprocessed spectral reflectivity data into a training set and a verification set according to a certain proportion; wherein, the number of training set samples is 70-80% of all samples, and the number of verification set samples is 20-30% of all samples;
the model construction module adopts a continuous projection algorithm to select a sensitive wave band from the training set; based on the sensitive wave band, a hyperspectral estimation model of the content of main nutrient components of the medlar is established by adopting a partial least square regression method, wherein the main nutrient components of the medlar comprise total sugar, polysaccharide, crude protein, manganese, copper, iron, zinc and calcium;
and the verification module is used for verifying the accuracy of the constructed hyperspectral estimation model by using the verification set.
8. The apparatus of claim 7, wherein the preprocessing module comprises a first stage preprocessing module and a second stage preprocessing module;
the first-stage preprocessing module is used for carrying out first-stage preprocessing on the spectral reflectivity data by using SG smoothing filtering;
the second-stage preprocessing module is used for carrying out second-stage preprocessing on the data after the SG is smoothed and filtered, and selecting different preprocessing methods according to different nutritional components to be estimated; the second stage preprocessing method comprises differential transformation, standard normal variable transformation and multiple scattering correction.
9. An electronic device, comprising:
a memory for storing a computer software program;
the processor is used for reading and executing the computer software program so as to realize the model construction method for estimating the nutritional ingredient content of the medlar based on hyperspectrum according to any one of claims 1 to 6.
10. A non-transitory computer readable storage medium, wherein the storage medium has stored therein a computer software program which, when executed by a processor, implements a model construction method for estimating nutritional content of wolfberry based on hyperspectrum as claimed in any one of claims 1 to 6.
CN202310309351.XA 2023-03-27 2023-03-27 Model construction method and device for estimating nutritional ingredient content of wolfberry based on hyperspectrum Pending CN116482037A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310309351.XA CN116482037A (en) 2023-03-27 2023-03-27 Model construction method and device for estimating nutritional ingredient content of wolfberry based on hyperspectrum

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310309351.XA CN116482037A (en) 2023-03-27 2023-03-27 Model construction method and device for estimating nutritional ingredient content of wolfberry based on hyperspectrum

Publications (1)

Publication Number Publication Date
CN116482037A true CN116482037A (en) 2023-07-25

Family

ID=87216878

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310309351.XA Pending CN116482037A (en) 2023-03-27 2023-03-27 Model construction method and device for estimating nutritional ingredient content of wolfberry based on hyperspectrum

Country Status (1)

Country Link
CN (1) CN116482037A (en)

Similar Documents

Publication Publication Date Title
Ren et al. Using near-infrared hyperspectral imaging with multiple decision tree methods to delineate black tea quality
KR101574895B1 (en) Method for predicting sugar contents and acidity of citrus using ft-ir fingerprinting combined by multivariate analysis
CN108875913B (en) Tricholoma matsutake rapid nondestructive testing system and method based on convolutional neural network
CN102788752A (en) Non-destructive detection device and method of internal information of crops based on spectrum technology
CN103278473B (en) The mensuration of pipering and moisture and method for evaluating quality in white pepper
CN103940748B (en) Based on the prediction of oranges and tangerines canopy nitrogen content and the visualization method of hyperspectral technique
CN110308111B (en) Method for rapidly predicting time for smoldering yellow tea by using near infrared spectrum technology
CN108844917A (en) A kind of Near Infrared Spectroscopy Data Analysis based on significance tests and Partial Least Squares
CN103185695A (en) Spectrum-based flue-cured tobacco maturity field quick judgment method
CN104778349B (en) One kind is used for rice table soil nitrogen application Classified Protection
CN112595692A (en) Establishment method of fruit total sugar content prediction model and fruit total sugar content prediction method
CN105138834A (en) Tobacco chemical value quantifying method based on near-infrared spectrum wave number K-means clustering
CN101957316A (en) Method for authenticating Xiangshui rice by near-infrared spectroscopy
CN111795943A (en) Method for nondestructive detection of exogenous doped sucrose in tea based on near infrared spectrum technology
CN102313715A (en) Method for detecting honey quality base on laser technology
CN113176227A (en) Method for rapidly predicting adulteration of dendrobium huoshanense in dendrobium hunan
CN107796779A (en) The near infrared spectrum diagnostic method of rubber tree LTN content
CN116482037A (en) Model construction method and device for estimating nutritional ingredient content of wolfberry based on hyperspectrum
CN113049526B (en) Corn seed moisture content determination method based on terahertz attenuated total reflection
Jin-Long et al. Estimation of nutrient contents in wolfberry (Lycium barbarum L.) based on hyperspectral analysis
CN108732128A (en) A method of detection shelled peanut eats organoleptic quality
Li Classification of black tea leaf water content based on hyperspectral imaging
CN111795932B (en) Hyperspectrum-based nondestructive testing method for sugar acidity of waxberry fruits
Gao et al. Design and test of portable comprehensive quality non-destructive detector for grape bunches based on spectrum
CN110320174B (en) Method for rapidly predicting time for smoldering yellow tea by applying polynomial net structure artificial neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination