CN111458308A - Near infrared spectrum gentian identification method and system - Google Patents

Near infrared spectrum gentian identification method and system Download PDF

Info

Publication number
CN111458308A
CN111458308A CN202010499629.0A CN202010499629A CN111458308A CN 111458308 A CN111458308 A CN 111458308A CN 202010499629 A CN202010499629 A CN 202010499629A CN 111458308 A CN111458308 A CN 111458308A
Authority
CN
China
Prior art keywords
spectrum
spectral
determining
gentian
angle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010499629.0A
Other languages
Chinese (zh)
Inventor
孙明华
孔汶汶
孙永祺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Landa Technology Co ltd
Original Assignee
Hangzhou Landa Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Landa Technology Co ltd filed Critical Hangzhou Landa Technology Co ltd
Priority to CN202010499629.0A priority Critical patent/CN111458308A/en
Publication of CN111458308A publication Critical patent/CN111458308A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • G01N21/35Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
    • G01N21/3563Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light for analysing solids; Preparation of samples therefor
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • G01N21/35Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
    • G01N21/359Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using near infrared light

Abstract

The invention relates to a near infrared spectrum gentian identification method and system. The near infrared spectrum gentian identification method provided by the method can be used for quickly, nondestructively, efficiently and accurately detecting the gentian to be detected by acquiring the spectrum data of the gentian to be detected and utilizing the established partial least square method identification model so as to identify and obtain the specific variety of the gentian to be detected. Further, the near infrared spectrum gentiana identification method and system provided by the invention have the characteristics of low cost, simplicity in operation, high detection speed, few required samples, no pollution, high detection precision, strong reliability and the like.

Description

Near infrared spectrum gentian identification method and system
Technical Field
The invention relates to the field of spectrum detection, in particular to a near infrared spectrum gentian identification method and system.
Background
Gentiana (Gentiana) plants mostly grow in mountains or plateaus, and the growing environment and climatic conditions are extreme and variable. Due to the difference between the growth environment and the growth period of the gentiana rigescens, the domestic species and the wild species have obvious difference in some aspects, and in appearance, the domestic species are rich and loose, and the wild species are thin, weak and compact. In addition, the wild growth period is generally longer, the components are balanced and rich, the curative effect is better, and the problems of non-standard planting technology, hormone abuse and the like exist in the medicinal materials of families, so the wild discrimination of the Chinese gentian is particularly important.
In the aspect of traditional technical means, the quality of the gentian is determined by mainly measuring the composition proportion and content value of various nutrient components in the gentian by a chemical method, and the method has the advantages of good repeatability, high accuracy and the like, but has the defects of high labor cost, complex operation and the like, and has a long time from sampling to result output, so that the method cannot meet the requirement of market on real-time and quick detection.
Therefore, it is a technical problem to be solved in the art to provide a method or system capable of identifying gentian rapidly in real time while ensuring accuracy.
Disclosure of Invention
The invention aims to provide a near infrared spectrum gentian identification method and system, which can be used for identifying gentian in real time and rapidly while ensuring the accuracy.
In order to achieve the purpose, the invention provides the following scheme:
a near infrared spectrum gentian identification method comprises the following steps:
obtaining a tabletting sample set of gentian plants; the flaked sample set comprises a plurality of sample flaked tablets of gentian plants;
collecting spectral data of the sample preform by a spectrometer;
determining spectral angle data from the spectral data; the spectral angle data comprises a plurality of spectral angles; the spectrum angle is an included angle between a spectrum and an average spectrum in the spectrum data;
sequencing the spectrum angles in the spectrum angle data to obtain a sequencing result;
acquiring a maximum spectrum angle and a minimum spectrum angle in the sequencing result, and determining a spectrum corresponding to the maximum spectrum angle and a spectrum corresponding to the minimum spectrum angle to construct a modeling set;
dividing other spectrum angles except the maximum spectrum angle and the minimum spectrum angle into a plurality of intervals, and acquiring a spectrum corresponding to a middle spectrum angle in each interval to construct a prediction set; placing spectra corresponding to other spectral angles in the plurality of intervals except for an intermediate spectral angle into the modeling set;
acquiring the wavelength of each spectrum in the modeling set, and determining a first spectrum intensity corresponding to the wavelength;
determining a first spectral vector of the modeling set according to the first spectral intensity, and determining a first spectral matrix according to the first spectral vector;
determining a projection vector according to the first spectrum matrix by adopting a principal component analysis method;
updating the first spectrum matrix by adopting the projection vector to obtain a second spectrum matrix;
establishing a partial least square method identification model by taking the second spectrum matrix as input and the discrimination degree value as output;
acquiring spectral data of a gentian plant to be detected;
determining a spectrum matrix of the gentian to be detected according to the spectrum data of the gentian plant to be detected;
determining an identification result according to the spectrum matrix of the gentian to be detected by adopting the partial least square method identification model; the identification result is as follows: the gentian to be detected is wild gentian or the gentian to be detected is home-grown gentian.
Preferably, the other spectrum angles except the maximum spectrum angle and the minimum spectrum angle are divided into a plurality of intervals, and a spectrum corresponding to an intermediate spectrum angle in each interval is acquired to construct a prediction set; placing into the modeling set spectra corresponding to spectral angles other than the intermediate spectral angle in a plurality of the intervals, and thereafter further comprising:
judging whether the ratio of the number of the spectrums in the modeling set to the number of the spectrums in the testing set is 2:1 or not to obtain a judgment result;
if the judgment result is yes, the number of the intervals is not adjusted;
if the judgment result is negative, adjusting the number of the intervals, and returning to the step of dividing other spectrum angles except the maximum spectrum angle and the minimum spectrum angle into a plurality of intervals to obtain a spectrum corresponding to the middle spectrum angle in each interval so as to construct a prediction set; a step of placing spectra corresponding to other spectral angles in the plurality of intervals, except for the intermediate spectral angle, into the modeling set ".
Preferably, the updating the first spectrum matrix by using the projection vector to obtain a second spectrum matrix specifically includes:
acquiring projection sub-vectors with contribution rates ranked in the first three in the projection vectors;
determining characteristic wavelengths according to the projection sub-vectors, acquiring spectral wavelengths of the spectrums in the modeling set, wherein the wavelengths of the spectrums are equal to the characteristic wavelengths, and determining the spectral intensities of the spectral wavelengths to obtain second spectral intensities;
determining a second spectral vector of the modeling set according to the second spectral intensity, and determining a second spectral matrix according to the second spectral vector.
Preferably, the partial least squares identification model is:
Figure BDA0002524306830000031
wherein Y is a discrimination degree value of α0…αmAs a function of the number of the coefficients,
Figure BDA0002524306830000032
the spectral intensity of different characteristic wavelengths, m ═ 23.
Preferably, the identifying result is determined according to the spectrum matrix of the gentian to be detected by adopting the partial least square method identification model, and specifically comprises the following steps:
determining the discrimination degree value according to the spectrum matrix of the gentian to be detected by adopting the partial least square method discrimination model;
determining a difference value between the discrimination degree value and 1 to obtain a first difference value; if the absolute value of the first difference is less than 0.5, the identification result is: the gentiana rigescens to be detected is a home-grown gentiana rigescens;
determining a difference value between the discrimination degree value and 2 to obtain a second difference value; if the absolute value of the second difference is less than 0.5, the identification result is: the gentiana rigescens to be detected is wild gentiana rigescens.
A near infrared spectrum gentian identification system comprising:
the sample set acquisition module is used for acquiring a tabletting sample set of the gentian plants; the flaked sample set comprises a plurality of sample flaked tablets of gentian plants;
the spectrum data acquisition module is used for acquiring the spectrum data of the sample pressed sheet by adopting a spectrometer;
the spectral angle data determining module is used for determining spectral angle data according to the spectral data; the spectral angle data comprises a plurality of spectral angles; the spectrum angle is an included angle between a spectrum and an average spectrum in the spectrum data;
the sequencing module is used for sequencing the spectrum angles in the spectrum angle data to obtain a sequencing result;
the modeling set construction module is used for acquiring a maximum spectrum angle and a minimum spectrum angle in the sequencing result, and determining a spectrum corresponding to the maximum spectrum angle and a spectrum corresponding to the minimum spectrum angle so as to construct a modeling set;
a prediction set construction module, configured to divide other spectral angles except the maximum spectral angle and the minimum spectral angle into a plurality of intervals, and obtain a spectrum corresponding to an intermediate spectral angle in each interval to construct a prediction set; placing spectra corresponding to other spectral angles in the plurality of intervals except for an intermediate spectral angle into the modeling set;
the first spectrum intensity determination module is used for acquiring the wavelength of each spectrum in the modeling set and determining the first spectrum intensity corresponding to the wavelength;
the first spectrum matrix determination module is used for determining a first spectrum vector of the modeling set according to the first spectrum intensity and determining a first spectrum matrix according to the first spectrum vector;
the projection vector determining module is used for determining a projection vector according to the first spectrum matrix by adopting a principal component analysis method;
the second spectrum matrix determining module is used for updating the first spectrum matrix by adopting the projection vector to obtain a second spectrum matrix;
the partial least square method identification model construction module is used for constructing a partial least square method identification model by taking the second spectrum matrix as input and the discrimination degree value as output;
the spectrum data acquisition module is used for acquiring the spectrum data of the gentian plant to be detected;
the spectrum matrix determination module of the gentian to be detected is used for determining the spectrum matrix of the gentian to be detected according to the spectrum data of the gentian to be detected;
the identification module is used for identifying the model by adopting the partial least square method and determining an identification result according to the spectrum matrix of the gentian to be detected; the identification result is as follows: the gentian to be detected is wild gentian or the gentian to be detected is home-grown gentian.
Preferably, the method further comprises the following steps:
the judging module is used for judging whether the ratio of the number of the spectrums in the modeling set to the number of the spectrums in the testing set is 2:1 or not to obtain a judging result;
if the judgment result is yes, the number of the intervals is not adjusted;
if the judgment result is negative, adjusting the number of the intervals, and returning to the step of dividing other spectrum angles except the maximum spectrum angle and the minimum spectrum angle into a plurality of intervals to obtain a spectrum corresponding to the middle spectrum angle in each interval so as to construct a prediction set; placing spectra corresponding to other spectral angles in a plurality of said intervals, except for an intermediate spectral angle, into said modeling set ".
Preferably, the second spectrum matrix determining module specifically includes:
the projection sub-vector determining unit is used for acquiring projection sub-vectors with contribution rates ranked in the top three in the projection vectors;
the second spectral intensity determining unit is used for determining characteristic wavelengths according to the projection sub-vectors, acquiring spectral wavelengths of the spectrums in the modeling set, wherein the wavelengths of the spectrums are equal to the characteristic wavelengths, and determining the spectral intensities of the spectral wavelengths to obtain second spectral intensities;
and the second spectrum matrix determining unit is used for determining a second spectrum vector of the modeling set according to the second spectrum intensity and determining a second spectrum matrix according to the second spectrum vector.
Preferably, the authentication module specifically includes:
the discrimination degree value determining unit is used for identifying the model by adopting the partial least square method and determining the discrimination degree value according to the spectrum matrix of the gentian to be detected;
the first identification unit is used for determining the difference value between the identification degree value and 1 to obtain a first difference value; if the absolute value of the first difference is less than 0.5, the identification result is: the gentiana rigescens to be detected is a home-grown gentiana rigescens;
the second identification unit is used for determining the difference value between the discrimination degree value and 2 to obtain a second difference value; if the absolute value of the second difference is less than 0.5, the identification result is: the gentiana rigescens to be detected is wild gentiana rigescens.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
according to the near infrared spectrum gentian identification method and system, the spectrum data of the gentian to be detected are obtained, and the established partial least square method identification model is utilized to perform rapid, nondestructive, efficient and accurate detection on the gentian to be detected so as to identify and obtain the specific type of the gentian to be detected. Further, the near infrared spectrum gentiana identification method and system provided by the invention have the characteristics of low cost, simplicity in operation, high detection speed, few required samples, no pollution, high detection precision, strong reliability and the like.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a flow chart of a method for identifying gentiana scabra bunge by near infrared spectroscopy provided by the present invention;
FIG. 2 is a graph showing the average spectra of Gentiana rigescens in the producing area of 13 Gentiana rigescens according to example 13 of the present invention;
FIG. 3 is a first projection vector diagram of PCA analysis in an embodiment of the present invention;
FIG. 4 is a second projection vector diagram of the PCA analysis in an embodiment of the present invention;
FIG. 5 is a third projection vector diagram of the PCA analysis in an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a near infrared spectrum gentian identification system provided by the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide a near infrared spectrum gentian identification method and system, which can be used for identifying gentian in real time and rapidly while ensuring the accuracy.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Fig. 1 is a flowchart of the method for identifying gentiana rough near infrared spectrum provided by the present invention, and as shown in fig. 1, the method for identifying gentiana rough near infrared spectrum provided by the present invention includes:
step 100: obtaining a tabletting sample set of gentian plants; the flaked sample set includes sample flaked tablets of a plurality of gentian plants.
Step 101: and collecting spectral data of the sample preform by using a spectrometer.
Step 102: determining spectral angle data from the spectral data; the spectral angle data comprises a plurality of spectral angles; and the spectrum angle is an included angle between the spectrum and the average spectrum in the spectrum data.
Step 103: and sequencing the spectrum angles in the spectrum angle data to obtain a sequencing result.
Step 104: and acquiring a maximum spectrum angle and a minimum spectrum angle in the sequencing result, and determining a spectrum corresponding to the maximum spectrum angle and a spectrum corresponding to the minimum spectrum angle to construct a modeling set.
Step 105: dividing other spectrum angles except the maximum spectrum angle and the minimum spectrum angle into a plurality of intervals, and acquiring a spectrum corresponding to a middle spectrum angle in each interval to construct a prediction set; placing spectra corresponding to spectral angles other than the intermediate spectral angle in a plurality of said intervals into said modeling set.
Step 106: and acquiring the wavelength of each spectrum in the modeling set, and determining the first spectrum intensity corresponding to the wavelength.
Step 107: a first spectral vector of the modeling set is determined from the first spectral intensity, and a first spectral matrix is determined from the first spectral vector.
Step 108: and determining a projection vector according to the first spectrum matrix by adopting a principal component analysis method.
Step 109: updating the first spectrum matrix by adopting the projection vector to obtain a second spectrum matrix; step 109 specifically includes:
and acquiring projection sub-vectors with contribution rates ranked in the first three in the projection vectors.
And determining characteristic wavelengths according to the projection sub-vectors, acquiring spectral wavelengths of the spectrums in the modeling set, wherein the wavelengths of the spectrums are equal to the characteristic wavelengths, and determining the spectral intensities of the spectral wavelengths to obtain second spectral intensities.
Determining a second spectral vector of the modeling set according to the second spectral intensity, and determining a second spectral matrix according to the second spectral vector.
Step 110: establishing a partial least square method identification model by taking the second spectrum matrix as input and the discrimination degree value as output; the partial least square method identification model is as follows:
Figure BDA0002524306830000071
wherein Y is a discrimination degree value of α0…αmIs a coefficient, Y is a discrimination degree value,
Figure BDA0002524306830000081
the spectral intensity of different characteristic wavelengths, m ═ 23.
Step 111: and acquiring the spectral data of the gentian plant to be detected.
Step 112: and determining the spectrum matrix of the gentian to be detected according to the spectrum data of the gentian plant to be detected.
Step 113: determining an identification result according to the spectrum matrix of the gentian to be detected by adopting the partial least square method identification model; the identification result is as follows: the gentian to be detected is wild gentian or the gentian to be detected is home-grown gentian.
Step 113 specifically includes:
and determining the discrimination degree value according to the spectrum matrix of the gentian to be detected by adopting the partial least square method discrimination model.
Determining a difference value between the discrimination degree value and 1 to obtain a first difference value; if the absolute value of the first difference is less than 0.5, the identification result is: the gentiana rigescens to be detected is a home-grown gentiana rigescens;
determining a difference value between the discrimination degree value and 2 to obtain a second difference value; if the absolute value of the second difference is less than 0.5, the identification result is: the gentiana rigescens to be detected is wild gentiana rigescens.
In order to further improve the accuracy of the identification, after step 105, the method may further include:
and judging whether the ratio of the number of the spectrums in the modeling set to the number of the spectrums in the testing set is 2:1 or not to obtain a judgment result.
And if the judgment result is yes, the number of the intervals is not adjusted.
If the judgment result is no, adjusting the number of the intervals, and returning to the step 105.
The scheme of the invention is further illustrated by providing a specific embodiment, which is illustrated by taking the identification of gentiana rigescens as an example, and is also applicable to the identification of other kinds of gentiana rigescens in specific application.
Step 1: samples were collected and pressed into tablets.
Collecting Gentiana rigescens plants of different producing areas in Yunnan province, cleaning roots, drying and grinding, taking 0.15g of each sample, pressurizing into tablets, making the surfaces of the tablets relatively flat, being beneficial to reducing experimental errors caused by uneven powder surfaces, taking wild samples from the rear rock mountain heads, taking ten plants in total, and taking the rest species as shown in table 1:
TABLE 1
Figure BDA0002524306830000091
Step 2: collecting spectra
The light source and the spectrum signal receiving probe are respectively arranged on two tripods, the light source adopts a 50W halogen lamp, the spectrometer connected with the probe adopts an ASDFieldSpec4 high-resolution spectrometer (Analytical Spectral Devices, Inc., Boulder, Colorado, USA), the collectable spectrum range is 350-2500nm, the distance between the tablet sample and the probe is 2cm, the light source irradiates the surface of the tablet sample, the surface reflected light is collected by the probe again and becomes a spectrum signal through the spectrometer, one tablet collects three spectrums, the total collection of 384 spectrums is completed, the wavelength range of each spectrum is 350-2500nm, and the wavelength range of each spectrum has 2151.
And step 3: partitioning modeling and prediction sets according to spectral angles
Taking the spectrum of a wild gentiana rigescens tablet as an example, the included angle between the spectrum and the average spectrum (the average spectrum of 13 gentiana rigescens is shown in figure 2) is calculated, and the included angle is called as a spectrum angle, and the calculation method is as follows:
Figure BDA0002524306830000092
wherein, A is a spectrum of the wild gentiana rigescens tablet, B is the average spectrum of the wild gentiana rigescens tablet, and theta is the spectrum angle.
Sorting according to the ascending order of the spectrum angles theta, putting the spectrums corresponding to the maximum value and the minimum value of the spectrum angles into a modeling set, manually and randomly dividing a plurality of intervals for the rest spectrum angles, putting the spectrums corresponding to the spectrum angles in the middle of the intervals into a prediction set, and putting the spectrums corresponding to other spectrum angles into the modeling set.
If the number of the intervals is too large, the number of the predicted light collection spectrums is large, if the number of the intervals is too small, the number of the modeling light collection spectrums is large, and under a plurality of artificially and randomly divided intervals, if the spectrum number ratio of the modeling set to the predicted set does not meet 2:1, the number of the intervals is adjusted to ensure that the ratio of the modeling set to the predicted set is about 2:1, so that the spectrum uniformity of the modeling set and the predicted set is fully ensured.
The 384 spectra of the gentiana rigescens are divided into a modeling set and a prediction set according to the spectrum angles, and as a result, the modeling set has 260 spectra and the prediction set has 124 spectra.
The essence of the spectrum angle is the included angle of two spectrum vectors, the deviation degree of the spectra can be effectively represented, the larger the included angle is, the larger the difference of the two spectra is, the spectrum angle of each spectrum and the average spectrum of the spectrum is solved for the Gentiana rigescens spectrum of each production place, the spectrum angle is divided into a plurality of intervals, the spectrum corresponding to the spectrum angle in the middle of each interval is put into a prediction set, the uniformity of the modeling set and the prediction set can be effectively ensured, the model representation performance is improved, and the modeling effect is greatly improved.
And 4, step 4: characteristic wavelength screening
In order to reduce input information and improve model efficiency, characteristic wavelengths are screened based on Principal Component Analysis (PCA).
The PCA is an unsupervised method, linear combination is carried out on spectral variables to form new linearly independent score variables capable of effectively representing gentiana rigescens information, the linear combination mode is vector projection, the goal is to enable the variance of the score variables after the spectral variable projection to be larger and better, and the PCA can generate projection vectors with different contribution rates according to the variance.
PCA analysis of the spectral matrix of the modeling set was performed, recording the wavelengths of the spectral lines as λ 1, λ 2, … …, λ p, where p is 2151. One spectrum has 2151 wavelengths corresponding to line intensities Iλ1,Iλ2,……IλpThe spectral vector of the modeling set can be represented as Xi
Figure BDA0002524306830000101
(i ═ 1,2 … … 260), the spectral matrix of the modeling set can be represented as
Figure BDA0002524306830000102
Suppose projection vector V ═ a1,a2……ap]The score variable may be represented as X × V, and the projection vector may be calculated as follows:
variance σ of the score variables2Comprises the following steps:
Figure BDA0002524306830000111
wherein, it is made
Figure BDA0002524306830000112
Then
Figure BDA0002524306830000113
Then V ═ argmax (σ)2)=argmaxVTCV, different projection subvectors V' can be obtained by constructing Lagrange function, and the projection subvectors are disclosed
Figure BDA0002524306830000114
Knowing that different projection subvectors V 'can obtain different variances, a larger variance represents a larger contribution rate of the corresponding projection subvectors V'.
By adopting the calculation process, the projection subvector V of the first three of the contribution rates is found1,V2,V3Its projected subvector V1,V2,V3The results are shown in FIGS. 3 to 5.
For three projection vectors, the wavelength at the inflection point is artificially selected as the characteristic wavelength P, and the screening of the characteristic wavelength can reduce the background noise on one hand and reduce the input of the model on the other hand, thereby improving the running speed and the precision of the model. The 23 characteristic wavelengths screened from the three projection subvector axes are sorted as shown in the following table 2.
TABLE 2
Figure BDA0002524306830000115
Step 5, distinguishing a model formula
By screening the characteristic wavelength, the wavelength of the spectral line is reduced from 2151 to 23, and is recorded as p1,p2,…,pm(m 23) corresponding to a line intensity of Ip1,Ip2,……Ipm. At this point, the new spectral vector for the modeled set may be represented as X'i
Figure BDA0002524306830000116
(i ═ 1,2 … … 260), the new spectral matrix of the modeling set can be represented as
Figure BDA0002524306830000121
Using the discrimination degree value as output and using new spectral matrix X' of modeling set as input to establish partial least square method identification model
Figure BDA0002524306830000122
Wherein Y is a discrimination degree value of α0…αmIs a coefficient, Y is a discrimination degree value,
Figure BDA0002524306830000123
the spectral intensity of different characteristic wavelengths, m ═ 23.
And (3) integrating the sampled data of the gentiana rigescens, and solving to obtain an equation of a partial least square method identification model as follows:
Y=2.41+8.36Ip1+42.93Ip2-188.78Ip3+186.77Ip4-164.82Ip5+224.7Ip6-339.9Ip7-14.73Ip8+209.39Ip9+30.62Ip10+66.5Ip11-256.48Ip12-538Ip13+536.23Ip14+274.76Ip15-183Ip16+19.93Ip17-16.8Ip18+376.19Ip19-477.6Ip20-35.73Ip21+808.57Ip22-568.97Ip23
wherein, IpmThe spectral intensities representing the characteristic wavelengths were 23 in total.
If the difference between the output discrimination degree Y value and the absolute value of 1 is less than 0.5, the Gentiana rigescens is identified as the home-grown Gentiana rigescens, if the difference between the Y value and the absolute value of 2 is less than 0.5, the Gentiana rigescens is identified as the wild Gentiana rigescens, otherwise, the Gentiana rigescens is identified as neither the home-grown Gentiana rigescens nor the wild Gentiana rigescens.
The accuracy of the modeling set and the prediction set obtained by the equation of the partial least squares method identification model is 97.69% and 97.79%, respectively.
In addition, aiming at the near infrared spectrum gentian identification method, the invention also correspondingly provides a near infrared spectrum gentian identification system, as shown in fig. 6, the system comprises: the system comprises a sample set acquisition module 200, a spectral data acquisition module 201, a spectral angle data determination module 202, a sorting module 203, a modeling set construction module 204, a prediction set construction module 205, a first spectral intensity determination module 206, a first spectral matrix determination module 207, a projection vector determination module 208, a second spectral matrix determination module 209, a partial least squares identification model construction module 210, a spectral data acquisition module 211, a spectral matrix determination module 212 and an identification module 213 of the gentian to be detected.
The sample set acquisition module 200 is used for acquiring a tabletting sample set of the gentian plants; the flaked sample set includes sample flaked tablets of a plurality of gentian plants.
The spectral data acquisition module 201 is used for acquiring the spectral data of the sample tablet by using a spectrometer.
The spectral angle data determination module 202 is configured to determine spectral angle data according to the spectral data; the spectral angle data comprises a plurality of spectral angles; and the spectrum angle is an included angle between the spectrum and the average spectrum in the spectrum data.
The sorting module 203 is configured to sort the spectrum angles in the spectrum angle data to obtain a sorting result.
The modeling set constructing module 204 is configured to obtain a maximum spectral angle and a minimum spectral angle in the sorting result, and determine a spectrum corresponding to the maximum spectral angle and a spectrum corresponding to the minimum spectral angle to construct a modeling set.
The prediction set constructing module 205 is configured to divide the other spectral angles except the maximum spectral angle and the minimum spectral angle into a plurality of intervals, and obtain a spectrum corresponding to an intermediate spectral angle in each interval to construct a prediction set; placing spectra corresponding to spectral angles other than the intermediate spectral angle in a plurality of said intervals into said modeling set.
The first spectrum intensity determining module 206 is configured to obtain a wavelength of each spectrum in the modeling set, and determine a first spectrum intensity corresponding to the wavelength.
The first spectral matrix determination module 207 is configured to determine a first spectral vector of the modeling set according to the first spectral intensity, and determine a first spectral matrix according to the first spectral vector.
The projection vector determination module 208 is configured to determine a projection vector from the first spectral matrix using principal component analysis.
The second spectrum matrix determining module 209 is configured to update the first spectrum matrix with the projection vector to obtain a second spectrum matrix.
The partial least square method identification model construction module 210 is configured to construct a partial least square method identification model by using the second spectrum matrix as an input and the discrimination degree value as an output.
The spectrum data acquisition module 211 is used for acquiring the spectrum data of the gentian plant to be detected.
The spectrum matrix determining module 212 of the gentian to be detected is used for determining the spectrum matrix of the gentian to be detected according to the spectrum data of the gentian plant to be detected.
The identification module 213 is configured to identify the model by using the partial least square method, and determine an identification result according to the spectrum matrix of the gentian to be detected; the identification result is as follows: the gentian to be detected is wild gentian or the gentian to be detected is home-grown gentian.
As another embodiment of the near infrared spectral gentian identification system provided by the present invention, the system further comprises: and a judging module.
The judging module is used for judging whether the ratio of the number of the spectrums in the modeling set to the number of the spectrums in the testing set is 2:1 or not to obtain a judging result.
And if the judgment result is yes, the number of the intervals is not adjusted.
If the judgment result is negative, adjusting the number of the intervals, returning to a prediction set module to execute' dividing other spectrum angles except the maximum spectrum angle and the minimum spectrum angle into a plurality of intervals, and acquiring a spectrum corresponding to a middle spectrum angle in each interval to construct a prediction set; a step of placing spectra corresponding to other spectral angles in the plurality of intervals, except for the intermediate spectral angle, into the modeling set ".
As another embodiment of the present invention, the second spectrum matrix determining module 209 specifically includes: a projection sub-vector determination unit, a second spectral intensity determination unit and a second spectral matrix determination unit.
The projection sub-vector determining unit is used for acquiring the projection sub-vectors with the contribution rates of the first three in the projection vectors.
The second spectrum intensity determining unit is used for determining characteristic wavelengths according to the projection sub-vectors, acquiring spectrum wavelengths with the wavelengths of the spectrums in the modeling set equal to the characteristic wavelengths, and determining the spectrum intensities of the spectrum wavelengths to obtain second spectrum intensities.
The second spectrum matrix determining unit is used for determining a second spectrum vector of the modeling set according to the second spectrum intensity and determining a second spectrum matrix according to the second spectrum vector.
As another embodiment of the present invention, the authentication module 213 specifically includes: a discrimination degree value determination unit, a first discrimination unit, and a second discrimination unit.
The discrimination degree value determining unit is used for identifying a model by adopting the partial least square method and determining the discrimination degree value according to the spectrum matrix of the gentian to be detected.
The first identification unit is used for determining the difference value between the identification degree value and 1 to obtain a first difference value; if the absolute value of the first difference is less than 0.5, the identification result is: the gentiana rigescens to be detected is a home-grown gentiana rigescens;
the second identification unit is used for determining the difference value between the discrimination degree value and 2 to obtain a second difference value; if the absolute value of the second difference is less than 0.5, the identification result is: the gentiana rigescens to be detected is wild gentiana rigescens.
Compared with the gentian identification method in the prior art, the technical scheme provided by the invention has the following advantages:
1. the near infrared spectrum can be used for rapidly, nondestructively and efficiently detecting a sample, the cost is low, a chemical reagent or excessive sample pretreatment process is not needed, the prediction effect of the model is excellent, and the near infrared spectrum has the characteristics of high detection speed, few required samples, no pollution, high detection precision, high reliability and the like.
2. The characteristic wavelength is screened out, so that the input variable of the model can be reduced, and the efficiency of the model is improved.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (9)

1. A near infrared spectrum gentian identification method is characterized by comprising the following steps:
obtaining a tabletting sample set of gentian plants; the flaked sample set comprises a plurality of sample flaked tablets of gentian plants;
collecting spectral data of the sample preform by a spectrometer;
determining spectral angle data from the spectral data; the spectral angle data comprises a plurality of spectral angles; the spectrum angle is an included angle between a spectrum and an average spectrum in the spectrum data;
sequencing the spectrum angles in the spectrum angle data to obtain a sequencing result;
acquiring a maximum spectrum angle and a minimum spectrum angle in the sequencing result, and determining a spectrum corresponding to the maximum spectrum angle and a spectrum corresponding to the minimum spectrum angle to construct a modeling set;
dividing other spectrum angles except the maximum spectrum angle and the minimum spectrum angle into a plurality of intervals, and acquiring a spectrum corresponding to a middle spectrum angle in each interval to construct a prediction set; placing spectra corresponding to other spectral angles in the plurality of intervals except for an intermediate spectral angle into the modeling set;
acquiring the wavelength of each spectrum in the modeling set, and determining a first spectrum intensity corresponding to the wavelength;
determining a first spectral vector of the modeling set according to the first spectral intensity, and determining a first spectral matrix according to the first spectral vector;
determining a projection vector according to the first spectrum matrix by adopting a principal component analysis method;
updating the first spectrum matrix by adopting the projection vector to obtain a second spectrum matrix;
establishing a partial least square method identification model by taking the second spectrum matrix as input and the discrimination degree value as output;
acquiring spectral data of a gentian plant to be detected;
determining a spectrum matrix of the gentian to be detected according to the spectrum data of the gentian plant to be detected;
determining an identification result according to the spectrum matrix of the gentian to be detected by adopting the partial least square method identification model; the identification result is as follows: the gentian to be detected is wild gentian or the gentian to be detected is home-grown gentian.
2. The near infrared spectral gentian identification method of claim 1, wherein said dividing the spectral angles other than said maximum spectral angle and said minimum spectral angle into a plurality of intervals and obtaining spectra corresponding to intermediate spectral angles in each interval to construct a prediction set; placing into the modeling set spectra corresponding to spectral angles other than the intermediate spectral angle in a plurality of the intervals, and thereafter further comprising:
judging whether the ratio of the number of the spectrums in the modeling set to the number of the spectrums in the testing set is 2:1 or not to obtain a judgment result;
if the judgment result is yes, the number of the intervals is not adjusted;
if the judgment result is negative, adjusting the number of the intervals, and returning to the step of dividing other spectrum angles except the maximum spectrum angle and the minimum spectrum angle into a plurality of intervals to obtain a spectrum corresponding to the middle spectrum angle in each interval so as to construct a prediction set; a step of placing spectra corresponding to other spectral angles in the plurality of intervals, except for the intermediate spectral angle, into the modeling set ".
3. The method of identifying a gentiana near infrared spectrum according to claim 1, wherein said updating the first spectral matrix with the projection vector to obtain a second spectral matrix comprises:
acquiring projection sub-vectors with contribution rates ranked in the first three in the projection vectors;
determining characteristic wavelengths according to the projection sub-vectors, acquiring spectral wavelengths of the spectrums in the modeling set, wherein the wavelengths of the spectrums are equal to the characteristic wavelengths, and determining the spectral intensities of the spectral wavelengths to obtain second spectral intensities;
determining a second spectral vector of the modeling set according to the second spectral intensity, and determining a second spectral matrix according to the second spectral vector.
4. The method of identifying gentiana near infrared spectrum of claim 3, wherein said partial least squares identification model is:
Figure FDA0002524306820000021
wherein Y is a discrimination degree value of α0…αmIs a coefficient, Y is a discrimination degree value,
Figure FDA0002524306820000022
the spectral intensity of different characteristic wavelengths, m ═ 23.
5. The identification method of gentiana scabra bunge according to the near infrared spectrum of claim 1, wherein the determining an identification result according to the spectrum matrix of gentiana scabra bunge by using the partial least squares identification model specifically comprises:
determining the discrimination degree value according to the spectrum matrix of the gentian to be detected by adopting the partial least square method discrimination model;
determining a difference value between the discrimination degree value and 1 to obtain a first difference value; if the absolute value of the first difference is less than 0.5, the identification result is: the gentiana rigescens to be detected is a home-grown gentiana rigescens;
determining a difference value between the discrimination degree value and 2 to obtain a second difference value; if the absolute value of the second difference is less than 0.5, the identification result is: the gentiana rigescens to be detected is wild gentiana rigescens.
6. A near infrared spectrum gentian identification system, comprising:
the sample set acquisition module is used for acquiring a tabletting sample set of the gentian plants; the flaked sample set comprises a plurality of sample flaked tablets of gentian plants;
the spectrum data acquisition module is used for acquiring the spectrum data of the sample pressed sheet by adopting a spectrometer;
the spectral angle data determining module is used for determining spectral angle data according to the spectral data; the spectral angle data comprises a plurality of spectral angles; the spectrum angle is an included angle between a spectrum and an average spectrum in the spectrum data;
the sequencing module is used for sequencing the spectrum angles in the spectrum angle data to obtain a sequencing result;
the modeling set construction module is used for acquiring a maximum spectrum angle and a minimum spectrum angle in the sequencing result, and determining a spectrum corresponding to the maximum spectrum angle and a spectrum corresponding to the minimum spectrum angle so as to construct a modeling set;
a prediction set construction module, configured to divide other spectral angles except the maximum spectral angle and the minimum spectral angle into a plurality of intervals, and obtain a spectrum corresponding to an intermediate spectral angle in each interval to construct a prediction set; placing spectra corresponding to other spectral angles in the plurality of intervals except for an intermediate spectral angle into the modeling set;
the first spectrum intensity determination module is used for acquiring the wavelength of each spectrum in the modeling set and determining the first spectrum intensity corresponding to the wavelength;
the first spectrum matrix determination module is used for determining a first spectrum vector of the modeling set according to the first spectrum intensity and determining a first spectrum matrix according to the first spectrum vector;
the projection vector determining module is used for determining a projection vector according to the first spectrum matrix by adopting a principal component analysis method;
the second spectrum matrix determining module is used for updating the first spectrum matrix by adopting the projection vector to obtain a second spectrum matrix;
the partial least square method identification model construction module is used for constructing a partial least square method identification model by taking the second spectrum matrix as input and the discrimination degree value as output;
the spectrum data acquisition module is used for acquiring the spectrum data of the gentian plant to be detected;
the spectrum matrix determination module of the gentian to be detected is used for determining the spectrum matrix of the gentian to be detected according to the spectrum data of the gentian to be detected;
the identification module is used for identifying the model by adopting the partial least square method and determining an identification result according to the spectrum matrix of the gentian to be detected; the identification result is as follows: the gentian to be detected is wild gentian or the gentian to be detected is home-grown gentian.
7. The near infrared spectral gentian identification system of claim 6, further comprising:
the judging module is used for judging whether the ratio of the number of the spectrums in the modeling set to the number of the spectrums in the testing set is 2:1 or not to obtain a judging result;
if the judgment result is yes, the number of the intervals is not adjusted;
if the judgment result is negative, adjusting the number of the intervals, and returning to the step of dividing other spectrum angles except the maximum spectrum angle and the minimum spectrum angle into a plurality of intervals to obtain a spectrum corresponding to the middle spectrum angle in each interval so as to construct a prediction set; placing spectra corresponding to other spectral angles in a plurality of said intervals, except for an intermediate spectral angle, into said modeling set ".
8. The near infrared spectral gentian identification system of claim 6, wherein said second spectral matrix determination module comprises:
the projection sub-vector determining unit is used for acquiring projection sub-vectors with contribution rates ranked in the top three in the projection vectors;
the second spectral intensity determining unit is used for determining characteristic wavelengths according to the projection sub-vectors, acquiring spectral wavelengths of the spectrums in the modeling set, wherein the wavelengths of the spectrums are equal to the characteristic wavelengths, and determining the spectral intensities of the spectral wavelengths to obtain second spectral intensities;
and the second spectrum matrix determining unit is used for determining a second spectrum vector of the modeling set according to the second spectrum intensity and determining a second spectrum matrix according to the second spectrum vector.
9. The near infrared spectrum gentian identification system of claim 6, wherein the identification module comprises:
the discrimination degree value determining unit is used for identifying the model by adopting the partial least square method and determining the discrimination degree value according to the spectrum matrix of the gentian to be detected;
the first identification unit is used for determining the difference value between the identification degree value and 1 to obtain a first difference value; if the absolute value of the first difference is less than 0.5, the identification result is: the gentiana rigescens to be detected is a home-grown gentiana rigescens;
the second identification unit is used for determining the difference value between the discrimination degree value and 2 to obtain a second difference value; if the absolute value of the second difference is less than 0.5, the identification result is: the gentiana rigescens to be detected is wild gentiana rigescens.
CN202010499629.0A 2020-06-04 2020-06-04 Near infrared spectrum gentian identification method and system Pending CN111458308A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010499629.0A CN111458308A (en) 2020-06-04 2020-06-04 Near infrared spectrum gentian identification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010499629.0A CN111458308A (en) 2020-06-04 2020-06-04 Near infrared spectrum gentian identification method and system

Publications (1)

Publication Number Publication Date
CN111458308A true CN111458308A (en) 2020-07-28

Family

ID=71684886

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010499629.0A Pending CN111458308A (en) 2020-06-04 2020-06-04 Near infrared spectrum gentian identification method and system

Country Status (1)

Country Link
CN (1) CN111458308A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104374739A (en) * 2014-10-30 2015-02-25 中国科学院半导体研究所 Identification method for authenticity of varieties of seeds on basis of near-infrared quantitative analysis
CN106053358A (en) * 2016-05-06 2016-10-26 中国农业大学 Visible/near infrared spectroscopy-based poultry infertile egg detecting method and device
CN107860740A (en) * 2017-12-08 2018-03-30 中国农业科学院茶叶研究所 A kind of evaluation method of the fermentation of black tea quality based on near-infrared spectrum technique
CN110231300A (en) * 2019-03-04 2019-09-13 天津农学院 A kind of lossless method for quickly identifying true and false Aksu red fuji apple

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104374739A (en) * 2014-10-30 2015-02-25 中国科学院半导体研究所 Identification method for authenticity of varieties of seeds on basis of near-infrared quantitative analysis
CN106053358A (en) * 2016-05-06 2016-10-26 中国农业大学 Visible/near infrared spectroscopy-based poultry infertile egg detecting method and device
CN107860740A (en) * 2017-12-08 2018-03-30 中国农业科学院茶叶研究所 A kind of evaluation method of the fermentation of black tea quality based on near-infrared spectrum technique
CN110231300A (en) * 2019-03-04 2019-09-13 天津农学院 A kind of lossless method for quickly identifying true and false Aksu red fuji apple

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李志刚: "《光谱数据处理与定量分析技术》", 31 July 2017, 北京邮电大学出版社 *

Similar Documents

Publication Publication Date Title
CN110411957B (en) Nondestructive rapid prediction method and device for shelf life and freshness of fruits
CN101231274B (en) Method for rapid measuring allantoin content in yam using near infrared spectrum
CN102590129B (en) Method for detecting content of amino acid in peanuts by near infrared method
CN111488926A (en) Soil organic matter measuring method based on optimization model
WO2020248961A1 (en) Method for selecting spectral wavenumber without reference value
CN109211829A (en) A method of moisture content in the near infrared spectroscopy measurement rice based on SiPLS
CN112098357B (en) Strawberry sensory quality grade evaluation method based on near infrared spectrum
CN104807777A (en) Rapid detection method for areca-nut water content based on near infrared spectrum analysis technology
CN111257277A (en) Tobacco leaf similarity judgment method based on near infrared spectrum technology
CN105223140A (en) The method for quickly identifying of homology material
CN105738311A (en) Apple sweetness non-damage quick detection method based on near-infrared spectrum technology
CN104316492A (en) Method for near-infrared spectrum measurement of protein content in potato tuber
CN113655027A (en) Method for rapidly detecting tannin content in plant by near infrared
CN111896497B (en) Spectral data correction method based on predicted value
CN111595806A (en) Method for monitoring soil carbon component by using mid-infrared diffuse reflection spectrum
CN111458308A (en) Near infrared spectrum gentian identification method and system
CN108169162B (en) Rapid evaluation method for soil fertility level of tea garden
CN104181125A (en) Method for rapidly determining Kol-bach value of beer malt
CN113049526B (en) Corn seed moisture content determination method based on terahertz attenuated total reflection
CN110243773B (en) Method for calculating total salt content of soil by utilizing high spectral reflectivity of soil
CN110579466B (en) Laser-induced breakdown spectroscopy screening method
CN105675548A (en) Method for determining main nutrition components in rice through using spectroscopy
CN111289451A (en) Method for quantitatively calculating concentration of complex spectral components
CN112861413A (en) Biomass water content measurement and modeling method based on near infrared spectrum principal component and neural network
CN112861415A (en) Biomass ash content measurement and modeling method based on near infrared spectrum principal component and neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination