CN116992259A - Spectral feature extraction method and detection method suitable for biological quality detection - Google Patents

Spectral feature extraction method and detection method suitable for biological quality detection Download PDF

Info

Publication number
CN116992259A
CN116992259A CN202311265639.8A CN202311265639A CN116992259A CN 116992259 A CN116992259 A CN 116992259A CN 202311265639 A CN202311265639 A CN 202311265639A CN 116992259 A CN116992259 A CN 116992259A
Authority
CN
China
Prior art keywords
data
models
spectral
feature extraction
extraction method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311265639.8A
Other languages
Chinese (zh)
Other versions
CN116992259B (en
Inventor
刘鸿飞
黄晓晓
熊康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Optosky Xiamen Optoelectronic Co ltd
Original Assignee
Optosky Xiamen Optoelectronic Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Optosky Xiamen Optoelectronic Co ltd filed Critical Optosky Xiamen Optoelectronic Co ltd
Priority to CN202311265639.8A priority Critical patent/CN116992259B/en
Publication of CN116992259A publication Critical patent/CN116992259A/en
Application granted granted Critical
Publication of CN116992259B publication Critical patent/CN116992259B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • G01N21/35Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
    • G01N21/359Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using near infrared light
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • G01N21/35Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
    • G01N21/3563Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light for analysing solids; Preparation of samples therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2113Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/27Regression, e.g. linear or logistic regression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/20Identification of molecular entities, parts thereof or of chemical compositions
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biochemistry (AREA)
  • Evolutionary Biology (AREA)
  • Analytical Chemistry (AREA)
  • Pathology (AREA)
  • Immunology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Investigating Or Analysing Materials By Optical Means (AREA)

Abstract

The invention provides a spectral feature extraction method and a detection method suitable for biological quality detection, wherein the spectral feature extraction method comprises the following steps: s1: dividing the preprocessed spectrum data into K band intervals; s2: modeling is carried out according to the spectral data after eliminating one band interval in the input spectral data and the preset quality index data until all band intervals are eliminated, and a plurality of models are obtained; s3: selecting a first optimal model in a plurality of models, and recording a wave band interval; s4: inputting the recorded wave band intervals into S2 until S3 is recorded into one wave band interval, and obtaining a plurality of corresponding first optimal models; s5: selecting a second optimal model in the plurality of first optimal models; s6: and extracting characteristic wavelengths according to the band interval corresponding to the second optimal model. The high-value characteristic wavelength with high correlation with the quality index can be accurately selected, and meanwhile, the modeling efficiency and the modeling accuracy are improved.

Description

Spectral feature extraction method and detection method suitable for biological quality detection
Technical Field
The invention relates to the technical field of near infrared spectrum analysis, in particular to a spectrum characteristic extraction method and a detection method suitable for biological quality detection.
Background
Due to the characteristics of near infrared, such as no destructiveness, no pretreatment, no pollution, and the like, near infrared spectrum classification technology is increasingly widely applied to post-partum processing and quality judgment of organisms such as fruits, vegetables, meat, and the like. The basic flow of most schemes is as follows: illuminating the living beings by using a halogen lamp in a diffuse reflection mode, and collecting a plurality of groups of near infrared spectrum data; according to the characteristics of the detected organism, calculating quality index data such as moisture, sweetness, protein content and the like in the organism by using a chemometry; a mapping relationship between near infrared spectrum data and quality index data is established, which is generally called a model. The common modeling flow is: preprocessing of spectrum data (convolution smoothing, derivative filtering smoothing, standard normalization and the like), extracting of spectrum characteristics (correlation coefficient method, principal component analysis method and the like), and modeling by selecting a proper method (multiple linear regression, principal component regression, partial least square method and the like). The extraction of the spectral characteristic wavelength is very important, and the valuable spectral characteristic wavelength is screened out for the subsequent links, meanwhile, the model can be simplified, the calculated amount is reduced, and the modeling efficiency and accuracy are improved.
However, the existing spectral wavelength extraction method, such as a correlation coefficient method, needs to set a threshold value, but the value can only be set empirically, so that the randomness is too strong; partial least square modeling is a common modeling method, which includes a principal component analysis process, so that redundant calculation of the principal component analysis method is caused when extracting spectral characteristic wavelengths.
Disclosure of Invention
The present invention aims to solve at least to some extent one of the technical problems in the above-described technology. Therefore, a first object of the present invention is to provide a spectral feature extraction method suitable for biological quality detection, which can accurately select high-value feature wavelengths with high correlation with quality indexes, and improve modeling efficiency and modeling accuracy.
A second object of the present invention is to provide a detection method suitable for quality detection, which improves the modeling efficiency and the modeling accuracy by improving the extraction accuracy of the spectral characteristic wavelength, thereby improving the quality detection accuracy.
A third object of the present invention is to provide a computer-readable storage medium, on which a computer program stored thereon, after being executed by a processor, is capable of precisely selecting a high-value characteristic wavelength having a high correlation with a quality index, thereby improving modeling efficiency and modeling accuracy.
To achieve the above object, an embodiment of the present invention provides a spectral feature extraction method suitable for biomass quality detection, including:
s1: dividing the preprocessed spectrum data into K wave band intervals and then inputting the K wave band intervals as an S2 step; wherein K is an integer greater than 2;
s2: modeling is carried out according to the spectral data after eliminating one band interval in the input spectral data and the preset quality index data until all band intervals are eliminated, and a plurality of models are obtained;
s3: selecting a first optimal model in the plurality of models, and recording a corresponding wave band interval;
s4: taking the recorded wave band intervals as the input of the step S2, and returning to the step S2 until the step S3 only records one wave band interval, and obtaining a plurality of corresponding first optimal models;
s5: selecting a second optimal model in the corresponding plurality of first optimal models;
s6: and extracting characteristic wavelengths according to the wave band interval corresponding to the second optimal model.
According to the spectral feature extraction method suitable for biological quality detection, interference of irrelevant variables to a model is reduced through nested modeling circulation and preferential iteration, a band interval with highest correlation with a variety index is efficiently screened, and feature wavelengths are accurately extracted. By adopting the mode to select the spectral characteristics, the ultra-parameters are not required to be set artificially, the effectiveness of the selected characteristic wavelength is improved, and the valuable spectral characteristic wavelength is ensured to be screened out for the subsequent links; and the model can be greatly simplified, redundant calculation is removed, and the efficiency and the accuracy are obviously improved.
In addition, the spectral feature extraction method suitable for biological quality detection according to the above embodiment of the present invention may further have the following additional technical features:
optionally, the step S6 includes:
s61: splicing a random noise section after the wave band section corresponding to the second optimal model to obtain a mixed spectrum data set;
s62: modeling by using a leave-one method according to the mixed spectrum data set, obtaining a plurality of models and regression coefficient vectors corresponding to the models, and obtaining a regression coefficient matrix according to the regression coefficient vectors;
s63: calculating a first stability factor corresponding to each column of data of the regression coefficient matrix;
s64: obtaining a maximum value stability factor and a minimum value stability factor corresponding to the random noise interval;
s65: determining a second stability factor which is larger than the minimum stability factor and smaller than the maximum stability factor in the stability factors corresponding to the band interval;
s66: and eliminating column data corresponding to the second stability factor in the regression coefficient matrix to obtain characteristic wavelength.
Optionally, the step S63 includes:
and calculating standard deviation and average value corresponding to each column of data of the regression coefficient matrix, and then calculating a first stability factor corresponding to each column of data according to the standard deviation and the average value.
Optionally, before the step S1, the method includes:
s01: acquiring a plurality of original spectrum data samples;
s02: firstly, removing noise of the plurality of original spectrum data samples respectively by using a Savitzky-Golay smoothing method based on polynomial least square fitting, and then, respectively carrying out baseline correction on the plurality of original spectrum data samples after noise removal by using a self-adaptive iteration weighted least square method so as to obtain a plurality of preprocessed spectrum data.
Optionally, the step S1 includes:
s10: dividing a training set and a testing set from the preprocessed plurality of spectrum data by adopting an SPXY algorithm;
s11: and dividing the spectral data of the training set and the testing set into K band intervals respectively, and taking the K band intervals as the input of the step S2.
Optionally, the ratio of the training set to the test set is 3:1 or 4:1 or 5:1.
Optionally, the first optimal model and the second optimal model are selected according to the root mean square error obtained by cross-verifying the models as an excellent index of the models.
Optionally, the step S2 includes:
s21: traversing and rejecting a band interval in the input spectrum data;
s22: performing partial least square regression modeling according to the removed spectrum data and preset quality index data to obtain a corresponding model;
s23: judging whether traversing is completed or not; if yes, executing the step S3; if not, returning to the step S21.
To achieve the above object, according to a second aspect of the present invention, there is provided a method for detecting biological quality, comprising:
extracting spectral characteristics of the detected organism according to the spectral characteristic extraction method, and detecting the quality of the detected organism according to the spectral characteristics; wherein the detected organisms comprise fruits, vegetables and seafood.
According to the detection method suitable for biological quality detection, the efficiency and accuracy of selecting the characteristic wavelength are improved, so that the valuable spectral characteristic wavelength is ensured to be screened out for the subsequent link; and the efficiency and accuracy of biological quality detection are improved.
To achieve the above object, an embodiment of a third aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, is capable of implementing the above-described spectral feature extraction method.
According to the computer readable storage medium, when the computer program is executed by the processor, interference of irrelevant variables to a model is reduced through nested modeling circulation and preferential iteration, a band interval with highest correlation degree with variety indexes is efficiently screened, and further characteristic wavelengths are accurately extracted. By adopting the mode to select the spectral characteristics, the ultra-parameters are not required to be set artificially, the effectiveness of the selected characteristic wavelength is improved, and the valuable spectral characteristic wavelength is ensured to be screened out for the subsequent links; and the model can be greatly simplified, redundant calculation is removed, and the efficiency and the accuracy are obviously improved.
Drawings
Fig. 1 is a schematic flow chart of a spectral feature extraction method suitable for biological quality detection according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a modeling loop and preferred iteration flow in a spectral feature extraction method suitable for biological quality detection according to an embodiment of the present invention;
FIG. 3 is a spectrum example of a mixed spectrum data set in a spectrum feature extraction method suitable for biological quality detection according to an embodiment of the present invention;
FIG. 4 is a schematic illustration of a final characteristic wavelength obtained in a spectral feature extraction method for quality detection according to an embodiment of the present invention;
fig. 5 is a root mean square error RMSECV corresponding to a plurality of models obtained in a certain iteration in a spectral feature extraction method suitable for fruit quality detection according to an embodiment of the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present invention and should not be construed as limiting the invention.
The interference of irrelevant variables on the model is reduced through a nested modeling loop and a preferential iteration mode, the band interval with the highest correlation degree with the variety index is efficiently screened, and then the characteristic wavelength is accurately extracted.
In order that the above-described aspects may be better understood, exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
In order to better understand the above technical solutions, the following detailed description will refer to the accompanying drawings and specific embodiments.
The existing spectrum wavelength extraction method, such as a correlation coefficient method, needs to manually set super parameters, but the value can only be set according to experience, so that the randomness is too strong; the partial least square method modeling adopted in the modeling process mainly comprises a principal component analysis process, so that redundant calculation of the principal component analysis method is caused when spectral characteristic wavelengths are extracted. The present invention aims to solve at least the above problems.
Fig. 1 is a schematic flow chart of a spectral feature extraction method suitable for biological quality detection according to an embodiment of the present invention. As shown in fig. 1, a spectral feature extraction method suitable for biological quality detection according to an embodiment of the present invention includes:
s1: dividing the preprocessed spectrum data into K wave band intervals and then inputting the K wave band intervals as an S2 step; wherein K is an integer greater than 2.
The preprocessed spectrum data is used as basic data, so that accuracy and reliability of subsequent data processing are improved, and performance of the model is improved.
The spectral data, i.e. the data of K band intervals are preferably divided equally to improve the efficiency of the analysis processing of the band intervals.
Optionally, the value range of K is 12-19; preferably 15. The value can avoid overlong calculation time and can be helpful for screening out the spectrum characteristic wavelength with proper length and the most value.
S2: after each band interval in the input spectrum data is removed, modeling is carried out according to the removed spectrum data and preset quality index data until all band intervals are removed, and a plurality of models are obtained.
Namely, after spectral data is input, removing one band interval, and modeling according to the rest spectral data and the preset quality index data to obtain a corresponding model; and circulating in this way until each band interval of the input spectrum data is removed once, and finally obtaining K models.
Here, the preset quality index data refers to quality index data calculated by using a chemometric method for a living being to be measured. If the detected organism is fruit, the corresponding quality index data is the quality index data of water content, sweetness and the like.
S3: and selecting a first optimal model in the plurality of models, and recording a corresponding wave band interval.
That is, the best model is selected from the K models obtained in step S2, and is labeled as the first best model.
The first optimal model may be selected in various ways, such as K-Fold method, aliasing matrix (precision, recall, F1 index), ROC curve, PR curve, and cross-validation. Preferably, the root mean square error obtained by cross-validation is used as an excellent index of the model, and if the root mean square error of the model is smaller, the prediction effect of the corresponding model is better.
The steps S1 to S3 are modeling cycles of the present embodiment, and the objective is to determine a least useful band segment from the spectrum data obtained in step S1 and discard the band segment. In other words, the modeling loop procedure proves that if the band section of which the first optimal model is removed in the step S2 is removed from the spectrum data, the modeling effect is better than that of any other band section, so that the removed band section is the most useless.
S4: and taking the recorded band intervals as the input of the step S2, and returning to the step S2 until only one band interval is recorded in the step S3, so as to obtain a plurality of corresponding first optimal models.
Thus, the preferred iteration process of this embodiment is entered. The modeling loop process is nested in the preferential iteration, the residual wave band interval obtained through the first modeling loop after the least useful wave band interval is removed is input into the step S2, and the preferential iteration process is started. In each round of preferential iteration, a modeling cycle is carried out, one 'least useful band interval' is removed, and the remaining band intervals of 'output' are used as inputs for the next round of preferential iteration. This is iterated until "output" leaves only one band interval.
So far, k-1 first optimal models are obtained through a nested modeling loop and a preferential iteration mode.
S5: and selecting a second optimal model in the corresponding plurality of first optimal models.
It will be appreciated that the superior performance of these first optimal models is manifested by the value of the band data corresponding thereto. Thus, the second best performing model is selected from the k-1 first best models, and its corresponding band interval is the most valuable band data.
That is, nested modeling loops and preferential iterations are aimed at determining the most valuable band data from the spectral data obtained in step S1. The band interval data may correspond to one or more band intervals.
S6: and extracting characteristic wavelengths according to the wave band interval corresponding to the second optimal model.
On the premise that the most valuable band data is acquired, there are various ways to extract the corresponding characteristic wavelength.
Referring to fig. 2 to fig. 4, fig. 2 is a schematic flow diagram of a modeling loop and a preferential iteration flow in a spectral feature extraction method suitable for biological quality detection according to an embodiment of the present invention; FIG. 3 is a spectrum example of a mixed spectrum data set in a spectrum feature extraction method suitable for biological quality detection according to an embodiment of the present invention; fig. 4 is a schematic illustration of a characteristic wavelength finally obtained in a spectral feature extraction method suitable for biological quality detection according to an embodiment of the present invention.
The embodiment is further expanded on the basis of the embodiment of fig. 1, and specifically, each step of the embodiment is further refined.
The spectral feature extraction method suitable for biological quality detection provided by the embodiment of the invention comprises the following steps:
s01: acquiring a plurality of original spectrum data samples;
the original spectrum data acquired by one sample corresponds to the original spectrum data acquired by one time, and the processing is performed based on a plurality of samples, so that the accuracy of a data result is improved.
S02: preprocessing an original spectrum data sample;
optionally, the specific process of pretreatment includes:
firstly, removing noise of the plurality of original spectrum data samples respectively by using a Savitzky-Golay smoothing method based on polynomial least square fitting, and then, respectively carrying out baseline correction on the plurality of original spectrum data samples after noise removal by using a self-adaptive iteration weighted least square method so as to obtain a plurality of preprocessed spectrum data.
S1: preparing the pretreated spectrum data sample before modeling;
optionally, the pre-modeling preparation process includes:
s10: dividing a training set and a testing set from the preprocessed plurality of spectrum data by adopting an SPXY algorithm;
the optional division ratio is 3:1 or 4:1 or 5:1. Preferably 3:1. The proper dividing proportion is beneficial to improving the model training efficiency and simultaneously can ensure the reliability of the model training result.
S11: and dividing the spectral data of the training set and the testing set into K band intervals respectively, and taking the K band intervals as the input of the step S2.
Assuming that the spectral data has a full band (780-1800 nm), if the spectral data is equally divided into 10 band sections, each band section is (780-882, 882-984, 984-1086.). And subsequently, removing the data of the band intervals 882-984 of all samples when the band intervals 882-984 are removed.
As shown in fig. 2, the nested modeling loops and preferential iteration flow will then be entered:
s2: modeling is carried out according to the spectral data after eliminating one band interval in the input spectral data and the preset quality index data until all band intervals are eliminated, and a plurality of models are obtained;
optionally, the method specifically includes:
s21: traversing and rejecting a band interval in the input spectrum data;
here, traversing culling indicates that a band interval will be culled each time in sequence.
S22: performing partial least square regression modeling according to the removed spectrum data and preset quality index data to obtain a corresponding model;
assuming that the band interval 882 to 984 of the above 10 band intervals is eliminated, the remaining 9 band interval data (780 to 882, 984 to 1086.) are referred to as "spectral data after elimination".
S23: judging whether traversing is completed, namely whether all wave band intervals are eliminated once; if yes, executing the step S3; if not, returning to the step S21, and continuously eliminating the next wave band interval.
S3: selecting a first optimal model in the plurality of models, and recording a corresponding wave band interval;
here, the first optimal model and a second optimal model described below are selected according to the root mean square error obtained by cross-verifying the models as an excellent index of the models.
S4: taking the wave band interval set recorded in the step S3 as the input of the step S2, and returning to execute the step S2 until only one wave band interval is recorded in the step S3, and acquiring a plurality of corresponding first optimal models;
s5: selecting a second optimal model in the corresponding plurality of first optimal models;
s6: and extracting characteristic wavelengths according to the wave band interval corresponding to the second optimal model.
Optionally, the step S6 includes:
s61: splicing a random noise section after the wave band section corresponding to the second optimal model to obtain a mixed spectrum data set; as an example of a blended spectral dataset, please refer to fig. 3.
The value range of the random noise section is preferably slightly larger than the maximum value and slightly smaller than the minimum value of the band section data corresponding to the second optimal model.
S62: modeling by using a leave-one method according to the mixed spectrum data set, obtaining a plurality of models and regression coefficient vectors corresponding to the models, and obtaining a regression coefficient matrix according to the regression coefficient vectors;
the method is used, the spectrum data obtained in the step S02 are reserved as a test set, and the rest spectrum data are combined with preset quality index data to be modeled, so that regression coefficient vectors in the model are obtained; and repeatedly reserving each spectrum data for modeling, and combining regression coefficient vectors corresponding to all models to obtain a regression coefficient matrix. Among them, modeling using the partial least square method is preferable.
S63: calculating a first stability factor corresponding to each column of data of the regression coefficient matrix;
the standard deviation and the average value corresponding to each column of data of the regression coefficient matrix are calculated, and then the first stability factor corresponding to each column of data is calculated according to the standard deviation and the average value.
S64: obtaining the maximum value stability factor and the minimum value stability factor in all the first stability factors of the random noise interval corresponding to the regression coefficient matrix;
s65: determining a second stability factor which is larger than the minimum stability factor and smaller than the maximum stability factor in all stability factors of the band interval corresponding to the regression coefficient matrix;
s66: and eliminating column data corresponding to the second stability factor in the regression coefficient matrix, wherein the rest data is the required characteristic wavelength. As an example of the acquired characteristic wavelength, please refer to fig. 4.
The embodiment can effectively select the wavelength with the highest correlation degree with the quality index, and reduce the interference of irrelevant variables to the model.
The embodiment of the invention also provides a spectral feature extraction method suitable for fruit quality detection. The method comprises the following steps:
1. calculating quality index data Y of water content, sweetness and the like in fruits by using a chemometry;
2. after original near infrared spectrum sample data which are acquired by a spectrometer for a plurality of times under the same condition are acquired, the following preprocessing operations are respectively carried out on the spectrum sample data: removing noise of spectrum sample data by using a Savitzky-Golay smoothing method based on polynomial least square fitting, and performing baseline correction on the spectrum sample data by using an adaptive iteration weighted least square method.
3. The preprocessed spectral sample data is uniformly divided into k band intervals, respectively, k is preferably set to 15.
In this embodiment, a model is required to be built for a number of times n (n-1)/2, and the number of times should be about 100 times in order to avoid too long calculation time, so that the k value range is preferably 12-19.
4. The spectrum sample data was subjected to the SPXY algorithm at 3:1 (or 4:1 or 5:1) to divide the training set and the test set.
The SPXY algorithm is as follows: calculating the mahalanobis distance between the two samplesAnd sequentially selecting two samples with the largest mahalanobis distance to be placed in the training set until the division ratio of the training set and the testing set (namely 3:1) is met, so as to prepare for modeling.
5. After a band interval of the spectrum sample data is removed, combining the rest data set of the spectrum sample data with the quality index data Y to perform partial least square regression modeling. Repeating the above operation until all band intervals of the spectrum sample data are removed once. A number of models are ultimately obtained:
in the method, in the process of the invention,vector form representing a sample, +.>Vector form representing another sample, +.>The inverse, T, representing the covariance of the overall sample is the transpose of the matrix.
6. Taking the value of the root mean square error RMSECV obtained by cross verification as an index for evaluating whether the modeled model is good or not, wherein the smaller the root mean square error obtained by cross verification is, the better the model prediction effect is, and selecting a best model from the multiple models obtained in step 5, and recording the corresponding wave band interval; fig. 5 shows the root mean square error RMSECV corresponding to the multiple models obtained in a certain iteration 5.
In the method, in the process of the invention,predictive value representing test set sample, +.>A true value representing a sample of the test set, +.>Representing the total number of samples in cross-validation.
7. Inputting the data set corresponding to the band interval of the best model in the step 6 into the step 4, and repeatedly executing the steps 4 to 7 until only one band interval is recorded;
it is assumed that the model is divided into 15 wave bands, 15 models are obtained by 5 (a wave band interval is removed in sequence and modeling is performed), each model has an RMSECV, the smaller the RMSECV is, the better the RMSECV is, the best model is selected from the 15 models, and the RMSECV of the model and the corresponding 14 wave band intervals are recorded. The above steps are 5.and 6, namely one round of loop iteration. Taking as input the 14 band intervals of this best model, repeat 5.and 6, i.e. loop execution 5.—7. I.e. another iteration of the last loop iteration is nested. With each iteration (5.—7.) one of the most useless band intervals is eliminated.
8. From all the best models in the step 6, evaluating the best model in the best models, and taking the wave band interval of the best model as a result;
the characteristic wavelength is extracted from the band interval of 8:
9. and 8, acquiring a data set X (m) corresponding to the middle band interval, creating a random noise set R (m) and splicing the noise set to the rear of the data set to form a mixed spectrum XR (m) 2n, wherein the range of the random noise should be in a range slightly larger than the maximum value of the data set X and slightly smaller than the minimum value of the data set X. The mixed spectrum XR is shown in FIG. 3, where m is the number of samples and n is the number of wavelengths in the mid-band region of 8.
10. And (3) using a leave-one method, reserving one spectrum sample data as a test set, combining the data set of the rest spectrum sample data with the quality index data Y, and performing partial least square modeling to obtain a regression coefficient vector b (1 x 2 n) in the model. Repeating for m times, ensuring that the spectrum sample data reserved each time are different, establishing m models, and combining regression coefficient vectors B (1 x 2 n) in the m models to obtain a regression coefficient matrix B (m x 2 n);
11. calculating the standard deviation of matrix B (m x 2 n) by columnMean value->Then calculate the corresponding stability of each columnFactor Ci.
Wherein, the calculation formula is:
in the method, in the process of the invention,for the ith column data of regression coefficient matrix B, < ->The j-th row and i-th column data of the regression coefficient matrix B.
12. Taking the maximum Cmax and the minimum Cmin of the stabilizing factor Ci from the [ n+1,2n ] noise interval in the regression coefficient matrix B;
13. and traversing the [1, n ] data interval in the regression coefficient matrix B, judging the stability factor Ci of each column, and removing the corresponding column data if Cmin < Ci < Cmax, wherein the residual wavelength of the regression coefficient matrix B is the final spectral characteristic wavelength.
The embodiment of the invention also provides a detection method suitable for biological quality detection, which comprises the following steps:
the spectral feature extraction method according to any one of the embodiments is used to extract spectral features of a detected organism, and quality detection is performed on the detected organism according to the extracted spectral features; wherein the tested organisms include fruits, vegetables and seafood alone.
The embodiment of the present invention further provides a computer readable storage medium, on which a computer program is stored, where the program, when executed by a processor, can implement the spectral feature extraction method described in any one of the above embodiments, so as to extract a spectral feature of a detected living being.
According to the spectral feature extraction method suitable for biological quality detection, the detection method suitable for biological quality detection and the computer readable storage medium, interference of irrelevant variables on a model is reduced through nested modeling circulation and a preferential iteration mode, a band interval with highest correlation degree with variety indexes is efficiently screened, and feature wavelengths are accurately extracted. By adopting the mode to select the spectral characteristics, the ultra-parameters are not required to be set artificially, the effectiveness of the selected characteristic wavelength is improved, and the valuable spectral characteristic wavelength is ensured to be screened out for the subsequent links; and the model can be greatly simplified, redundant calculation is removed, and the efficiency and the accuracy are obviously improved.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.
In the description of the present invention, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.
In the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," "secured," and the like are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communicated with the inside of two elements or the interaction relationship of the two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.
In the present invention, unless expressly stated or limited otherwise, a first feature "up" or "down" a second feature may be the first and second features in direct contact, or the first and second features in indirect contact via an intervening medium. Moreover, a first feature being "above," "over" and "on" a second feature may be a first feature being directly above or obliquely above the second feature, or simply indicating that the first feature is level higher than the second feature. The first feature being "under", "below" and "beneath" the second feature may be the first feature being directly under or obliquely below the second feature, or simply indicating that the first feature is less level than the second feature.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms should not be understood as necessarily being directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.

Claims (10)

1. A spectral feature extraction method suitable for biological quality detection, comprising:
s1: dividing the preprocessed spectrum data into K wave band intervals and then inputting the K wave band intervals as an S2 step; wherein K is an integer greater than 2;
s2: modeling is carried out according to the spectral data after eliminating one band interval in the input spectral data and the preset quality index data until all band intervals are eliminated, and a plurality of models are obtained;
s3: selecting a first optimal model in the plurality of models, and recording a corresponding wave band interval;
s4: taking the recorded wave band intervals as the input of the step S2, and returning to the step S2 until the step S3 only records one wave band interval, and obtaining a plurality of corresponding first optimal models;
s5: selecting a second optimal model in the corresponding plurality of first optimal models;
s6: and extracting characteristic wavelengths according to the wave band interval corresponding to the second optimal model.
2. The spectral feature extraction method according to claim 1, wherein the step S6 includes:
s61: splicing a random noise section after the wave band section corresponding to the second optimal model to obtain a mixed spectrum data set;
s62: modeling by using a leave-one method according to the mixed spectrum data set, obtaining a plurality of models and regression coefficient vectors corresponding to the models, and obtaining a regression coefficient matrix according to the regression coefficient vectors;
s63: calculating a first stability factor corresponding to each column of data of the regression coefficient matrix;
s64: obtaining a maximum value stability factor and a minimum value stability factor corresponding to the random noise interval;
s65: determining a second stability factor which is larger than the minimum stability factor and smaller than the maximum stability factor in the stability factors corresponding to the band interval;
s66: and eliminating column data corresponding to the second stability factor in the regression coefficient matrix to obtain characteristic wavelength.
3. The spectral feature extraction method according to claim 2, wherein the S63 includes:
and calculating standard deviation and average value corresponding to each column of data of the regression coefficient matrix, and then calculating a first stability factor corresponding to each column of data according to the standard deviation and the average value.
4. The method for extracting spectral features as defined in claim 1, wherein before the step S1, the method comprises:
s01: acquiring a plurality of original spectrum data samples;
s02: firstly, removing noise of the plurality of original spectrum data samples respectively by using a Savitzky-Golay smoothing method based on polynomial least square fitting, and then, respectively carrying out baseline correction on the plurality of original spectrum data samples after noise removal by using a self-adaptive iteration weighted least square method so as to obtain a plurality of preprocessed spectrum data.
5. The spectral feature extraction method according to claim 4, wherein the step S1 comprises:
s10: dividing a training set and a testing set from the preprocessed plurality of spectrum data by adopting an SPXY algorithm;
s11: and dividing the spectral data of the training set and the testing set into K band intervals respectively, and taking the K band intervals as the input of the step S2.
6. The spectral feature extraction method of claim 5, wherein the ratio of training set to test set is 3:1 or 4:1 or 5:1.
7. The spectral feature extraction method according to claim 1, wherein the first optimal model and the second optimal model are selected based on a root mean square error obtained by cross-verifying the models as an excellent index of the models.
8. The spectral feature extraction method according to claim 1, wherein the step S2 comprises:
s21: traversing and rejecting a band interval in the input spectrum data;
s22: performing partial least square regression modeling according to the removed spectrum data and preset quality index data to obtain a corresponding model;
s23: judging whether traversing is completed or not; if yes, executing the step S3; if not, returning to the step S21.
9. A detection method suitable for biological quality detection, comprising:
the method for extracting spectral features according to any one of claims 1 to 8, extracting spectral features of a living being to be detected, and performing quality detection on the living being to be detected according to the spectral features; wherein the detected organisms comprise fruits, vegetables and seafood.
10. A computer readable storage medium, characterized in that a computer program is stored thereon, which program, when being executed by a processor, is capable of realizing the spectral feature extraction method according to any of the preceding claims 1 to 8.
CN202311265639.8A 2023-09-28 2023-09-28 Spectral feature extraction method and detection method suitable for biological quality detection Active CN116992259B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311265639.8A CN116992259B (en) 2023-09-28 2023-09-28 Spectral feature extraction method and detection method suitable for biological quality detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311265639.8A CN116992259B (en) 2023-09-28 2023-09-28 Spectral feature extraction method and detection method suitable for biological quality detection

Publications (2)

Publication Number Publication Date
CN116992259A true CN116992259A (en) 2023-11-03
CN116992259B CN116992259B (en) 2023-12-08

Family

ID=88528750

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311265639.8A Active CN116992259B (en) 2023-09-28 2023-09-28 Spectral feature extraction method and detection method suitable for biological quality detection

Country Status (1)

Country Link
CN (1) CN116992259B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117556246A (en) * 2024-01-09 2024-02-13 电信科学技术第五研究所有限公司 Method for separating single wave signal from carrier mixed signal
CN117848973A (en) * 2024-03-07 2024-04-09 铜川市人民医院 Intelligent detection method and system for medicine components based on anti-infection clinical pharmacy
CN117848973B (en) * 2024-03-07 2024-05-28 铜川市人民医院 Intelligent detection method and system for medicine components based on anti-infection clinical pharmacy

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170261486A1 (en) * 2015-05-21 2017-09-14 Zhejiang University Of Science And Technology Pearl grading method
CN107271375A (en) * 2017-07-21 2017-10-20 石河子大学 A kind of high spectral image detecting method of quality of mutton index
CN108982406A (en) * 2018-07-06 2018-12-11 浙江大学 A kind of soil nitrogen near-infrared spectral characteristic band choosing method based on algorithm fusion
CN111999260A (en) * 2020-08-03 2020-11-27 自然资源实物地质资料中心 Method for identifying lithium-containing pegmatite by thermal infrared spectrum and application of thermal infrared spectrum
CN113435115A (en) * 2021-06-21 2021-09-24 安徽理工大学 Fluorescence spectrum characteristic wavelength screening method and device, computer equipment and readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170261486A1 (en) * 2015-05-21 2017-09-14 Zhejiang University Of Science And Technology Pearl grading method
CN107271375A (en) * 2017-07-21 2017-10-20 石河子大学 A kind of high spectral image detecting method of quality of mutton index
CN108982406A (en) * 2018-07-06 2018-12-11 浙江大学 A kind of soil nitrogen near-infrared spectral characteristic band choosing method based on algorithm fusion
CN111999260A (en) * 2020-08-03 2020-11-27 自然资源实物地质资料中心 Method for identifying lithium-containing pegmatite by thermal infrared spectrum and application of thermal infrared spectrum
CN113435115A (en) * 2021-06-21 2021-09-24 安徽理工大学 Fluorescence spectrum characteristic wavelength screening method and device, computer equipment and readable storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117556246A (en) * 2024-01-09 2024-02-13 电信科学技术第五研究所有限公司 Method for separating single wave signal from carrier mixed signal
CN117556246B (en) * 2024-01-09 2024-03-19 电信科学技术第五研究所有限公司 Method for separating single wave signal from carrier mixed signal
CN117848973A (en) * 2024-03-07 2024-04-09 铜川市人民医院 Intelligent detection method and system for medicine components based on anti-infection clinical pharmacy
CN117848973B (en) * 2024-03-07 2024-05-28 铜川市人民医院 Intelligent detection method and system for medicine components based on anti-infection clinical pharmacy

Also Published As

Publication number Publication date
CN116992259B (en) 2023-12-08

Similar Documents

Publication Publication Date Title
CN109493287B (en) Deep learning-based quantitative spectral data analysis processing method
CN116992259B (en) Spectral feature extraction method and detection method suitable for biological quality detection
CN112669915B (en) Pear nondestructive testing method based on neural network and near infrared spectrum
CN112329609A (en) Feature fusion transfer learning arrhythmia classification system based on 2D heart beat
CN111340248A (en) Transformer fault diagnosis method and system based on intelligent integration algorithm
CN113361385B (en) Heart sound classification method and system, readable storage medium and electronic device
CN113030001B (en) Fruit sugar degree detection method and system
CN114416707A (en) Method and device for automated feature engineering of industrial time series data
CN113420795A (en) Mineral spectrum classification method based on void convolutional neural network
CN111693487A (en) Fruit sugar degree detection method and system based on genetic algorithm and extreme learning machine
CN115878966A (en) Heart sound data enhancement method and system based on countermeasure generation network
Bhole et al. A transfer learning-based approach to predict the shelf life of fruit
CN112485217B (en) Construction method and device of meat identification model applied to origin tracing
Zheng et al. Effective band selection of hyperspectral image by an attention mechanism-based convolutional network
CN116898455B (en) Sleep electroencephalogram signal detection method and system based on deep learning model
Janghorbani et al. Prediction of acute hypotension episodes using logistic regression model and support vector machine: A comparative study
CN102135496A (en) Infrared spectrum quantitative analysis method and infrared spectrum quantitative analysis device based on multi-scale regression
CN116584902A (en) Heart sound classification device based on feature optimization and visualization
US11610112B2 (en) Method for the computer-aided configuration of a data-driven model on the basis of training data
CN112782148B (en) Method for rapidly identifying Arabica and Robertia coffee beans
CN113686810B (en) Near infrared spectrum wavelength selection method based on convolutional neural network
CN114354666B (en) Soil heavy metal spectral feature extraction and optimization method based on wavelength frequency selection
CN116026795A (en) Rice grain quality character nondestructive prediction method based on reflection and transmission spectrum
CN113049526B (en) Corn seed moisture content determination method based on terahertz attenuated total reflection
CN112881333B (en) Near infrared spectrum wavelength screening method based on improved immune genetic algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant