CN116008245A - Application of Sang Shela Manchurian spectral fingerprint establishment combined with machine learning algorithm in mulberry leaf origin identification - Google Patents

Application of Sang Shela Manchurian spectral fingerprint establishment combined with machine learning algorithm in mulberry leaf origin identification Download PDF

Info

Publication number
CN116008245A
CN116008245A CN202210723329.5A CN202210723329A CN116008245A CN 116008245 A CN116008245 A CN 116008245A CN 202210723329 A CN202210723329 A CN 202210723329A CN 116008245 A CN116008245 A CN 116008245A
Authority
CN
China
Prior art keywords
mulberry leaf
different
raman
water
genus
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210723329.5A
Other languages
Chinese (zh)
Inventor
王亮
马张文
唐佳伟
刘清华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong General Hospital
Original Assignee
Guangdong General Hospital
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong General Hospital filed Critical Guangdong General Hospital
Priority to CN202210723329.5A priority Critical patent/CN116008245A/en
Publication of CN116008245A publication Critical patent/CN116008245A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Investigating, Analyzing Materials By Fluorescence Or Luminescence (AREA)

Abstract

The invention relates to a method for establishing a mulberry She Laman spectral fingerprint and application of a machine learning algorithm in mulberry leaf origin identification, which realizes the purpose of obtaining a mulberry leaf extract Raman spectral fingerprint in a short time by a simple method, solves the problems of complex operation, long period, high detection cost and the like of other detection methods at present, and on the other hand, adopts a deep learning model and Raman spectral detection to convert the problem of identifying the Raman spectrum into the problem of classifying of machine learning, realizes batch real-time processing, and can intelligently identify and classify the mulberry leaf extract with higher speed, higher accuracy and recall rate.

Description

Application of Sang Shela Manchurian spectral fingerprint establishment combined with machine learning algorithm in mulberry leaf origin identification
Technical Field
The invention belongs to the technical field of spectrum analysis, and particularly relates to application of a Raman spectrum method to quick analysis of mulberry leaves from different sources, establishment of an average Raman spectrum of the mulberry leaves and combination of machine learning and deep learning models in the aspect of source of the mulberry leaves.
Background
Mulberry leaf is dry leaf of Morus alba L of Moraceae, and is collected after frost, removed with impurities, sun-dried, and sweet, bitter and cold in nature according to Chinese pharmacopoeia (2020 edition). It enters lung and liver meridians, and has the effects of dispelling wind-heat, clearing lung-heat, moistening dryness, removing liver-fire and improving eyesight. Mulberry leaves have been widely used in medicinal materials since ancient times, have a wide medicinal effect, are related to active ingredients such as polysaccharides, phenols, alkaloids, sterols and the like, and mainly have the effects of reducing blood fat, reducing blood sugar, protecting liver, resisting oxidation and resisting atherosclerosis, so that the mulberry leaves have an important effect on preventing and controlling diseases as a traditional Chinese medicine.
The analysis of the components and the content of the plant medicinal materials is the key point for distinguishing the medicinal materials in different producing areas. However, the content of active ingredients in mulberry leaves can be different due to the fact that the mulberry leaves come from different producing places and varieties and are picked and treated at different times, and further the quality and quality of the mulberry leaves are affected. The common analysis methods include Thin Layer Chromatography (TLC), gas chromatography-mass spectrometry (GC-MS), high Performance Liquid Chromatography (HPLC), high throughput sequencing and the like, and are widely applied to the quality control of medicinal materials, and the method is mature. The traditional Chinese medicine fingerprint spectrum is established by utilizing classical high performance liquid chromatography, and the multi-component is quantified as a comprehensive and quantitative identification means, which is established on the basis of the research of a traditional Chinese medicine chemical composition system and is mainly used for evaluating the authenticity, superiority and stability of the quality of traditional Chinese medicine materials and semi-finished products of traditional Chinese medicine preparations. The method can reflect the integrity of the traditional Chinese medicine, but has the advantages of complex operation, long judging time period and high detection cost, and simultaneously ignores the influence of the difference of chemical components of the mulberry leaves and the quantity difference of main components on the quality of the medicinal materials. Therefore, the establishment of a simple, convenient, cheap, rapid and efficient detection means has important significance for classifying, distinguishing and identifying the mulberry leaves.
Raman spectroscopy has rich substance property information and is an important component of modern substance characteristic analysis technology. The traditional Raman spectrum has weak signal intensity, the defect can be effectively overcome by using the surface enhanced Raman spectrum, and the characteristic information which is difficult to capture by the common Raman spectrum can be obtained. Raman spectroscopy is a fingerprint identification technique used in analytical tools for molecular characterization, molecular identification and molecular quantification. The Raman spectrum is based on Raman scattering effect and has vibration spectrum with molecular fingerprint information, and each substance has unique spectrum information different from other substances, so that specific fingerprint calibration is carried out on functional groups inside the substance. Raman spectra are typically measured in wavenumbers (cm -1 ) Reported in units. The spectral range of the acquired raman spectrum is not particularly limited, and useful ranges include raman shifts in the chemical fingerprint region corresponding to a typical range of polyatomic vibrational frequencies and in the structural fingerprint region corresponding to vibrational modes. The characteristic Raman shift of the molecules is detected by a Raman spectrometer, so that chemical information of the molecules can be obtained from a sample as much as possible, and the fingerprint of the mulberry leaves is provided, so that the mulberry leaves are identified according to the Raman spectra of the mulberry leaves. The raman spectrum signal contains noise such as fluorescence and various infection sources, and has a great influence on measurement. The Raman spectrum pretreatment technology is used for eliminating fluorescence and noise of the measured spectrum before qualitative and quantitative analysis of the sample, and reliable and effective data are provided for Raman spectrum substance identification.
In the prior art, for classifying plants in different places, data processing is mostly performed by adopting a traditional machine learning method, and a PLS-DA algorithm is used in a method for identifying the place where the rhizoma polygonati oil is produced as disclosed in patent application with the application publication number of CN112485238A, and the traditional algorithm is often inaccurate and requires a complex feature extraction technology. With the continuous development of machine learning and artificial intelligence technology, the method based on machine learning and intelligent Raman spectrum data processing and identification will be the development trend of Raman spectrum instrument processing and identification.
Disclosure of Invention
The invention aims to solve the problems of regional and source tracing which cannot be solved by the existing mulberry leaf quality detection method, and the characteristics of low sensitivity of the existing Raman signal are overcome by taking water and ethanol aqueous solution as extraction solvents to prepare mulberry leaf extraction samples, obtaining mulberry leaf water extraction samples and mulberry leaf alcohol extraction samples and utilizing surface enhanced Raman spectrum detection. The original characteristic spectrum of the mulberry leaf sample is repeatedly collected for multiple times at multiple sites, all the Raman spectrums are processed, the average Raman spectrum and the corresponding characteristic peak of each mulberry leaf are calculated, the mulberry leaves from different mulberry leaves are distinguished through the difference between the characteristic peak and the functional group, a rapid, efficient, safe and reliable detection means is provided for distinguishing and identifying the mulberry leaves, and the problems of long time consumption and complex operation of the traditional Chinese medicine detection method such as chromatography and mass spectrometry are solved.
The invention provides a machine learning mulberry leaf attribute identification and classification method based on Raman spectrum, which is characterized in that a large amount of data is quickly and effectively converted into valuable characteristic information by combining a surface enhanced Raman spectrum with a deep learning algorithm, and further, the subtle differences among different mulberry leaf categories or subcategories are analyzed, and classification is performed based on sample Raman spectrum data extracted from different mulberry leaves, so that automatic high-speed and high-precision classification of mulberry leaf extracts is realized.
In order to achieve the above object, the present invention has the following technical scheme:
the application of the establishment of the mulberry She Laman spectral fingerprint spectrum in combination with a machine learning algorithm in the identification of mulberry leaf genus origin comprises the following steps:
(1) Sample preparation: selecting a plurality of mulberry leaves from different genus areas, wherein each mulberry leaf from different genus areas is respectively prepared into a mulberry leaf alcohol extract sample from different genus areas and a mulberry leaf water extract sample from different genus areas according to the way of alcohol extraction and water extraction;
(2) Sample pretreatment: adding sodium citrate solution into the boiled silver nitrate solution to perform stirring reaction, centrifuging the obtained reaction solution, and re-suspending the obtained precipitate in deionized water to obtain a Raman enhanced substrate with negative electricity nano silver particles;
Respectively dissolving each of the mulberry leaf alcohol extract samples of the genus source obtained in the step (1) into an ethanol water solution with the volume concentration of 60-80%, centrifuging, taking supernatant, and sequentially mixing with the prepared Raman enhanced substrate to obtain high-sensitivity Raman signal detection substrates for mulberry leaf alcohol extracts of different genus sources;
respectively dissolving each of the samples of the mulberry leaf water extracts of the genus origin obtained in the step (1) in water, centrifuging, taking supernatant, and sequentially mixing with the prepared Raman enhanced substrate to obtain high-sensitivity Raman signal detection substrates for the mulberry leaf water extracts of the genus origin;
(3) SERS detection: sequentially carrying out multiple times and multi-site Raman spectrum sampling on the high-sensitivity Raman signal detection substrate for the mulberry leaf alcohol extract with different origins and the high-sensitivity Raman signal detection substrate for the mulberry leaf water extract with different origins obtained in the step (2) to respectively obtain corresponding Raman spectra of the mulberry leaf alcohol extract with different origins and the mulberry leaf water extract with different origins, thereby constructing Raman spectrum databases of mulberry leaves with different extraction modes and different origins;
(4) Data preprocessing: sequentially performing curve smoothing, baseline correction and normalization treatment on the raman spectra of the mulberry leaf alcohol extracts from different genera and the mulberry leaf water extracts from different genera obtained in the step (3);
(5) Qualitative analysis: carrying out PCA qualitative analysis on the pretreated Raman spectrum data obtained in the step (4), judging the category of the sample by comparing the differences between the mulberry leaf extracts of different extraction modes and/or different genus areas to form a PCA classification quadrant of a Raman vector, and qualitatively distinguishing the mulberry leaf extracts of different extraction modes and/or different genus areas by observing the result of a PCA classification quadrant graph;
(6) Data classification: automatically analyzing the Raman signal data of all the preprocessed samples in the step (4) by adopting different machine learning algorithms, dividing Raman spectrum data sets of mulberry leaf extracts with different extraction modes and/or different genus areas into a training set, a verification set and a test set by adopting a uniform random sampling mode, training a classifier, detecting by using K-fold cross verification, wherein K is any integer of 1-10, carrying out sample category attribution and labeling on the sample data by the trained classifier, and storing the sample category attribution and the labeled sample data into a database;
(7) Data evaluation: selecting part of the remaining samples for carrying out prediction capability test, correcting the final judgment model by using accuracy, recall ratio, ROC curve and confusion matrix, evaluating the performances of different machine learning algorithms, and selecting the optimal judgment model, wherein the method specifically comprises the following steps:
Figure BDA0003710158430000041
Figure BDA0003710158430000042
Figure BDA0003710158430000043
Wherein: precision represents accuracy, recall represents Recall, AUC represents area under ROC curve, which is considered as performance index; TP, FP, TN and FN represent the number of true positives, false positives, true negatives and false negatives, respectively, by which the confusion matrix is constructed.
For the invention, in the step (1), mulberry leaves from different areas are selected, corresponding mulberry leaf extraction samples are prepared according to the same steps aiming at mulberry leaves from different sources, the mulberry leaf extraction samples are crushed, solvent is boiled and filtered to obtain an extracting solution, the extracting solution is concentrated, freeze-dried to obtain a mulberry leaf extract dry extract paste sample, and the mulberry leaf extract dry extract paste sample is stored in a refrigerator at 4 ℃ for waiting to be detected.
Preferably, the mulberry leaves are selected from different sources, and the sources of the mulberry leaves are not less than two, and the mulberry leaves can be selected according to actual needs, namely 3, 5, 8 and 10.
Preferably, the extraction solvent used for extracting the mulberry leaf component in the sample preparation process is pure water and 60-80% ethanol water solution, preferably 70% ethanol water solution.
In a preferred embodiment, in step (1), the sample preparation process is as follows:
the extraction process of the mulberry leaf alcohol extract of each origin is as follows: mixing and boiling each of the mulberry leaf powder of the origin of the land with an ethanol water solution with the volume concentration of 60-80%, filtering, and collecting filtrate; adding 60-80% ethanol water solution into the obtained filter residue again for boiling, filtering, collecting filtrate again, mixing the two filtrates, concentrating, and lyophilizing to obtain mulberry leaf ethanol extract samples of different origins.
The extraction process of the mulberry leaf water extract from each of the genus sources is as follows: mixing and boiling each of the mulberry leaf powder of the origin of the land with water, filtering, and collecting the filtrate; adding water into the obtained filter residue again for boiling, filtering, collecting filtrate again, mixing the two filtrates, concentrating, and lyophilizing to obtain water extract samples of folium Mori of different origins.
In a more preferred embodiment, in step (1), the sample preparation process is as follows:
the extraction process of the mulberry leaf alcohol extract of each origin is as follows: mixing and boiling each of the mulberry leaf powder of the origin of the genus with 70% ethanol water solution in a volume concentration ratio of 1:15, keeping boiling for 1h, filtering, and collecting filtrate; adding 60-80% ethanol water solution into the obtained filter residue again for boiling, keeping boiling for 1h, filtering, collecting filtrate again, mixing the two filtrates, concentrating, and lyophilizing to obtain mulberry leaf ethanol extract samples of different genus sources;
the extraction process of the mulberry leaf water extract from each of the genus sources is as follows: mixing each of the mulberry leaf powder of the origin with water at a ratio of 1:15, boiling for 1h, filtering, and collecting the filtrate; adding water into the obtained filter residue again for boiling, keeping boiling for 1h, filtering, collecting filtrate again, mixing the filtrates, concentrating, and lyophilizing to obtain water extract samples of folium Mori of different genus.
For the present invention, in step (2), the sample pre-treatment dissolves the alcohol-extracted and water-extracted mulberry leaf-extracted samples of different origins in step (1), and combines the dissolved samples with the raman-enhanced substrate, respectively, so as to obtain corresponding characteristic Sang Shela raman spectra, comprising: dissolving silver nitrate in water, stirring and heating to boil, obtaining nano silver particles with negative electricity on the surface by using a reducing agent (such as sodium citrate), centrifuging, and then re-suspending the obtained precipitate in deionized water to prepare nano silver particle solution with negative electricity, thereby obtaining a Raman enhanced substrate; and respectively mixing the dissolved alcohol extract and water extract mulberry leaf extract samples of all the genus sources with nano silver particles (Raman enhanced substrates) with negative charges on the surfaces, and standing to obtain high-sensitivity Raman signal detection substrates for the alcohol extracts and water extracts of mulberry leaves of different genus sources, thereby improving the signal intensity and repeatability of Raman spectra.
In the preparation of the Raman-enhanced substrate, silver nitrate is dissolved in water, stirred and heated to boiling, wherein the concentration of silver nitrate in the silver nitrate solution is 0.5-1.5 mmol/L, preferably 1mmol/L. The reducing agent is sodium citrate, and the sodium citrate is prepared into sodium citrate solution with the concentration of 0.5-1.5 wt%, preferably 1wt%.
In a preferred embodiment, the raman-enhanced substrate is prepared as follows:
(1) Heating 1mmol/L silver nitrate solution to boiling, adding 8mL of 1wt% sodium citrate solution in the stirring process, and stirring at a rotating speed of 600-800 r/min for reaction for 30-60 min to obtain negatively charged nano silver particles;
(2) And (3) centrifuging the reaction solution obtained in the step (1) at a rotating speed of 6000-8000 r/min for 5-10 min, discarding supernatant after centrifuging, and re-suspending the obtained precipitate in deionized water to obtain a nano silver particle solution with negative electricity, namely the Raman enhancement substrate.
In a more preferred embodiment, the raman-enhanced substrate is prepared as follows:
(1) Heating 1mmol/L silver nitrate solution to boiling, adding 8mL of 1wt% sodium citrate solution in the stirring process, and stirring at a rotating speed of 650r/min for reaction for 40min to obtain negatively charged nano silver particles;
(2) Taking 1mL of the reaction solution obtained in the step (1), centrifuging for 7min at a rotating speed of 7000r/min, discarding supernatant after centrifuging, and re-suspending the obtained precipitate in 100 mu L of deionized water to obtain a nano silver particle solution with negative electricity, namely the Raman enhancement substrate.
For the negatively charged nano-silver particles mentioned above, wherein the diameter of the nano-silver particles is < 11nm.
Preferably, the dissolved alcohol extraction and water extraction mulberry leaf extraction samples of each genus source are mixed with the nano silver particle solution (Raman enhanced substrate) with negative electricity according to the volume ratio of 2:1 in the sample pretreatment process.
In the invention, in the step (2), SERS detection is used for detecting the surface enhanced Raman spectrum of a sample to obtain Raman spectrum fingerprint data, and the SERS detection comprises the steps of dripping a high-sensitivity Raman signal detection substrate for the mulberry leaf alcohol extracts with different genus sources and a high-sensitivity Raman signal detection substrate for the mulberry leaf water extracts with different genus sources, which are obtained in the step (2), onto a silicon wafer for a plurality of times and in proper amounts, naturally drying, and detecting to obtain a Raman spectrum and spectrum data.
Preferably, SERS detection uses B&The W Tek Raman spectrometer collects data, and the parameter conditions of sampling are as follows: excitation wavelength of raman spectrum 785nm, detector type: high-sensitivity CCD array with exposure time of 1000ms and wavelength scanning range of 0-3000cm -1
Preferably, SERS detection is performed at 520cm using a silicon wafer prior to spectral acquisition -1 The raman peak at the position is used as a reference peak for carrying out the wave number calibration, and the dark current is subtracted in the same integration time.
Preferably, 90 to 110 sites are randomly selected for raman spectrum detection, and more preferably 100 sites are selected for each sample to be detected in the SERS detection module.
The corresponding raman spectra of the mulberry leaf extracts are obtained for the mulberry leaf extracts with different extraction modes and the belonging sources respectively in the following modes, so as to construct a Sang Shela raman spectrum database with different extraction modes and the belonging sources: mixing the dissolved alcohol extract of mulberry leaves of different genus and the water extract of mulberry leaves of different genus with the Raman enhanced substrate obtained in the step (2), and detecting the Raman spectra of the two mulberry leaf extracts of different genus, thereby obtaining the Raman spectra corresponding to the alcohol extract of mulberry leaves and the Raman spectra corresponding to the water extract of mulberry leaves.
In a preferred embodiment, in the step (4), an average raman spectrum is drawn for raman spectrum data of mulberry leaf extracts of different genus and origin in different extraction modes, the average raman spectrum of mulberry leaf extracts of different genus and origin in different extraction modes is calculated, a standard error band and a characteristic peak are added, and the repeatability of the data is judged by observing the smoothness of the spectrum and the size of the standard error band. And drawing an average Raman spectrum for checking the data quality and judging the repeatability of the Raman spectrum data.
For the present invention, in step (5), a qualitative analysis is used to determine whether the sample is in the specified interval and belongs to the category.
For the present invention, in step (6), the data classification is to automatically analyze the raman signal of the sample using different machine learning algorithms and train the classification data.
When the local source is 5 provinces, in the step (6), the data classification comprises a mulberry leaf alcohol extraction and water extraction two-class machine learning model, a mulberry leaf alcohol extraction 5-class model and a mulberry leaf water extraction five-class model.
For the present invention, a machine learning process comprises a computer memory, a computer processor, and a computer program stored in the computer memory and executable on the computer processor, the computer memory having stored therein a parameter-optimized machine learning model, the computer processor implementing the steps of when executing the computer program:
1) Collecting Raman spectrum fingerprint spectra of the prepared sample, preprocessing and performing PCA qualitative analysis, and then obtaining machine learning input data;
2) Dividing the preprocessed Raman spectrum data set;
3) Classifying the Raman data of different mulberry leaf extraction modes by using a machine learning model in data classification to obtain a classification result of the mulberry leaf attribution;
4) And evaluating the performances of different machine learning algorithms by utilizing data evaluation, and selecting an optimal judgment model.
Wherein, the preprocessing in the step 1) is to perform curve smoothing denoising, baseline correction and normalization on the Raman data.
The step 2) comprises the following steps: grouping a pre-constructed sample database by adopting a uniform random sampling mode, dividing the pre-constructed sample database into a training set and a verification set, and uniformly and randomly sampling the verification set to form a test set; the training set, the verification set and the test set are respectively used for training, verifying and checking the model.
The step 3) comprises the following steps: the data classification adopts a data analysis tool to automatically analyze Raman signal data of all samples, trains a classifier, uses K-fold cross validation to detect, wherein K is any integer of 1-10, carries out sample category attribution and labeling on the sample data through the trained classifier, and stores the sample category attribution and labeling in a database.
The step 4) comprises the following steps: and selecting part of the residual samples for carrying out prediction capability test, correcting the final judgment model by using accuracy, recall ratio, ROC curve and confusion matrix, evaluating the performances of different machine learning algorithms, and selecting the optimal judgment model.
By adopting the technical scheme of the invention, the advantages are as follows:
the components in the mulberry leaves are extracted by adopting different extraction modes, so that the identification and classification method of the mulberry leaves by machine learning based on Raman spectrum is realized, a quick and effective analysis way is provided for the application of a Raman spectrometer, and the application advantages of the Raman spectrometer are reflected.
A Raman spectrum library of different mulberry leaf extraction modes is established through machine learning, newly acquired spectrum data is directly imported into the library for judgment, and the combination and comparison of different machine learning methods and evaluation indexes can ensure the accuracy of the identification result.
The accuracy is high: in the distinction of the mulberry leaf alcohol extract and the water extract, the accuracy and recall rate of the invention are 99.76 percent; in the distinction of alcohol extracts of mulberry leaves in Jiangsu, anhui, henan, hebei, guangdong and 5 provinces, the accuracy and recall rate of the invention are 94.10 percent and 92.22 percent respectively; the accuracy and recall of the present invention was 94.92% and 92.00%, respectively, in distinguishing between 5 provinces of mulberry leaf aqueous extracts.
Drawings
FIG. 1 is a flow chart of analysis of Sang Shela Manchurian spectral fingerprint establishment in combination with machine learning algorithm in mulberry leaf origin identification in the present invention;
FIG. 2 is a graph of classification results of different extraction modes obtained by PCA qualitative analysis in example 1 of the present invention;
FIG. 3 is a chart showing ROC results obtained by examining two extraction modes of mulberry leaves with a model trained by a plurality of machine learning algorithms in example 1 of the present invention; wherein, the left-to-right lines along the diagonal direction of the dotted line represent: CNN, SVM, adaBoost, XGB, randomForest and precision Tree;
FIG. 4 is a graph of the confusion matrix obtained by training the CNN algorithm in example 1 to examine two mulberry leaf extraction modes;
FIG. 5 is a graph showing the average Raman spectrum of 5 provincial mulberry leaf ethanol extracts in example 2 of the present invention;
FIG. 6 is a graph showing classification results of 5 provincial mulberry leaf ethanol extracts obtained by PCA qualitative analysis in example 2 of the present invention;
FIG. 7 is a graph showing ROC results of a model test of 5 provincial mulberry leaf ethanol extracts trained by various machine learning algorithms in example 2 of the present invention; wherein, the left-to-right lines along the diagonal direction of the dotted line represent: CNN, XGB, randomForest, SVM, adaBoost and precision Tree;
FIG. 8 is a graph showing the results of a confusion matrix obtained by examining 5 provincial mulberry leaf ethanol extracts with a model obtained by training the CNN algorithm in example 2 of the present invention;
FIG. 9 is a graph showing the average Raman spectrum of 5 extract of Morus alba She Chunshui in example 3 of the present invention;
FIG. 10 is a graph showing the classification results of 5 extract of Morus alba She Chunshui obtained by PCA qualitative analysis in example 3 of the present invention;
FIG. 11 is a graph showing the ROC results of a model test of 5 extracts of Mulberry She Chunshui trained by various machine learning algorithms in example 3 of the present invention; wherein, the left-to-right lines along the diagonal direction of the dotted line represent: CNN, XGB, randomForest, SVM, precision Tree and AdaBoost;
FIG. 12 is a graph showing the results of a confusion matrix obtained by examining 5 extracts of Mulberry She Chunshui with a model obtained by training the CNN algorithm in example 3 of the present invention.
Detailed Description
The raman spectrum (Raman Spectroscopy) technology is based on the principle that when a substance is irradiated by incident light of a laser light source, the incident light is scattered by molecules of the substance, a very small part of the scattered light has different frequencies from the incident light, the frequency of the scattered light is changed depending on the structural characteristics of the irradiated substance, and different substances generate scattered light with specific frequencies under the same laser irradiation, so that the raman spectrum technology can be used for realizing rapid, simple, repeatable and nondestructive detection of the substance components.
The artificial intelligence technology provides an efficient and accurate implementation scheme for detecting substance components based on Raman spectrum. The existing Raman spectrum machine learning algorithm is oriented to specific substances to be detected, the problem of substance identification of Raman spectrum is converted into classification of machine learning, a machine learning model is trained according to standard Raman spectrum of known substances, and accurate identification of detection samples is achieved by using the trained model.
The description and claims do not take the form of an element with differences in names as a way of distinguishing the elements, but rather with differences in functions of the elements as a criterion of distinguishing. As referred to throughout the specification and claims, the word "comprising" is used in an open-ended fashion, and thus should be interpreted to mean "including, but not necessarily limited to. By "substantially" is meant that within an acceptable error range, a person skilled in the art is able to solve the technical problem within a certain error range, substantially achieving the technical effect. The present invention will be further described in detail below with reference to examples to enable those skilled in the art to practice the invention by referring to the description. The equipment or materials used in the examples are all commercially available.
1.1 instruments
Magnetic stirrer (DF-101S, tianjin, china), B & W Tek Raman spectrometer (BWS 465-785S, USA), centrifuge (Centrifuge 5430R, eppendorf, USA), lyophilizer (SCIENTZ-10N)
1.2 pharmaceutical products and reagents
Ethanol (AR), silver nitrate (Sinopharm, beijin, china), trisodium citrate dihydrate (Trisodium citrate dihydrate, AR), ultrapure water. The sources, places of production and lots of mulberry leaves used in the examples are shown in Table 1 below:
table 15 Source information of Mulberry leaves
Figure BDA0003710158430000091
Example 1 differentiation of alcohol-extracted and water-extracted mulberry leaf extracts using the method of the present invention
1. Method of
(1) Sample preparation: preparing mulberry leaf extracts of different genus areas by two modes of alcohol extraction and water extraction;
the extraction process of the mulberry leaf alcohol extract sample is as follows: grinding folium Mori into powder by pulverizer, sieving with 80 mesh sieve, precisely weighing folium Mori powder 20g and 300mL (feed-liquid ratio is 1:15) volume concentration 70% ethanol water solution, mixing thoroughly, boiling, keeping micro boiling for 1 hr, filtering, and collecting filtrate; adding 200ml (feed-liquid ratio is 1:20) of 70% ethanol water solution into the obtained filter residue for boiling, keeping micro boiling for 1h, filtering, collecting filtrate again, mixing the two filtrates, concentrating, and lyophilizing to obtain mulberry leaf ethanol extract sample.
The extraction process of the mulberry leaf water extract sample is as follows: grinding folium Mori into powder by pulverizer, sieving with 80 mesh sieve, precisely weighing folium Mori powder 20g, mixing with 300mL (feed-liquid ratio of 1:15) water, boiling, keeping micro boiling for 1 hr, filtering, and collecting filtrate; adding 200ml (feed-liquid ratio is 1:20) of water into the obtained filter residue for boiling, keeping micro boiling for 1h, filtering, collecting filtrate again, mixing the two collected filtrates, concentrating, and lyophilizing to obtain a mulberry leaf water extract sample.
(2) Sample pretreatment
The preparation process of the Raman enhanced substrate comprises the following steps:
1) 33.72mg of silver nitrate is weighed, 200mL of deionized water is used for dissolution, 1mmol/L of silver nitrate solution is obtained, a magnetic stirrer is used for heating to boiling, 8mL of sodium citrate solution (1 wt%) is immediately added in one step in the stirring process, stirring reaction is carried out for 40min at a rotating speed of 650r/min, and negatively charged nano silver particles with the diameter smaller than 11nm are obtained.
2) Taking 1mL of the reaction solution obtained in the step 1), centrifuging for 7min at a rotating speed of 7000r/min, discarding the supernatant after centrifuging, re-suspending the obtained precipitate in 100 mu L of deionized water to obtain the Raman enhanced substrate with negative electric nano silver particles, and sealing the obtained solution at room temperature for later use in a dark way.
Weighing 10mg of the mulberry leaf alcohol extract sample obtained in the step (1), dissolving in 1mL of 70% ethanol water solution by volume concentration, centrifuging for 10 minutes at a rotating speed of 8000r/min, and fully mixing 20 mu L of supernatant with 10 mu L of prepared Raman enhanced substrate with negative nanometer silver particles to obtain the mulberry leaf alcohol extract high-sensitivity Raman signal detection substrate.
Weighing 10mg of the mulberry leaf water extract sample obtained in the step (1), dissolving in 1mL of water, centrifuging at 8000r/min for 10 minutes, taking 20 mu L of supernatant and fully mixing with 10 mu L of prepared Raman enhanced substrate with negative electric nano silver particles, and obtaining the substrate for detecting the mulberry leaf water extract by using high-sensitivity Raman signals.
(3) SERS detection: and (3) respectively taking 10 mu L of the mulberry leaf alcohol extract detection substrate with the high-sensitivity Raman signal obtained in the step (2) and 10 mu L of the mulberry leaf water extract detection substrate with the high-sensitivity Raman signal, dripping the substrate on a silicon wafer to wait for natural air drying, and repeating the three times of air drying to wait for measurement.
The parameters of the raman spectrum sampling are as follows: excitation wavelength of raman spectrum 785nm, detector type: high-sensitivity CCD array with exposure time of 1000ms and wavelength scanning range of 0-3000cm -1 The actual scanning range is 195-2999cm -1 Detector type: high quantum efficiency CCD array, output data type: csv, laser intensity: 2, spectrum acquisition time: 1000ms, before spectrum acquisition, use silicon wafer at 520cm -1 The raman peak at this point was used as a reference peak for the number calibration and the dark current was subtracted at the same integration time. And randomly selecting 100 positions of each sample to be detected for Raman spectrum detection.
And respectively acquiring the Raman spectra of the corresponding mulberry leaf extracts according to the mode aiming at the mulberry leaf extraction samples with different extraction modes, thereby constructing Raman spectrum databases of mulberry leaves with different extraction modes and different genus sources.
(4) Data preprocessing: and (3) sequentially performing curve smoothing, baseline correction and normalization treatment on the Raman spectra of the mulberry leaf alcohol extract and the mulberry leaf water extract obtained in the step (3).
(5) Qualitative analysis: and (3) carrying out PCA qualitative analysis on the pretreated Raman spectrum data obtained in the step (4), judging the category of the sample by comparing the difference between the mulberry leaf alcohol extract and the mulberry leaf water extract, forming a PCA classification quadrant graph of mulberry She Chundi and water extraction Raman vectors, and qualitatively distinguishing the mulberry leaf extracts in two extraction modes by observing the result of the quadrant graph.
(6) Data classification: and (3) automatically analyzing the Raman signal data of all the preprocessed samples in the step (4) by adopting different machine learning algorithms, dividing mulberry leaf alcohol extraction and water extraction classification data sets into training sets, verification sets and test sets by adopting a uniform random sampling mode, training a classifier, detecting by using K-fold cross verification, wherein K is any integer of 1-10, carrying out sample category attribution and labeling on the sample data by the trained classifier, and storing the sample category attribution and labeling in a database. All parameters were optimized prior to training and machine learning parameters were set as shown in table 2 below.
Table 2 mulberry leaf alcohol extract water extract classification machine learning parameter settings
Figure BDA0003710158430000111
(7) Data evaluation: selecting part of the remaining samples for carrying out prediction capability test, correcting the final judgment model by using accuracy, recall ratio, ROC curve and confusion matrix, evaluating the performances of different machine learning algorithms, and selecting the optimal judgment model, wherein the method specifically comprises the following steps:
Figure BDA0003710158430000121
Figure BDA0003710158430000122
Figure BDA0003710158430000123
wherein: accuracy (Precision), recall (Recall), area under ROC curve (AUC) are considered performance indicators; TP, FP, TN and FN represent the number of true positives, false positives, true negatives and false negatives, respectively, by which the confusion matrix is constructed.
Wherein, the raman characteristic peaks and band attributions of the mulberry leaf alcohol extract and the water extract are shown in tables 3 and 4.
TABLE 3 Mulberry leaf alcohol extraction Raman characteristic peaks and band assignment
Figure BDA0003710158430000124
TABLE 4 Morus alba aqueous extract Raman characteristic peaks and band attribution
Figure BDA0003710158430000131
2. Results
The results of the qualitative analysis and classification of PCA in the case of classification of mulberry leaf alcohol extract and aqueous extract are shown in FIG. 2.
As can be seen from fig. 2: the PCA algorithm divides the mulberry leaf alcohol extract and the water extract into two groups in the form of a scatter diagram, and the PCA algorithm can be used for distinguishing Raman spectrum data of different mulberry leaf extraction modes. The different dashed circles in fig. 2 represent different groupings and the extraction pattern is marked in the dashed circles. The abscissa represents the principal component with the greatest contribution in the PCA analysis (85.2051%), and the ordinate represents the principal component with the second greatest contribution in the PCA analysis (12.1436%).
Fig. 3 shows ROC analysis results of different machine learning algorithms in the case of classification of mulberry leaf alcohol extract and water extract, in fig. 3, it is known that the area under the curve of CNN learning algorithm is at most 1.00, and the area under the curve is lower than 1.00 relative to other learning algorithms, such as XGB learning algorithm, random forest learning algorithm, SVM learning algorithm, adaBoost learning algorithm or precision Tree learning algorithm. Therefore, the CNN learning algorithm has the highest accuracy and the best effect. As shown by the ROC analysis result, compared with the CNN learning algorithm, the effect of the other learning algorithms is not as high as the accuracy of the CNN learning algorithm, and confusion matrix analysis is not performed.
Fig. 4 shows the results of the analysis of the confusion matrix of CNN learning algorithm in the case of classifying the mulberry leaf alcohol extract and the water extract, and it can be seen from fig. 4 that only 1% of the mulberry leaf alcohol extract is classified as the mulberry leaf water extract.
The results of this experimental example illustrate: the data after surface enhanced Raman spectroscopy can be effectively trained by using a CNN algorithm, and an accurate classification model can be obtained and used for distinguishing the mulberry leaf alcohol extract from the water extract.
Example 2 differentiation of 5 Economy mulberry leaf alcohol extracts by the method of the present invention
1. Method of
(1) Sample preparation: 5 different mulberry leaf extracts are prepared by an alcohol extraction mode;
the extraction process of each provincial mulberry leaf alcohol extract sample is as follows: grinding folium Mori into powder by pulverizer, sieving with 80 mesh sieve, precisely weighing folium Mori powder 20g and 300mL (feed-liquid ratio is 1:15) volume concentration 70% ethanol water solution, mixing thoroughly, boiling, keeping micro boiling for 1 hr, filtering, and collecting filtrate; adding 200ml (feed-liquid ratio is 1:20) of 70% ethanol water solution into the obtained filter residue, boiling, keeping micro boiling for 1h, filtering, collecting filtrate again, mixing the two filtrates, concentrating, and lyophilizing to obtain folium Mori ethanol extract sample.
(2) Sample pretreatment
The preparation process of the Raman enhanced substrate comprises the following steps:
1) 33.72mg of silver nitrate is weighed, 200mL of deionized water is used for dissolution, 1mmol/L of silver nitrate solution is obtained, a magnetic stirrer is used for heating to boiling, 8mL of sodium citrate solution (1 wt%) is immediately added in one step in the stirring process, stirring reaction is carried out for 40min at a rotating speed of 650r/min, and negatively charged nano silver particles with the diameter smaller than 11nm are obtained.
2) Taking 1mL of the reaction solution obtained in the step 1), centrifuging for 7min at a rotating speed of 7000r/min, discarding the supernatant after centrifuging, re-suspending the obtained precipitate in 100 mu L of deionized water to obtain the Raman enhanced substrate with negative electric nano silver particles, and sealing the obtained solution at room temperature for later use in a dark way.
Weighing 10mg of each sample of the mulberry leaf alcohol extract obtained in the step (1), dissolving in 1mL of 70% ethanol water solution by volume concentration, centrifuging for 10 minutes at a rotating speed of 8000r/min, and then taking 20 mu L of supernatant liquid to be fully mixed with 10 mu L of prepared Raman enhanced substrate with negative nanometer silver particles in sequence to obtain the high-sensitivity Raman signal detection substrate for each mulberry leaf alcohol extract.
(3) SERS detection: and (3) respectively taking 10 mu L of the mulberry leaf alcohol extract obtained in the step (2), detecting a substrate drop by using a high-sensitivity Raman signal, and waiting for natural air drying on a silicon wafer, and repeating the air drying for three times to wait for measurement.
The parameters of the raman spectrum sampling are as follows: excitation wavelength of raman spectrum 785nm, detector type: high-sensitivity CCD array with exposure time of 1000ms and wavelength scanning range of 0-3000cm -1 The actual scanning range is 195-2999cm -1 Detector type: high quantum efficiency CCD array, output data type: csv, laser intensity: 2, spectrum acquisition time: 1000ms, before spectrum acquisition, use silicon wafer at 520cm -1 The raman peak at this point was used as a reference peak for the number calibration and the dark current was subtracted at the same integration time. And randomly selecting 100 positions of each sample to be detected for Raman spectrum detection.
And respectively acquiring the Raman spectra of the corresponding mulberry leaf extracts according to the alcohol extraction samples of the mulberry leaves in different provinces, thereby constructing a Raman spectrum database of the mulberry leaf alcohol extracts in 5 provinces.
(4) Data preprocessing: and (3) sequentially performing curve smoothing, baseline correction and normalization treatment on the Raman spectrum of the mulberry leaf alcohol extract obtained in the step (3).
Drawing average Raman spectra aiming at Raman spectrum data of mulberry leaf extracts from different genera under different extraction modes, calculating the average Raman spectra of the mulberry leaf extracts from different genera under different extraction modes, adding a standard error band and a characteristic peak, and judging the repeatability of the data by observing the smoothness degree of the spectra and the size of the standard error band. And drawing an average Raman spectrum for checking the data quality and judging the repeatability of the Raman spectrum data. The average raman spectrum of 5 provinces mulberry leaf alcohol extracts is shown in figure 5.
(5) Qualitative analysis: and (3) carrying out PCA qualitative analysis on the pretreated Raman spectrum data obtained in the step (4), judging the category of the sample by comparing the differences among the 5 different provinces of mulberry leaf alcohol extracts, forming a PCA classification quadrant graph of the Raman vectors of the 5 different provinces of mulberry leaf alcohol extracts, and qualitatively distinguishing the 5 provinces of mulberry leaf extracts in the alcohol extraction mode by observing the result of the quadrant graph.
(6) Data classification: and (3) automatically analyzing the Raman signal data of all the preprocessed samples in the step (4) by adopting different machine learning algorithms, dividing 5 provincial mulberry leaf alcohol extraction data sets into a training set, a verification set and a test set by adopting a uniform random sampling mode, training a classifier, detecting by using K-fold cross verification, wherein K is any integer of 1-10, carrying out sample category attribution and labeling on the sample data by the trained classifier, and storing the sample category attribution and the label into a database. All parameters were optimized prior to training and machine learning parameters were set as shown in table 5 below.
Table 5 5 provincial mulberry leaf alcohol extraction machine learning parameter settings
Figure BDA0003710158430000151
Figure BDA0003710158430000161
(7) Data evaluation: selecting part of the remaining samples for carrying out prediction capability test, correcting the final judgment model by using accuracy, recall ratio, ROC curve and confusion matrix, evaluating the performances of different machine learning algorithms, and selecting the optimal judgment model, wherein the method specifically comprises the following steps:
Figure BDA0003710158430000162
Figure BDA0003710158430000163
Figure BDA0003710158430000164
Wherein: accuracy (Precision), recall (Recall), area under ROC curve (AUC) are considered performance indicators; TP, FP, TN and FN represent the number of true positives, false positives, true negatives and false negatives, respectively, by which the confusion matrix is constructed.
2. Results
FIG. 6 shows the results of the qualitative analysis classification of PCA in the case of five classifications of 5 provinces of mulberry leaf alcohol extract.
As can be seen from fig. 6: the PCA algorithm can separate the different genus mulberry leaf ethanol extracts into 5 groups, illustrating that the PCA algorithm can be used to separate raman spectrum data of the different genus mulberry leaf ethanol extracts. The different dashed circles in fig. 6 represent different groupings and the generic names are marked in the dashed circles. The abscissa represents the principal component with the greatest contribution (98.87%) in the PCA analysis, and the ordinate represents the principal component with the second greatest contribution (1.00%) in the PCA analysis.
Fig. 7 shows ROC analysis results of different machine learning algorithms under five categories of 5 provinces of mulberry leaf alcohol extract, in fig. 7, it is known that the area under the curve of the CNN learning algorithm is 0.9470 at the highest, and the area under the curve is lower than that of other learning algorithms, such as XGB learning algorithm, random forest learning algorithm, SVM learning algorithm, adaBoost learning algorithm or precision Tree learning algorithm. Therefore, the CNN learning algorithm has the highest accuracy and the best effect.
As shown by the ROC analysis result, compared with the CNN learning algorithm, the effect of the other learning algorithms is not as high as the accuracy of the CNN learning algorithm, and confusion matrix analysis is not performed.
FIG. 8 is a graph showing the results of a confusion matrix obtained by training a CNN learning algorithm to examine 5 provincial mulberry leaf ethanol extracts. As can be seen from fig. 8, the CNN learning algorithm has an average classification accuracy of 89.6% in the case of five classifications of 5 provincial mulberry leaf alcohol extracts. The spectral recognition accuracy of the CNN learning algorithm on the extracts of Hebei province and Henan province is 100%, the data recognition accuracy of the CNN learning algorithm on the extracts of Anhui province is 81%, and 19% of spectral data are erroneously recognized as fingerprints of Henan province. For both Guangdong and Jiangsu provinces of data, 4%,5% and 4% of the attribution was identified as Anhui, hebei, henan and Jiangsu. Among Jiangsu province data, 6%,5% and 5% of the attribution were identified as Anhui, guangdong and Henan.
The results of this experimental example illustrate: the data after surface enhanced Raman spectroscopy can be effectively trained by using a CNN algorithm, and an accurate five-classification model can be obtained and used for distinguishing different provinces of mulberry leaf alcohol extracts.
Example 3 differentiation of 5 Water extracts of Mulberry leaves Using the method of the invention
1. Method of
(1) Sample preparation: 5 different mulberry leaf extracts are prepared by water extraction;
the extraction process of each provincial mulberry leaf water extract sample is as follows: grinding folium Mori into powder by pulverizer, sieving with 80 mesh sieve, precisely weighing folium Mori powder 20g, mixing with 300mL (feed-liquid ratio of 1:15) water, boiling, keeping micro boiling for 1 hr, filtering, and collecting filtrate; adding 200ml (feed-liquid ratio is 1:20) of water into the obtained filter residue for boiling, keeping micro boiling for 1h, filtering, collecting filtrate again, mixing the two collected filtrates, concentrating, and lyophilizing to obtain a mulberry leaf water extract sample.
(2) Sample pretreatment
The preparation process of the Raman enhanced substrate comprises the following steps:
1) 33.72mg of silver nitrate is weighed, 200mL of deionized water is used for dissolution, 1mmol/L of silver nitrate solution is obtained, a magnetic stirrer is used for heating to boiling, 8mL of sodium citrate solution (1 wt%) is immediately added in one step in the stirring process, stirring reaction is carried out for 40min at a rotating speed of 650r/min, and negatively charged nano silver particles with the diameter smaller than 11nm are obtained.
2) Taking 1mL of the reaction solution obtained in the step 1), centrifuging for 7min at a rotating speed of 7000r/min, discarding the supernatant after centrifuging, re-suspending the obtained precipitate in 100 mu L of deionized water to obtain the Raman enhanced substrate with negative electric nano silver particles, and sealing the obtained solution at room temperature for later use in a dark way.
Weighing 10mg of each sample of the water extract of the mulberry leaves obtained in the step (1), dissolving in 1mL of water, centrifuging for 10 minutes at a rotating speed of 8000r/min, taking 20 mu L of supernatant, and fully mixing with 10 mu L of prepared Raman enhanced substrate with negative electric nano silver particles in sequence to obtain a high-sensitivity Raman signal detection substrate for each water extract of the mulberry leaves.
(3) SERS detection: and (3) respectively taking 10 mu L of the mulberry leaf water extract obtained in the step (2), using a high-sensitivity Raman signal detection substrate to drop on a silicon wafer to wait for natural air drying, and repeating the air drying for three times to wait for measurement.
The parameters of the raman spectrum sampling are as follows: excitation wavelength of raman spectrum 785nm, detector type: high-sensitivity CCD array with exposure time of 1000ms and wavelength scanning range of 0-3000cm -1 The actual scanning range is 195-2999cm -1 Detector type: high quantum efficiency CCD array, output data type: csv, laser intensity: 2, spectrum acquisition time: 1000ms, before spectrum acquisition, use silicon wafer at 520cm -1 The raman peak at this point was used as a reference peak for the number calibration and the dark current was subtracted at the same integration time. And randomly selecting 100 positions of each sample to be detected for Raman spectrum detection.
And respectively acquiring the Raman spectra of the corresponding mulberry leaf extracts according to the water extraction mulberry leaf samples of different provinces, thereby constructing a Raman spectrum database of the 5 provinces mulberry leaf water extracts.
(4) Data preprocessing: and (3) sequentially performing curve smoothing, baseline correction and normalization treatment on the Raman spectrum of the mulberry leaf water extract obtained in the step (3).
Drawing average Raman spectra aiming at Raman spectrum data of mulberry leaf extracts from different genera under different extraction modes, calculating the average Raman spectra of the mulberry leaf extracts from different genera under different extraction modes, adding a standard error band and a characteristic peak, and judging the repeatability of the data by observing the smoothness degree of the spectra and the size of the standard error band. And drawing an average Raman spectrum for checking the data quality and judging the repeatability of the Raman spectrum data. The average raman spectra of 5 provinces mulberry leaf water extracts are shown in fig. 9.
(5) Qualitative analysis: and (3) carrying out PCA qualitative analysis on the pretreated Raman spectrum data obtained in the step (4), judging the category of the sample by comparing the differences among the 5 different provinces of mulberry leaf water extracts, forming a PCA classification quadrant graph of the Raman vectors of the 5 different provinces of mulberry leaf water extracts, and qualitatively distinguishing the 5 provinces of mulberry leaf extracts in a water extraction mode by observing the result of the quadrant graph.
(6) Data classification: and (3) automatically analyzing the Raman signal data of all the preprocessed samples in the step (4) by adopting different machine learning algorithms, dividing 5 provincial mulberry leaf water extraction data sets into a training set, a verification set and a test set by adopting a uniform random sampling mode, training a classifier, detecting by using K-fold cross verification, wherein K is any integer of 1-10, carrying out sample category attribution and labeling on the sample data by the trained classifier, and storing the sample category attribution and the label into a database. All parameters were optimized prior to training and machine learning parameters were set as shown in table 6 below.
Table 6 5 province mulberry leaf water extraction machine learning parameter settings
Figure BDA0003710158430000191
(7) Data evaluation: selecting part of the remaining samples for carrying out prediction capability test, correcting the final judgment model by using accuracy, recall ratio, ROC curve and confusion matrix, evaluating the performances of different machine learning algorithms, and selecting the optimal judgment model, wherein the method specifically comprises the following steps:
Figure BDA0003710158430000201
Figure BDA0003710158430000202
Figure BDA0003710158430000203
wherein: accuracy (Precision), recall (Recall), area under ROC curve (AUC) are considered performance indicators; TP, FP, TN and FN represent the number of true positives, false positives, true negatives and false negatives, respectively, by which the confusion matrix is constructed.
2. Results
FIG. 10 shows the results of the qualitative analysis and classification of PCA under the five classification of 5 provinces of mulberry leaf aqueous extracts.
As can be seen from fig. 10: the PCA algorithm can separate the pure water extracts of different genus mulberry leaves into 5 groups, illustrating that the PCA algorithm can be used to separate raman spectrum data of the pure water extracts of different genus mulberry leaves. The different dashed circles in fig. 10 represent different groupings, and the generic names are marked in the dashed circles. The abscissa represents the principal component with the greatest contribution (96.25%) in the PCA analysis, and the ordinate represents the principal component with the second greatest contribution (2.62%) in the PCA analysis.
Fig. 11 shows ROC analysis results of different machine learning algorithms under five classification conditions of 5 provincial mulberry leaf aqueous extracts, in fig. 6, it can be seen that the area under the curve of the CNN learning algorithm is 0.9622 at the highest, and the area under the curve is lower than that of other learning algorithms, such as the XGB learning algorithm, the random forest learning algorithm, the SVM learning algorithm, the AdaBoost learning algorithm, or the precision Tree learning algorithm. Therefore, the CNN learning algorithm has the highest accuracy and the best effect.
As shown by the ROC analysis result, compared with the CNN learning algorithm, the effect of the other learning algorithms is not as high as the accuracy of the CNN learning algorithm, and confusion matrix analysis is not performed.
Fig. 12 shows the analysis result of the confusion matrix of the CNN learning algorithm under the condition of five classifications of 5 provincial mulberry leaf aqueous extracts, and fig. 12 shows that the average classification accuracy of the CNN learning algorithm under the condition of five classifications of 5 provincial mulberry leaf aqueous extracts is 93.4%. The spectral recognition accuracy of the CNN learning algorithm on the extracts of two provinces of Henan and Jiangsu is 100%. For the Anhui, the Guangdong and Hebei provinces have 17% of the spectral data classified in Henan province, 6% of the spectral data classified in Guangdong province, and 10% of the spectral data classified in Hebei province.
The results of this experimental example illustrate: the data after surface enhanced Raman spectroscopy can be effectively trained by using a CNN algorithm, and an accurate five-classification model can be obtained and used for distinguishing different provinces of mulberry leaf alcohol extracts.
In conclusion, the mulberry leaf area identification method based on the surface enhanced spectrum mode can rapidly and accurately judge the mulberry leaf area, can effectively distinguish and evaluate extracts of different areas in different extraction modes and different extraction modes, and has good application prospect.
The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments may be modified or some technical features may be replaced equivalently; such modifications and substitutions do not depart from the spirit of the invention.

Claims (10)

1. The application of the establishment of the mulberry She Laman spectral fingerprint spectrum in combination with a machine learning algorithm in the identification of mulberry leaf genus origin is characterized by comprising the following steps:
(1) Sample preparation: selecting a plurality of mulberry leaves from different genus areas, wherein each mulberry leaf from different genus areas is respectively prepared into a mulberry leaf alcohol extract sample from different genus areas and a mulberry leaf water extract sample from different genus areas according to the way of alcohol extraction and water extraction;
(2) Sample pretreatment: adding sodium citrate solution into the boiled silver nitrate solution for stirring reaction, centrifuging the obtained reaction solution, and re-suspending the obtained precipitate in deionized water to obtain a Raman enhanced substrate with negative electricity nano silver particles;
respectively dissolving each of the mulberry leaf alcohol extract samples of the genus source obtained in the step (1) into an ethanol water solution with the volume concentration of 60-80%, centrifuging, taking supernatant, and sequentially mixing with the prepared Raman enhanced substrate to obtain high-sensitivity Raman signal detection substrates for mulberry leaf alcohol extracts of different genus sources;
respectively dissolving each of the samples of the mulberry leaf water extracts of the genus origin obtained in the step (1) in water, centrifuging, taking supernatant, and sequentially mixing with the prepared Raman enhanced substrate to obtain high-sensitivity Raman signal detection substrates for the mulberry leaf water extracts of the genus origin;
(3) SERS detection: sequentially carrying out multiple times and multi-site Raman spectrum sampling on the high-sensitivity Raman signal detection substrate for the mulberry leaf alcohol extract with different origins and the high-sensitivity Raman signal detection substrate for the mulberry leaf water extract with different origins obtained in the step (2) to respectively obtain corresponding Raman spectra of the mulberry leaf alcohol extract with different origins and the mulberry leaf water extract with different origins, thereby constructing Raman spectrum databases of mulberry leaves with different extraction modes and different origins;
(4) Data preprocessing: sequentially performing curve smoothing, baseline correction and normalization treatment on the raman spectra of the mulberry leaf alcohol extracts from different genera and the mulberry leaf water extracts from different genera obtained in the step (3);
(5) Qualitative analysis: carrying out PCA qualitative analysis on the pretreated Raman spectrum data obtained in the step (4), judging the category of the sample by comparing the differences between the mulberry leaf extracts of different extraction modes and/or different genus areas to form a PCA classification quadrant of a Raman vector, and qualitatively distinguishing the mulberry leaf extracts of different extraction modes and/or different genus areas by observing the result of a PCA classification quadrant graph;
(6) Data classification: automatically analyzing the Raman signal data of all the preprocessed samples in the step (4) by adopting different machine learning algorithms, dividing Raman spectrum data sets of mulberry leaf extracts with different extraction modes and/or different genus areas into a training set, a verification set and a test set by adopting a uniform random sampling mode, training a classifier, detecting by using K-fold cross verification, wherein K is any integer of 1-10, carrying out sample category attribution and labeling on the sample data by the trained classifier, and storing the sample category attribution and the labeled sample data into a database;
(7) Data evaluation: selecting part of the remaining samples for carrying out prediction capability test, correcting the final judgment model by using accuracy, recall ratio, ROC curve and confusion matrix, evaluating the performances of different machine learning algorithms, and selecting the optimal judgment model, wherein the method specifically comprises the following steps:
Figure FDA0003710158420000021
Figure FDA0003710158420000022
Figure FDA0003710158420000023
wherein: precision represents accuracy, recall represents Recall, AUC represents area under ROC curve, which is considered as performance index; TP, FP, TN and FN represent the number of true positives, false positives, true negatives and false negatives, respectively, by which the confusion matrix is constructed.
2. The use according to claim 1, wherein in step (2) the raman-enhanced substrate is prepared by:
(1) Heating 1mmol/L silver nitrate solution to boiling, adding 8mL of 1wt% sodium citrate solution in the stirring process, and stirring at a rotating speed of 600-800 r/min for reaction for 30-60 min to obtain negatively charged nano silver particles;
(2) And (3) centrifuging the reaction solution obtained in the step (1) at a rotating speed of 6000-8000 r/min for 5-10 min, discarding supernatant after centrifuging, and re-suspending the obtained precipitate in deionized water to obtain a nano silver particle solution with negative electricity.
3. The use according to claim 2, wherein in step (2) the raman-enhanced substrate is prepared by:
(1) Heating 1mmol/L silver nitrate solution to boiling, adding 8mL of 1wt% sodium citrate solution in the stirring process, and stirring at a rotating speed of 650r/min for reaction for 40min to obtain negatively charged nano silver particles;
(2) Taking 1mL of the reaction solution obtained in the step (1), centrifuging for 7min at a rotating speed of 7000r/min, discarding supernatant after centrifugation, and re-suspending the obtained precipitate in 100 mu L of deionized water to obtain a solution of negatively charged nano silver particles.
4. The use according to claim 2 or 3, wherein in step (1),
the extraction process of the mulberry leaf alcohol extract of each origin is as follows: mixing and boiling each of the mulberry leaf powder of the origin of the land with an ethanol water solution with the volume concentration of 60-80%, filtering, and collecting filtrate; adding 60-80% ethanol water solution into the obtained filter residue again for boiling, filtering, collecting filtrate again, mixing the two filtrates, concentrating, and lyophilizing to obtain mulberry leaf ethanol extract samples of different origins;
the extraction process of the mulberry leaf water extract from each of the genus sources is as follows: mixing and boiling each of the mulberry leaf powder of the origin of the genus with water, filtering, and collecting the filtrate; adding water into the obtained filter residue again for boiling, filtering, collecting filtrate again, mixing the two filtrates, concentrating, and lyophilizing to obtain water extract samples of folium Mori of different origins.
5. The use according to claim 4, wherein in step (1),
the extraction process of the mulberry leaf alcohol extract of each origin is as follows: mixing and boiling each of the mulberry leaf powder of the origin of the genus with 70% ethanol water solution in a volume concentration ratio of 1:15, keeping boiling for 1h, filtering, and collecting filtrate; adding 60-80% ethanol water solution into the obtained filter residue again for boiling, keeping boiling for 1h, filtering, collecting filtrate again, mixing the two filtrates, concentrating, and lyophilizing to obtain mulberry leaf ethanol extract samples of different genus sources;
the extraction process of the mulberry leaf water extract from each of the genus sources is as follows: mixing each of the mulberry leaf powder of the origin with water at a ratio of 1:15, boiling for 1h, filtering, and collecting the filtrate; adding water into the obtained filter residue again to boil, keeping boiling for 1h, filtering, collecting filtrate again, mixing the filtrates, concentrating, and lyophilizing to obtain water extract samples of folium Mori of different genus.
6. Use according to claim 1, wherein in step (3) the multisite is between 90 and 110 sites, preferably 100 sites, in the multisite raman spectrum sampling.
7. The use according to claim 1, wherein in step (3) the parameters of the raman spectrum sampling are as follows: excitation wavelength of raman spectrum 785nm, detector type: high-sensitivity CCD array with exposure time of 1000ms and wavelength scanning range of 0-3000cm -1
8. The use according to claim 1, wherein the pretreated raman spectrum data obtained in step (4) is plotted into an average raman spectrum, the average raman spectra of mulberry leaf extracts of different origins under different extraction modes are calculated, standard error bands and characteristic peaks are added, and the repeatability of the data is judged by observing the smoothness of the spectra and the size of the standard error bands.
9. The use according to claim 1, wherein in step (7) the different machine learning algorithm is a CNN learning algorithm, an XGB learning algorithm, a random forest learning algorithm, an SVM learning algorithm, an AdaBoost learning algorithm or a Decision Tree learning algorithm.
10. The use according to claim 9, wherein in step (7) the different machine learning algorithm is a CNN learning algorithm.
CN202210723329.5A 2022-06-23 2022-06-23 Application of Sang Shela Manchurian spectral fingerprint establishment combined with machine learning algorithm in mulberry leaf origin identification Pending CN116008245A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210723329.5A CN116008245A (en) 2022-06-23 2022-06-23 Application of Sang Shela Manchurian spectral fingerprint establishment combined with machine learning algorithm in mulberry leaf origin identification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210723329.5A CN116008245A (en) 2022-06-23 2022-06-23 Application of Sang Shela Manchurian spectral fingerprint establishment combined with machine learning algorithm in mulberry leaf origin identification

Publications (1)

Publication Number Publication Date
CN116008245A true CN116008245A (en) 2023-04-25

Family

ID=86019774

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210723329.5A Pending CN116008245A (en) 2022-06-23 2022-06-23 Application of Sang Shela Manchurian spectral fingerprint establishment combined with machine learning algorithm in mulberry leaf origin identification

Country Status (1)

Country Link
CN (1) CN116008245A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116502130A (en) * 2023-06-26 2023-07-28 湖南大学 Method for identifying smell characteristics of algae source

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116502130A (en) * 2023-06-26 2023-07-28 湖南大学 Method for identifying smell characteristics of algae source
CN116502130B (en) * 2023-06-26 2023-09-15 湖南大学 Method for identifying smell characteristics of algae source

Similar Documents

Publication Publication Date Title
WO2019192433A1 (en) Method for chemical pattern recognition of authenticity of traditional chinese medicine chinese honeylocust spine based on near-infrared spectroscopy
CN101961379B (en) Near infrared spectrum identification method for red sage roots
WO2021056814A1 (en) Chemical pattern recognition method for evaluating quality of traditional chinese medicine based on medicine effect information
CN101876633B (en) Terahertz time domain spectroscopy-based textile fiber identification method
CN101961360B (en) Near infrared spectrum identification method for pseudo-ginseng
CN109142317A (en) A kind of Raman spectrum substance recognition methods based on Random Forest model
CN103411906A (en) Near infrared spectrum qualitative identification method of pearl powder and shell powder
CN106841083A (en) Sesame oil quality detecting method based on near-infrared spectrum technique
CN112414967B (en) Near infrared quality control method for rapidly detecting processing of cattail pollen charcoal in real time
CN111007032B (en) Near-infrared spectroscopy for rapidly and nondestructively identifying liquorice and pseudo-product glycyrrhiza spinosa
CN111832477A (en) Novel coronavirus detection method and system
CN103364359A (en) Application of SIMCA pattern recognition method to near infrared spectrum recognition of medicinal material, rhubarb
CN116008245A (en) Application of Sang Shela Manchurian spectral fingerprint establishment combined with machine learning algorithm in mulberry leaf origin identification
CN105181761A (en) Method for rapidly identifying irradiation absorbed dose of tea by using electronic nose
CN113049571A (en) Method for judging green tea fresh and old based on Raman spectrum
CN105334183A (en) Method for identifying certifiable Herba Ephedrae based on near infrared spectroscopy
CN108169204B (en) Raman spectrum preprocessing method based on database
CN110567907A (en) Method for rapidly identifying authenticity of traditional Chinese medicine based on infrared spectrum technology
CN108760679A (en) A kind of gastrodia elata f. glauca discriminating side based on near-infrared spectrum technique
CN110108661B (en) Tea near infrared spectrum classification method based on fuzzy maximum entropy clustering
CN112697743A (en) Method for identifying pterocarpus santalinus pen container based on two-dimensional correlation infrared spectrum
CN107389598B (en) Near infrared spectrum analysis method for identifying quality of sophora japonica
CN114076745A (en) Saffron identification method based on cloud-interconnection portable near-infrared technology and adulterated product quantitative prediction method thereof
CN111220561A (en) Infrared spectrum identification method for origin of swertia mussotii
CN114199818B (en) Construction method of near infrared quantitative detection model of fructus xanthil traditional Chinese medicine formula particles and quantitative detection method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination