CN110378373B - Tea variety classification method for fuzzy non-relevant linear discriminant analysis - Google Patents
Tea variety classification method for fuzzy non-relevant linear discriminant analysis Download PDFInfo
- Publication number
- CN110378373B CN110378373B CN201910505655.7A CN201910505655A CN110378373B CN 110378373 B CN110378373 B CN 110378373B CN 201910505655 A CN201910505655 A CN 201910505655A CN 110378373 B CN110378373 B CN 110378373B
- Authority
- CN
- China
- Prior art keywords
- tea
- sample
- matrix
- fuzzy
- class
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 43
- 238000000034 method Methods 0.000 title claims abstract description 28
- 241001122767 Theaceae Species 0.000 claims abstract description 93
- 238000001228 spectrum Methods 0.000 claims abstract description 44
- 238000002329 infrared spectrum Methods 0.000 claims abstract description 17
- 238000000605 extraction Methods 0.000 claims abstract description 14
- 238000007781 pre-processing Methods 0.000 claims abstract description 7
- 230000009467 reduction Effects 0.000 claims abstract description 5
- 239000011159 matrix material Substances 0.000 claims description 63
- 238000012549 training Methods 0.000 claims description 37
- 238000012360 testing method Methods 0.000 claims description 30
- 238000004364 calculation method Methods 0.000 claims description 12
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 238000000354 decomposition reaction Methods 0.000 claims description 6
- 238000001514 detection method Methods 0.000 abstract description 3
- 230000008569 process Effects 0.000 description 6
- 235000013305 food Nutrition 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 3
- 238000000985 reflectance spectrum Methods 0.000 description 3
- RYYVLZVUVIJVGH-UHFFFAOYSA-N caffeine Chemical compound CN1C(=O)N(C)C(=O)C2=C1N=CN2C RYYVLZVUVIJVGH-UHFFFAOYSA-N 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000001737 promoting effect Effects 0.000 description 2
- 238000004445 quantitative analysis Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- PFTAWBLQPZVEMU-DZGCQCFKSA-N (+)-catechin Chemical compound C1([C@H]2OC3=CC(O)=CC(O)=C3C[C@@H]2O)=CC=C(O)C(O)=C1 PFTAWBLQPZVEMU-DZGCQCFKSA-N 0.000 description 1
- 206010019133 Hangover Diseases 0.000 description 1
- LPHGQDQBBGAPDZ-UHFFFAOYSA-N Isocaffeine Natural products CN1C(=O)N(C)C(=O)C2=C1N(C)C=N2 LPHGQDQBBGAPDZ-UHFFFAOYSA-N 0.000 description 1
- 238000004497 NIR spectroscopy Methods 0.000 description 1
- 206010039424 Salivary hypersecretion Diseases 0.000 description 1
- 244000299461 Theobroma cacao Species 0.000 description 1
- 235000009470 Theobroma cacao Nutrition 0.000 description 1
- 229940024606 amino acid Drugs 0.000 description 1
- 150000001413 amino acids Chemical class 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 229960001948 caffeine Drugs 0.000 description 1
- VJEONQKOZGKCAK-UHFFFAOYSA-N caffeine Natural products CN1C(=O)N(C)C(=O)C2=C1C=CN2C VJEONQKOZGKCAK-UHFFFAOYSA-N 0.000 description 1
- ADRVNXBAWSRFAJ-UHFFFAOYSA-N catechin Natural products OC1Cc2cc(O)cc(O)c2OC1c3ccc(O)c(O)c3 ADRVNXBAWSRFAJ-UHFFFAOYSA-N 0.000 description 1
- 235000005487 catechin Nutrition 0.000 description 1
- 229950001002 cianidanol Drugs 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005670 electromagnetic radiation Effects 0.000 description 1
- 230000004438 eyesight Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000027939 micturition Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 231100000614 poison Toxicity 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000004451 qualitative analysis Methods 0.000 description 1
- 238000010791 quenching Methods 0.000 description 1
- 230000000171 quenching effect Effects 0.000 description 1
- 238000011946 reduction process Methods 0.000 description 1
- 208000026451 salivation Diseases 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 230000035922 thirst Effects 0.000 description 1
- 239000003440 toxic substance Substances 0.000 description 1
- 230000002936 tranquilizing effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2132—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on discrimination criteria, e.g. discriminant analysis
- G06F18/21322—Rendering the within-class scatter matrix non-singular
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Investigating Or Analysing Materials By Optical Means (AREA)
Abstract
The invention discloses a tea variety classification method for fuzzy non-relevant linear discriminant analysis, which comprises the steps of firstly, obtaining near infrared diffuse reflection spectrum data of tea samples of a plurality of varieties by using an Antaris II Fourier transform near infrared spectrum analyzer; then preprocessing near infrared diffuse reflection spectrum data of the collected tea samples by adopting a Savitzky-Golay first derivative; then, performing dimension reduction treatment and classification discrimination information extraction on the preprocessed near infrared diffuse reflection spectrum data of the tea by using a fuzzy non-relevant linear discrimination analysis method for extracting near infrared spectrum features of the tea; finally, classifying the tea varieties by utilizing a Gath-Geva fuzzy cluster. The invention is in a fuzzy expansion form of the non-relevant linear discriminant analysis, not only can solve the undersampling problem of the linear discriminant analysis, but also can treat the characteristic extraction problem of the hard class of the non-relevant linear discriminant analysis, and has the advantages of green pollution-free, less detection samples, low identification cost, high discrimination speed, high classification accuracy and the like.
Description
Technical Field
The invention relates to the field of pattern recognition and artificial intelligence, in particular to a tea variety classification method for fuzzy non-relevant linear discriminant analysis.
Background
Tea is taken as a green health-care drink, and is not only mixed with coffee and cocoa and called as three major world drinks; and with the progress of society and the rapid development of modern food industry, tea series products are favored by consumers. The tea is rich in caffeine, catechin, amino acids and microelements, and has effects of tranquillizing, improving eyesight, promoting salivation, quenching thirst, clearing heat, removing summer-heat, resolving food stagnation, relieving hangover, promoting urination, and removing toxic substances. At present, the unit price difference of different varieties of tea leaves in the tea market is huge, and the price fluctuation of the same variety of tea leaves along with seasons is also great due to the short storage period of part varieties. Therefore, the tea market has huge violent space, and therefore, the behavior that some illegal merchants impersonate high-quality tea with low-quality and inferior tea is frequent. In view of the consideration of standardizing the tea market and protecting the interests of consumers, it is necessary to establish a simple, rapid, accurate and lossless tea variety identification method.
The near infrared spectrum technology has the characteristics of rapidness, no damage, no pollution, no pretreatment, low analysis cost and the like, and is applied to various fields, especially the food research field in recent years. Near infrared spectrum refers to electromagnetic radiation wave with the wavelength in the range of 780-2526 nm, can reflect information of frequency multiplication and frequency combination vibration of molecular groups, and realizes quantitative and qualitative analysis of characteristic components. Research on tea leaves using near infrared spectroscopy today mainly involves two aspects: on one hand, quantitative analysis and measurement of tea components are performed, and on the other hand, qualitative classification and discrimination of tea grades, varieties, production places and the like are performed. However, due to the "high-dimensional, overlapping, redundant" nature of the near infrared spectrum, appropriate feature extraction algorithms are used to extract useful information in the spectrum before analysis to obtain better model performance.
Currently, when near infrared spectrum technology is applied to detect and classify foods, a popular feature information extraction method is mainly linear discriminant analysis. The linear discriminant analysis is a dimension reduction technology with labels, and the optimal transformation vector is found by maximizing the ratio of the inter-class distance to the intra-class distance, so that the optimal class discrimination is achieved. However, in practical application, the sample dimension is often larger than the sample number, so that the problem of undersampling is solved, and the non-correlation linear discriminant analysis is an expansion of the problem of linear discriminant analysis, so that redundancy of a transformation space is reduced, and the problem of undersampling is also solved. However, in essence, the non-relevant linear discriminant analysis is also a "hard" feature extraction algorithm, and the extracted feature information cannot completely reflect the original structural information of the sample. The invention introduces a fuzzy set theory based on non-relevant linear discriminant analysis, and provides a tea near infrared spectrum classification method for fuzzy non-relevant linear discriminant analysis to realize variety discrimination of tea.
Disclosure of Invention
Aiming at the undersampling problem of linear discriminant analysis and the characteristic extraction problem of 'hard' class of non-relevant linear discriminant analysis, the invention provides a characteristic information extraction method of fuzzy non-relevant linear discriminant analysis which combines fuzzy set theory with non-relevant linear discriminant analysis for classifying near infrared spectrums of tea. The tea variety classification method for fuzzy non-relevant linear discriminant analysis not only can solve the undersampling problem of linear discriminant analysis, but also can solve the characteristic extraction problem of hard class of non-relevant linear discriminant analysis when extracting the classification discrimination information of tea varieties. Meanwhile, the invention has the advantages of green pollution-free, less detection samples, low identification cost, high discrimination speed, high classification accuracy and the like.
A tea variety classification method for fuzzy non-relevant linear discriminant analysis adopts the technical scheme that the method comprises the following steps:
step one, acquiring near infrared diffuse reflection spectrum data of a tea sample;
step two, preprocessing the near infrared diffuse reflection spectrum of the tea sample;
step three, extracting near-red spectrum identification information of the tea subjected to fuzzy non-relevant linear identification analysis;
and step four, classifying tea varieties by Gath-Geva fuzzy clustering.
The near infrared diffuse reflection spectrum data of the tea sample is obtained after the first step, and particularly the near infrared diffuse reflection spectrum data of the tea sample is collected through an integrating sphere diffuse reflection mode of an Antaris II Fourier transform near infrared spectrum analyzer. Meanwhile, in the process of acquiring near infrared diffuse reflection spectrum data of the tea sample, the stability of factors such as temperature, humidity and the like during acquisition is ensured as much as possible, and the finally obtained tea sample is obtainedNear infrared diffuse reflection spectrum data is wave number range 10000cm -1 ~4000cm -1 1557-dimensional data of (2);
preprocessing near infrared diffuse reflection spectrum data of a tea sample, namely preprocessing the collected near infrared diffuse reflection spectrum data of the tea sample by adopting a Savitzky-Golay first derivative, and dividing the preprocessed tea sample data into a training sample set and a test sample set;
extracting near-red spectrum identification information of the tea subjected to fuzzy non-relevant linear identification analysis, and particularly carrying out dimension reduction treatment and classification identification information extraction on the near-infrared diffuse reflection spectrum data of the tea pretreated in the step two by using a method for extracting near-red spectrum characteristics of the tea subjected to fuzzy non-relevant linear identification analysis; it should be noted that, before performing the dimension reduction process and the classification discrimination information extraction, the number of classes c, the weight index η, the cluster center V and the fuzzy membership degree U need to be initialized. Wherein the clustering center V takes the mean value of each training sample as the clustering center value V j And U in the fuzzy membership matrix U ij The calculation formula of (2) is as follows:
wherein x is i Training sample for near infrared diffuse reflection spectrum of ith tea, v k Is the class center of the k-th class.
The specific process of performing the dimension reduction processing and the classification discrimination information extraction in the third step is as follows:
(1) Given a labeled training sample matrixp 1 For the dimension of the sample, n is the number of samples, S ft ,S fb ,S fw Respectively defining a fuzzy total scattering matrix, a fuzzy inter-class scattering matrix and a fuzzy intra-class scattering matrix of the training sample set:
wherein c is the number of categories, eta is the weight index, and x i For the i th tea near infrared diffuse reflection spectrum training sample,to train the overall sample mean of the sample set, u ij For sample x i Fuzzy membership belonging to class j, v j Is the sample mean (j=1, 2,3, 4) of the j-th sample in the sample set.
(2) Construction matrix H ft ,H fb ,H fw And make it meet
(3) Calculate matrix H ft Singular value decomposition of H ft =G∑S T Wherein the matrixMatrix arrayp 1 For the sample dimension, t=rank (H ft );
(4) Order theWherein, matrix->Is a matrix sigma t Inverse matrix of matrix->As a matrix G 1 Is a transposed matrix of (a). And calculates a singular value decomposition of matrix B, b=pao T Wherein matrix->
(5) Order theWherein matrix Y q Is a matrix consisting of the first q columns of matrix Y, q=rank (H fb );
(6) Finally, a characteristic projection matrix W=Y of fuzzy non-relevant linear discriminant analysis is obtained q The ith (i=1, 2, …, n) training sample x in the training sample set of the second step i Conversion to x' i =x i W, where n is the number of training samples; the kth (k=1, 2, …, n 1 ) Test samples y k Conversion to z k =y k W, where n 1 To test the number of samples.
And step four, namely classifying tea varieties by Gath-Geva fuzzy clustering, wherein the specific process is described as follows:
(1) Initializing: setting the number of tea varieties to be c (+infinity > c is more than or equal to 2), and setting an initial weight index m 0 (+∞>m 0 > 1), maximum number of iterations r max The upper error limit value epsilon, the training sample number n and the test sample number n 1 With training samples x 'in step three' i The mean value of each class of samples in the composed sample set is taken as the initial class center gamma i (0) The initial fuzzy membership is calculated as follows:
γ i (0) an initial class center, z, of class i (i=1, 2, …, c) k Is the kth (k=1, 2, …, n) in step three 1 ) And (3) testing samples.
(2) Calculate the r (r=1, 2, … …, r max ) Membership value μ at multiple iterations ik (r) ;
Membership value mu ik (r) Represents the r (r=1, 2, … …, r max ) The kth sample is subject to the membership value of the ith class in the iterative calculation, D ik For sample z k To the class center gamma i (r-1) Distance norm of (2), andz k for the kth test sample, γ i (r-1) Is the class center value of the i class calculated by the r-1 th iteration; s is S fi Is a fuzzy covariance matrix, and +.>n 1 To test the number of samples, mu ik (r-1) Is the fuzzy membership value of the r-1 th iterative computation; all fuzzy membership forms a fuzzy membership matrix +.>m r Weight index at the r-th iteration, m r =m 0 -rΔm;Δm=(m 0 -1)/r max ;
(3) Calculating the learning rate alpha at the r-th iteration ik,r
(4) Calculating class center gamma at the time of the r-th iteration i (r) (i=1,2,……,c)
Wherein gamma is i (r) For class center of class i (i=1, 2, … …, c) at the r-th iterative calculation, γ i (r-1) Class center of the ith class in the r-1 th iterative computation;
(5) When (when)Or r=r max -1, ending the iteration, otherwise returning to step (2) to continue the iterative computation. After iteration is converged, according to the final fuzzy membership mu ik (r) Discriminating test sample z k Tea belonging to which variety.
The invention has the beneficial effects that:
the tea variety classification method for fuzzy non-relevant linear discriminant analysis can solve the undersampling problem of linear discriminant analysis and the characteristic extraction problem of hard class of non-relevant linear discriminant analysis, has the advantages of green pollution-free, few detection samples, low identification cost, high discrimination speed, high classification accuracy and the like, and can be used for reducing dimension, extracting and discriminating near infrared spectrum data of tea and extracting and analyzing near infrared spectrum data of other foods.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a near infrared diffuse reflectance spectrum of 260 tea samples;
FIG. 3 is a near infrared diffuse reflectance spectrum of tea after Savitzky-Golay first derivative pretreatment;
FIG. 4 is an initial fuzzy membership graph for fuzzy non-relevant linear discriminant analysis;
FIG. 5 is a diagram of test sample data obtained by extracting classification discrimination information from the preprocessed near infrared diffuse reflectance spectrum data of tea through fuzzy non-relevant linear discrimination analysis;
FIG. 6 is an initial fuzzy membership graph of Gath-Geva fuzzy clustering;
FIG. 7 is a final fuzzy membership graph of Gath-Geva fuzzy clustering.
Detailed Description
The invention is further described below with reference to the drawings and examples.
As shown in fig. 1, the specific implementation flow of the present invention is as follows:
step one, obtainNear infrared diffuse reflection spectrum data of a tea sample are taken: four Anhui brand tea leaves of Yuexi Cuilan, liuan Guapian, maofeng and Huangshan Maofeng were collected, the number of samples of each tea leaf was 65, and a total of 260 tea leaf samples. All tea samples were ground and crushed and then filtered through a 40 mesh screen. In the process of acquiring near infrared diffuse reflection spectrum data of a tea sample, the stability and the constancy of the external environments such as temperature, humidity and the like during acquisition are ensured as much as possible. The specific steps of spectrum data acquisition include: firstly, starting up an Antaris II Fourier transform near infrared spectrum analyzer and preheating for 1 hour; second, setting the wave number range, scanning interval and scanning times of spectrum scanning to 10000cm -1 ~4000cm -1 、3.857cm -1 32; thirdly, near infrared diffuse reflection spectrum data of the tea sample are obtained by adopting an integrating sphere diffuse reflection mode of an Antaris II Fourier transform near infrared spectrum analyzer, and the obtained tea spectrum data are 1557-dimensional high-dimensional data. Meanwhile, each tea sample is sampled 3 times, and the average value of the 3 times of sampling is stored in a computer so as to provide experimental data for the establishment of a subsequent model. Fig. 2 shows near infrared diffuse reflection spectrum of 260 tea samples.
Step two, preprocessing a near infrared diffuse reflection spectrum of a tea sample: preprocessing the collected near infrared diffuse reflection spectrum data of the tea sample by using the Savitzky-Golay first derivative, wherein a preprocessed near infrared diffuse reflection spectrum diagram of the tea is shown in figure 3; and randomly distributing the preprocessed tea sample data into a training set and a testing set, wherein the tea samples of each variety randomly extract 22 samples, 88 samples form the training sample set, and the rest 43 samples form the testing sample set.
Step three, extracting near-red spectrum identification information of the tea subjected to fuzzy non-relevant linear identification analysis: and (3) obtaining classification identification information of the tea varieties from the preprocessed tea near infrared diffuse reflection spectrum data in the step two by using a fuzzy non-relevant linear identification analysis tea near infrared spectrum characteristic extraction method to obtain training samples and test samples containing identification information.
Proceeding to stepIn step three, the class number c=4 and the weight index η=1.5 are set first, and the central value v is clustered by the mean value of each class of training samples j Fuzzy membership value u ij The calculation is as follows:
wherein x is i Training sample for near infrared diffuse reflection spectrum of ith tea, v k Is the class center of class k (k=1, 2,3, 4).
The calculation results are as follows:
fuzzy membership value u ij As shown in fig. 4.
The detailed process for extracting the near-red spectrum identification information of the tea subjected to the fuzzy non-relevant linear identification analysis is as follows:
(1) Given a labeled training sample matrixSample dimension p 1 Number of training samples n=88, s, =1557 ft ,S fb ,S fw Respectively defining a fuzzy total scattering matrix, a fuzzy inter-class scattering matrix and a fuzzy intra-class scattering matrix of the training sample set:
wherein, the category number c=4, the weight index eta=1.5 and x i For the i th tea near infrared diffuse reflection spectrum training sample, the total sample mean value of the training sample setu ij For sample x i Fuzzy membership belonging to class j, v j For sample mean (j=1, 2,3, 4) of the j-th class of samples in the sample set, is->Is an intermediate variable.
(2) Construction matrix H ft ,H fb ,H fw And make it meet
(3) Calculate matrix H ft Singular value decomposition of H ft =G∑S T Wherein the matrixS represents an orthogonal matrix of order l×l, matrix +.>p 1 =1557,l=352,t=87,
(4) Order theWherein matrix->Is a matrix sigma t Inverse matrix of matrix->As a matrix G 1 Is used to determine the transposed matrix of (a),and calculates a singular value decomposition of matrix B, b=pao T Wherein, matrix->t=87,/>A is a matrix of order t×r, t=rank (H ft ) R=rank (B), the former r×r matrix is a diagonal matrix, the elements on the diagonal are the singular values of matrix B, and the elements of the remaining (r+1) ×r matrices are all 0
O represents an orthogonal matrix of order r×r, r=rank (B).
(5) Order theWherein matrix Y q Is a matrix consisting of the first q columns of matrix Y, q=3,
(6) Finally, a characteristic projection matrix W=Y of fuzzy non-relevant linear discriminant analysis is obtained q The ith (i=1, 2, …, n) training sample x in the training sample set of the second step i Conversion to x' i =x i W, where n is the number of training samples; the kth (k=1, 2, …, n 1 ) Test samples y k Conversion to z k =y k W, where n 1 To test the number of samples. Test sample z k The data distribution is shown in fig. 5.
Step four, classifying tea varieties by Gath-Geva fuzzy clustering, wherein the specific process is described as follows:
(1) Initializing: setting the number of tea varieties to be c=4 (+infinity > c is more than or equal to 2), and setting an initial weight index m 0 =2.0(+∞>m 0 > 1), maximum number of iterations r max The upper error limit value epsilon=0.00001, the training sample number n=88, and the test sample number n 1 =172, with training samples x 'in step three' i The mean value of each class of samples in the composed sample set is taken as the initial class center gamma i (0) Initial fuzzy membership mu ik (0) The calculation is as follows:
γ i (0) an initial class center, z, of class i (i=1, 2, …, c) k Is the kth (k=1, 2, …, n) in step three 1 ) And (3) testing samples.
Calculation results:
initial fuzzy membership mu ik (0) As shown in fig. 6.
(2) Calculate the r (r=1, 2, … …, r max ) Membership value μ at multiple iterations ik (r) ;Membership value mu ik (r) Represents the r (r=1, 2, … …, r max ) The kth sample is subject to the membership value of the ith class in the iterative calculation, D ik For sample z k To the class center gamma i (r-1) Distance norm of (2), and->z k For the kth test sample, γ i (r-1) Is the class center value of the i class calculated by the r-1 th iteration; s is S fi Is a fuzzy covariance matrix, and +.>n 1 To test the number of samples, mu ik (r-1) Is the fuzzy membership value of the r-1 th iterative computation; all fuzzy membership forms a fuzzy membership matrix +.>m r Weight index at the r-th iteration, m r =m 0 -rΔm;Δm=(m 0 -1)/r max ;
(3) Calculating the learning rate alpha at the r-th iteration ik,r
(4) Calculating class center gamma at the time of the r-th iteration i (r) (i=1,2,……,c)
Wherein gamma is i (r) For class center of class i (i=1, 2, … …, c) at the r-th iterative calculation, γ i (r-1) Class center of the ith class in the r-1 th iterative computation;
(5) When (when)Or r=r max -1, ending the iteration, otherwise returning to step (2) to continue the iterative computation. After iteration is converged, according to the final fuzzy membership mu ik (r) Discriminating test sample z k Tea belonging to which variety.
Experimental results: final fuzzy membership μ after termination of r=2 iterations ik (2) As shown in fig. 7, the classification accuracy of the tea samples in the discrimination test set can reach 100% according to the fuzzy membership.
The above list of detailed descriptions is only specific to practical embodiments of the present invention, and they are not intended to limit the scope of the present invention, and all equivalent embodiments or modifications that do not depart from the spirit of the present invention should be included in the scope of the present invention.
Claims (6)
1. A tea variety classification method for fuzzy non-relevant linear discriminant analysis is characterized by comprising the following steps:
step 1, acquiring near infrared diffuse reflection spectrum data of a tea sample;
step 2, preprocessing a near infrared diffuse reflection spectrum of a tea sample;
step 3, extracting near-red spectrum identification information of the tea by adopting a fuzzy non-relevant linear identification analysis method;
the implementation method of the step 3 comprises the following steps: performing dimension reduction treatment and classification identification information extraction on the near infrared diffuse reflection spectrum data of the tea leaves pretreated in the step 2; the method comprises the following specific steps:
3.1, given a labeled training sample matrixp 1 For the dimension of the sample, n is the number of samples, S ft ,S fb ,S fw Respectively defining a fuzzy total scattering matrix, a fuzzy inter-class scattering matrix and a fuzzy intra-class scattering matrix of the training sample set:
wherein c is the number of categories, eta is the weight index, and x i For the i th tea near infrared diffuse reflection spectrum training sample,to train the overall sample mean of the sample set, u ij For sample x i Fuzzy membership belonging to class j, v j J=1, 2,3,4, which is the sample mean value of the j-th sample in the sample set;
3.2, constructing matrix H ft ,H fb ,H fw And make it meet
3.3, calculating matrix H ft Singular value decomposition of H ft =GΣS T Wherein the matrix g= [ G ] 1 G 2 ],Matrix arrayp 1 For the sample dimension, t=rank (H ft );
3.4, orderWherein, matrix->For matrix sigma t Inverse matrix of matrix->As a matrix G 1 And calculates the singular value decomposition of matrix B, b=pao T Wherein matrix->t=rank(H ft );
3.5, orderWherein matrix Y q Is a matrix consisting of the first q columns of matrix Y, q=rank (H fb );
3.6, finally obtaining the characteristic projection matrix W=Y of the fuzzy non-relevant linear discriminant analysis q The ith training sample x in the training sample set of the second step i Conversion to x' i =x i W, where n is the number of training samples; the kth test sample y in the test set of the step two is processed k Conversion to z k =y k W, where n 1 For the number of test samples; where i=1, 2, …, n, k=1, 2, …, n 1 ;
And 4, classifying tea varieties by adopting a Gath-Geva fuzzy clustering method.
2. The method for classifying tea varieties according to claim 1, wherein the implementation method of step 1 is as follows: collecting near infrared diffuse reflection spectrum data of a tea sample by using an integrating sphere diffuse reflection mode of an Antaris II Fourier transform near infrared spectrum analyzer; specifically:
firstly, starting up an Antaris II Fourier transform near infrared spectrum analyzer and preheating for 1 hour;
second, setting the wave number range, scanning interval and scanning times of spectrum scanning to 10000cm -1 ~4000cm -1 、3.857cm -1 、32;
Thirdly, near infrared diffuse reflection spectrum data of the tea sample are obtained by adopting an integrating sphere diffuse reflection mode of an Antaris II Fourier transform near infrared spectrum analyzer, and the obtained tea spectrum data are 1557-dimensional high-dimensional data.
3. The method for classifying tea varieties by fuzzy non-relevant linear discriminant analysis according to claim 2, wherein the temperature and humidity are ensured to be stable as much as possible during the collection.
4. The method for classifying tea varieties by fuzzy non-relevant linear discriminant analysis according to claim 1, wherein the implementation method of step 2 is as follows: the collected near infrared diffuse reflection spectrum data of the tea samples are preprocessed by adopting the Savitzky-Golay first derivative, and the preprocessed tea sample data are divided into a training sample set and a testing sample set.
5. A method of classifying tea varieties according to claim 1, further comprising: initializing a class number c, a weight index eta, a clustering center V and a fuzzy membership U; wherein the clustering center V takes the mean value of each training sample as the clustering center value V j And fuzzy membership matrixU in U ij The calculation formula of (2) is as follows:
wherein x is i Training sample for near infrared diffuse reflection spectrum of ith tea, v k Is the class center of the k-th class.
6. The method for classifying tea varieties according to claim 1, wherein the implementation of the step 4 comprises the steps of:
4.1, initializing: setting the number of tea varieties as c and the initial weight index m 0 Maximum number of iterations r max The upper error limit value epsilon, the training sample number n and the test sample number n 1 With training samples x 'in step three' i The mean value of each class of samples in the composed sample set is taken as the initial class center gamma i (0) The initial fuzzy membership is calculated as follows:
γ i (0) z is the initial class center of class i k Is the kth test sample in step three; wherein, C is more than or equal to 2, and m is more than or equal to 2 0 >1,i=1,2,…,c,k=1,2,…,n 1 ;
4.2, calculating the membership value mu at the r-th iteration ik (r) ;r=1,2,……,r max ;
Membership value mu ik (r) Representing the membership value of the kth sample to the ith class in the nth iterative calculation, D ik For sample z k To the class center gamma i (r-1) Distance norm of (2), and->z k For the kth test sample, γ i (r-1) Is the class center value of the i class calculated by the r-1 th iteration; s is S fi Is a fuzzy covariance matrix, and +.>n 1 To test the number of samples, mu ik (r-1) Is the fuzzy membership value of the r-1 th iterative computation; all fuzzy membership forms a fuzzy membership matrix +.>m r Weight index at the r-th iteration, m r =m 0 -rm;Δm=(m 0 -1)/r max ;
4.3, calculating the learning Rate α at the r-th iteration ik,r
4.4, calculating the class center gamma at the time of the r iteration i (r) Where i=1, 2, … …, c,
wherein gamma is i (r) For class center of i-th class in the r-th iterative calculation, gamma i (r-1) Class center of the ith class in the r-1 th iterative computation;
4.5 when max i ||γ i (r) -γ i (r-1) || < epsilon or r=r max -1, ending the iteration, otherwise returning to step 4.2 to continue the iterative computation; after iteration is converged, according to the final fuzzy membership mu ik (r) Discriminating test sample z k Tea belonging to which variety。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910505655.7A CN110378373B (en) | 2019-06-12 | 2019-06-12 | Tea variety classification method for fuzzy non-relevant linear discriminant analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910505655.7A CN110378373B (en) | 2019-06-12 | 2019-06-12 | Tea variety classification method for fuzzy non-relevant linear discriminant analysis |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110378373A CN110378373A (en) | 2019-10-25 |
CN110378373B true CN110378373B (en) | 2024-03-12 |
Family
ID=68250185
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910505655.7A Active CN110378373B (en) | 2019-06-12 | 2019-06-12 | Tea variety classification method for fuzzy non-relevant linear discriminant analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110378373B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111595803A (en) * | 2020-05-09 | 2020-08-28 | 滁州职业技术学院 | Apple near infrared spectrum classification method based on exponential distance measure fuzzy clustering |
CN112801174A (en) * | 2021-01-25 | 2021-05-14 | 江苏大学 | Tea variety classification method for fuzzy linear machine learning |
CN112801172A (en) * | 2021-01-25 | 2021-05-14 | 江苏大学 | Chinese cabbage pesticide residue qualitative analysis method based on fuzzy pattern recognition |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109685098A (en) * | 2018-11-12 | 2019-04-26 | 江苏大学 | The local tea variety classification method of cluster is separated between a kind of Fuzzy Cluster |
-
2019
- 2019-06-12 CN CN201910505655.7A patent/CN110378373B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109685098A (en) * | 2018-11-12 | 2019-04-26 | 江苏大学 | The local tea variety classification method of cluster is separated between a kind of Fuzzy Cluster |
Non-Patent Citations (1)
Title |
---|
模糊非相关鉴别C均值聚类的茶叶傅里叶红外光谱分类;武小红等;《光谱学与光谱分析》;20180630;第38卷(第6期);第1719-1723页 * |
Also Published As
Publication number | Publication date |
---|---|
CN110378373A (en) | 2019-10-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107677647B (en) | Method for identifying origin of traditional Chinese medicinal materials based on principal component analysis and BP neural network | |
CN110378373B (en) | Tea variety classification method for fuzzy non-relevant linear discriminant analysis | |
CN101819141B (en) | Maize variety identification method based on near infrared spectrum and information processing | |
CN103048273B (en) | Fruit near infrared spectrum sorting method based on fuzzy clustering | |
CN110378374B (en) | Tea near infrared spectrum classification method for extracting fuzzy identification information | |
CN109685098B (en) | Tea variety classification method for fuzzy inter-cluster separation and clustering | |
CN106408012A (en) | Tea infrared spectrum classification method of fuzzy discrimination clustering | |
CN107192686B (en) | Method for identifying possible fuzzy clustering tea varieties by fuzzy covariance matrix | |
CN105181650A (en) | Method for quickly identifying tea varieties through near-infrared spectroscopy technology | |
CN104374739A (en) | Identification method for authenticity of varieties of seeds on basis of near-infrared quantitative analysis | |
CN108764288A (en) | A kind of GK differentiates the local tea variety sorting technique of cluster | |
CN103278467A (en) | Rapid nondestructive high-accuracy method with for identifying abundance degree of nitrogen element in plant leaf | |
CN107271394A (en) | A kind of fuzzy Kohonen differentiates the tealeaves infrared spectrum sorting technique of clustering network | |
CN108872128B (en) | Tea infrared spectrum classification method based on fuzzy non-correlated C-means clustering | |
CN109685099B (en) | Apple variety distinguishing method based on spectrum band optimization fuzzy clustering | |
CN108491894B (en) | Tea leaf classification method capable of fuzzy identification of C-means clustering | |
CN110414549B (en) | Tea near infrared spectrum classification method for fuzzy orthogonal linear discriminant analysis | |
CN109886296A (en) | A kind of authentication information extracts the local tea variety classification method of formula noise cluster | |
CN106570520A (en) | Infrared spectroscopy tea quality identification method mixed with GK clustering | |
CN109001181A (en) | A kind of edible oil type method for quick identification of Raman spectrum canonical correlation analysis fusion | |
CN111595804A (en) | Fuzzy clustering tea near infrared spectrum classification method | |
CN111881738B (en) | Near infrared spectrum classification method for tea leaves through nuclear fuzzy orthogonal discriminant analysis | |
CN110108661B (en) | Tea near infrared spectrum classification method based on fuzzy maximum entropy clustering | |
CN112801173B (en) | Lettuce near infrared spectrum classification method based on QR fuzzy discriminant analysis | |
CN102999765B (en) | The pork storage time decision method of adaptive boosting method and irrelevant discriminatory analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20240719 Address after: 423000 Industrial Undertaking Park, Economic Development Zone, Yizhang County, Chenzhou City, Hunan Province Patentee after: Yizhang Huyi Agricultural Development Co.,Ltd. Country or region after: China Address before: Zhenjiang City, Jiangsu Province, 212013 Jingkou District Road No. 301 Patentee before: JIANGSU University Country or region before: China |
|
TR01 | Transfer of patent right |