CN113313150A - Electronic tongue detection method and system based on PCA and random forest - Google Patents
Electronic tongue detection method and system based on PCA and random forest Download PDFInfo
- Publication number
- CN113313150A CN113313150A CN202110533466.8A CN202110533466A CN113313150A CN 113313150 A CN113313150 A CN 113313150A CN 202110533466 A CN202110533466 A CN 202110533466A CN 113313150 A CN113313150 A CN 113313150A
- Authority
- CN
- China
- Prior art keywords
- random forest
- characteristic
- electronic tongue
- pca
- data set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000007637 random forest analysis Methods 0.000 title claims abstract description 77
- 238000001514 detection method Methods 0.000 title claims abstract description 47
- 239000007788 liquid Substances 0.000 claims abstract description 54
- 238000000513 principal component analysis Methods 0.000 claims abstract description 43
- 238000012360 testing method Methods 0.000 claims abstract description 25
- 238000012549 training Methods 0.000 claims abstract description 23
- 230000004044 response Effects 0.000 claims abstract description 22
- 238000000034 method Methods 0.000 claims abstract description 16
- 230000009467 reduction Effects 0.000 claims abstract description 11
- 238000012545 processing Methods 0.000 claims abstract description 6
- 238000003066 decision tree Methods 0.000 claims description 34
- 239000011159 matrix material Substances 0.000 claims description 10
- 230000001186 cumulative effect Effects 0.000 claims description 6
- 239000012535 impurity Substances 0.000 claims description 5
- 238000012952 Resampling Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 claims description 3
- 230000000630 rising effect Effects 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 238000004458 analytical method Methods 0.000 abstract description 4
- 210000002105 tongue Anatomy 0.000 description 33
- 238000010586 diagram Methods 0.000 description 6
- 235000014101 wine Nutrition 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 4
- 235000013305 food Nutrition 0.000 description 4
- 235000015096 spirit Nutrition 0.000 description 3
- 238000010276 construction Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012271 agricultural production Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 230000007123 defense Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000009776 industrial production Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 235000019991 rice wine Nutrition 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 231100000331 toxic Toxicity 0.000 description 1
- 230000002588 toxic effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2135—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The invention discloses an electronic tongue detection method and system based on PCA and random forest, comprising the following steps: collecting a liquid sample by using an integrated electrode of the electronic tongue to obtain response data X; step 2: carrying out cycle division on the response data X, respectively extracting a plurality of features from the data in each cycle to obtain a new feature data set X ', and carrying out PCA (principal component analysis) dimensionality reduction on the feature data set X' to obtain a single sample data set Y; and step 3: dividing a plurality of sample data sets into training set samples and test set samples, constructing a random forest classifier by using the training set samples, and evaluating the constructed random forest classifier by using the test set samples; and 4, step 4: the method has the advantages that the sample data set corresponding to the liquid sample to be detected is input to the trained random forest classifier, the detection result of the liquid sample is obtained, the functions of automatic acquisition, processing and data uploading are achieved, on one hand, the size of the electronic tongue is greatly reduced, and on the other hand, the efficiency and the accuracy of liquid component analysis are improved.
Description
Technical Field
The invention relates to the field of electronic tongues, in particular to a PCA (principal component analysis) and random forest based electronic tongue detection method and system.
Background
People eat food as days, and with the rapid development of Chinese economy, the requirement of people on food taste is higher and higher. However, at present, the research on taste in the food industry is mainly based on artificial evaluation, the testing process consumes a great amount of manpower and time, and the testing result is subjective and can not be repeated. Meanwhile, the ordinary families also lack a means for quickly detecting the true and false food and toxic and harmful substances. The analysis of liquid components is also widely applied to the fields of industrial production, agricultural production control, medicine, safety and national defense.
The PCA and random forest based detection method can be used for predicting classification at present, and the random forest is not iterated and tree length is not limited, so that a model is relatively simple and has little possibility of overfitting and high classification accuracy, but the PCA and random forest algorithm is not applied to liquid detection at present.
Disclosure of Invention
In order to solve the problems, the invention provides an electronic tongue detection method and system based on PCA and random forest, which utilizes integrated electrodes in an electronic tongue to collect liquid sample information, takes cross response as a basic principle, combines the PCA and the random forest detection method to analyze data, achieves the aim of detecting a liquid sample, has the functions of automatically collecting, processing and uploading data, greatly reduces the size of the electronic tongue on one hand, and improves the efficiency and the accuracy of liquid component analysis on the other hand.
In order to achieve the above purpose, the invention adopts a technical scheme that:
an electronic tongue detection method based on PCA and random forest is characterized by comprising the following steps:
step 1: collecting a liquid sample by using an integrated electrode of the electronic tongue to obtain response data X;
step 2: carrying out cycle division on the response data X, respectively extracting a plurality of features from the data in each cycle to obtain a new feature data set X ', and carrying out PCA (principal component analysis) dimensionality reduction on the feature data set X' to obtain a single sample data set Y;
and step 3: dividing a plurality of sample data sets into training set samples and test set samples, constructing a random forest classifier by using the training set samples, and evaluating the constructed random forest classifier by using the test set samples;
and 4, step 4: inputting a sample data set corresponding to a liquid sample to be detected into a trained random forest classifier to obtain a plurality of preliminary decision tree classification results of the liquid sample, and obtaining a detection result of the liquid sample by using a majority voting method for the plurality of preliminary decision tree classification results.
Further, step 2 comprises the steps of:
step 21: carrying out cycle division on the response data X, dividing every N data into one cycle, totally dividing the cycle into X/N cycles, and respectively extracting a plurality of features from the data in each cycle to obtain a new feature data set X';
step 22: for m × n dimensional feature data setCarrying out standardization treatment, wherein m is the number of working electrodes in the integrated electrode, and n is the number of features in a plurality of periods;
the matrix elements formed after normalization of the feature data set X' are:in the formula XjColumn vector, X, for feature data set XijFor each element of the characteristic data set X', i 1,2j)、Var(Xj) Respectively, mean and variance of the elements of column j, i.e.
Step 23: computing normalized matricesThe matrix of correlation coefficients between dimensions R ═ R (R)ij)n×nThe element calculation method comprises the following steps:in the formulaFor normalized matrixCovariance between ith and jth columns;
step 24: calculating the characteristic value and the characteristic vector, and solving the characteristic value lambda according to a characteristic equation lambda I-R0jJ 1,2, and n, and the feature values are in descending order; for each eigenvalue lambdajFinding its feature vector as ej,j=1,2,...,n;
Step 25: calculating principal component contribution rate and cumulative contribution rate, principal component zjThe contribution rate isThe cumulative contribution rate isGenerally, a characteristic value lambda with the accumulated contribution rate of 85-95 percent is taken1、λ2、...、λpThe first, second and p (p ≦ n) th principal components, thereby constituting a sample data set Y.
Further, step 3 comprises the steps of:
step 31: randomly dividing a plurality of sample data sets into training set samples and testing set samples by a self-help resampling technology, wherein the training set samples are used for constructing a random forest classifier, and the testing set samples are used for evaluating the effect of the constructed random forest classifier;
step 32: the method comprises the following steps of constructing a decision tree in a random forest classifier by using a kini coefficient:
step 321: for each decision tree in the random forest, randomly selecting t characteristic data from a training set sample according to a certain proportion, and calculating the impurity degree of each characteristic node in the t characteristic nodes:wherein the expression represents the proportion of samples belonging to the class i in the characteristic t;
Step 322: selecting a characteristic node with the maximum impurity degree to start branching, carrying out first segmentation, and dividing the current characteristic node into a plurality of characteristic sub-nodes;
step 323: repeating the step 321 and the step 322 until the current feature node can not be branched any more, namely when the current feature node only contains one attribute class, a complete decision tree is constructed;
step 33: evaluating the constructed random forest classifier by using the test set samples, and if an over-fitting phenomenon or an under-fitting phenomenon occurs, entering step 34;
step 34: and optimizing parameters such as the number of decision trees in the random forest, the maximum depth of the decision trees, the characteristic number, the minimum sample size and the like by using a grid search method, thereby constructing an optimized random forest classifier.
Further, the features extracted in step 21 include a maximum-minimum value, an integrated area value, and a rising maximum slope value.
The invention also provides an electronic tongue detection system based on PCA and random forest, which comprises electronic tongue equipment and an upper computer, wherein the electronic tongue equipment comprises an integrated electrode, a controller, an acquisition module and a wireless module;
the upper computer is provided with a PCA and random forest algorithm program to realize the detection of the liquid sample, and the program comprises the following execution steps:
step 1: carrying out cycle division on the response data X, respectively extracting a plurality of characteristics from the liquid sample in each cycle to obtain a new characteristic data set X ', and carrying out PCA (principal component analysis) dimensionality reduction processing on the characteristic data set X' to obtain a single sample data set Y;
step 2: dividing a plurality of sample data sets into training set samples and test set samples, constructing a random forest classifier by using the training set samples, and evaluating the constructed random forest classifier by using the test set samples;
and step 3: inputting a sample data set corresponding to a liquid sample to be detected into a trained random forest classifier to obtain a plurality of preliminary decision tree classification results of the liquid sample, and obtaining a detection result of the liquid sample by using a majority voting method for the plurality of preliminary decision tree classification results.
Further, in step 2, a decision tree in the random forest classifier is constructed by using the kini coefficient.
Further, the electronic tongue device further comprises a display module, the upper computer sends the detection result of the liquid sample to the electronic tongue device, and the controller is further connected with the display module and used for displaying the detection result of the liquid sample.
The invention has the beneficial effects that: the liquid sample detection method has the advantages that the integrated electrodes in the electronic tongue are used for collecting liquid sample information, cross response is used as a basic principle, PCA is used in an upper computer for carrying out dimensionality reduction on the sample information, and then a random forest classifier is used for carrying out detection, so that the purpose of detecting the liquid sample is achieved, the functions of automatically collecting, processing and uploading data are achieved, on one hand, the size of the electronic tongue is greatly reduced, and on the other hand, the efficiency and the accuracy of liquid component analysis are improved.
Drawings
FIG. 1 is a control diagram of an embodiment of an electronic tongue detection system based on PCA and random forest;
FIG. 2 is a flow chart of an embodiment of a PCA and random forest based electronic tongue detection method;
FIG. 3 is a diagram illustrating an overall structure of the electronic tongue device according to an embodiment;
FIG. 4 is a graph of response data for an integrated electrode of an electronic tongue to collect a liquid sample in one embodiment;
FIG. 5 is a feature data set X formed after feature extraction of response data in one embodiment;
FIG. 6 is a schematic representation of a feature data set PCA dimensionality reduction in one example;
FIG. 7 is a diagram illustrating decision tree construction in a random forest classifier in an example;
FIG. 8 is a schematic diagram of an example of PCA dimensionality reduction for different types of white spirits;
FIG. 9 is a diagram of random forest confusion matrices for different classes of liquor in one example.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
As shown in fig. 1, an electronic tongue detection system based on PCA and random forest includes an electronic tongue device and an upper computer, the electronic tongue device includes an integrated electrode, a controller, an acquisition module, a wireless module and a display module, the controller is connected with the integrated electrode through the acquisition module, the controller utilizes the wireless module to send response data X obtained by acquiring a liquid sample by the integrated electrode to the upper computer, the upper computer sends a detection result of the liquid sample to the electronic tongue device, and the controller is further connected with the display module for displaying the detection result of the liquid sample.
The electronic tongue device in this embodiment is shown in fig. 3, and the integrated electrode in the electronic tongue device has 6 working electrodes, and the working process is as follows: the controller controls the acquisition module to send out a large-frequency pulse signal to the integrated electrode 3 for sample detection, the pulse voltage range is-1V, the pulse step is 20-200 mV, and the pulse time is 0.01 s-1 s; the acquisition module simultaneously acquires the signal data output by the integrated electrode 3 and sends the signal data to the controller, the controller sends the signal data to the computer to run a relevant algorithm for detection, wherein the flow of processing the signal data by the computer is as follows: the computer adopts an algorithm to extract parameters such as extreme values, integral areas or average values and the like to form a characteristic map, namely data compression is carried out; and then, analyzing the compressed characteristic spectrum by adopting PCA and a random forest algorithm to realize the detection of the liquid.
As shown in fig. 2, an electronic tongue detection method based on PCA and random forest includes the following steps:
step 1: collecting a liquid sample by using an integrated electrode of the electronic tongue to obtain response data X, wherein the integrated electrode in the electronic tongue device in the embodiment is provided with 6 working electrodes, and the corresponding response data X is shown in FIG. 4;
step 2: dividing the response data X periodically, and extracting a plurality of features from the data in each period to obtain a new feature data set X ', wherein the feature data set X' is a 6X n-dimensional matrix as shown in FIG. 5, n is the total number of the features in the periods, and 6 is the number of the working electrodes; carrying out PCA dimensionality reduction on the feature data set X' to obtain a single sample data set Y;
and step 3: dividing a plurality of sample data sets into training set samples and test set samples, constructing a random forest classifier by using the training set samples, and evaluating the constructed random forest classifier by using the test set samples;
and 4, step 4: inputting a sample data set corresponding to a liquid sample to be detected into a trained random forest classifier to obtain a plurality of preliminary decision tree classification results of the liquid sample, and obtaining a detection result of the liquid sample by using a majority voting method for the plurality of preliminary decision tree classification results.
Further, as shown in fig. 6, step 2 includes the following steps:
step 21: carrying out cycle division on the response data X, dividing every N data into one cycle, dividing the cycle into X/N cycles, and extracting a plurality of characteristics from the liquid sample in each cycle to obtain a new characteristic data set X'; the features extracted in step 21 include a maximum-minimum value, an integrated area value (AUC), and a rising maximum slope value;
step 22: for m × n dimensional feature data setCarrying out standardization treatment, wherein m is the number of working electrodes in the integrated electrode, and n is the number of features in a plurality of periods;
the matrix elements formed after normalization of the feature data set X' are:in the formula XjColumn vector, X, for feature data set XijFor each element of the feature data set X', i ═ 1,2,...,m,j=1,2,...,n,E(Xj)、Var(Xj) Respectively, mean and variance of the elements of column j, i.e.
Step 23: computing normalized matricesThe matrix of correlation coefficients between dimensions R ═ R (R)ij)n×nThe element calculation method comprises the following steps:in the formulaFor normalized matrixCovariance between ith and jth columns;
step 24: calculating the characteristic value and the characteristic vector, and solving the characteristic value lambda according to a characteristic equation lambda I-R0jJ 1,2, and n, and the feature values are in descending order; for each eigenvalue lambdajFinding its feature vector as ej,j=1,2,...,n;
Step 25: calculating principal component contribution rate and cumulative contribution rate, principal component zjThe contribution rate isThe cumulative contribution rate isGenerally, a characteristic value lambda with the accumulated contribution rate of 85-95 percent is taken1、λ2、...、λpThe first, second and p (p ≦ n) th principal components, thereby constituting a sample data set Y.
step 31: randomly dividing a plurality of sample data sets into training set samples and testing set samples by a self-help resampling technology, wherein the training set samples are used for constructing a random forest classifier, and the testing set samples are used for evaluating the effect of the constructed random forest classifier;
step 32: the method comprises the following steps of constructing a decision tree in a random forest classifier by using a kini coefficient:
step 321: for each decision tree in the random forest, randomly selecting t characteristic data from a training set sample according to a certain proportion, and calculating the impurity degree of each characteristic node in the t characteristic nodes:the proportion of samples belonging to the category i in the characteristic t is represented in the formula;
step 322: selecting a characteristic node with the maximum impurity degree to start branching, carrying out first segmentation, and dividing the current characteristic node into a plurality of characteristic sub-nodes;
step 323: repeating the step 321 and the step 322 until the current feature node can not be branched any more, namely when the current feature node only contains one attribute class, a complete decision tree is constructed;
step 33: evaluating the constructed random forest classifier by using the test set samples, and if an over-fitting phenomenon or an under-fitting phenomenon occurs, entering step 34;
step 34: and optimizing parameters such as the number of decision trees in the random forest, the maximum depth of the decision trees, the characteristic number, the minimum sample size and the like by using a grid search method, thereby constructing an optimized random forest classifier.
Fig. 7 is a schematic diagram of decision tree construction in a random forest classifier, where the number of samples is 40 in total and 5 in total, and the kini coefficient is branched from large to small, and as can be seen from fig. 7, the initial judgment result of one decision tree is: 8 samples belong to 1 class, 8 samples belong to 2 classes, 9 samples belong to 3 classes, 7 samples belong to 4 classes, 8 samples belong to 5 classes, the samples are judged to belong to 3 classes, and then a majority voting method is used for a plurality of preliminary decision tree classification results to obtain a final detection result of the liquid sample.
In a specific embodiment, the response data set X is 880 data divided into 11 cycles, 3 features are extracted in each cycle to form a feature data set X with dimensions of 6X33, the liquid sample is divided into 4 series by white spirit, the series 1 is 5 kinds of wine, the white spirit in different years is screened, and the data from #1 to #5 are respectively: excellent in 2006, excellent in 2009, excellent in 2012, excellent in 2015 and excellent in 2018; the series 2 is 5 types of white spirits, from #1 to # 5: the rice wine is specially superior in 2019, lees in 2019, Xiaoqu in 2019 and fragrant in 2019; series 3 is a special grade wine doped with different concentrations in base wine, and the doped concentrations from #1 to #5 are respectively 3%, 6%, 9%, 12% and 15%; series 4; the series 4 is super wine of 5 years continuously, 2016-2020 is respectively adopted from #1 to #5, a sample data set Y is formed after dimensionality reduction is carried out on a characteristic data set X' by PCA, as shown in FIG. 8, and at the moment, different types of wine in each series can be distinguished; as shown in fig. 9, different white spirits can be analyzed and detected by using the random forest classifier, and the accuracy is high.
The invention also provides an electronic tongue detection system based on PCA and random forest, which comprises electronic tongue equipment and an upper computer, wherein the electronic tongue equipment comprises an integrated electrode, a controller, an acquisition module and a wireless module;
the upper computer is provided with a PCA and random forest algorithm program to realize the detection of the liquid sample, and the program comprises the following execution steps:
step 1: carrying out cycle division on the response data X, respectively extracting a plurality of features from the data in each cycle to obtain a new feature data set X ', and carrying out PCA (principal component analysis) dimensionality reduction on the feature data set X' to obtain a single sample data set Y;
step 2: dividing a plurality of sample data sets into training set samples and test set samples, constructing a random forest classifier by using the training set samples, and evaluating the constructed random forest classifier by using the test set samples;
and step 3: inputting a sample data set corresponding to a liquid sample to be detected into a trained random forest classifier to obtain a plurality of preliminary decision tree classification results of the liquid sample, and obtaining a detection result of the liquid sample by using a majority voting method for the plurality of preliminary decision tree classification results.
Further, in step 2, a decision tree in the random forest classifier is constructed by using the kini coefficient.
Further, the electronic tongue device further comprises a display module, the upper computer sends the detection result of the liquid sample to the electronic tongue device, and the controller is further connected with the display module and used for displaying the detection result of the liquid sample.
The above description is only a few of the preferred embodiments of the present application and is not intended to limit the present application, which may be modified and varied by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.
Claims (7)
1. An electronic tongue detection method based on PCA and random forest is characterized by comprising the following steps:
step 1: collecting a liquid sample by using an integrated electrode of the electronic tongue to obtain response data X;
step 2: carrying out cycle division on the response data X, respectively extracting a plurality of features from the data in each cycle to obtain a new feature data set X ', and carrying out PCA (principal component analysis) dimensionality reduction on the feature data set X' to obtain a single sample data set Y;
and step 3: dividing a plurality of sample data sets into training set samples and test set samples, constructing a random forest classifier by using the training set samples, and evaluating the constructed random forest classifier by using the test set samples;
and 4, step 4: inputting a sample data set corresponding to a liquid sample to be detected into a trained random forest classifier to obtain a plurality of preliminary decision tree classification results of the liquid sample, and obtaining a detection result of the liquid sample by using a majority voting method for the plurality of preliminary decision tree classification results.
2. An electronic tongue detection method based on PCA and random forest as claimed in claim 1 wherein step 2 comprises the steps of:
step 21: carrying out cycle division on the response data X, dividing every N data into one cycle, totally dividing the cycle into X/N cycles, and respectively extracting a plurality of features from the data in each cycle to obtain a new feature data set X';
step 22: for m × n dimensional feature data setCarrying out standardization treatment, wherein m is the number of working electrodes in the integrated electrode, and n is the number of features in a plurality of periods;
matrix formed by standardizing characteristic data set XThe elements are as follows:in the formula XjColumn vector, X, for feature data set XijFor each element of the characteristic data set X', i 1,2j)、Var(Xj) Respectively, mean and variance of the elements of column j, i.e.
Step 23: computing normalized matricesThe matrix of correlation coefficients between dimensions R ═ R (R)ij)n×nThe element calculation method comprises the following steps:in the formulaFor normalized matrixCovariance between ith and jth columns;
step 24: calculating the characteristic value and the characteristic vector, and solving the characteristic value lambda according to a characteristic equation lambda I-R0jJ 1,2, and n, and the feature values are in descending order; for each eigenvalue lambdajFinding its feature vector as ej,j=1,2,...,n;
Step 25: calculating principal component contribution rate and cumulative contribution rate, principal component zjThe contribution rate isThe cumulative contribution rate isGenerally, a characteristic value lambda with the accumulated contribution rate of 85-95 percent is taken1、λ2、...、λpThe first, second and p (p ≦ n) th principal components, thereby constituting a sample data set Y.
3. An electronic tongue detection method based on PCA and random forest as claimed in claim 2 wherein step 3 comprises the steps of:
step 31: randomly dividing a plurality of sample data sets into training set samples and testing set samples by a self-help resampling technology, wherein the training set samples are used for constructing a random forest classifier, and the testing set samples are used for evaluating the effect of the constructed random forest classifier;
step 32: the method comprises the following steps of constructing a decision tree in a random forest classifier by using a kini coefficient:
step 321: for each decision tree in the random forest, randomly selecting t characteristic data from the training set samples according to a certain proportion, and calculating each characteristic in t characteristic nodesImpurity degree of the node:the proportion of samples belonging to the category i in the characteristic t is represented in the formula;
step 322: selecting a characteristic node with the maximum impurity degree to start branching, carrying out first segmentation, and dividing the current characteristic node into a plurality of characteristic sub-nodes;
step 323: repeating the step 321 and the step 322 until the current feature node can not be branched any more, namely when the current feature node only contains one attribute class, a complete decision tree is constructed;
step 33: evaluating the constructed random forest classifier by using the test set samples, and if an over-fitting phenomenon or an under-fitting phenomenon occurs, entering step 34;
step 34: and optimizing parameters such as the number of decision trees in the random forest, the maximum depth of the decision trees, the characteristic number, the minimum sample size and the like by using a grid search method, thereby constructing an optimized random forest classifier.
4. An electronic tongue detection method based on PCA and random forests as claimed in claim 2 wherein the features extracted in step 21 include maximum and minimum values, integrated area values and rising maximum slope values.
5. The PCA and random forest based electronic tongue detection system as claimed in claim 1, wherein the electronic tongue detection system comprises electronic tongue equipment and an upper computer, the electronic tongue equipment comprises an integrated electrode, a controller, a collection module and a wireless module, the controller is connected with the integrated electrode through the collection module, and the controller utilizes the wireless module to send response data X obtained by collecting a liquid sample by the integrated electrode to the upper computer;
the upper computer is provided with a PCA and random forest algorithm program to realize the detection of the liquid sample, and the program comprises the following execution steps:
step 1: carrying out cycle division on the response data X, respectively extracting a plurality of characteristics from the liquid sample in each cycle to obtain a new characteristic data set X ', and carrying out PCA (principal component analysis) dimensionality reduction processing on the characteristic data set X' to obtain a single sample data set Y;
step 2: dividing a plurality of sample data sets into training set samples and test set samples, constructing a random forest classifier by using the training set samples, and evaluating the constructed random forest classifier by using the test set samples;
and step 3: inputting a sample data set corresponding to a liquid sample to be detected into a trained random forest classifier to obtain a plurality of preliminary decision tree classification results of the liquid sample, and obtaining a detection result of the liquid sample by using a majority voting method for the plurality of preliminary decision tree classification results.
6. An electronic tongue detection system based on PCA and random forest as claimed in claim 5 wherein step 2 is to use the kini coefficients to construct decision trees in the random forest classifier.
7. The PCA and random forest based electronic tongue detection system as claimed in claim 5, wherein the electronic tongue device further comprises a display module, the upper computer sends the detection result of the liquid sample to the electronic tongue device, and the controller is further connected with the display module for displaying the detection result of the liquid sample.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110533466.8A CN113313150A (en) | 2021-05-17 | 2021-05-17 | Electronic tongue detection method and system based on PCA and random forest |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110533466.8A CN113313150A (en) | 2021-05-17 | 2021-05-17 | Electronic tongue detection method and system based on PCA and random forest |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113313150A true CN113313150A (en) | 2021-08-27 |
Family
ID=77373459
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110533466.8A Withdrawn CN113313150A (en) | 2021-05-17 | 2021-05-17 | Electronic tongue detection method and system based on PCA and random forest |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113313150A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114418041A (en) * | 2022-03-31 | 2022-04-29 | 合肥工业大学 | Electronic tongue liquor detection method based on IG-HSIC-SVM |
CN114636736A (en) * | 2021-11-08 | 2022-06-17 | 滁州怡然传感技术研究院有限公司 | Electronic tongue white spirit detection method based on AIF-1DCNN |
CN114925756A (en) * | 2022-05-07 | 2022-08-19 | 上海燕龙基再生资源利用有限公司 | Waste glass classified recovery method and device based on fine management |
-
2021
- 2021-05-17 CN CN202110533466.8A patent/CN113313150A/en not_active Withdrawn
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114636736A (en) * | 2021-11-08 | 2022-06-17 | 滁州怡然传感技术研究院有限公司 | Electronic tongue white spirit detection method based on AIF-1DCNN |
CN114418041A (en) * | 2022-03-31 | 2022-04-29 | 合肥工业大学 | Electronic tongue liquor detection method based on IG-HSIC-SVM |
CN114418041B (en) * | 2022-03-31 | 2022-06-21 | 合肥工业大学 | Electronic tongue white spirit detection method based on IG-HSIC-SVM |
CN114925756A (en) * | 2022-05-07 | 2022-08-19 | 上海燕龙基再生资源利用有限公司 | Waste glass classified recovery method and device based on fine management |
CN114925756B (en) * | 2022-05-07 | 2022-11-11 | 上海燕龙基再生资源利用有限公司 | Waste glass classified recovery method and device based on fine management |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113313150A (en) | Electronic tongue detection method and system based on PCA and random forest | |
Souza et al. | Challenges in benchmarking stream learning algorithms with real-world data | |
CN111311128A (en) | Consumption financial credit scoring card development method based on third-party data | |
CN112580749B (en) | Intelligent fire detection method based on machine olfaction technology | |
CN103499609B (en) | A kind of method that honey fragrance intelligence sense of smell dynamic response feature and differentiation information dynamic characterization are studied | |
CN104849321B (en) | A kind of method based on smell finger-print quick detection Quality Parameters in Orange | |
CN108717548B (en) | Behavior recognition model updating method and system for dynamic increase of sensors | |
Shvydka et al. | Optimum sample size to estimate mean parasite abundance in fish parasite surveys | |
CN110794090A (en) | Emotion electronic nose implementation method | |
CN114113471A (en) | Method and system for detecting food freshness of artificial nose refrigerator based on machine learning | |
Ibrahim et al. | Palm leaf nutrient deficiency detection using convolutional neural network (CNN) | |
CN117172430B (en) | Deep learning-based water body environment assessment and prediction method and system | |
Pratiwi et al. | The application of graphology and enneagram techniques in determining personality type based on handwriting features | |
CN101950334A (en) | Information system danger sense method and system based on computer immunity | |
CN117789038A (en) | Training method of data processing and recognition model based on machine learning | |
Pratiwi et al. | Personality type assessment system by using enneagram-graphology techniques on digital handwriting | |
Wang et al. | The recognition of different odor using convolutional neural networks extracted from time and temperature features | |
CN117371604A (en) | Agricultural production prediction method and system based on intelligent perception | |
CN110096708A (en) | A kind of determining method and device of calibration collection | |
CN113378935B (en) | Intelligent olfactory sensation identification method for gas | |
Abas et al. | Agarwood oil quality classifier using machine learning | |
Liu et al. | Convenient and accurate method for the identification of Chinese teas by an electronic nose | |
CN112487991B (en) | High-precision load identification method and system based on characteristic self-learning | |
CN101884045A (en) | Mixed statistical and numerical model for sensor array detection and classification | |
CN110622042A (en) | Analysis device, stratum generation device, analysis method, stratum generation method, and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20210827 |