CN113313150A - Electronic tongue detection method and system based on PCA and random forest - Google Patents

Electronic tongue detection method and system based on PCA and random forest Download PDF

Info

Publication number
CN113313150A
CN113313150A CN202110533466.8A CN202110533466A CN113313150A CN 113313150 A CN113313150 A CN 113313150A CN 202110533466 A CN202110533466 A CN 202110533466A CN 113313150 A CN113313150 A CN 113313150A
Authority
CN
China
Prior art keywords
random forest
characteristic
electronic tongue
pca
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110533466.8A
Other languages
Chinese (zh)
Inventor
章伟
朱亚龙
刘嘉明
朱晓龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Yideguan Electronic Technology Co ltd
Original Assignee
Nanjing Yideguan Electronic Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Yideguan Electronic Technology Co ltd filed Critical Nanjing Yideguan Electronic Technology Co ltd
Priority to CN202110533466.8A priority Critical patent/CN113313150A/en
Publication of CN113313150A publication Critical patent/CN113313150A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention discloses an electronic tongue detection method and system based on PCA and random forest, comprising the following steps: collecting a liquid sample by using an integrated electrode of the electronic tongue to obtain response data X; step 2: carrying out cycle division on the response data X, respectively extracting a plurality of features from the data in each cycle to obtain a new feature data set X ', and carrying out PCA (principal component analysis) dimensionality reduction on the feature data set X' to obtain a single sample data set Y; and step 3: dividing a plurality of sample data sets into training set samples and test set samples, constructing a random forest classifier by using the training set samples, and evaluating the constructed random forest classifier by using the test set samples; and 4, step 4: the method has the advantages that the sample data set corresponding to the liquid sample to be detected is input to the trained random forest classifier, the detection result of the liquid sample is obtained, the functions of automatic acquisition, processing and data uploading are achieved, on one hand, the size of the electronic tongue is greatly reduced, and on the other hand, the efficiency and the accuracy of liquid component analysis are improved.

Description

Electronic tongue detection method and system based on PCA and random forest
Technical Field
The invention relates to the field of electronic tongues, in particular to a PCA (principal component analysis) and random forest based electronic tongue detection method and system.
Background
People eat food as days, and with the rapid development of Chinese economy, the requirement of people on food taste is higher and higher. However, at present, the research on taste in the food industry is mainly based on artificial evaluation, the testing process consumes a great amount of manpower and time, and the testing result is subjective and can not be repeated. Meanwhile, the ordinary families also lack a means for quickly detecting the true and false food and toxic and harmful substances. The analysis of liquid components is also widely applied to the fields of industrial production, agricultural production control, medicine, safety and national defense.
The PCA and random forest based detection method can be used for predicting classification at present, and the random forest is not iterated and tree length is not limited, so that a model is relatively simple and has little possibility of overfitting and high classification accuracy, but the PCA and random forest algorithm is not applied to liquid detection at present.
Disclosure of Invention
In order to solve the problems, the invention provides an electronic tongue detection method and system based on PCA and random forest, which utilizes integrated electrodes in an electronic tongue to collect liquid sample information, takes cross response as a basic principle, combines the PCA and the random forest detection method to analyze data, achieves the aim of detecting a liquid sample, has the functions of automatically collecting, processing and uploading data, greatly reduces the size of the electronic tongue on one hand, and improves the efficiency and the accuracy of liquid component analysis on the other hand.
In order to achieve the above purpose, the invention adopts a technical scheme that:
an electronic tongue detection method based on PCA and random forest is characterized by comprising the following steps:
step 1: collecting a liquid sample by using an integrated electrode of the electronic tongue to obtain response data X;
step 2: carrying out cycle division on the response data X, respectively extracting a plurality of features from the data in each cycle to obtain a new feature data set X ', and carrying out PCA (principal component analysis) dimensionality reduction on the feature data set X' to obtain a single sample data set Y;
and step 3: dividing a plurality of sample data sets into training set samples and test set samples, constructing a random forest classifier by using the training set samples, and evaluating the constructed random forest classifier by using the test set samples;
and 4, step 4: inputting a sample data set corresponding to a liquid sample to be detected into a trained random forest classifier to obtain a plurality of preliminary decision tree classification results of the liquid sample, and obtaining a detection result of the liquid sample by using a majority voting method for the plurality of preliminary decision tree classification results.
Further, step 2 comprises the steps of:
step 21: carrying out cycle division on the response data X, dividing every N data into one cycle, totally dividing the cycle into X/N cycles, and respectively extracting a plurality of features from the data in each cycle to obtain a new feature data set X';
step 22: for m × n dimensional feature data set
Figure RE-GDA0003171443000000021
Carrying out standardization treatment, wherein m is the number of working electrodes in the integrated electrode, and n is the number of features in a plurality of periods;
the matrix elements formed after normalization of the feature data set X' are:
Figure RE-GDA0003171443000000022
in the formula XjColumn vector, X, for feature data set XijFor each element of the characteristic data set X', i 1,2j)、Var(Xj) Respectively, mean and variance of the elements of column j, i.e.
Figure RE-GDA0003171443000000023
Step 23: computing normalized matrices
Figure RE-GDA0003171443000000024
The matrix of correlation coefficients between dimensions R ═ R (R)ij)n×nThe element calculation method comprises the following steps:
Figure RE-GDA0003171443000000025
in the formula
Figure RE-GDA0003171443000000026
For normalized matrix
Figure RE-GDA0003171443000000027
Covariance between ith and jth columns;
step 24: calculating the characteristic value and the characteristic vector, and solving the characteristic value lambda according to a characteristic equation lambda I-R0jJ 1,2, and n, and the feature values are in descending order; for each eigenvalue lambdajFinding its feature vector as ej,j=1,2,...,n;
Step 25: calculating principal component contribution rate and cumulative contribution rate, principal component zjThe contribution rate is
Figure RE-GDA0003171443000000028
The cumulative contribution rate is
Figure RE-GDA0003171443000000029
Generally, a characteristic value lambda with the accumulated contribution rate of 85-95 percent is taken1、λ2、...、λpThe first, second and p (p ≦ n) th principal components, thereby constituting a sample data set Y.
Further, step 3 comprises the steps of:
step 31: randomly dividing a plurality of sample data sets into training set samples and testing set samples by a self-help resampling technology, wherein the training set samples are used for constructing a random forest classifier, and the testing set samples are used for evaluating the effect of the constructed random forest classifier;
step 32: the method comprises the following steps of constructing a decision tree in a random forest classifier by using a kini coefficient:
step 321: for each decision tree in the random forest, randomly selecting t characteristic data from a training set sample according to a certain proportion, and calculating the impurity degree of each characteristic node in the t characteristic nodes:
Figure RE-GDA0003171443000000031
wherein the expression represents the proportion of samples belonging to the class i in the characteristic t;
Step 322: selecting a characteristic node with the maximum impurity degree to start branching, carrying out first segmentation, and dividing the current characteristic node into a plurality of characteristic sub-nodes;
step 323: repeating the step 321 and the step 322 until the current feature node can not be branched any more, namely when the current feature node only contains one attribute class, a complete decision tree is constructed;
step 33: evaluating the constructed random forest classifier by using the test set samples, and if an over-fitting phenomenon or an under-fitting phenomenon occurs, entering step 34;
step 34: and optimizing parameters such as the number of decision trees in the random forest, the maximum depth of the decision trees, the characteristic number, the minimum sample size and the like by using a grid search method, thereby constructing an optimized random forest classifier.
Further, the features extracted in step 21 include a maximum-minimum value, an integrated area value, and a rising maximum slope value.
The invention also provides an electronic tongue detection system based on PCA and random forest, which comprises electronic tongue equipment and an upper computer, wherein the electronic tongue equipment comprises an integrated electrode, a controller, an acquisition module and a wireless module;
the upper computer is provided with a PCA and random forest algorithm program to realize the detection of the liquid sample, and the program comprises the following execution steps:
step 1: carrying out cycle division on the response data X, respectively extracting a plurality of characteristics from the liquid sample in each cycle to obtain a new characteristic data set X ', and carrying out PCA (principal component analysis) dimensionality reduction processing on the characteristic data set X' to obtain a single sample data set Y;
step 2: dividing a plurality of sample data sets into training set samples and test set samples, constructing a random forest classifier by using the training set samples, and evaluating the constructed random forest classifier by using the test set samples;
and step 3: inputting a sample data set corresponding to a liquid sample to be detected into a trained random forest classifier to obtain a plurality of preliminary decision tree classification results of the liquid sample, and obtaining a detection result of the liquid sample by using a majority voting method for the plurality of preliminary decision tree classification results.
Further, in step 2, a decision tree in the random forest classifier is constructed by using the kini coefficient.
Further, the electronic tongue device further comprises a display module, the upper computer sends the detection result of the liquid sample to the electronic tongue device, and the controller is further connected with the display module and used for displaying the detection result of the liquid sample.
The invention has the beneficial effects that: the liquid sample detection method has the advantages that the integrated electrodes in the electronic tongue are used for collecting liquid sample information, cross response is used as a basic principle, PCA is used in an upper computer for carrying out dimensionality reduction on the sample information, and then a random forest classifier is used for carrying out detection, so that the purpose of detecting the liquid sample is achieved, the functions of automatically collecting, processing and uploading data are achieved, on one hand, the size of the electronic tongue is greatly reduced, and on the other hand, the efficiency and the accuracy of liquid component analysis are improved.
Drawings
FIG. 1 is a control diagram of an embodiment of an electronic tongue detection system based on PCA and random forest;
FIG. 2 is a flow chart of an embodiment of a PCA and random forest based electronic tongue detection method;
FIG. 3 is a diagram illustrating an overall structure of the electronic tongue device according to an embodiment;
FIG. 4 is a graph of response data for an integrated electrode of an electronic tongue to collect a liquid sample in one embodiment;
FIG. 5 is a feature data set X formed after feature extraction of response data in one embodiment;
FIG. 6 is a schematic representation of a feature data set PCA dimensionality reduction in one example;
FIG. 7 is a diagram illustrating decision tree construction in a random forest classifier in an example;
FIG. 8 is a schematic diagram of an example of PCA dimensionality reduction for different types of white spirits;
FIG. 9 is a diagram of random forest confusion matrices for different classes of liquor in one example.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
As shown in fig. 1, an electronic tongue detection system based on PCA and random forest includes an electronic tongue device and an upper computer, the electronic tongue device includes an integrated electrode, a controller, an acquisition module, a wireless module and a display module, the controller is connected with the integrated electrode through the acquisition module, the controller utilizes the wireless module to send response data X obtained by acquiring a liquid sample by the integrated electrode to the upper computer, the upper computer sends a detection result of the liquid sample to the electronic tongue device, and the controller is further connected with the display module for displaying the detection result of the liquid sample.
The electronic tongue device in this embodiment is shown in fig. 3, and the integrated electrode in the electronic tongue device has 6 working electrodes, and the working process is as follows: the controller controls the acquisition module to send out a large-frequency pulse signal to the integrated electrode 3 for sample detection, the pulse voltage range is-1V, the pulse step is 20-200 mV, and the pulse time is 0.01 s-1 s; the acquisition module simultaneously acquires the signal data output by the integrated electrode 3 and sends the signal data to the controller, the controller sends the signal data to the computer to run a relevant algorithm for detection, wherein the flow of processing the signal data by the computer is as follows: the computer adopts an algorithm to extract parameters such as extreme values, integral areas or average values and the like to form a characteristic map, namely data compression is carried out; and then, analyzing the compressed characteristic spectrum by adopting PCA and a random forest algorithm to realize the detection of the liquid.
As shown in fig. 2, an electronic tongue detection method based on PCA and random forest includes the following steps:
step 1: collecting a liquid sample by using an integrated electrode of the electronic tongue to obtain response data X, wherein the integrated electrode in the electronic tongue device in the embodiment is provided with 6 working electrodes, and the corresponding response data X is shown in FIG. 4;
step 2: dividing the response data X periodically, and extracting a plurality of features from the data in each period to obtain a new feature data set X ', wherein the feature data set X' is a 6X n-dimensional matrix as shown in FIG. 5, n is the total number of the features in the periods, and 6 is the number of the working electrodes; carrying out PCA dimensionality reduction on the feature data set X' to obtain a single sample data set Y;
and step 3: dividing a plurality of sample data sets into training set samples and test set samples, constructing a random forest classifier by using the training set samples, and evaluating the constructed random forest classifier by using the test set samples;
and 4, step 4: inputting a sample data set corresponding to a liquid sample to be detected into a trained random forest classifier to obtain a plurality of preliminary decision tree classification results of the liquid sample, and obtaining a detection result of the liquid sample by using a majority voting method for the plurality of preliminary decision tree classification results.
Further, as shown in fig. 6, step 2 includes the following steps:
step 21: carrying out cycle division on the response data X, dividing every N data into one cycle, dividing the cycle into X/N cycles, and extracting a plurality of characteristics from the liquid sample in each cycle to obtain a new characteristic data set X'; the features extracted in step 21 include a maximum-minimum value, an integrated area value (AUC), and a rising maximum slope value;
step 22: for m × n dimensional feature data set
Figure RE-GDA0003171443000000051
Carrying out standardization treatment, wherein m is the number of working electrodes in the integrated electrode, and n is the number of features in a plurality of periods;
the matrix elements formed after normalization of the feature data set X' are:
Figure RE-GDA0003171443000000052
in the formula XjColumn vector, X, for feature data set XijFor each element of the feature data set X', i ═ 1,2,...,m,j=1,2,...,n,E(Xj)、Var(Xj) Respectively, mean and variance of the elements of column j, i.e.
Figure RE-GDA0003171443000000053
Step 23: computing normalized matrices
Figure RE-GDA0003171443000000054
The matrix of correlation coefficients between dimensions R ═ R (R)ij)n×nThe element calculation method comprises the following steps:
Figure RE-GDA0003171443000000061
in the formula
Figure RE-GDA0003171443000000062
For normalized matrix
Figure RE-GDA0003171443000000063
Covariance between ith and jth columns;
step 24: calculating the characteristic value and the characteristic vector, and solving the characteristic value lambda according to a characteristic equation lambda I-R0jJ 1,2, and n, and the feature values are in descending order; for each eigenvalue lambdajFinding its feature vector as ej,j=1,2,...,n;
Step 25: calculating principal component contribution rate and cumulative contribution rate, principal component zjThe contribution rate is
Figure RE-GDA0003171443000000064
The cumulative contribution rate is
Figure RE-GDA0003171443000000065
Generally, a characteristic value lambda with the accumulated contribution rate of 85-95 percent is taken1、λ2、...、λpThe first, second and p (p ≦ n) th principal components, thereby constituting a sample data set Y.
Step 3 comprises the following steps:
step 31: randomly dividing a plurality of sample data sets into training set samples and testing set samples by a self-help resampling technology, wherein the training set samples are used for constructing a random forest classifier, and the testing set samples are used for evaluating the effect of the constructed random forest classifier;
step 32: the method comprises the following steps of constructing a decision tree in a random forest classifier by using a kini coefficient:
step 321: for each decision tree in the random forest, randomly selecting t characteristic data from a training set sample according to a certain proportion, and calculating the impurity degree of each characteristic node in the t characteristic nodes:
Figure RE-GDA0003171443000000066
the proportion of samples belonging to the category i in the characteristic t is represented in the formula;
step 322: selecting a characteristic node with the maximum impurity degree to start branching, carrying out first segmentation, and dividing the current characteristic node into a plurality of characteristic sub-nodes;
step 323: repeating the step 321 and the step 322 until the current feature node can not be branched any more, namely when the current feature node only contains one attribute class, a complete decision tree is constructed;
step 33: evaluating the constructed random forest classifier by using the test set samples, and if an over-fitting phenomenon or an under-fitting phenomenon occurs, entering step 34;
step 34: and optimizing parameters such as the number of decision trees in the random forest, the maximum depth of the decision trees, the characteristic number, the minimum sample size and the like by using a grid search method, thereby constructing an optimized random forest classifier.
Fig. 7 is a schematic diagram of decision tree construction in a random forest classifier, where the number of samples is 40 in total and 5 in total, and the kini coefficient is branched from large to small, and as can be seen from fig. 7, the initial judgment result of one decision tree is: 8 samples belong to 1 class, 8 samples belong to 2 classes, 9 samples belong to 3 classes, 7 samples belong to 4 classes, 8 samples belong to 5 classes, the samples are judged to belong to 3 classes, and then a majority voting method is used for a plurality of preliminary decision tree classification results to obtain a final detection result of the liquid sample.
In a specific embodiment, the response data set X is 880 data divided into 11 cycles, 3 features are extracted in each cycle to form a feature data set X with dimensions of 6X33, the liquid sample is divided into 4 series by white spirit, the series 1 is 5 kinds of wine, the white spirit in different years is screened, and the data from #1 to #5 are respectively: excellent in 2006, excellent in 2009, excellent in 2012, excellent in 2015 and excellent in 2018; the series 2 is 5 types of white spirits, from #1 to # 5: the rice wine is specially superior in 2019, lees in 2019, Xiaoqu in 2019 and fragrant in 2019; series 3 is a special grade wine doped with different concentrations in base wine, and the doped concentrations from #1 to #5 are respectively 3%, 6%, 9%, 12% and 15%; series 4; the series 4 is super wine of 5 years continuously, 2016-2020 is respectively adopted from #1 to #5, a sample data set Y is formed after dimensionality reduction is carried out on a characteristic data set X' by PCA, as shown in FIG. 8, and at the moment, different types of wine in each series can be distinguished; as shown in fig. 9, different white spirits can be analyzed and detected by using the random forest classifier, and the accuracy is high.
The invention also provides an electronic tongue detection system based on PCA and random forest, which comprises electronic tongue equipment and an upper computer, wherein the electronic tongue equipment comprises an integrated electrode, a controller, an acquisition module and a wireless module;
the upper computer is provided with a PCA and random forest algorithm program to realize the detection of the liquid sample, and the program comprises the following execution steps:
step 1: carrying out cycle division on the response data X, respectively extracting a plurality of features from the data in each cycle to obtain a new feature data set X ', and carrying out PCA (principal component analysis) dimensionality reduction on the feature data set X' to obtain a single sample data set Y;
step 2: dividing a plurality of sample data sets into training set samples and test set samples, constructing a random forest classifier by using the training set samples, and evaluating the constructed random forest classifier by using the test set samples;
and step 3: inputting a sample data set corresponding to a liquid sample to be detected into a trained random forest classifier to obtain a plurality of preliminary decision tree classification results of the liquid sample, and obtaining a detection result of the liquid sample by using a majority voting method for the plurality of preliminary decision tree classification results.
Further, in step 2, a decision tree in the random forest classifier is constructed by using the kini coefficient.
Further, the electronic tongue device further comprises a display module, the upper computer sends the detection result of the liquid sample to the electronic tongue device, and the controller is further connected with the display module and used for displaying the detection result of the liquid sample.
The above description is only a few of the preferred embodiments of the present application and is not intended to limit the present application, which may be modified and varied by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (7)

1. An electronic tongue detection method based on PCA and random forest is characterized by comprising the following steps:
step 1: collecting a liquid sample by using an integrated electrode of the electronic tongue to obtain response data X;
step 2: carrying out cycle division on the response data X, respectively extracting a plurality of features from the data in each cycle to obtain a new feature data set X ', and carrying out PCA (principal component analysis) dimensionality reduction on the feature data set X' to obtain a single sample data set Y;
and step 3: dividing a plurality of sample data sets into training set samples and test set samples, constructing a random forest classifier by using the training set samples, and evaluating the constructed random forest classifier by using the test set samples;
and 4, step 4: inputting a sample data set corresponding to a liquid sample to be detected into a trained random forest classifier to obtain a plurality of preliminary decision tree classification results of the liquid sample, and obtaining a detection result of the liquid sample by using a majority voting method for the plurality of preliminary decision tree classification results.
2. An electronic tongue detection method based on PCA and random forest as claimed in claim 1 wherein step 2 comprises the steps of:
step 21: carrying out cycle division on the response data X, dividing every N data into one cycle, totally dividing the cycle into X/N cycles, and respectively extracting a plurality of features from the data in each cycle to obtain a new feature data set X';
step 22: for m × n dimensional feature data set
Figure FDA0003068883580000011
Carrying out standardization treatment, wherein m is the number of working electrodes in the integrated electrode, and n is the number of features in a plurality of periods;
matrix formed by standardizing characteristic data set X
Figure FDA0003068883580000012
The elements are as follows:
Figure FDA0003068883580000013
in the formula XjColumn vector, X, for feature data set XijFor each element of the characteristic data set X', i 1,2j)、Var(Xj) Respectively, mean and variance of the elements of column j, i.e.
Figure FDA0003068883580000014
Step 23: computing normalized matrices
Figure FDA0003068883580000015
The matrix of correlation coefficients between dimensions R ═ R (R)ij)n×nThe element calculation method comprises the following steps:
Figure FDA0003068883580000016
in the formula
Figure FDA0003068883580000017
For normalized matrix
Figure FDA0003068883580000018
Covariance between ith and jth columns;
step 24: calculating the characteristic value and the characteristic vector, and solving the characteristic value lambda according to a characteristic equation lambda I-R0jJ 1,2, and n, and the feature values are in descending order; for each eigenvalue lambdajFinding its feature vector as ej,j=1,2,...,n;
Step 25: calculating principal component contribution rate and cumulative contribution rate, principal component zjThe contribution rate is
Figure FDA0003068883580000021
The cumulative contribution rate is
Figure FDA0003068883580000022
Generally, a characteristic value lambda with the accumulated contribution rate of 85-95 percent is taken1、λ2、...、λpThe first, second and p (p ≦ n) th principal components, thereby constituting a sample data set Y.
3. An electronic tongue detection method based on PCA and random forest as claimed in claim 2 wherein step 3 comprises the steps of:
step 31: randomly dividing a plurality of sample data sets into training set samples and testing set samples by a self-help resampling technology, wherein the training set samples are used for constructing a random forest classifier, and the testing set samples are used for evaluating the effect of the constructed random forest classifier;
step 32: the method comprises the following steps of constructing a decision tree in a random forest classifier by using a kini coefficient:
step 321: for each decision tree in the random forest, randomly selecting t characteristic data from the training set samples according to a certain proportion, and calculating each characteristic in t characteristic nodesImpurity degree of the node:
Figure FDA0003068883580000023
the proportion of samples belonging to the category i in the characteristic t is represented in the formula;
step 322: selecting a characteristic node with the maximum impurity degree to start branching, carrying out first segmentation, and dividing the current characteristic node into a plurality of characteristic sub-nodes;
step 323: repeating the step 321 and the step 322 until the current feature node can not be branched any more, namely when the current feature node only contains one attribute class, a complete decision tree is constructed;
step 33: evaluating the constructed random forest classifier by using the test set samples, and if an over-fitting phenomenon or an under-fitting phenomenon occurs, entering step 34;
step 34: and optimizing parameters such as the number of decision trees in the random forest, the maximum depth of the decision trees, the characteristic number, the minimum sample size and the like by using a grid search method, thereby constructing an optimized random forest classifier.
4. An electronic tongue detection method based on PCA and random forests as claimed in claim 2 wherein the features extracted in step 21 include maximum and minimum values, integrated area values and rising maximum slope values.
5. The PCA and random forest based electronic tongue detection system as claimed in claim 1, wherein the electronic tongue detection system comprises electronic tongue equipment and an upper computer, the electronic tongue equipment comprises an integrated electrode, a controller, a collection module and a wireless module, the controller is connected with the integrated electrode through the collection module, and the controller utilizes the wireless module to send response data X obtained by collecting a liquid sample by the integrated electrode to the upper computer;
the upper computer is provided with a PCA and random forest algorithm program to realize the detection of the liquid sample, and the program comprises the following execution steps:
step 1: carrying out cycle division on the response data X, respectively extracting a plurality of characteristics from the liquid sample in each cycle to obtain a new characteristic data set X ', and carrying out PCA (principal component analysis) dimensionality reduction processing on the characteristic data set X' to obtain a single sample data set Y;
step 2: dividing a plurality of sample data sets into training set samples and test set samples, constructing a random forest classifier by using the training set samples, and evaluating the constructed random forest classifier by using the test set samples;
and step 3: inputting a sample data set corresponding to a liquid sample to be detected into a trained random forest classifier to obtain a plurality of preliminary decision tree classification results of the liquid sample, and obtaining a detection result of the liquid sample by using a majority voting method for the plurality of preliminary decision tree classification results.
6. An electronic tongue detection system based on PCA and random forest as claimed in claim 5 wherein step 2 is to use the kini coefficients to construct decision trees in the random forest classifier.
7. The PCA and random forest based electronic tongue detection system as claimed in claim 5, wherein the electronic tongue device further comprises a display module, the upper computer sends the detection result of the liquid sample to the electronic tongue device, and the controller is further connected with the display module for displaying the detection result of the liquid sample.
CN202110533466.8A 2021-05-17 2021-05-17 Electronic tongue detection method and system based on PCA and random forest Withdrawn CN113313150A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110533466.8A CN113313150A (en) 2021-05-17 2021-05-17 Electronic tongue detection method and system based on PCA and random forest

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110533466.8A CN113313150A (en) 2021-05-17 2021-05-17 Electronic tongue detection method and system based on PCA and random forest

Publications (1)

Publication Number Publication Date
CN113313150A true CN113313150A (en) 2021-08-27

Family

ID=77373459

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110533466.8A Withdrawn CN113313150A (en) 2021-05-17 2021-05-17 Electronic tongue detection method and system based on PCA and random forest

Country Status (1)

Country Link
CN (1) CN113313150A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114418041A (en) * 2022-03-31 2022-04-29 合肥工业大学 Electronic tongue liquor detection method based on IG-HSIC-SVM
CN114636736A (en) * 2021-11-08 2022-06-17 滁州怡然传感技术研究院有限公司 Electronic tongue white spirit detection method based on AIF-1DCNN
CN114925756A (en) * 2022-05-07 2022-08-19 上海燕龙基再生资源利用有限公司 Waste glass classified recovery method and device based on fine management

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114636736A (en) * 2021-11-08 2022-06-17 滁州怡然传感技术研究院有限公司 Electronic tongue white spirit detection method based on AIF-1DCNN
CN114418041A (en) * 2022-03-31 2022-04-29 合肥工业大学 Electronic tongue liquor detection method based on IG-HSIC-SVM
CN114418041B (en) * 2022-03-31 2022-06-21 合肥工业大学 Electronic tongue white spirit detection method based on IG-HSIC-SVM
CN114925756A (en) * 2022-05-07 2022-08-19 上海燕龙基再生资源利用有限公司 Waste glass classified recovery method and device based on fine management
CN114925756B (en) * 2022-05-07 2022-11-11 上海燕龙基再生资源利用有限公司 Waste glass classified recovery method and device based on fine management

Similar Documents

Publication Publication Date Title
CN113313150A (en) Electronic tongue detection method and system based on PCA and random forest
Souza et al. Challenges in benchmarking stream learning algorithms with real-world data
CN111311128A (en) Consumption financial credit scoring card development method based on third-party data
CN112580749B (en) Intelligent fire detection method based on machine olfaction technology
CN103499609B (en) A kind of method that honey fragrance intelligence sense of smell dynamic response feature and differentiation information dynamic characterization are studied
CN104849321B (en) A kind of method based on smell finger-print quick detection Quality Parameters in Orange
CN108717548B (en) Behavior recognition model updating method and system for dynamic increase of sensors
Shvydka et al. Optimum sample size to estimate mean parasite abundance in fish parasite surveys
CN110794090A (en) Emotion electronic nose implementation method
CN114113471A (en) Method and system for detecting food freshness of artificial nose refrigerator based on machine learning
Ibrahim et al. Palm leaf nutrient deficiency detection using convolutional neural network (CNN)
CN117172430B (en) Deep learning-based water body environment assessment and prediction method and system
Pratiwi et al. The application of graphology and enneagram techniques in determining personality type based on handwriting features
CN101950334A (en) Information system danger sense method and system based on computer immunity
CN117789038A (en) Training method of data processing and recognition model based on machine learning
Pratiwi et al. Personality type assessment system by using enneagram-graphology techniques on digital handwriting
Wang et al. The recognition of different odor using convolutional neural networks extracted from time and temperature features
CN117371604A (en) Agricultural production prediction method and system based on intelligent perception
CN110096708A (en) A kind of determining method and device of calibration collection
CN113378935B (en) Intelligent olfactory sensation identification method for gas
Abas et al. Agarwood oil quality classifier using machine learning
Liu et al. Convenient and accurate method for the identification of Chinese teas by an electronic nose
CN112487991B (en) High-precision load identification method and system based on characteristic self-learning
CN101884045A (en) Mixed statistical and numerical model for sensor array detection and classification
CN110622042A (en) Analysis device, stratum generation device, analysis method, stratum generation method, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20210827