CN113313150A

CN113313150A - Electronic tongue detection method and system based on PCA and random forest

Info

Publication number: CN113313150A
Application number: CN202110533466.8A
Authority: CN
Inventors: 章伟; 朱亚龙; 刘嘉明; 朱晓龙
Original assignee: Nanjing Yideguan Electronic Technology Co ltd
Current assignee: Nanjing Yideguan Electronic Technology Co ltd
Priority date: 2021-05-17
Filing date: 2021-05-17
Publication date: 2021-08-27

Abstract

The invention discloses an electronic tongue detection method and system based on PCA and random forest, comprising the following steps: collecting a liquid sample by using an integrated electrode of the electronic tongue to obtain response data X; step 2: carrying out cycle division on the response data X, respectively extracting a plurality of features from the data in each cycle to obtain a new feature data set X ', and carrying out PCA (principal component analysis) dimensionality reduction on the feature data set X' to obtain a single sample data set Y; and step 3: dividing a plurality of sample data sets into training set samples and test set samples, constructing a random forest classifier by using the training set samples, and evaluating the constructed random forest classifier by using the test set samples; and 4, step 4: the method has the advantages that the sample data set corresponding to the liquid sample to be detected is input to the trained random forest classifier, the detection result of the liquid sample is obtained, the functions of automatic acquisition, processing and data uploading are achieved, on one hand, the size of the electronic tongue is greatly reduced, and on the other hand, the efficiency and the accuracy of liquid component analysis are improved.

Description

Electronic tongue detection method and system based on PCA and random forest

Technical Field

The invention relates to the field of electronic tongues, in particular to a PCA (principal component analysis) and random forest based electronic tongue detection method and system.

Background

People eat food as days, and with the rapid development of Chinese economy, the requirement of people on food taste is higher and higher. However, at present, the research on taste in the food industry is mainly based on artificial evaluation, the testing process consumes a great amount of manpower and time, and the testing result is subjective and can not be repeated. Meanwhile, the ordinary families also lack a means for quickly detecting the true and false food and toxic and harmful substances. The analysis of liquid components is also widely applied to the fields of industrial production, agricultural production control, medicine, safety and national defense.

The PCA and random forest based detection method can be used for predicting classification at present, and the random forest is not iterated and tree length is not limited, so that a model is relatively simple and has little possibility of overfitting and high classification accuracy, but the PCA and random forest algorithm is not applied to liquid detection at present.

Disclosure of Invention

In order to solve the problems, the invention provides an electronic tongue detection method and system based on PCA and random forest, which utilizes integrated electrodes in an electronic tongue to collect liquid sample information, takes cross response as a basic principle, combines the PCA and the random forest detection method to analyze data, achieves the aim of detecting a liquid sample, has the functions of automatically collecting, processing and uploading data, greatly reduces the size of the electronic tongue on one hand, and improves the efficiency and the accuracy of liquid component analysis on the other hand.

In order to achieve the above purpose, the invention adopts a technical scheme that:

an electronic tongue detection method based on PCA and random forest is characterized by comprising the following steps:

step 1: collecting a liquid sample by using an integrated electrode of the electronic tongue to obtain response data X;

step 2: carrying out cycle division on the response data X, respectively extracting a plurality of features from the data in each cycle to obtain a new feature data set X ', and carrying out PCA (principal component analysis) dimensionality reduction on the feature data set X' to obtain a single sample data set Y;

and step 3: dividing a plurality of sample data sets into training set samples and test set samples, constructing a random forest classifier by using the training set samples, and evaluating the constructed random forest classifier by using the test set samples;

and 4, step 4: inputting a sample data set corresponding to a liquid sample to be detected into a trained random forest classifier to obtain a plurality of preliminary decision tree classification results of the liquid sample, and obtaining a detection result of the liquid sample by using a majority voting method for the plurality of preliminary decision tree classification results.

Further, step 2 comprises the steps of:

step 21: carrying out cycle division on the response data X, dividing every N data into one cycle, totally dividing the cycle into X/N cycles, and respectively extracting a plurality of features from the data in each cycle to obtain a new feature data set X';

step 22: for m × n dimensional feature data set

Carrying out standardization treatment, wherein m is the number of working electrodes in the integrated electrode, and n is the number of features in a plurality of periods;

the matrix elements formed after normalization of the feature data set X' are:

in the formula X_jColumn vector, X, for feature data set X_ijFor each element of the characteristic data set X', i 1,2_j)、Var(X_j) Respectively, mean and variance of the elements of column j, i.e.

Step 23: computing normalized matrices

The matrix of correlation coefficients between dimensions R ═ R (R)_ij)_n×nThe element calculation method comprises the following steps:

in the formula

For normalized matrix

Covariance between ith and jth columns;

step 24: calculating the characteristic value and the characteristic vector, and solving the characteristic value lambda according to a characteristic equation lambda I-R0_jJ 1,2, and n, and the feature values are in descending order; for each eigenvalue lambda_jFinding its feature vector as e_j，j＝1,2,...,n；

Step 25: calculating principal component contribution rate and cumulative contribution rate, principal component z_jThe contribution rate is

The cumulative contribution rate is

Generally, a characteristic value lambda with the accumulated contribution rate of 85-95 percent is taken₁、λ₂、...、λ_pThe first, second and p (p ≦ n) th principal components, thereby constituting a sample data set Y.

Further, step 3 comprises the steps of:

step 31: randomly dividing a plurality of sample data sets into training set samples and testing set samples by a self-help resampling technology, wherein the training set samples are used for constructing a random forest classifier, and the testing set samples are used for evaluating the effect of the constructed random forest classifier;

step 32: the method comprises the following steps of constructing a decision tree in a random forest classifier by using a kini coefficient:

step 321: for each decision tree in the random forest, randomly selecting t characteristic data from a training set sample according to a certain proportion, and calculating the impurity degree of each characteristic node in the t characteristic nodes:

wherein the expression represents the proportion of samples belonging to the class i in the characteristic t；

Step 322: selecting a characteristic node with the maximum impurity degree to start branching, carrying out first segmentation, and dividing the current characteristic node into a plurality of characteristic sub-nodes;

step 323: repeating the step 321 and the step 322 until the current feature node can not be branched any more, namely when the current feature node only contains one attribute class, a complete decision tree is constructed;

step 33: evaluating the constructed random forest classifier by using the test set samples, and if an over-fitting phenomenon or an under-fitting phenomenon occurs, entering step 34;

step 34: and optimizing parameters such as the number of decision trees in the random forest, the maximum depth of the decision trees, the characteristic number, the minimum sample size and the like by using a grid search method, thereby constructing an optimized random forest classifier.

Further, the features extracted in step 21 include a maximum-minimum value, an integrated area value, and a rising maximum slope value.

The invention also provides an electronic tongue detection system based on PCA and random forest, which comprises electronic tongue equipment and an upper computer, wherein the electronic tongue equipment comprises an integrated electrode, a controller, an acquisition module and a wireless module;

the upper computer is provided with a PCA and random forest algorithm program to realize the detection of the liquid sample, and the program comprises the following execution steps:

step 1: carrying out cycle division on the response data X, respectively extracting a plurality of characteristics from the liquid sample in each cycle to obtain a new characteristic data set X ', and carrying out PCA (principal component analysis) dimensionality reduction processing on the characteristic data set X' to obtain a single sample data set Y;

step 2: dividing a plurality of sample data sets into training set samples and test set samples, constructing a random forest classifier by using the training set samples, and evaluating the constructed random forest classifier by using the test set samples;

and step 3: inputting a sample data set corresponding to a liquid sample to be detected into a trained random forest classifier to obtain a plurality of preliminary decision tree classification results of the liquid sample, and obtaining a detection result of the liquid sample by using a majority voting method for the plurality of preliminary decision tree classification results.

Further, in step 2, a decision tree in the random forest classifier is constructed by using the kini coefficient.

Further, the electronic tongue device further comprises a display module, the upper computer sends the detection result of the liquid sample to the electronic tongue device, and the controller is further connected with the display module and used for displaying the detection result of the liquid sample.

The invention has the beneficial effects that: the liquid sample detection method has the advantages that the integrated electrodes in the electronic tongue are used for collecting liquid sample information, cross response is used as a basic principle, PCA is used in an upper computer for carrying out dimensionality reduction on the sample information, and then a random forest classifier is used for carrying out detection, so that the purpose of detecting the liquid sample is achieved, the functions of automatically collecting, processing and uploading data are achieved, on one hand, the size of the electronic tongue is greatly reduced, and on the other hand, the efficiency and the accuracy of liquid component analysis are improved.

Drawings

FIG. 1 is a control diagram of an embodiment of an electronic tongue detection system based on PCA and random forest;

FIG. 2 is a flow chart of an embodiment of a PCA and random forest based electronic tongue detection method;

FIG. 3 is a diagram illustrating an overall structure of the electronic tongue device according to an embodiment;

FIG. 4 is a graph of response data for an integrated electrode of an electronic tongue to collect a liquid sample in one embodiment;

FIG. 5 is a feature data set X formed after feature extraction of response data in one embodiment;

FIG. 6 is a schematic representation of a feature data set PCA dimensionality reduction in one example;

FIG. 7 is a diagram illustrating decision tree construction in a random forest classifier in an example;

FIG. 8 is a schematic diagram of an example of PCA dimensionality reduction for different types of white spirits;

FIG. 9 is a diagram of random forest confusion matrices for different classes of liquor in one example.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.

As shown in fig. 1, an electronic tongue detection system based on PCA and random forest includes an electronic tongue device and an upper computer, the electronic tongue device includes an integrated electrode, a controller, an acquisition module, a wireless module and a display module, the controller is connected with the integrated electrode through the acquisition module, the controller utilizes the wireless module to send response data X obtained by acquiring a liquid sample by the integrated electrode to the upper computer, the upper computer sends a detection result of the liquid sample to the electronic tongue device, and the controller is further connected with the display module for displaying the detection result of the liquid sample.

The electronic tongue device in this embodiment is shown in fig. 3, and the integrated electrode in the electronic tongue device has 6 working electrodes, and the working process is as follows: the controller controls the acquisition module to send out a large-frequency pulse signal to the integrated electrode 3 for sample detection, the pulse voltage range is-1V, the pulse step is 20-200 mV, and the pulse time is 0.01 s-1 s; the acquisition module simultaneously acquires the signal data output by the integrated electrode 3 and sends the signal data to the controller, the controller sends the signal data to the computer to run a relevant algorithm for detection, wherein the flow of processing the signal data by the computer is as follows: the computer adopts an algorithm to extract parameters such as extreme values, integral areas or average values and the like to form a characteristic map, namely data compression is carried out; and then, analyzing the compressed characteristic spectrum by adopting PCA and a random forest algorithm to realize the detection of the liquid.

As shown in fig. 2, an electronic tongue detection method based on PCA and random forest includes the following steps:

step 1: collecting a liquid sample by using an integrated electrode of the electronic tongue to obtain response data X, wherein the integrated electrode in the electronic tongue device in the embodiment is provided with 6 working electrodes, and the corresponding response data X is shown in FIG. 4;

step 2: dividing the response data X periodically, and extracting a plurality of features from the data in each period to obtain a new feature data set X ', wherein the feature data set X' is a 6X n-dimensional matrix as shown in FIG. 5, n is the total number of the features in the periods, and 6 is the number of the working electrodes; carrying out PCA dimensionality reduction on the feature data set X' to obtain a single sample data set Y;

Further, as shown in fig. 6, step 2 includes the following steps:

step 21: carrying out cycle division on the response data X, dividing every N data into one cycle, dividing the cycle into X/N cycles, and extracting a plurality of characteristics from the liquid sample in each cycle to obtain a new characteristic data set X'; the features extracted in step 21 include a maximum-minimum value, an integrated area value (AUC), and a rising maximum slope value;

step 22: for m × n dimensional feature data set

the matrix elements formed after normalization of the feature data set X' are:

in the formula X_jColumn vector, X, for feature data set X_ijFor each element of the feature data set X', i ═ 1,2,...,m，j＝1,2,...,n，E(X_j)、Var(X_j) Respectively, mean and variance of the elements of column j, i.e.

Step 23: computing normalized matrices

in the formula

For normalized matrix

Covariance between ith and jth columns;

The cumulative contribution rate is

Step 3 comprises the following steps:

the proportion of samples belonging to the category i in the characteristic t is represented in the formula;

Fig. 7 is a schematic diagram of decision tree construction in a random forest classifier, where the number of samples is 40 in total and 5 in total, and the kini coefficient is branched from large to small, and as can be seen from fig. 7, the initial judgment result of one decision tree is: 8 samples belong to 1 class, 8 samples belong to 2 classes, 9 samples belong to 3 classes, 7 samples belong to 4 classes, 8 samples belong to 5 classes, the samples are judged to belong to 3 classes, and then a majority voting method is used for a plurality of preliminary decision tree classification results to obtain a final detection result of the liquid sample.

In a specific embodiment, the response data set X is 880 data divided into 11 cycles, 3 features are extracted in each cycle to form a feature data set X with dimensions of 6X33, the liquid sample is divided into 4 series by white spirit, the series 1 is 5 kinds of wine, the white spirit in different years is screened, and the data from #1 to #5 are respectively: excellent in 2006, excellent in 2009, excellent in 2012, excellent in 2015 and excellent in 2018; the series 2 is 5 types of white spirits, from #1 to # 5: the rice wine is specially superior in 2019, lees in 2019, Xiaoqu in 2019 and fragrant in 2019; series 3 is a special grade wine doped with different concentrations in base wine, and the doped concentrations from #1 to #5 are respectively 3%, 6%, 9%, 12% and 15%; series 4; the series 4 is super wine of 5 years continuously, 2016-2020 is respectively adopted from #1 to #5, a sample data set Y is formed after dimensionality reduction is carried out on a characteristic data set X' by PCA, as shown in FIG. 8, and at the moment, different types of wine in each series can be distinguished; as shown in fig. 9, different white spirits can be analyzed and detected by using the random forest classifier, and the accuracy is high.

step 1: carrying out cycle division on the response data X, respectively extracting a plurality of features from the data in each cycle to obtain a new feature data set X ', and carrying out PCA (principal component analysis) dimensionality reduction on the feature data set X' to obtain a single sample data set Y;

The above description is only a few of the preferred embodiments of the present application and is not intended to limit the present application, which may be modified and varied by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. An electronic tongue detection method based on PCA and random forest is characterized by comprising the following steps:

2. An electronic tongue detection method based on PCA and random forest as claimed in claim 1 wherein step 2 comprises the steps of:

step 22: for m × n dimensional feature data set

matrix formed by standardizing characteristic data set X

The elements are as follows:

Step 23: computing normalized matrices

in the formula

For normalized matrix

Covariance between ith and jth columns;

The cumulative contribution rate is

3. An electronic tongue detection method based on PCA and random forest as claimed in claim 2 wherein step 3 comprises the steps of:

step 321: for each decision tree in the random forest, randomly selecting t characteristic data from the training set samples according to a certain proportion, and calculating each characteristic in t characteristic nodesImpurity degree of the node:

4. An electronic tongue detection method based on PCA and random forests as claimed in claim 2 wherein the features extracted in step 21 include maximum and minimum values, integrated area values and rising maximum slope values.

5. The PCA and random forest based electronic tongue detection system as claimed in claim 1, wherein the electronic tongue detection system comprises electronic tongue equipment and an upper computer, the electronic tongue equipment comprises an integrated electrode, a controller, a collection module and a wireless module, the controller is connected with the integrated electrode through the collection module, and the controller utilizes the wireless module to send response data X obtained by collecting a liquid sample by the integrated electrode to the upper computer;

6. An electronic tongue detection system based on PCA and random forest as claimed in claim 5 wherein step 2 is to use the kini coefficients to construct decision trees in the random forest classifier.

7. The PCA and random forest based electronic tongue detection system as claimed in claim 5, wherein the electronic tongue device further comprises a display module, the upper computer sends the detection result of the liquid sample to the electronic tongue device, and the controller is further connected with the display module for displaying the detection result of the liquid sample.