CN112613536A - Near infrared spectrum diesel grade identification method based on SMOTE and deep learning - Google Patents
Near infrared spectrum diesel grade identification method based on SMOTE and deep learning Download PDFInfo
- Publication number
- CN112613536A CN112613536A CN202011443096.0A CN202011443096A CN112613536A CN 112613536 A CN112613536 A CN 112613536A CN 202011443096 A CN202011443096 A CN 202011443096A CN 112613536 A CN112613536 A CN 112613536A
- Authority
- CN
- China
- Prior art keywords
- diesel
- sample
- smote
- samples
- near infrared
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000002329 infrared spectrum Methods 0.000 title claims abstract description 29
- 238000013135 deep learning Methods 0.000 title claims abstract description 20
- 238000012549 training Methods 0.000 claims abstract description 31
- 238000012360 testing method Methods 0.000 claims abstract description 23
- 238000012545 processing Methods 0.000 claims abstract description 20
- 239000011159 matrix material Substances 0.000 claims abstract description 17
- 238000013528 artificial neural network Methods 0.000 claims abstract description 15
- 238000009826 distribution Methods 0.000 claims abstract description 11
- 238000013145 classification model Methods 0.000 claims abstract description 9
- 238000013507 mapping Methods 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 26
- 238000011176 pooling Methods 0.000 claims description 21
- 239000002283 diesel fuel Substances 0.000 claims description 19
- 230000004913 activation Effects 0.000 claims description 10
- 238000010586 diagram Methods 0.000 claims description 9
- 210000002569 neuron Anatomy 0.000 claims description 9
- 238000005070 sampling Methods 0.000 claims description 9
- 238000011156 evaluation Methods 0.000 claims description 6
- 238000003062 neural network model Methods 0.000 claims description 6
- 238000013527 convolutional neural network Methods 0.000 claims description 5
- 238000001228 spectrum Methods 0.000 claims description 5
- 230000008569 process Effects 0.000 claims description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 238000009833 condensation Methods 0.000 claims description 3
- 230000005494 condensation Effects 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 230000009849 deactivation Effects 0.000 claims description 2
- 230000009466 transformation Effects 0.000 claims description 2
- 238000000844 transformation Methods 0.000 claims description 2
- 238000007781 pre-processing Methods 0.000 abstract description 3
- 238000004497 NIR spectroscopy Methods 0.000 description 18
- 238000001514 detection method Methods 0.000 description 6
- 238000010276 construction Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000009467 reduction Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000002790 cross-validation Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 239000003208 petroleum Substances 0.000 description 2
- 238000004451 qualitative analysis Methods 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- -1 O-H Chemical class 0.000 description 1
- 241001122767 Theaceae Species 0.000 description 1
- 238000010521 absorption reaction Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000001066 destructive effect Effects 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 229930195733 hydrocarbon Natural products 0.000 description 1
- 150000002430 hydrocarbons Chemical class 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000002779 inactivation Effects 0.000 description 1
- 238000011031 large-scale manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24147—Distances to closest patterns, e.g. nearest neighbour classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
Abstract
The invention discloses a near infrared spectrum diesel grade identification method based on SMOTE and deep learning, which comprises the following steps: step 1, drawing a diesel near-infrared spectrogram, analyzing distribution conditions of different grades of diesel, performing attribute mapping on grade labels, and taking the different grades of the diesel as a sample set; step 2, carrying out data equalization processing on the sample set by adopting SMOTE, and dividing the sample set into a training set sample and a test set sample; step 3, constructing a near infrared spectrum classification model of the one-dimensional deep convolution neural network by using the training set sample; and step 4, bringing the test set samples into the established model to obtain the diesel grade identification result, drawing a multi-classification confusion matrix, and analyzing the identification rate of each class. The invention does not need a large amount of preprocessing, can improve the accuracy of classification and identification, and can improve the identification rate of a few samples.
Description
Technical Field
The invention relates to the field of near infrared spectrum, in particular to a near infrared spectrum diesel grade identification method based on SMOTE and deep learning.
Background
Due to high energy density, low oil consumption and low price, petroleum derived diesel still dominates the market. Improving the quality and detection precision of diesel oil to meet the changing demand of the diesel oil market is still one of the main directions of the development of the global petroleum industry in the future. According to the GB/T1.1-2020 standard, the commercial diesel oil can be divided into 6 grades of 5#, 0#, -20#, -35# and-50 # according to the difference of condensation points. The lower the diesel grade, the less likely it is to form wax and the higher the price is relatively. In order to gain benefits, illegal manufacturers often have behaviors of adulteration of diesel oil and label disordering, and selling illegal oil products can not only damage engines, but also increase pollution emission and even harm personal safety. Therefore, the diesel brand can be quickly and accurately identified, so that convenience is provided for a supervision department to master accurate and timely detection data, and the method has important significance for guaranteeing the rights and benefits of consumers and the safety of lives.
The grade of diesel oil is only identified from the aspects of color, hand feeling, smell and the like, and although the grade is usually used in daily life, the grade is undoubtedly a work which is time-consuming, labor-consuming and highly subjective, and is not suitable for large-scale production detection. Near infrared spectroscopy (NIRS), a fast, green, low cost, easy to operate, non-destructive technique, has been used in many cases in the petrochemical field. NIRS of diesel fuel involves characteristic absorption of various hydrocarbons (such as O-H, C-H and N-H) in a complex mixture, and accurate identification of diesel fuel brands is extremely difficult and requires computer-aided detection. At present, the commonly used auxiliary models comprise a partial least square method, a support vector machine, an artificial neural network and the like, and because the NIRS has a wide spectrum range, weak useful information intensity, more noise interference and serious spectrum peak overlap, the traditional machine learning methods have to combine a large amount of pre-processing of denoising, feature extraction, dimension reduction and the like to obtain a faster detection speed and a more accurate prediction result, so that not only is the workload increased invisibly, but also the applicability of the models and the accuracy of prediction need to be improved urgently.
Deep learning is a deep network, which is a new research direction in the field of machine learning, and in recent years, development in a plurality of application fields such as image processing, speech recognition, machine translation, and the like, such as tea. The DCNN is the most widely applied deep learning model, can autonomously extract effective features from complex data and reduce dimensionality, and has stronger expression capability compared with the traditional shallow model, but has less processing capacity for one-dimensional NIRS because the DCNN is mainly used for processing two-dimensional or three-dimensional images.
Disclosure of Invention
The invention aims to provide a near infrared spectrum diesel grade identification method based on SMOTE and deep learning, which can improve the accuracy of classification identification and improve the identification rate of a few samples.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
a near infrared spectrum diesel grade identification method based on SMOTE and deep learning comprises the following steps:
and step 4, bringing the test set samples into the established model to obtain the diesel grade identification result, drawing a multi-classification confusion matrix, and analyzing the identification rate of each class.
The technical scheme of the invention is further improved as follows: the method comprises the steps of drawing a near infrared spectrum image by using a sample set of diesel oil, dividing sample grades into 5 types including-10 #, -20#, -35#, -50# and interference according to the condensation point of the diesel oil, respectively mapping attributes to be types 1, 2, 3, 4 and 0.
The technical scheme of the invention is further improved as follows: the specific process of performing data equalization processing on the sample set by using SMOTE is as follows:
1): firstly, calculating the Euclidean distance from each sample x in the minority class to all samples in the minority class sample set to obtain k neighbor of each sample x;
2): setting a sampling ratio according to the sample imbalance ratio to determine a sampling multiplying factor N, and randomly selecting a plurality of samples from k neighbors of each sample x of a minority class, wherein the selected neighbors are assumed to be xn;
3): for each xnRespectively according to the formula<1>Constructing a new sample;
xnew=x+rand(0,1)×|x-xn|,new∈1,2,…,N <1>
4): finally, repeating the steps for N times to synthesize N new samples; if the rare class has a total of T samples, NT new samples can be synthesized.
The technical scheme of the invention is further improved as follows: the specific method for constructing the near infrared spectrum classification model of the one-dimensional depth convolution neural network comprises the following steps:
performing some transformations on the one-dimensional near infrared spectrum data to enable the input signal to meet the requirements of a convolutional neural network; regarding the one-dimensional near infrared spectrum as a special set of two-dimensional images only comprising one row or one column, performing corresponding dimension expansion on a spectrum signal, and converting a category label into a form of one-hot coding; and constructing a one-dimensional depth convolution neural network model by referring to LetNet-5, wherein the one-dimensional depth convolution neural network model comprises an input layer, two convolution layers, two pooling layers, two full-connection layers and an output layer.
The technical scheme of the invention is further improved as follows: the convolution layer is composed of a group of convolution kernels with trainable parameters, the size of the convolution kernels is set to be m multiplied by 1, and the convolution operation of the one-dimensional signal is shown in a formula <2 >:
wherein l is the current convolution layer, l-1 is the (l-1) th convolution layer, and xiAnd yjRespectively representing the ith input feature map and the jth output feature map, and a convolution operator, omegaijRepresents the convolution kernel, b is the bias, and f (.) is the operation of the activation function.
The technical scheme of the invention is further improved as follows: an activation function PReLU is introduced into the convolutional layer, and the expression of the function is shown in <3 >.
The technical scheme of the invention is further improved as follows: the operation of the pooling layer is shown in formula <4 >:
wherein l represents the current pooling layer, l-1 represents the (l-1) th pooling layer, yjIs the jth output characteristic diagram, beta is a multiplicative bias term, and b is bias;
the pooling method is a maximum pooling method, and a sampling method of the maximum pooling method is calculated according to a formula <5 >:
in the formula, one feature map obtained by the convolution layer is divided into a plurality of regions Xk,k∈1,2,…,K。
The technical scheme of the invention is further improved as follows: the fully-connected layer comprises a Flatten layer and two Dense layers, the activation function of the last Dense layer is Softmax, and a certain proportion of random deactivation is added into the fully-connected layer, wherein the fully-connected layer is calculated according to the formula <6 >:
hω,b(x)=f(ωTx+b) <6>
where ω is the weight of the neuron, b is the bias, T is the transpose, and h (x) is the output of the neuron.
The technical scheme of the invention is further improved as follows: after the model is constructed, a training method is configured; the configured training method comprises a loss function, an optimizer and an evaluation index, wherein the loss function is specifically a cross entropy loss function, the formula of the loss function is shown as <7>, the optimizer adopts Adam optimization, and the evaluation index is accuracy A, and is shown as a formula <8 >:
in the formula, niThe number of predicted samples is the same as the number of actual samples, and n is the total number of samples.
The technical scheme of the invention is further improved as follows: on the basis of the constructed one-dimensional depth convolution neural network model, respectively adopting a test set subjected to SMOTE oversampling processing and an original test set to predict the grade of diesel oil, and obtaining the integral classification recognition rate; then drawing a multi-classification confusion matrix, and obtaining the precision, the recall rate, the accuracy and the balance F score according to the confusion matrix, wherein the precision, the recall, the accuracy and the balance F score are shown in <9>, <10>, <11>, <12 >:
where TP is the number of samples for which positive examples are predicted as positive examples, FN is the number of samples for which positive examples are predicted as negative examples, FP is the number of samples for which negative examples are predicted as positive examples, and TN is the number of samples for which negative examples are predicted as negative examples.
Due to the adoption of the technical scheme, the invention has the technical progress that:
the near infrared spectrum diesel grade identification method based on SMOTE and deep learning can greatly improve the accuracy of classification and identification on the premise of not needing complex operations such as manual feature extraction, dimension reduction and the like, and improves the identification rate of a few samples in consideration of the problem of unbalanced class samples in actual life. The model combining SMOTE and deep learning provided by the invention has strong applicability and expandability, and is beneficial to the development of a rapid detection system with high accuracy, simple operation, portability and based on NIRS.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a NIRS diagram for diesel fuel;
FIG. 3 is a distribution diagram of sample numbers of various categories of raw data;
FIG. 4 is a training set sample distribution graph after SMOTE oversampling processing;
FIG. 5 is a diagram of a NIRS-based one-dimensional depth convolution neural network classification model architecture;
FIG. 6 is a graph of a training set loss function variation;
FIG. 7 is a graph of training set accuracy change;
FIG. 8 is a diagram of a training set multi-class confusion matrix after SMOTE oversampling processing;
FIG. 9 is a diagram of a test set multi-class confusion matrix after SMOTE oversampling;
FIG. 10 is a diagram of an original test set multi-classification confusion matrix;
FIG. 11 is a comparison graph of prediction accuracy of the XGboost, SVM and BP methods.
Detailed Description
The present invention will be described in further detail with reference to the following examples:
referring to fig. 1, the specific implementation steps of the present invention are as follows:
in the present embodiment, the sample details of the sample set of the grade of diesel fuel used are shown in table 1. There are 394 samples in total, and the attributes of the marks-10 #, -20#, -35#, -50# and the interference are mapped to categories 1, 2, 3, 4 and 0, respectively. As can be seen from the table, the number of samples in each category is different, and the samples in each category are extremely unevenly distributed.
TABLE 1 sample details of Diesel grade data set
The NIRS of the diesel oil is shown in figure 2, the spectral wavelength range is 750 nm-1550 nm, the interval is 2nm, and the total number of characteristic wavelength points is 401. It can be seen from the figure that 394 samples are stacked together and cannot be distinguished, the information intensity of the spectrum is weak, the interference is large, and it is completely infeasible to realize accurate distinction of 5 categories by only using the NIRS chart, so the method of step 2 is required to perform the equalization processing of the data.
firstly, a cross-validation method is adopted to automatically divide a sample set according to the proportion of 7:3, 275 training set samples are obtained, and 119 testing set samples are obtained. In order to improve the generalization capability of the model and solve the class imbalance phenomenon, the sampling data of the training set is equalized by adopting the SMOTE oversampling technology, the distribution condition of each class sample before SMOTE processing is shown in fig. 3, and the distribution condition of each class sample of the training set after SMOTE processing is shown in fig. 4. After the processing, the number of samples in each category is consistent, and the samples are 184 samples, that is, samples in the training set are changed from 275 original samples to 920 samples. For later comparison, the SMOTE method is adopted to automatically generate unbalanced samples for the test set samples, and the total number of generated new test set samples is 395.
in the present embodiment, the overall structure of the NIRS-based one-dimensional depth convolution neural network classification model is shown in fig. 5. The method comprises the following specific steps:
step 3.1: and the input layer inputs a one-dimensional diesel NIRS signal, and the input shape is (401, 1).
Step 3.2: the convolution layer is used in cooperation with a one-dimensional deep convolution neural network, the size of convolution kernels is 40 x 1, the number of the convolution kernels is 16, the step length is 1, the activation function is PReLU, and the PReLU has the characteristics of high convergence speed and low error rate and can effectively solve the problems of avoiding gradient disappearance and gradient explosion.
Step 3.3: and (3) a pooling layer, wherein the size of a pooling window is 3 x 1 and the step length is 1 by adopting a maximum pooling method.
Step 3.4: and (3) convolution layer, the size of convolution kernel is 40 x 1, the number of convolution kernels is 64, the step size is 1, and the activation function is PReLU.
Step 3.5: and (3) a pooling layer, wherein the size of a pooling window is 3 x 1 and the step length is 1 by adopting a maximum pooling method.
Step 3.6: and a Flatten layer which is used for realizing the transition from the multi-dimensional input to the full connection layer by one-dimensional input.
Step 3.7: the Dense layer, in order to reduce the risk of overfitting of the model, adds random inactivation Dropout with a ratio of 0.1, the number of neurons is 128, and the activation function is PReLU.
Step 3.8: and in the Dense layer, the number of the neurons is 5, the neurons correspond to 5 output categories respectively, and the activation function is Softmax.
Step 3.9: the cross entropy is used as a loss function, an Adam optimizer is adopted, the accuracy A is used as an evaluation index, the number of batch processing samples is set to be 16, training is carried out on training set samples subjected to SMOTE oversampling processing, an iteration curve of the loss function of the training set is obtained, and as shown in figure 6, the loss value is smaller and smaller along with the increase of the training batch and is finally close to 0. The iteration curve of the evaluation index accuracy of the training set is shown in fig. 7, and the accuracy recognition rate of the training set gradually increases with the increase of the training batch, and finally approaches to 1. From the training result, the model has better performance, and the diesel grade qualitative analysis model of the one-dimensional deep convolution neural network is successfully established.
Here, it should be noted that:
in step 3.2 and step 3.4, the convolution layer is composed of a set of convolution kernels with trainable parameters, convolution operation is performed by sliding on input data according to a specific rule, extraction of spectral local abstract features is achieved, and a one-dimensional feature map is correspondingly generated.
In step 3.3 and step 3.5, the pooling layer is typically used to sample the map generated by the convolution operation to reduce the dimensionality of the feature vectors in the convolutional layer; on the premise of ensuring that the number of the characteristic graphs is not changed, the running speed of the algorithm can be greatly improved by reducing the data volume.
In step 3.6, the Flatten layer is used to Flatten the data to facilitate ordered connections to neurons.
In step 3.9, the cross entropy is used to evaluate the difference between the probability distribution obtained by the current training and the true distribution, indicating the distance between the probability of actual output and the probability of expected output.
And step 4, bringing the test set samples into the established model to obtain the diesel grade identification result, drawing a multi-classification confusion matrix, and analyzing the identification rate of each class.
In this embodiment, firstly, based on a constructed one-dimensional deep convolutional neural network classification model, classification prediction is performed by using training set data, and the classification accuracy of the training set is 97.61%; the data of the test set after SMOTE oversampling processing is brought in, and the classification accuracy rate can be obtained to be 95.44%; the classification accuracy was 95.80% by substituting the original 119 samples of the test set data. At this time, in order to observe the recognition rate of each class, especially the recognition rate of a few classes of samples, a multi-class confusion matrix needs to be drawn, the multi-class confusion matrix is transformed by adding a one-to-one strategy on the basis of a traditional two-class confusion matrix, rows of the matrix represent real classes of data, and columns of the matrix represent prediction classes. Therefore, the numbers on the main diagonal line represent the number of tuples of which the predicted result is consistent with the actual result, and the numbers outside the diagonal line represent the number of tuples of which the prediction is wrong.
Then, the multi-class confusion matrices for the above three cases can be plotted as shown in fig. 8, 9 and 10, respectively. It can be seen that the prediction accuracy of each class is relatively high regardless of the training set or the test set, and the accurate recognition rate of the samples of the few classes, namely the class 0, the class 1 and the class 4, can be as high as 100% for the actual original test set samples. According to the confusion matrix and the formulas <9>, <10>, <11> and <12>, the accuracy of the prediction model is 98.67%, the recall rate is 100%, the accuracy is 95.80% and the F1 value is 0.9933. The model has high accuracy rate for diesel grade classification and strong generalization ability.
In order to increase persuasion, the XGboost integrated learning method and the SVM and BP neural network are adopted to process the same diesel NIRS data set.
Specifically, the parameters of XGBoost are set as follows: the tree of the tree is 196, the maximum depth of the tree is 5, the minimum leaf node weight sum is 1, the complexity control term gamma is 0.15, the weight of the L1 regular term is 0.08, the weight of the L2 regular term is 0.1, the ratio of random sampling of each tree is 0.71, the ratio colsample _ byte of the column number of each random sampling is 0.69, the learning rate is 0.1, the weak classifier selects 'gbtree', the objective function selects 'multi: softmax', the category number is 5, and the CPU thread number is 4. Through the construction of the model, the XGboost model brand classification recognition rate of 75.63% of the diesel oil sample of the original test set can be obtained.
The parameter settings of the SVM are as follows: the range of the kernel parameter g and the penalty parameter c is [ -10,0.2], and parameter optimization is carried out by taking RBF as a kernel function and adopting a cross validation grid search mode. Through the construction of the model, the classification recognition rate of the SVM model brand for the diesel oil sample of the original test set is 78.99%, and the operation speed of the method is slightly slow.
The parameters of the BP neural network are set as follows: the number of nodes of an input layer is 275, an implicit layer comprises 9 nodes, an output layer comprises 5 nodes, the output of the neural network is changed into probability distribution by utilizing Softmax, then cross entropy is used as a loss function, and the learning rate is 0.05. Through the construction of the model, the classification recognition rate of the BP model number of the diesel oil sample of the original test set is 69.75%, and the model operation speed is very slow because all neurons between the traditional BP neural network layers are connected.
The classification results of several methods are plotted in fig. 11, and it is obvious that the classification recognition rate of the diesel grade can be greatly improved by combining the SMOTE oversampling technology provided by the invention with the one-dimensional deep convolutional neural network method.
In summary, the method of combining the SMOTE oversampling technology with the one-dimensional deep convolutional neural network not only solves the problem of imbalance of class sample number in the actual situation, but also avoids the complex preprocessing processes of denoising, feature selection, dimension reduction and the like required by the traditional NIRS modeling method. The overall classification recognition rate of the diesel grade is improved, the recognition rate of a few class samples is greatly improved, and the generalization capability and the practical applicability of the model are strong. The intelligent identification of the diesel grade based on the deep learning NIRS modeling is used for replacing the fussy manual identification, and manpower and material resources are saved. In addition, the method has good application prospect in the NIRS qualitative analysis field.
The above-mentioned embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements of the technical solution of the present invention by those skilled in the art should fall within the protection scope defined by the claims of the present invention without departing from the spirit of the present invention.
Claims (10)
1. A near infrared spectrum diesel grade identification method based on SMOTE and deep learning is characterized by comprising the following steps:
step 1, drawing a diesel near-infrared spectrogram, analyzing distribution conditions of different grades of diesel, performing attribute mapping on grade labels, and taking the different grades of the diesel as a sample set;
step 2, carrying out data equalization processing on the sample set by adopting SMOTE, and dividing the sample set into a training set sample and a test set sample;
step 3, constructing a near infrared spectrum classification model of the one-dimensional deep convolution neural network by using the training set sample;
and step 4, bringing the test set samples into the established model to obtain the diesel grade identification result, drawing a multi-classification confusion matrix, and analyzing the identification rate of each class.
2. The SMOTE and deep learning based near infrared spectrum diesel grade identification method according to claim 1, characterized in that: the method comprises the steps of drawing a near infrared spectrum image by using a sample set of diesel oil, dividing sample grades into 5 types including-10 #, -20#, -35#, -50# and interference according to the condensation point of the diesel oil, respectively mapping attributes to be types 1, 2, 3, 4 and 0.
3. The SMOTE and deep learning based near infrared spectrum diesel grade identification method according to claim 2, characterized in that: the specific process of performing data equalization processing on the sample set by using SMOTE is as follows:
1): firstly, calculating the Euclidean distance from each sample x in the minority class to all samples in the minority class sample set to obtain k neighbor of each sample x;
2): setting a sampling ratio according to the sample imbalance ratio to determine a sampling multiplying factor N, and randomly selecting a plurality of samples from k neighbors of each sample x of a minority class, wherein the selected neighbors are assumed to be xn;
3): for each xnRespectively according to the formula<1>Constructing a new sample;
xnew=x+rand(0,1)×|x-xn|,new∈1,2,…,N <1>
4): finally, repeating the steps for N times to synthesize N new samples; if the rare class has a total of T samples, NT new samples can be synthesized.
4. The SMOTE and deep learning based near infrared spectrum diesel grade identification method according to claim 3, characterized in that: the specific method for constructing the near infrared spectrum classification model of the one-dimensional depth convolution neural network comprises the following steps:
performing some transformations on the one-dimensional near infrared spectrum data to enable the input signal to meet the requirements of a convolutional neural network; regarding the one-dimensional near infrared spectrum as a special set of two-dimensional images only comprising one row or one column, performing corresponding dimension expansion on a spectrum signal, and converting a category label into a form of one-hot coding; and constructing a one-dimensional depth convolution neural network model by referring to LetNet-5, wherein the one-dimensional depth convolution neural network model comprises an input layer, two convolution layers, two pooling layers, two full-connection layers and an output layer.
5. The SMOTE and deep learning based near infrared spectrum diesel grade identification method according to claim 4, characterized in that: the convolution layer is composed of a group of convolution kernels with trainable parameters, the size of the convolution kernels is set to be m multiplied by 1, and the convolution operation of the one-dimensional signal is shown in a formula <2 >:
wherein l is the current convolution layer, l-1 is the (l-1) th convolution layer, and xiAnd yjRespectively representing the ith input feature map and the jth output feature map, and a convolution operator, omegaijRepresents the convolution kernel, b is the bias, and f (.) is the operation of the activation function.
7. The SMOTE and deep learning based near infrared spectrum diesel grade identification method according to claim 4, characterized in that: the operation of the pooling layer is shown in formula <4 >:
wherein l represents the current pooling layer, l-1 represents the (l-1) th pooling layer, yjIs the jth output characteristic diagram, beta is a multiplicative bias term, and b is bias;
the pooling method is a maximum pooling method, and a sampling method of the maximum pooling method is calculated according to a formula <5 >:
in the formula, one feature map obtained by the convolution layer is divided into a plurality of regions Xk,k∈1,2,…,K。
8. The SMOTE and deep learning based near infrared spectrum diesel grade identification method according to claim 4, characterized in that: the fully-connected layer comprises a Flatten layer and two Dense layers, the activation function of the last Dense layer is Softmax, and a certain proportion of random deactivation is added into the fully-connected layer, wherein the fully-connected layer is calculated according to the formula <6 >:
hω,b(x)=f(ωTx+b) <6>
where ω is the weight of the neuron, b is the bias, T is the transpose, and h (x) is the output of the neuron.
9. The SMOTE and deep learning based near infrared spectrum diesel grade identification method according to claim 4, characterized in that: after the model is constructed, a training method is configured; the configured training method comprises a loss function, an optimizer and an evaluation index, wherein the loss function is specifically a cross entropy loss function, the formula of the loss function is shown as <7>, the optimizer adopts Adam optimization, and the evaluation index is accuracy A, and is shown as a formula <8 >:
in the formula, niThe number of predicted samples is the same as the number of actual samples, and n is the total number of samples.
10. The SMOTE and deep learning based near infrared spectrum diesel grade identification method according to claim 4, characterized in that: on the basis of the constructed one-dimensional depth convolution neural network model, respectively adopting a test set subjected to SMOTE oversampling processing and an original test set to predict the grade of diesel oil, and obtaining the integral classification recognition rate; then drawing a multi-classification confusion matrix, and obtaining the precision, the recall rate, the accuracy and the balance F score according to the confusion matrix, wherein the precision, the recall, the accuracy and the balance F score are shown in <9>, <10>, <11>, <12 >:
where TP is the number of samples for which positive examples are predicted as positive examples, FN is the number of samples for which positive examples are predicted as negative examples, FP is the number of samples for which negative examples are predicted as positive examples, and TN is the number of samples for which negative examples are predicted as negative examples.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011443096.0A CN112613536A (en) | 2020-12-08 | 2020-12-08 | Near infrared spectrum diesel grade identification method based on SMOTE and deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011443096.0A CN112613536A (en) | 2020-12-08 | 2020-12-08 | Near infrared spectrum diesel grade identification method based on SMOTE and deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112613536A true CN112613536A (en) | 2021-04-06 |
Family
ID=75232922
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011443096.0A Pending CN112613536A (en) | 2020-12-08 | 2020-12-08 | Near infrared spectrum diesel grade identification method based on SMOTE and deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112613536A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113298791A (en) * | 2021-05-31 | 2021-08-24 | 中电福富信息科技有限公司 | Image detection method of mixed cartoon based on deep learning |
CN113378971A (en) * | 2021-06-28 | 2021-09-10 | 燕山大学 | Near infrared spectrum classification model training method and system and classification method and system |
CN113505730A (en) * | 2021-07-26 | 2021-10-15 | 全景智联(武汉)科技有限公司 | Model evaluation method, device, equipment and storage medium based on mass data |
CN113702328A (en) * | 2021-08-20 | 2021-11-26 | 广东省惠州市石油产品质量监督检验中心 | Method, device, equipment and storage medium for analyzing properties of product oil |
CN114659996A (en) * | 2022-05-19 | 2022-06-24 | 联桥网云信息科技(长沙)有限公司 | Hyperspectral oil detection method based on reflected light |
Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6134537A (en) * | 1995-09-29 | 2000-10-17 | Ai Ware, Inc. | Visualization and self organization of multidimensional data through equalized orthogonal mapping |
CN101866428A (en) * | 2010-07-13 | 2010-10-20 | 中国人民解放军总后勤部油料研究所 | Method for quickly identifying categories and brands of engine fuel |
CN104155245A (en) * | 2014-07-31 | 2014-11-19 | 中国科学院自动化研究所 | Method for detecting multiclass properties of oil product based on mode recognition and spectrogram mapping |
CN106845371A (en) * | 2016-12-31 | 2017-06-13 | 中国科学技术大学 | A kind of city road network automotive emission remote sensing monitoring system |
CN108647643A (en) * | 2018-05-11 | 2018-10-12 | 浙江工业大学 | A kind of packed tower liquid flooding state on-line identification method based on deep learning |
US20180299375A1 (en) * | 2015-04-27 | 2018-10-18 | Virtual Fluid Monitoring Services LLC | Fluid analysis and monitoring using optical spectroscopy |
CN109167680A (en) * | 2018-08-06 | 2019-01-08 | 浙江工商大学 | A kind of traffic classification method based on deep learning |
AU2019100354A4 (en) * | 2019-04-04 | 2019-05-16 | Chen, Mingjie Mr | An animal image search system based on convolutional neural network |
CN109933539A (en) * | 2019-04-15 | 2019-06-25 | 燕山大学 | A kind of Software Defects Predict Methods based on principal component analysis and combination sampling |
CN109992861A (en) * | 2019-03-21 | 2019-07-09 | 温州大学 | A kind of near infrared spectrum modeling method |
WO2019169816A1 (en) * | 2018-03-09 | 2019-09-12 | 中山大学 | Deep neural network for fine recognition of vehicle attributes, and training method thereof |
CN110443302A (en) * | 2019-08-02 | 2019-11-12 | 天津相和电气科技有限公司 | Load discrimination method and its application based on Fusion Features and deep learning |
CN110717368A (en) * | 2018-07-13 | 2020-01-21 | 北京服装学院 | Qualitative classification method for textiles |
CN111740971A (en) * | 2020-06-15 | 2020-10-02 | 郑州大学 | Network intrusion detection model SGM-CNN based on class imbalance processing |
CN111860124A (en) * | 2020-06-04 | 2020-10-30 | 西安电子科技大学 | Remote sensing image classification method based on space spectrum capsule generation countermeasure network |
CN111881987A (en) * | 2020-07-31 | 2020-11-03 | 西安工业大学 | Apple virus identification method based on deep learning |
CN111896495A (en) * | 2020-08-05 | 2020-11-06 | 安徽大学 | Method and system for discriminating Taiping Houkui production places based on deep learning and near infrared spectrum |
-
2020
- 2020-12-08 CN CN202011443096.0A patent/CN112613536A/en active Pending
Patent Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6134537A (en) * | 1995-09-29 | 2000-10-17 | Ai Ware, Inc. | Visualization and self organization of multidimensional data through equalized orthogonal mapping |
CN101866428A (en) * | 2010-07-13 | 2010-10-20 | 中国人民解放军总后勤部油料研究所 | Method for quickly identifying categories and brands of engine fuel |
CN104155245A (en) * | 2014-07-31 | 2014-11-19 | 中国科学院自动化研究所 | Method for detecting multiclass properties of oil product based on mode recognition and spectrogram mapping |
US20180299375A1 (en) * | 2015-04-27 | 2018-10-18 | Virtual Fluid Monitoring Services LLC | Fluid analysis and monitoring using optical spectroscopy |
CN106845371A (en) * | 2016-12-31 | 2017-06-13 | 中国科学技术大学 | A kind of city road network automotive emission remote sensing monitoring system |
WO2019169816A1 (en) * | 2018-03-09 | 2019-09-12 | 中山大学 | Deep neural network for fine recognition of vehicle attributes, and training method thereof |
CN108647643A (en) * | 2018-05-11 | 2018-10-12 | 浙江工业大学 | A kind of packed tower liquid flooding state on-line identification method based on deep learning |
CN110717368A (en) * | 2018-07-13 | 2020-01-21 | 北京服装学院 | Qualitative classification method for textiles |
CN109167680A (en) * | 2018-08-06 | 2019-01-08 | 浙江工商大学 | A kind of traffic classification method based on deep learning |
CN109992861A (en) * | 2019-03-21 | 2019-07-09 | 温州大学 | A kind of near infrared spectrum modeling method |
AU2019100354A4 (en) * | 2019-04-04 | 2019-05-16 | Chen, Mingjie Mr | An animal image search system based on convolutional neural network |
CN109933539A (en) * | 2019-04-15 | 2019-06-25 | 燕山大学 | A kind of Software Defects Predict Methods based on principal component analysis and combination sampling |
CN110443302A (en) * | 2019-08-02 | 2019-11-12 | 天津相和电气科技有限公司 | Load discrimination method and its application based on Fusion Features and deep learning |
CN111860124A (en) * | 2020-06-04 | 2020-10-30 | 西安电子科技大学 | Remote sensing image classification method based on space spectrum capsule generation countermeasure network |
CN111740971A (en) * | 2020-06-15 | 2020-10-02 | 郑州大学 | Network intrusion detection model SGM-CNN based on class imbalance processing |
CN111881987A (en) * | 2020-07-31 | 2020-11-03 | 西安工业大学 | Apple virus identification method based on deep learning |
CN111896495A (en) * | 2020-08-05 | 2020-11-06 | 安徽大学 | Method and system for discriminating Taiping Houkui production places based on deep learning and near infrared spectrum |
Non-Patent Citations (2)
Title |
---|
何东远 等: "基于深度学习的恒星光谱分类", 北京师范大学学报(自然科学版), vol. 56, no. 1, pages 37 - 44 * |
胡薰尹;管业鹏;李伟东;罗宏杰;: "基于紫外可见近红外光谱特征映射矩阵的古陶瓷分类方法", 硅酸盐学报, no. 09, pages 1280 - 1286 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113298791A (en) * | 2021-05-31 | 2021-08-24 | 中电福富信息科技有限公司 | Image detection method of mixed cartoon based on deep learning |
CN113378971A (en) * | 2021-06-28 | 2021-09-10 | 燕山大学 | Near infrared spectrum classification model training method and system and classification method and system |
CN113505730A (en) * | 2021-07-26 | 2021-10-15 | 全景智联(武汉)科技有限公司 | Model evaluation method, device, equipment and storage medium based on mass data |
CN113702328A (en) * | 2021-08-20 | 2021-11-26 | 广东省惠州市石油产品质量监督检验中心 | Method, device, equipment and storage medium for analyzing properties of product oil |
CN114659996A (en) * | 2022-05-19 | 2022-06-24 | 联桥网云信息科技(长沙)有限公司 | Hyperspectral oil detection method based on reflected light |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112613536A (en) | Near infrared spectrum diesel grade identification method based on SMOTE and deep learning | |
CN105975573B (en) | A kind of file classification method based on KNN | |
CN107292350A (en) | The method for detecting abnormality of large-scale data | |
CN111914728B (en) | Hyperspectral remote sensing image semi-supervised classification method and device and storage medium | |
CN103489005A (en) | High-resolution remote sensing image classifying method based on fusion of multiple classifiers | |
CN110363253A (en) | A kind of Surfaces of Hot Rolled Strip defect classification method based on convolutional neural networks | |
CN109711483B (en) | Spark Autoencoder-based power system operation mode clustering method | |
CN110211127B (en) | Image partition method based on bicoherence network | |
CN110826618A (en) | Personal credit risk assessment method based on random forest | |
CN106886793B (en) | Hyperspectral image waveband selection method based on discrimination information and manifold information | |
CN113240201B (en) | Method for predicting ship host power based on GMM-DNN hybrid model | |
CN115564996A (en) | Hyperspectral remote sensing image classification method based on attention union network | |
CN105868796A (en) | Design method for linear discrimination of sparse representation classifier based on nuclear space | |
CN111338950A (en) | Software defect feature selection method based on spectral clustering | |
CN110348287A (en) | A kind of unsupervised feature selection approach and device based on dictionary and sample similar diagram | |
CN112289391A (en) | Anode aluminum foil performance prediction system based on machine learning | |
CN111027636B (en) | Unsupervised feature selection method and system based on multi-label learning | |
CN109583519A (en) | A kind of semisupervised classification method based on p-Laplacian figure convolutional neural networks | |
CN108564116A (en) | A kind of ingredient intelligent analysis method of camera scene image | |
CN113408616B (en) | Spectral classification method based on PCA-UVE-ELM | |
CN108509840B (en) | Hyperspectral remote sensing image waveband selection method based on quantum memory optimization mechanism | |
CN112966735B (en) | Method for fusing supervision multi-set related features based on spectrum reconstruction | |
CN112101574B (en) | Machine learning supervised model interpretation method, system and equipment | |
Li et al. | Adaptive mask sampling and manifold to Euclidean subspace learning with distance covariance representation for hyperspectral image classification | |
CN113516019A (en) | Hyperspectral image unmixing method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |