CN112613536A - Near infrared spectrum diesel grade identification method based on SMOTE and deep learning - Google Patents

Near infrared spectrum diesel grade identification method based on SMOTE and deep learning Download PDF

Info

Publication number
CN112613536A
CN112613536A CN202011443096.0A CN202011443096A CN112613536A CN 112613536 A CN112613536 A CN 112613536A CN 202011443096 A CN202011443096 A CN 202011443096A CN 112613536 A CN112613536 A CN 112613536A
Authority
CN
China
Prior art keywords
diesel
sample
smote
samples
near infrared
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011443096.0A
Other languages
Chinese (zh)
Inventor
王书涛
刘诗瑜
崔凯
张靖昆
孔德明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yanshan University
Original Assignee
Yanshan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yanshan University filed Critical Yanshan University
Priority to CN202011443096.0A priority Critical patent/CN112613536A/en
Publication of CN112613536A publication Critical patent/CN112613536A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Abstract

The invention discloses a near infrared spectrum diesel grade identification method based on SMOTE and deep learning, which comprises the following steps: step 1, drawing a diesel near-infrared spectrogram, analyzing distribution conditions of different grades of diesel, performing attribute mapping on grade labels, and taking the different grades of the diesel as a sample set; step 2, carrying out data equalization processing on the sample set by adopting SMOTE, and dividing the sample set into a training set sample and a test set sample; step 3, constructing a near infrared spectrum classification model of the one-dimensional deep convolution neural network by using the training set sample; and step 4, bringing the test set samples into the established model to obtain the diesel grade identification result, drawing a multi-classification confusion matrix, and analyzing the identification rate of each class. The invention does not need a large amount of preprocessing, can improve the accuracy of classification and identification, and can improve the identification rate of a few samples.

Description

Near infrared spectrum diesel grade identification method based on SMOTE and deep learning
Technical Field
The invention relates to the field of near infrared spectrum, in particular to a near infrared spectrum diesel grade identification method based on SMOTE and deep learning.
Background
Due to high energy density, low oil consumption and low price, petroleum derived diesel still dominates the market. Improving the quality and detection precision of diesel oil to meet the changing demand of the diesel oil market is still one of the main directions of the development of the global petroleum industry in the future. According to the GB/T1.1-2020 standard, the commercial diesel oil can be divided into 6 grades of 5#, 0#, -20#, -35# and-50 # according to the difference of condensation points. The lower the diesel grade, the less likely it is to form wax and the higher the price is relatively. In order to gain benefits, illegal manufacturers often have behaviors of adulteration of diesel oil and label disordering, and selling illegal oil products can not only damage engines, but also increase pollution emission and even harm personal safety. Therefore, the diesel brand can be quickly and accurately identified, so that convenience is provided for a supervision department to master accurate and timely detection data, and the method has important significance for guaranteeing the rights and benefits of consumers and the safety of lives.
The grade of diesel oil is only identified from the aspects of color, hand feeling, smell and the like, and although the grade is usually used in daily life, the grade is undoubtedly a work which is time-consuming, labor-consuming and highly subjective, and is not suitable for large-scale production detection. Near infrared spectroscopy (NIRS), a fast, green, low cost, easy to operate, non-destructive technique, has been used in many cases in the petrochemical field. NIRS of diesel fuel involves characteristic absorption of various hydrocarbons (such as O-H, C-H and N-H) in a complex mixture, and accurate identification of diesel fuel brands is extremely difficult and requires computer-aided detection. At present, the commonly used auxiliary models comprise a partial least square method, a support vector machine, an artificial neural network and the like, and because the NIRS has a wide spectrum range, weak useful information intensity, more noise interference and serious spectrum peak overlap, the traditional machine learning methods have to combine a large amount of pre-processing of denoising, feature extraction, dimension reduction and the like to obtain a faster detection speed and a more accurate prediction result, so that not only is the workload increased invisibly, but also the applicability of the models and the accuracy of prediction need to be improved urgently.
Deep learning is a deep network, which is a new research direction in the field of machine learning, and in recent years, development in a plurality of application fields such as image processing, speech recognition, machine translation, and the like, such as tea. The DCNN is the most widely applied deep learning model, can autonomously extract effective features from complex data and reduce dimensionality, and has stronger expression capability compared with the traditional shallow model, but has less processing capacity for one-dimensional NIRS because the DCNN is mainly used for processing two-dimensional or three-dimensional images.
Disclosure of Invention
The invention aims to provide a near infrared spectrum diesel grade identification method based on SMOTE and deep learning, which can improve the accuracy of classification identification and improve the identification rate of a few samples.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
a near infrared spectrum diesel grade identification method based on SMOTE and deep learning comprises the following steps:
step 1, drawing a diesel near-infrared spectrogram, analyzing distribution conditions of different grades of diesel, performing attribute mapping on grade labels, and taking the different grades of the diesel as a sample set;
step 2, carrying out data equalization processing on the sample set by adopting SMOTE, and dividing the sample set into a training set sample and a test set sample;
step 3, constructing a near infrared spectrum classification model of the one-dimensional deep convolution neural network by using the training set sample;
and step 4, bringing the test set samples into the established model to obtain the diesel grade identification result, drawing a multi-classification confusion matrix, and analyzing the identification rate of each class.
The technical scheme of the invention is further improved as follows: the method comprises the steps of drawing a near infrared spectrum image by using a sample set of diesel oil, dividing sample grades into 5 types including-10 #, -20#, -35#, -50# and interference according to the condensation point of the diesel oil, respectively mapping attributes to be types 1, 2, 3, 4 and 0.
The technical scheme of the invention is further improved as follows: the specific process of performing data equalization processing on the sample set by using SMOTE is as follows:
1): firstly, calculating the Euclidean distance from each sample x in the minority class to all samples in the minority class sample set to obtain k neighbor of each sample x;
2): setting a sampling ratio according to the sample imbalance ratio to determine a sampling multiplying factor N, and randomly selecting a plurality of samples from k neighbors of each sample x of a minority class, wherein the selected neighbors are assumed to be xn
3): for each xnRespectively according to the formula<1>Constructing a new sample;
xnew=x+rand(0,1)×|x-xn|,new∈1,2,…,N <1>
4): finally, repeating the steps for N times to synthesize N new samples; if the rare class has a total of T samples, NT new samples can be synthesized.
The technical scheme of the invention is further improved as follows: the specific method for constructing the near infrared spectrum classification model of the one-dimensional depth convolution neural network comprises the following steps:
performing some transformations on the one-dimensional near infrared spectrum data to enable the input signal to meet the requirements of a convolutional neural network; regarding the one-dimensional near infrared spectrum as a special set of two-dimensional images only comprising one row or one column, performing corresponding dimension expansion on a spectrum signal, and converting a category label into a form of one-hot coding; and constructing a one-dimensional depth convolution neural network model by referring to LetNet-5, wherein the one-dimensional depth convolution neural network model comprises an input layer, two convolution layers, two pooling layers, two full-connection layers and an output layer.
The technical scheme of the invention is further improved as follows: the convolution layer is composed of a group of convolution kernels with trainable parameters, the size of the convolution kernels is set to be m multiplied by 1, and the convolution operation of the one-dimensional signal is shown in a formula <2 >:
Figure BDA0002823195880000031
wherein l is the current convolution layer, l-1 is the (l-1) th convolution layer, and xiAnd yjRespectively representing the ith input feature map and the jth output feature map, and a convolution operator, omegaijRepresents the convolution kernel, b is the bias, and f (.) is the operation of the activation function.
The technical scheme of the invention is further improved as follows: an activation function PReLU is introduced into the convolutional layer, and the expression of the function is shown in <3 >.
Figure BDA0002823195880000041
The technical scheme of the invention is further improved as follows: the operation of the pooling layer is shown in formula <4 >:
Figure BDA0002823195880000042
wherein l represents the current pooling layer, l-1 represents the (l-1) th pooling layer, yjIs the jth output characteristic diagram, beta is a multiplicative bias term, and b is bias;
the pooling method is a maximum pooling method, and a sampling method of the maximum pooling method is calculated according to a formula <5 >:
Figure BDA0002823195880000043
in the formula, one feature map obtained by the convolution layer is divided into a plurality of regions Xk,k∈1,2,…,K。
The technical scheme of the invention is further improved as follows: the fully-connected layer comprises a Flatten layer and two Dense layers, the activation function of the last Dense layer is Softmax, and a certain proportion of random deactivation is added into the fully-connected layer, wherein the fully-connected layer is calculated according to the formula <6 >:
hω,b(x)=f(ωTx+b) <6>
where ω is the weight of the neuron, b is the bias, T is the transpose, and h (x) is the output of the neuron.
The technical scheme of the invention is further improved as follows: after the model is constructed, a training method is configured; the configured training method comprises a loss function, an optimizer and an evaluation index, wherein the loss function is specifically a cross entropy loss function, the formula of the loss function is shown as <7>, the optimizer adopts Adam optimization, and the evaluation index is accuracy A, and is shown as a formula <8 >:
Figure BDA0002823195880000051
Figure BDA0002823195880000052
in the formula, niThe number of predicted samples is the same as the number of actual samples, and n is the total number of samples.
The technical scheme of the invention is further improved as follows: on the basis of the constructed one-dimensional depth convolution neural network model, respectively adopting a test set subjected to SMOTE oversampling processing and an original test set to predict the grade of diesel oil, and obtaining the integral classification recognition rate; then drawing a multi-classification confusion matrix, and obtaining the precision, the recall rate, the accuracy and the balance F score according to the confusion matrix, wherein the precision, the recall, the accuracy and the balance F score are shown in <9>, <10>, <11>, <12 >:
Figure BDA0002823195880000053
Figure BDA0002823195880000056
Figure BDA0002823195880000054
Figure BDA0002823195880000055
where TP is the number of samples for which positive examples are predicted as positive examples, FN is the number of samples for which positive examples are predicted as negative examples, FP is the number of samples for which negative examples are predicted as positive examples, and TN is the number of samples for which negative examples are predicted as negative examples.
Due to the adoption of the technical scheme, the invention has the technical progress that:
the near infrared spectrum diesel grade identification method based on SMOTE and deep learning can greatly improve the accuracy of classification and identification on the premise of not needing complex operations such as manual feature extraction, dimension reduction and the like, and improves the identification rate of a few samples in consideration of the problem of unbalanced class samples in actual life. The model combining SMOTE and deep learning provided by the invention has strong applicability and expandability, and is beneficial to the development of a rapid detection system with high accuracy, simple operation, portability and based on NIRS.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a NIRS diagram for diesel fuel;
FIG. 3 is a distribution diagram of sample numbers of various categories of raw data;
FIG. 4 is a training set sample distribution graph after SMOTE oversampling processing;
FIG. 5 is a diagram of a NIRS-based one-dimensional depth convolution neural network classification model architecture;
FIG. 6 is a graph of a training set loss function variation;
FIG. 7 is a graph of training set accuracy change;
FIG. 8 is a diagram of a training set multi-class confusion matrix after SMOTE oversampling processing;
FIG. 9 is a diagram of a test set multi-class confusion matrix after SMOTE oversampling;
FIG. 10 is a diagram of an original test set multi-classification confusion matrix;
FIG. 11 is a comparison graph of prediction accuracy of the XGboost, SVM and BP methods.
Detailed Description
The present invention will be described in further detail with reference to the following examples:
referring to fig. 1, the specific implementation steps of the present invention are as follows:
step 1, drawing a diesel NIRS diagram, analyzing distribution conditions of different grades of diesel, performing attribute mapping on grade labels, and taking the different grades of the diesel as a sample set;
in the present embodiment, the sample details of the sample set of the grade of diesel fuel used are shown in table 1. There are 394 samples in total, and the attributes of the marks-10 #, -20#, -35#, -50# and the interference are mapped to categories 1, 2, 3, 4 and 0, respectively. As can be seen from the table, the number of samples in each category is different, and the samples in each category are extremely unevenly distributed.
TABLE 1 sample details of Diesel grade data set
Figure BDA0002823195880000061
The NIRS of the diesel oil is shown in figure 2, the spectral wavelength range is 750 nm-1550 nm, the interval is 2nm, and the total number of characteristic wavelength points is 401. It can be seen from the figure that 394 samples are stacked together and cannot be distinguished, the information intensity of the spectrum is weak, the interference is large, and it is completely infeasible to realize accurate distinction of 5 categories by only using the NIRS chart, so the method of step 2 is required to perform the equalization processing of the data.
Step 2, performing data equalization processing on the sample set by adopting an SMOTE method;
firstly, a cross-validation method is adopted to automatically divide a sample set according to the proportion of 7:3, 275 training set samples are obtained, and 119 testing set samples are obtained. In order to improve the generalization capability of the model and solve the class imbalance phenomenon, the sampling data of the training set is equalized by adopting the SMOTE oversampling technology, the distribution condition of each class sample before SMOTE processing is shown in fig. 3, and the distribution condition of each class sample of the training set after SMOTE processing is shown in fig. 4. After the processing, the number of samples in each category is consistent, and the samples are 184 samples, that is, samples in the training set are changed from 275 original samples to 920 samples. For later comparison, the SMOTE method is adopted to automatically generate unbalanced samples for the test set samples, and the total number of generated new test set samples is 395.
Step 3, establishing an NIRS classification model of the one-dimensional deep convolution neural network by using a diesel training set sample;
in the present embodiment, the overall structure of the NIRS-based one-dimensional depth convolution neural network classification model is shown in fig. 5. The method comprises the following specific steps:
step 3.1: and the input layer inputs a one-dimensional diesel NIRS signal, and the input shape is (401, 1).
Step 3.2: the convolution layer is used in cooperation with a one-dimensional deep convolution neural network, the size of convolution kernels is 40 x 1, the number of the convolution kernels is 16, the step length is 1, the activation function is PReLU, and the PReLU has the characteristics of high convergence speed and low error rate and can effectively solve the problems of avoiding gradient disappearance and gradient explosion.
Step 3.3: and (3) a pooling layer, wherein the size of a pooling window is 3 x 1 and the step length is 1 by adopting a maximum pooling method.
Step 3.4: and (3) convolution layer, the size of convolution kernel is 40 x 1, the number of convolution kernels is 64, the step size is 1, and the activation function is PReLU.
Step 3.5: and (3) a pooling layer, wherein the size of a pooling window is 3 x 1 and the step length is 1 by adopting a maximum pooling method.
Step 3.6: and a Flatten layer which is used for realizing the transition from the multi-dimensional input to the full connection layer by one-dimensional input.
Step 3.7: the Dense layer, in order to reduce the risk of overfitting of the model, adds random inactivation Dropout with a ratio of 0.1, the number of neurons is 128, and the activation function is PReLU.
Step 3.8: and in the Dense layer, the number of the neurons is 5, the neurons correspond to 5 output categories respectively, and the activation function is Softmax.
Step 3.9: the cross entropy is used as a loss function, an Adam optimizer is adopted, the accuracy A is used as an evaluation index, the number of batch processing samples is set to be 16, training is carried out on training set samples subjected to SMOTE oversampling processing, an iteration curve of the loss function of the training set is obtained, and as shown in figure 6, the loss value is smaller and smaller along with the increase of the training batch and is finally close to 0. The iteration curve of the evaluation index accuracy of the training set is shown in fig. 7, and the accuracy recognition rate of the training set gradually increases with the increase of the training batch, and finally approaches to 1. From the training result, the model has better performance, and the diesel grade qualitative analysis model of the one-dimensional deep convolution neural network is successfully established.
Here, it should be noted that:
in step 3.2 and step 3.4, the convolution layer is composed of a set of convolution kernels with trainable parameters, convolution operation is performed by sliding on input data according to a specific rule, extraction of spectral local abstract features is achieved, and a one-dimensional feature map is correspondingly generated.
In step 3.3 and step 3.5, the pooling layer is typically used to sample the map generated by the convolution operation to reduce the dimensionality of the feature vectors in the convolutional layer; on the premise of ensuring that the number of the characteristic graphs is not changed, the running speed of the algorithm can be greatly improved by reducing the data volume.
In step 3.6, the Flatten layer is used to Flatten the data to facilitate ordered connections to neurons.
In step 3.9, the cross entropy is used to evaluate the difference between the probability distribution obtained by the current training and the true distribution, indicating the distance between the probability of actual output and the probability of expected output.
And step 4, bringing the test set samples into the established model to obtain the diesel grade identification result, drawing a multi-classification confusion matrix, and analyzing the identification rate of each class.
In this embodiment, firstly, based on a constructed one-dimensional deep convolutional neural network classification model, classification prediction is performed by using training set data, and the classification accuracy of the training set is 97.61%; the data of the test set after SMOTE oversampling processing is brought in, and the classification accuracy rate can be obtained to be 95.44%; the classification accuracy was 95.80% by substituting the original 119 samples of the test set data. At this time, in order to observe the recognition rate of each class, especially the recognition rate of a few classes of samples, a multi-class confusion matrix needs to be drawn, the multi-class confusion matrix is transformed by adding a one-to-one strategy on the basis of a traditional two-class confusion matrix, rows of the matrix represent real classes of data, and columns of the matrix represent prediction classes. Therefore, the numbers on the main diagonal line represent the number of tuples of which the predicted result is consistent with the actual result, and the numbers outside the diagonal line represent the number of tuples of which the prediction is wrong.
Then, the multi-class confusion matrices for the above three cases can be plotted as shown in fig. 8, 9 and 10, respectively. It can be seen that the prediction accuracy of each class is relatively high regardless of the training set or the test set, and the accurate recognition rate of the samples of the few classes, namely the class 0, the class 1 and the class 4, can be as high as 100% for the actual original test set samples. According to the confusion matrix and the formulas <9>, <10>, <11> and <12>, the accuracy of the prediction model is 98.67%, the recall rate is 100%, the accuracy is 95.80% and the F1 value is 0.9933. The model has high accuracy rate for diesel grade classification and strong generalization ability.
In order to increase persuasion, the XGboost integrated learning method and the SVM and BP neural network are adopted to process the same diesel NIRS data set.
Specifically, the parameters of XGBoost are set as follows: the tree of the tree is 196, the maximum depth of the tree is 5, the minimum leaf node weight sum is 1, the complexity control term gamma is 0.15, the weight of the L1 regular term is 0.08, the weight of the L2 regular term is 0.1, the ratio of random sampling of each tree is 0.71, the ratio colsample _ byte of the column number of each random sampling is 0.69, the learning rate is 0.1, the weak classifier selects 'gbtree', the objective function selects 'multi: softmax', the category number is 5, and the CPU thread number is 4. Through the construction of the model, the XGboost model brand classification recognition rate of 75.63% of the diesel oil sample of the original test set can be obtained.
The parameter settings of the SVM are as follows: the range of the kernel parameter g and the penalty parameter c is [ -10,0.2], and parameter optimization is carried out by taking RBF as a kernel function and adopting a cross validation grid search mode. Through the construction of the model, the classification recognition rate of the SVM model brand for the diesel oil sample of the original test set is 78.99%, and the operation speed of the method is slightly slow.
The parameters of the BP neural network are set as follows: the number of nodes of an input layer is 275, an implicit layer comprises 9 nodes, an output layer comprises 5 nodes, the output of the neural network is changed into probability distribution by utilizing Softmax, then cross entropy is used as a loss function, and the learning rate is 0.05. Through the construction of the model, the classification recognition rate of the BP model number of the diesel oil sample of the original test set is 69.75%, and the model operation speed is very slow because all neurons between the traditional BP neural network layers are connected.
The classification results of several methods are plotted in fig. 11, and it is obvious that the classification recognition rate of the diesel grade can be greatly improved by combining the SMOTE oversampling technology provided by the invention with the one-dimensional deep convolutional neural network method.
In summary, the method of combining the SMOTE oversampling technology with the one-dimensional deep convolutional neural network not only solves the problem of imbalance of class sample number in the actual situation, but also avoids the complex preprocessing processes of denoising, feature selection, dimension reduction and the like required by the traditional NIRS modeling method. The overall classification recognition rate of the diesel grade is improved, the recognition rate of a few class samples is greatly improved, and the generalization capability and the practical applicability of the model are strong. The intelligent identification of the diesel grade based on the deep learning NIRS modeling is used for replacing the fussy manual identification, and manpower and material resources are saved. In addition, the method has good application prospect in the NIRS qualitative analysis field.
The above-mentioned embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements of the technical solution of the present invention by those skilled in the art should fall within the protection scope defined by the claims of the present invention without departing from the spirit of the present invention.

Claims (10)

1. A near infrared spectrum diesel grade identification method based on SMOTE and deep learning is characterized by comprising the following steps:
step 1, drawing a diesel near-infrared spectrogram, analyzing distribution conditions of different grades of diesel, performing attribute mapping on grade labels, and taking the different grades of the diesel as a sample set;
step 2, carrying out data equalization processing on the sample set by adopting SMOTE, and dividing the sample set into a training set sample and a test set sample;
step 3, constructing a near infrared spectrum classification model of the one-dimensional deep convolution neural network by using the training set sample;
and step 4, bringing the test set samples into the established model to obtain the diesel grade identification result, drawing a multi-classification confusion matrix, and analyzing the identification rate of each class.
2. The SMOTE and deep learning based near infrared spectrum diesel grade identification method according to claim 1, characterized in that: the method comprises the steps of drawing a near infrared spectrum image by using a sample set of diesel oil, dividing sample grades into 5 types including-10 #, -20#, -35#, -50# and interference according to the condensation point of the diesel oil, respectively mapping attributes to be types 1, 2, 3, 4 and 0.
3. The SMOTE and deep learning based near infrared spectrum diesel grade identification method according to claim 2, characterized in that: the specific process of performing data equalization processing on the sample set by using SMOTE is as follows:
1): firstly, calculating the Euclidean distance from each sample x in the minority class to all samples in the minority class sample set to obtain k neighbor of each sample x;
2): setting a sampling ratio according to the sample imbalance ratio to determine a sampling multiplying factor N, and randomly selecting a plurality of samples from k neighbors of each sample x of a minority class, wherein the selected neighbors are assumed to be xn
3): for each xnRespectively according to the formula<1>Constructing a new sample;
xnew=x+rand(0,1)×|x-xn|,new∈1,2,…,N <1>
4): finally, repeating the steps for N times to synthesize N new samples; if the rare class has a total of T samples, NT new samples can be synthesized.
4. The SMOTE and deep learning based near infrared spectrum diesel grade identification method according to claim 3, characterized in that: the specific method for constructing the near infrared spectrum classification model of the one-dimensional depth convolution neural network comprises the following steps:
performing some transformations on the one-dimensional near infrared spectrum data to enable the input signal to meet the requirements of a convolutional neural network; regarding the one-dimensional near infrared spectrum as a special set of two-dimensional images only comprising one row or one column, performing corresponding dimension expansion on a spectrum signal, and converting a category label into a form of one-hot coding; and constructing a one-dimensional depth convolution neural network model by referring to LetNet-5, wherein the one-dimensional depth convolution neural network model comprises an input layer, two convolution layers, two pooling layers, two full-connection layers and an output layer.
5. The SMOTE and deep learning based near infrared spectrum diesel grade identification method according to claim 4, characterized in that: the convolution layer is composed of a group of convolution kernels with trainable parameters, the size of the convolution kernels is set to be m multiplied by 1, and the convolution operation of the one-dimensional signal is shown in a formula <2 >:
Figure FDA0002823195870000021
wherein l is the current convolution layer, l-1 is the (l-1) th convolution layer, and xiAnd yjRespectively representing the ith input feature map and the jth output feature map, and a convolution operator, omegaijRepresents the convolution kernel, b is the bias, and f (.) is the operation of the activation function.
6. The SMOTE and deep learning based near infrared spectrum diesel grade identification method according to claim 4, characterized in that: an activation function PReLU is introduced into the convolutional layer, and the expression of the function is shown in <3 >.
Figure FDA0002823195870000022
7. The SMOTE and deep learning based near infrared spectrum diesel grade identification method according to claim 4, characterized in that: the operation of the pooling layer is shown in formula <4 >:
Figure FDA0002823195870000031
wherein l represents the current pooling layer, l-1 represents the (l-1) th pooling layer, yjIs the jth output characteristic diagram, beta is a multiplicative bias term, and b is bias;
the pooling method is a maximum pooling method, and a sampling method of the maximum pooling method is calculated according to a formula <5 >:
Figure FDA0002823195870000032
in the formula, one feature map obtained by the convolution layer is divided into a plurality of regions Xk,k∈1,2,…,K。
8. The SMOTE and deep learning based near infrared spectrum diesel grade identification method according to claim 4, characterized in that: the fully-connected layer comprises a Flatten layer and two Dense layers, the activation function of the last Dense layer is Softmax, and a certain proportion of random deactivation is added into the fully-connected layer, wherein the fully-connected layer is calculated according to the formula <6 >:
hω,b(x)=f(ωTx+b) <6>
where ω is the weight of the neuron, b is the bias, T is the transpose, and h (x) is the output of the neuron.
9. The SMOTE and deep learning based near infrared spectrum diesel grade identification method according to claim 4, characterized in that: after the model is constructed, a training method is configured; the configured training method comprises a loss function, an optimizer and an evaluation index, wherein the loss function is specifically a cross entropy loss function, the formula of the loss function is shown as <7>, the optimizer adopts Adam optimization, and the evaluation index is accuracy A, and is shown as a formula <8 >:
Figure FDA0002823195870000033
Figure FDA0002823195870000034
in the formula, niThe number of predicted samples is the same as the number of actual samples, and n is the total number of samples.
10. The SMOTE and deep learning based near infrared spectrum diesel grade identification method according to claim 4, characterized in that: on the basis of the constructed one-dimensional depth convolution neural network model, respectively adopting a test set subjected to SMOTE oversampling processing and an original test set to predict the grade of diesel oil, and obtaining the integral classification recognition rate; then drawing a multi-classification confusion matrix, and obtaining the precision, the recall rate, the accuracy and the balance F score according to the confusion matrix, wherein the precision, the recall, the accuracy and the balance F score are shown in <9>, <10>, <11>, <12 >:
Figure FDA0002823195870000041
Figure FDA0002823195870000042
Figure FDA0002823195870000043
Figure FDA0002823195870000044
where TP is the number of samples for which positive examples are predicted as positive examples, FN is the number of samples for which positive examples are predicted as negative examples, FP is the number of samples for which negative examples are predicted as positive examples, and TN is the number of samples for which negative examples are predicted as negative examples.
CN202011443096.0A 2020-12-08 2020-12-08 Near infrared spectrum diesel grade identification method based on SMOTE and deep learning Pending CN112613536A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011443096.0A CN112613536A (en) 2020-12-08 2020-12-08 Near infrared spectrum diesel grade identification method based on SMOTE and deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011443096.0A CN112613536A (en) 2020-12-08 2020-12-08 Near infrared spectrum diesel grade identification method based on SMOTE and deep learning

Publications (1)

Publication Number Publication Date
CN112613536A true CN112613536A (en) 2021-04-06

Family

ID=75232922

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011443096.0A Pending CN112613536A (en) 2020-12-08 2020-12-08 Near infrared spectrum diesel grade identification method based on SMOTE and deep learning

Country Status (1)

Country Link
CN (1) CN112613536A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298791A (en) * 2021-05-31 2021-08-24 中电福富信息科技有限公司 Image detection method of mixed cartoon based on deep learning
CN113378971A (en) * 2021-06-28 2021-09-10 燕山大学 Near infrared spectrum classification model training method and system and classification method and system
CN113505730A (en) * 2021-07-26 2021-10-15 全景智联(武汉)科技有限公司 Model evaluation method, device, equipment and storage medium based on mass data
CN113702328A (en) * 2021-08-20 2021-11-26 广东省惠州市石油产品质量监督检验中心 Method, device, equipment and storage medium for analyzing properties of product oil
CN114659996A (en) * 2022-05-19 2022-06-24 联桥网云信息科技(长沙)有限公司 Hyperspectral oil detection method based on reflected light

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6134537A (en) * 1995-09-29 2000-10-17 Ai Ware, Inc. Visualization and self organization of multidimensional data through equalized orthogonal mapping
CN101866428A (en) * 2010-07-13 2010-10-20 中国人民解放军总后勤部油料研究所 Method for quickly identifying categories and brands of engine fuel
CN104155245A (en) * 2014-07-31 2014-11-19 中国科学院自动化研究所 Method for detecting multiclass properties of oil product based on mode recognition and spectrogram mapping
CN106845371A (en) * 2016-12-31 2017-06-13 中国科学技术大学 A kind of city road network automotive emission remote sensing monitoring system
CN108647643A (en) * 2018-05-11 2018-10-12 浙江工业大学 A kind of packed tower liquid flooding state on-line identification method based on deep learning
US20180299375A1 (en) * 2015-04-27 2018-10-18 Virtual Fluid Monitoring Services LLC Fluid analysis and monitoring using optical spectroscopy
CN109167680A (en) * 2018-08-06 2019-01-08 浙江工商大学 A kind of traffic classification method based on deep learning
AU2019100354A4 (en) * 2019-04-04 2019-05-16 Chen, Mingjie Mr An animal image search system based on convolutional neural network
CN109933539A (en) * 2019-04-15 2019-06-25 燕山大学 A kind of Software Defects Predict Methods based on principal component analysis and combination sampling
CN109992861A (en) * 2019-03-21 2019-07-09 温州大学 A kind of near infrared spectrum modeling method
WO2019169816A1 (en) * 2018-03-09 2019-09-12 中山大学 Deep neural network for fine recognition of vehicle attributes, and training method thereof
CN110443302A (en) * 2019-08-02 2019-11-12 天津相和电气科技有限公司 Load discrimination method and its application based on Fusion Features and deep learning
CN110717368A (en) * 2018-07-13 2020-01-21 北京服装学院 Qualitative classification method for textiles
CN111740971A (en) * 2020-06-15 2020-10-02 郑州大学 Network intrusion detection model SGM-CNN based on class imbalance processing
CN111860124A (en) * 2020-06-04 2020-10-30 西安电子科技大学 Remote sensing image classification method based on space spectrum capsule generation countermeasure network
CN111881987A (en) * 2020-07-31 2020-11-03 西安工业大学 Apple virus identification method based on deep learning
CN111896495A (en) * 2020-08-05 2020-11-06 安徽大学 Method and system for discriminating Taiping Houkui production places based on deep learning and near infrared spectrum

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6134537A (en) * 1995-09-29 2000-10-17 Ai Ware, Inc. Visualization and self organization of multidimensional data through equalized orthogonal mapping
CN101866428A (en) * 2010-07-13 2010-10-20 中国人民解放军总后勤部油料研究所 Method for quickly identifying categories and brands of engine fuel
CN104155245A (en) * 2014-07-31 2014-11-19 中国科学院自动化研究所 Method for detecting multiclass properties of oil product based on mode recognition and spectrogram mapping
US20180299375A1 (en) * 2015-04-27 2018-10-18 Virtual Fluid Monitoring Services LLC Fluid analysis and monitoring using optical spectroscopy
CN106845371A (en) * 2016-12-31 2017-06-13 中国科学技术大学 A kind of city road network automotive emission remote sensing monitoring system
WO2019169816A1 (en) * 2018-03-09 2019-09-12 中山大学 Deep neural network for fine recognition of vehicle attributes, and training method thereof
CN108647643A (en) * 2018-05-11 2018-10-12 浙江工业大学 A kind of packed tower liquid flooding state on-line identification method based on deep learning
CN110717368A (en) * 2018-07-13 2020-01-21 北京服装学院 Qualitative classification method for textiles
CN109167680A (en) * 2018-08-06 2019-01-08 浙江工商大学 A kind of traffic classification method based on deep learning
CN109992861A (en) * 2019-03-21 2019-07-09 温州大学 A kind of near infrared spectrum modeling method
AU2019100354A4 (en) * 2019-04-04 2019-05-16 Chen, Mingjie Mr An animal image search system based on convolutional neural network
CN109933539A (en) * 2019-04-15 2019-06-25 燕山大学 A kind of Software Defects Predict Methods based on principal component analysis and combination sampling
CN110443302A (en) * 2019-08-02 2019-11-12 天津相和电气科技有限公司 Load discrimination method and its application based on Fusion Features and deep learning
CN111860124A (en) * 2020-06-04 2020-10-30 西安电子科技大学 Remote sensing image classification method based on space spectrum capsule generation countermeasure network
CN111740971A (en) * 2020-06-15 2020-10-02 郑州大学 Network intrusion detection model SGM-CNN based on class imbalance processing
CN111881987A (en) * 2020-07-31 2020-11-03 西安工业大学 Apple virus identification method based on deep learning
CN111896495A (en) * 2020-08-05 2020-11-06 安徽大学 Method and system for discriminating Taiping Houkui production places based on deep learning and near infrared spectrum

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
何东远 等: "基于深度学习的恒星光谱分类", 北京师范大学学报(自然科学版), vol. 56, no. 1, pages 37 - 44 *
胡薰尹;管业鹏;李伟东;罗宏杰;: "基于紫外可见近红外光谱特征映射矩阵的古陶瓷分类方法", 硅酸盐学报, no. 09, pages 1280 - 1286 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298791A (en) * 2021-05-31 2021-08-24 中电福富信息科技有限公司 Image detection method of mixed cartoon based on deep learning
CN113378971A (en) * 2021-06-28 2021-09-10 燕山大学 Near infrared spectrum classification model training method and system and classification method and system
CN113505730A (en) * 2021-07-26 2021-10-15 全景智联(武汉)科技有限公司 Model evaluation method, device, equipment and storage medium based on mass data
CN113702328A (en) * 2021-08-20 2021-11-26 广东省惠州市石油产品质量监督检验中心 Method, device, equipment and storage medium for analyzing properties of product oil
CN114659996A (en) * 2022-05-19 2022-06-24 联桥网云信息科技(长沙)有限公司 Hyperspectral oil detection method based on reflected light

Similar Documents

Publication Publication Date Title
CN112613536A (en) Near infrared spectrum diesel grade identification method based on SMOTE and deep learning
CN105975573B (en) A kind of file classification method based on KNN
CN107292350A (en) The method for detecting abnormality of large-scale data
CN111914728B (en) Hyperspectral remote sensing image semi-supervised classification method and device and storage medium
CN103489005A (en) High-resolution remote sensing image classifying method based on fusion of multiple classifiers
CN110363253A (en) A kind of Surfaces of Hot Rolled Strip defect classification method based on convolutional neural networks
CN109711483B (en) Spark Autoencoder-based power system operation mode clustering method
CN110211127B (en) Image partition method based on bicoherence network
CN110826618A (en) Personal credit risk assessment method based on random forest
CN106886793B (en) Hyperspectral image waveband selection method based on discrimination information and manifold information
CN113240201B (en) Method for predicting ship host power based on GMM-DNN hybrid model
CN115564996A (en) Hyperspectral remote sensing image classification method based on attention union network
CN105868796A (en) Design method for linear discrimination of sparse representation classifier based on nuclear space
CN111338950A (en) Software defect feature selection method based on spectral clustering
CN110348287A (en) A kind of unsupervised feature selection approach and device based on dictionary and sample similar diagram
CN112289391A (en) Anode aluminum foil performance prediction system based on machine learning
CN111027636B (en) Unsupervised feature selection method and system based on multi-label learning
CN109583519A (en) A kind of semisupervised classification method based on p-Laplacian figure convolutional neural networks
CN108564116A (en) A kind of ingredient intelligent analysis method of camera scene image
CN113408616B (en) Spectral classification method based on PCA-UVE-ELM
CN108509840B (en) Hyperspectral remote sensing image waveband selection method based on quantum memory optimization mechanism
CN112966735B (en) Method for fusing supervision multi-set related features based on spectrum reconstruction
CN112101574B (en) Machine learning supervised model interpretation method, system and equipment
Li et al. Adaptive mask sampling and manifold to Euclidean subspace learning with distance covariance representation for hyperspectral image classification
CN113516019A (en) Hyperspectral image unmixing method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination